Introduction
Every time you use Git to commit code, verify a Bitcoin transaction, or download a file via BitTorrent, you're using a data structure invented in 1979 by computer scientist Ralph Merkle. This elegant invention - the Merkle tree (also called a hash tree) - is one of the most important building blocks of modern cryptography and blockchain technology.
In this article, we'll explore what Merkle trees are, how they work, and why they're fundamental to how Anchora achieves 99.98% cost reduction in blockchain anchoring while maintaining cryptographic security.
The Problem Merkle Trees Solve
Imagine you have 1,000 documents and you want to prove that each one hasn't been tampered with. The naive approach is to:
- Hash each document individually
- Store each hash on a blockchain
- Pay for 1,000 separate blockchain transactions
At approximately $10 per transaction (Ethereum mainnet average), that's $10,000 to verify 1,000 documents. This makes blockchain verification economically impractical for most applications.
Merkle trees solve this problem elegantly: By combining multiple hashes into a tree structure, you can verify 1,000 documents with a single blockchain transaction, reducing the cost to approximately $0.04 (using Polygon via Anchora).
How Merkle Trees Work
A Merkle tree is a binary tree where:
- Leaf nodes contain hashes of individual data blocks
- Non-leaf nodes contain hashes of their children (combined)
- The root (Merkle root) is a single hash that represents ALL data in the tree
Step-by-Step Construction
Let's build a Merkle tree for 4 documents:
Documents: [Doc1, Doc2, Doc3, Doc4]
Step 1: Hash each document (leaf nodes)
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐
│ H1 │ │ H2 │ │ H3 │ │ H4 │
│ 0x7f3a..│ │ 0xb2c1..│ │ 0x9e4d..│ │ 0x1a2b..│
└─────────┘ └─────────┘ └─────────┘ └─────────┘
│ │ │ │
└─────┬─────┘ └─────┬─────┘
│ │
Step 2: Hash pairs together
┌─────────┐ ┌─────────┐
│ H12 │ │ H34 │
│ 0x5678..│ │ 0x9abc..│
└─────────┘ └─────────┘
│ │
└───────────┬───────────┘
│
Step 3: Hash to get Merkle Root
┌─────────┐
│ ROOT │
│ 0x9876..│
└─────────┘
Where:
H1 = SHA256(Doc1)
H2 = SHA256(Doc2)
H12 = SHA256(H1 + H2)
ROOT = SHA256(H12 + H34)
The key insight is that any change to any document changes the Merkle root. If someone modifies Doc2, then H2 changes, which changes H12, which changes ROOT. This is called tamper evidence.
The Power of Merkle Proofs
Here's where Merkle trees become truly powerful. To prove that Doc1 is part of the tree, you don't need all documents - you only need:
- H2 (sibling of H1)
- H34 (sibling of H12)
- The Merkle Root
To verify Doc1:
1. Compute H1 = SHA256(Doc1)
2. Compute H12 = SHA256(H1 + H2) ← H2 is in proof
3. Compute ROOT = SHA256(H12 + H34) ← H34 is in proof
4. Compare computed ROOT with stored ROOT
5. If match → Doc1 is authentic ✓
Proof for Doc1: [H2, H34]
Proof size: 2 hashes (O(log n))
For a tree with 256 documents, you only need 8 hashes (log2(256) = 8) to prove any single document. This is incredibly efficient!
Proof Size Comparison
Real-World Applications
1. Bitcoin and Blockchain
Every Bitcoin block contains thousands of transactions organized in a Merkle tree. The block header only stores the Merkle root (32 bytes), not all transactions. This allows Simplified Payment Verification (SPV) - lightweight clients can verify transactions without downloading the entire blockchain.
2. Git Version Control
Git uses Merkle trees (called "Git objects") to track files and directories. Every commit is a Merkle root of the entire repository state at that point. This enables efficient comparison between versions and detection of any file changes.
3. IPFS and Content Addressing
The InterPlanetary File System (IPFS) uses Merkle DAGs (Directed Acyclic Graphs) to address content by its hash. Files are split into chunks, and the Merkle root becomes the file's permanent address.
4. Certificate Transparency
Google's Certificate Transparency logs use Merkle trees to create an append-only log of SSL/TLS certificates, making it impossible to issue fraudulent certificates without detection.
How Anchora Uses Merkle Trees
Anchora uses Merkle tree batching as its core innovation to make blockchain verification economically viable:
1. Collect records (up to 256 per batch)
Records: [R1, R2, R3, ... R256]
2. Hash each record using SHA-256
Hashes: [H1, H2, H3, ... H256]
3. Build Merkle tree using keccak256
(keccak256 is Ethereum's hash function)
4. Anchor ONLY the Merkle root to Polygon blockchain
Contract.anchorRoot(merkleRoot, recordCount)
Cost: ~$0.001 per batch (regardless of batch size!)
5. Generate proof for each record
R1.proof = [H2, H34, H5678, ...]
6. Store proofs with records
Each record can be verified independently
Cost Comparison
Implementation Deep Dive
Here's how Anchora implements Merkle trees using the merkletreejs library:
import { MerkleTree } from 'merkletreejs';
import keccak256 from 'keccak256';
// Step 1: Collect record hashes
const recordHashes = [
'e3b0c44298fc1c149afbf4c8996fb924...',
'd7a8fbb307d7809469ca9abcb0082e4f...',
// ... up to 256 hashes
];
// Step 2: Create leaf nodes (keccak256 of each hash)
const leaves = recordHashes.map(h =>
keccak256(Buffer.from(h, 'hex'))
);
// Step 3: Build Merkle tree
const tree = new MerkleTree(leaves, keccak256, {
sortPairs: true // Ensures deterministic tree
});
// Step 4: Get Merkle root (this goes to blockchain)
const merkleRoot = '0x' + tree.getRoot().toString('hex');
console.log(merkleRoot);
// "0x9876543210fedcba9876543210fedcba..."
// Step 5: Generate proof for any record
const leaf = keccak256(Buffer.from(recordHashes[0], 'hex'));
const proof = tree.getProof(leaf);
// Convert proof to hex strings
const proofHex = proof.map(p =>
'0x' + p.data.toString('hex')
);
console.log(proofHex);
// ["0x1234...", "0x5678...", "0x9abc...", ...]
Verification Process
// Verify a record against stored Merkle root
function verifyMerkleProof(hash, proof, expectedRoot) {
let computedHash = Buffer.from(hash, 'hex');
for (const sibling of proof) {
const siblingBuffer = Buffer.from(
sibling.replace('0x', ''),
'hex'
);
// Sort pair to ensure deterministic hashing
const combined = Buffer.compare(computedHash, siblingBuffer) < 0
? Buffer.concat([computedHash, siblingBuffer])
: Buffer.concat([siblingBuffer, computedHash]);
computedHash = keccak256(combined);
}
const computedRoot = '0x' + computedHash.toString('hex');
return computedRoot === expectedRoot;
}
// Usage
const isValid = verifyMerkleProof(
'e3b0c44298fc1c149afbf4c8996fb924...', // Record hash
['0x1234...', '0x5678...'], // Merkle proof
'0x9876543210fedcba...' // Root from blockchain
);
console.log(isValid); // true
Security Properties
Merkle trees inherit their security from the underlying hash function (SHA-256 or keccak256). They provide:
It's computationally infeasible to find two different inputs that produce the same hash. With 2256 possible outputs, the probability of collision is negligible.
Any modification to any leaf changes the Merkle root. There's no way to modify data without detection.
Merkle proofs reveal only the siblings needed for verification. Other data in the tree remains hidden.
Once a Merkle root is anchored to blockchain, the data's existence at that time is permanently proven.
Why 256 Records Per Batch?
Anchora uses a batch size of 256 records for several reasons:
- Perfect Binary Tree: 256 = 28, creating an 8-level tree with exactly 8 hashes per proof
- Proof Size: 8 hashes x 32 bytes = 256 bytes per proof (manageable)
- Latency Balance: At 30-second batch intervals, this provides good throughput while maintaining reasonable anchoring times
- Cost Optimization: Gas cost is nearly constant regardless of batch size, so larger batches = lower per-record cost
Comparison with Other Approaches
Conclusion
Merkle trees are a fundamental cryptographic primitive that enables efficient, secure, and privacy-preserving data verification. Their O(log n) proof size makes them ideal for blockchain applications where storage and bandwidth are expensive.
At Anchora, we leverage Merkle trees to batch 256 records into a single blockchain transaction, achieving 99.98% cost reduction while maintaining the same cryptographic security guarantees. Each record gets its own unique proof that can be verified independently without revealing other records in the batch.
Whether you're building credential verification systems, supply chain tracking, or audit logs, Merkle trees (and by extension, Anchora) provide the most efficient path to blockchain-backed data integrity.
Ready to leverage Merkle tree batching?
Anchora handles all the complexity of Merkle tree construction, blockchain anchoring, and proof generation. Start verifying data at a fraction of traditional costs.
Get Free API Key