A Tale of Bitcoin — 1. The Blockchain

September 03, 2019

Let’s introduce the first character in our story — Alice. Alice has been traveling around the world, from city to city, and on her travels has heard about this thing called a “blockchain”. Never one to be left behind, she decides to teach herself about all things crypto. At the most basic level, a blockchain is just a data-structure that can hold any type of data. So, to keep things simple, Alice stores the name of each city on her travels and the date she arrived, on a blockchain.

The genesis block

Data is stored in units called blocks, with the first referred to as the genesis block. Alice puts the first city she visited, London, and the date she arrived, January 2009, into the block.

Each block must have a unique identifier, a “fingerprint”, calculated using just the data in the block. Fortunately, cryptographers have just the tool — a cryptographic hash function. These special functions have a few crucial properties:

The output — called the hash — is always a fixed size, for example 256 bits, independent of the size of the input.
All inputs should have a unique hash.
It should be impossible to predict the hash of an input.
Given a hash it is practically impossible to determine what the input data is.

It’s not possible to theoretically guarantee that all these properties will hold for any cryptographic hash function; but we have functions, such as SHA256 1, that are sufficiently secure (at least for now). As an example, the python snippet below calculates the SHA256 of two slightly different inputs, but the resulting hashes are completely different.

import hashlib

hash1 = hashlib.sha256(b"London - Jan 2009")
hash2 = hashlib.sha256(b"london - jan 2009")

print(hash1)  # af668d8080e59607600592291494ad10d6f54e1d81033e0e4c5b890accb5ca3c
print(hash2)  # d592885149424c91fa3e382cf4320210cb25bc390d52a6cdb7afe14f762b1e5a

Creating more blocks

To store the data for a new city, Alice creates a new block. Except this time, she also includes the hash of the previous block inside the new block. For example, she visited Berlin in June 2011 and includes the hash of the previous block — A93B4 — to produce a new block with hash 5E7A8 (we’re only using these “hashes” as a demonstration; they’re not real). This process is repeated for all subsequent blocks.

A blockchain

Referencing the hash of the previous block inside each new block links the blocks together, forming a chain of blocks — a blockchain. Alice can share the state of the blockchain with whomever she likes; but there’s a problem. What if there is a nefarious actor in the system, like Bob?

Making a forgery

Bob makes a copy of the blockchain and makes a modification, changing her trip in June 2011 from Berlin to Paris. By changing its data, the hash of the second block has changed from 5E7A8 to 761AB. The third block is now invalid because it’s previous block hash no longer points to this “new” block that Bob changed, and so, the link in the chain is broken.

But, it’s trivial for Bob to relink the full chain again. He updates the previous block hash in the third block — Tokyo — to point to the new block. Since the content of the third block has now changed, it’s hash has changed to C19E4, so he must do the same for the next block, and so on until the end of the chain.

Bob can re-compute these hashes easily, and soon has a blockchain four blocks long, just like Alice — he has made a forgery. But, how can anyone tell the blockchain has been forged?

Which version to trust?

Bob’s forged blockchain is completely valid — each block in the chain correctly points to the hash of the previous block in the chain. So, when Carol comes along, which blockchain should she trust? She doesn’t know Alice, so can’t simply ask her which one is the “true” version — the system is decentralized. If only we could somehow build trust into the blockchain itself. In the next post we will do just that, incorporating a concept called proof of work.

1. Bitcoin uses SHA256 as its block hashing algorithm. Except, instead of just one round, it computes a double hash SHA256(SHA256(data)). ↩