This article proposes a simple and scalable design to scan a blockchain, that works for full Bitcoin node and (soon) SPV.
WARNING! The Scanner is now Obsolete in NBitcoin making the part "A simple and extensible design for Blockchain Scanning" obsolete.
Introduction
In the previous article, I wrote about bitcoin (Introduction article and one about stealth address and two factor), I did not invent anything special, I just wanted to explain in simpler terms how things work.
On the other hand, this article proposes a simple and scalable design to scan a blockchain, that works for full Bitcoin node and (soon) SPV. I entirely made up this design, and it is experimental for now, and expect some bugs lurking around despite the 100+ unit tests I wrote. I expect you will use it and give me feedback about it.
Being experimental, the code of this article might have changed since I wrote it. But the general principle stays the same.
For people that know how bitcoin works, you can skip the “Bitcoin Basics” part.
Bitcoin Basics
What is a Transaction
For people not familiar with Bitcoin, I will explain how it works internally so you can understand the rest of this article, for others, skip it.
Bitcoin, as you have heard, wants to be a decentralized currency. Decentralized means that anybody can verify, confirm, and spread a transaction on the network, without any permission and authentication.
But what is a transaction? A transaction is an atomic operation that moves bitcoin, from sender(s ) to receiver(s ). (One transaction can involve multiple parties)
The senders fill the inputs part of the transaction (by convention at the top). The receivers fill the outputs part of the transaction (by convention at the bottom).
The input contains a reference to a previous output called an OutPoint
, and a proof of ownership from the sender called a scriptSig
(typically, but not limited to, a ECDSA signature, as I explain in this article).
The output contains a scriptPubKey
that specifies the way the receiver can prove its ownership to spend the funds. The output also contains an amount of BTC to spend.
Here is an example of Yellow paying Green (1 BTC), Blue (10 BTC), Orange (5 BTC).
Now imagine that this transaction is confirmed on the network, and Orange wants to pay 4 BTC to Green. In Bitcoin, you can’t partially consume an output. You consume it all at once.
So, the previous Orange output becomes referenced as an input. And outputs contains the 4BTC to Green, and 1 BTC back to orange.
Now, if this transaction is confirmed on the network, and Green wants to pay 9 BTC to Blue. In the same way, green, will reference its two outputs as input in the new transaction. And send back the surplus to himself.
How Transactions Are Verified
Being decentralized means that anybody can verify the validity of a transaction by himself. What do you need to verify?
- Each input references an unspent output.
- Each input correctly proves ownership of the referenced output (as I explain in this article).
- The sum of spent outputs (output referenced by the outpoint of an input) is equal to or more than the sum of outputs. The difference between these two amounts is called a fee.
So, if you want to verify the validity of a transaction, then you need all the confirmed transactions from the beginning of time (currently approximately 20 GO), and index all unspent output, to efficiently check that the referenced output is not already spent.
How Transactions Are Confirmed
A transaction is confirmed by miners. A miner is an agent that is economically incentivized to confirm transactions that are spreading into the network.
What motivates the miner?
- The miner receives all the fees of all transaction he confirms.
- The miner receives a newly created amount of BTC, called a reward. The amount is currently 25 BTC, but will decrease in the future until a maximum of 21 million BTC is emitted. (See this page.)
A miner gathers transactions he wants to confirm into a block, and the hash of such block needs to start with a certain number of zero, for the transaction to be “confirmed” by the network.
The number of zero is called “difficulty,” the difficulty is adjusted every 2 weeks depending on how fast blocks where mined, so you can expect one block discovered in the world every 10 minutes.
How to Instantly Confirm a Transaction
Some people argue that bitcoin takes too much time to confirm a transaction compared to say, visa. This is simply not true. You don’t need to wait so much time to be sure the sender of the money does not try to spend multiple time his outputs.
As explained in previous article, you can create an output that can be spent only if 2 different keys sign it correctly.
If the buyer and the merchant have a shared trusted party that will co-sign an input with the buyer, then the merchant can just verify that the transaction is signed by the trusted party and trust such trusted party will not sign another transaction to spend the same output.
What Is the Block Chain
The block chain is the ordered list of all blocks ever discovered in the bitcoin network since its creation. So it is the set of all transactions ever confirmed.
How to Get Your Balance
You need to scan the whole blockchain for unspent Outputs for which you know how to prove ownership. How to scan simply? This is what my article talks about next.
(Obsolete) a Simple and Extensible Design for Blockchain Scanning
Scanner
As said earlier, a scanner is an object that processes transactions from the blockchain for unspent output you are interested in tracking.
Here are some ideas of scanner, some of which I developed, some of which I will:
- Colored Coins Scanner
PubKeyHashScanner
(classic bitcoin address) PubKeyScanner
(same thing except the pubkey
is used in the scriptPubKey
instead of its hash) StealthPaymentScanner
(Get your funds with the scan key, Dark wallet compliant, as explained in this article)
A scanner provides two things: If it can, a pattern of data you want to find in a transaction with GetScannedPushData
, thanks to that, you will considerably lower scan time, because you won’t have to download most of the Block you are not interested in (thanks to BIP 37).
ScanCoins
which take a block and its height, and output a set of ScannedCoin
(which is nothing but a transaction stripped from irrelevant data for the scanner)
A scanner is stateless, so we will talk about how to save a scanner’s progression.
ScanState
If you don’t want to scan the blockchain everytime you restart your program, you need to store two kinds of data, that I encapsulated into the ScanState
class:
- the
Chain
you scanned so far - the
Account
that gives you the set on unspent outputs and your balance so far.
You program will have a full mainChain
that will reflect the current BlockChain
.
However, each ScanState
have their own chain that can be partial (If you create a Key
today, you don’t want to scan the whole blockchain from 2008 to now).
Everytimes you want to scan the mainChain
you call the Process
method, and pass to it a IBlockProvider
.
The method will :
- Cancel some
AccountEntry
in Account if Chain
is a fork. - Traverse the chain from the latest fork between
mainChain
and ScanState.Chain
,
updating both Account
and Chain
while doing it.
However Account
and Chain
might become very big, so in order to be efficient, one need to save Account
and Chain
incrementally. The approach use to do so is to save Account and Chain as a stream of change. (Some people call such design Event Sourcing). Every time the state of Account or Chain is changed, it is saved as a change, that will be replayed during deserialization.
The class responsible for saving such incremental changes is called ObjectStream
.
And it is used by both, Chain
and Account
, (ObjectStream<ChainChange>
and ObjectStream<AccountEntry>
respectively)
The only current implementation is StreamObjectStream<T>
which save your changes into a Stream
. (File stream, or in memory stream).
Let’s go back to ScanState.Process
.
A Chain
is a set of BlockHeader
, so it does not include the transactions of a Block
, this mean that the scanner need a way to get a Block
from the BlockHeader
.
This is the goal of IBlockProvider
.
You can searchedData
to IBlockProvider.GetBlock
, so, when I will implement SPV, the block provider will only download block that contains pushes with the searchedData
.
Currently, it exists only one implementation that require the full blockchain on your computer: IndexedBlockStore
.
IndexedBlockStore
requires two things: a key value store called NoSqlRepository
(only existing implementation is SQLiteNoSqlRepository
).
And a BlockStore
which is a raw, enumerable set of downloaded block. The format it uses is the same as bitcoinq. (blkXXX.dat)
In summary, here is how to Index your own IndexedBlockStore
.
var store = new BlockStore("bitcoin/datadir", Network.Main);
var index = new SQLiteNoSqlRepository("myindex.db", createNew: true);
var indexedStore = new IndexedBlockStore(index, store);
indexedStore.ReIndex();
ScanState.Process
need a second thing: mainChain
which is the current full chain of BlockHeader
. You can get this one very easily from the Bitcoin network by using NodeServer
.
NodeServer
is misnamed, because it is a server only if you call Listen()
, else it is a simple client Node on the Bitcoin Network.
NodeServer client = new NodeServer(Network.Main);
var inFile = new StreamObjectStream<ChainChange>
(File.Open("MainChain.dat",FileMode.OpenOrCreate));
var mainChain = client.BuildChain(infile);
In this example, NodeServer.BuildChain
will
- build the chain from the given
ObjectStream<ChainChange>
, by replaying the changes. - download all the
BlockHeader
from the last fork of the chain and the downloaded chain (takes 1 minutes if you download the current 300 000 block headers)
Manual Scanning
All the previous scans are good if you want to track interesting transfers on the BlockChain. But imagine you want to make a simple analytic application?
For example, is some case, I needed to make a query to retrieve all transactions between December 2012, and January 2013, with an amount around 1100 BTC.
First, let's get the chain of all transactions (1 minutes), you only need to do one time, because the chain is saved to MainChain.dat
var node = Node.ConnectToLocal(Network.Main);
node.VersionHandshake();
var chain = node.GetChain();
Then, let's build an index of all blocks, so we can query blocks by blockId
after.
BlockStore store = new BlockStore("E:\\Bitcoin\\blocks",
Network.Main); //the folder is the blocks folder of all blocks saved by Bitcoin-QT,
//the original bitcoin client application, this "store" can only browse blocks sequencially
IndexedBlockStore index =
new IndexedBlockStore(new SQLiteNoSqlRepository("PlayIndex"), store); //Save a SqlLite
//index in a file called PlayIndex
index.ReIndex(); //Index, to do one time only to fill the index
Once you have both, the chain and the index, you can run the analysis.
var headers =
chain.ToEnumerable(false)
.SkipWhile(c => c.Header.BlockTime < from)
.TakeWhile(c => c.Header.BlockTime < to)
.ToArray();
var target = Money.Parse("1100");
var margin = Money.Parse("1");
var en = new CultureInfo("en-US");
FileStream fs = File.Open("logs", FileMode.Create);
var writer = new StreamWriter(fs);
writer.WriteLine("time,height,txid,value");
var lines = from header in headers
let block = index.Get(header.HashBlock)
from tx in block.Transactions
from txout in tx.Outputs
where target - margin < txout.Value && txout.Value < target + margin
select new
{
Block = block,
Height = header.Height,
Transaction = tx,
TxOut = txout
};
foreach(var line in lines)
{
writer.WriteLine(
line.Block.Header.BlockTime.ToString(en) + "," +
line.Height + "," +
line.Transaction.GetHash() + "," +
line.TxOut.Value.ToString());
}
writer.Flush();
What's Next?
Next of NBitcoin
will be the creation of an IBlockProvider
for SPV scenario. It will use NodeServer
, an IndexedBlockStore
cache, and BloomFilter
to download block that are needed by the scanner. Such BlockProvider
will be thread safe, so multiple Scanner can download blocks at the same time, while sharing the same cache.
Also, NBitcoin lacks a coherent model of Wallet that reuses this concept of ScanState
, this will come soon though.
Conclusion
There is not tons of code in this article, but I expect you to discover and use this lib thanks to the 150+ unit test suite I wrote. (Once again, I also ported tests from bitcoin core.)
This lib took me a tremendous amount of time to code, so I’ll be more than happy you give me your feedback or improvement, or tip at 15sYbVpRh6dyWycZMwPdxJWD4xbfxReeHe, so I can keep being motivated, eat pizza and drink coffee!
History
- 10th June, 2014: Initial version
- 5th March, 2021: Updated code