Blockchain apps must be closed systems

Blockchain tech, especially smart contracts, are the hot new “internet”. Post the creation of Bitcoin, we’ve seen the rise of the public smart contract system Ethereum and several private systems like Linux Foundation’s Hyperledger. These distributed ledgers have become the brand new foundation to build apps on. This is as app developers hope to leverage the additional trust that these ledgers are supposed to provide by virtue of their distributed nature.

(Cross posted here from blog.imaginea.com - Blockchain apps must be closed systems)

Demian Brener of OpenZeppelin, which provides a set of reusable smart contracts atop Ethereum, writes -

There are no tools for developers to easily create, test, verify and audit smart contracts, and do so collaboratively.

As we accrete such common contract tools and libraries, a blockchain app developer’s job turns easier. But how should we audit the set of contracts that make up our applications to ensure that they leverage the blockchain platform in a foolproof manner?

That’s a hard problem – especially so for someone coming with a domain of expertise and looking to leverage smart contracts as a platform.

To explain why it is hard, we need to throw light on what a blockchain really is from the viewpoint of system properties that it provides for leveraging by apps. In short, a blockchain provides you a database -

  1. whose records cannot be mutated once created,
  2. which is very hard to tamper with, and
  3. which is auditable by non-participants.

The immutability of records is provided by the chaining of new blocks of records onto the existing chain (hence “blockchain”). The property of tamper evidence is possible since if one record is tampered with, the signature of the record and of all subsequent records won’t match up. Furthermore, all blockchain peers will have to agree about the change even if the signature verified fine. Auditability is provided by an open and inexpensive mechanism to compute these signatures based on the content of a block.

The cryptography that goes towards providing these properties is nothing short of genius, but the value of these properties is, relatively speaking, not hard to articulate and understand. After all, these properties, when possible without a central authority stepping in, should vastly increase our trust in such a system.

If we put aside the immutability and tamper evidence properties, and focus on the auditability of the records on a blockchain, we can see that all information placed in these records must be independently and timelessly verifiable. In other words, the data placed there cannot refer to entities that can change over time. For example, if we want to log that The New York Times wrote an article about Donald Trump on 29th May 2017, it is not sufficient to just place a link to the article, note down the date and time, and add it as a record on the blockchain. The server serving up the link may go down. The maintainer may change the content of the link to say something else entirely. The link may redirect to another article or to a cat GIF. These instances of mutability make it impossible to audit at arbitrary times.

In that sense, the blockchain is best not used as an arbitrary database. Doing so unnecessarily increases the cost of storing the record while reaping none of the benefits listed above. One might as well create a regular public RDBMS and place the info on it.

Only information of a certain nature benefits from existing on a blockchain.

So, to create a public record that the NYT did publish such an article, what can we do? Let’s say we have a system that archives all NYT articles as they appear. We can then store a copy of the entire article as a record. But, what then stops anyone from uploading any content and claiming that it was published by the NYT? One thing we can do is storing a cryptographic hash of the article on the blockchain, while archiving the whole article in a regular database. This helps in minimizing the amount of data we store on blockchain as well. Storing recomputable hashes enables us to compute them anytime and prove, with negligible error probability, that the article is indeed what was logged on the blockchain.

But what if someone records a fake article on the blockchain? All we need is a system that ensures that a hash pulled out of the system was indeed the article. For example, if all CDNs logged hashes of articles (ex: ETags), then showing that the hash we logged on the blockchain can also be found in the logs of a few independent CDNs–over which we have no admin control over–would increase the trust others would have in our claim.

In short, injecting external data into a blockchain record is a non-trivial problem.

On the other hand, suppose we’re logging information about a book. We can refer to the book by its ISBN number. The ISBN database is maintained in a reliable manner across the globe and lets us check, at any time, any metadata associated with the book once we know its ISBN number. The ISBN number, therefore, is an auditable timeless data item that can be placed in a blockchain record. The probability that the highly replicated ISBN database is tampered with in an undetectable way is pretty low. To tamper with the book that a number refers to, not only do we need to change the database content, but we also need to change all the printed or downloaded copies of the book that feature the ISBN number. To further strengthen this, every time the ISBN org registers a new book on the public blockchain instead of in a normal database, the record’s cryptographic hash can be used in place of the ISBN number to refer to the book in a timeless manner.

In this way, all information that is within a blockchain record eventually is best folded into the blockchain itself, making the entire system closed.

If we fail to create such a closed system, the next best thing is to only refer to highly trusted and timeless systems. Since the strength of the blockchain system is increased by the volume of transactions recorded on it, folding statements produced by these highly trusted systems into blockchain records would, effectively, carve them on digital diamonds.

In a manner, when a piece of information is touched by a blockchain app, it infects every part of the system it comes from. Immutability is an infectious virus.

Not even something as common as email addresses would necessarily qualify as immutable data. Of course, nothing stops us from placing an email address in a record, but what does it refer to? Does it refer to the author of a book? If so, has the author changed her email address now? Has the provider reassigned the email address to someone else? Was the email address placed there without the consent of the owner of the address? Did it even exist in the first place? Should we include some proof that an email actually got sent from this address? Should we include info about the IP address from which the mail got sent? What about the DKIM signature of the mailer that sent it? Different applications would require different answers to these questions. The bitcoin application, for example, neatly side steps this identity problem by creating an identity system–the wallet address–that exists by creation within the blockchain system.

So, how do we audit our blockchain application described as smart contracts?

We need to ensure that, literally, every bit of information we include in our contract comes from a timeless source.

With data that goes on the blockchain, it is all the more imperative to answer the timeless questions of epistemology -

  1. What do you know? - i.e. what does the data on the blockchain signify?
  2. How do you know it? - i.e. how are you sure that it indeed does signify what you claim it to?