Define state model and state root

alessandro.ricottone · May 22, 2021, 2:00pm

Dear all,

I would like to propose a LIP for the roadmap objective: “Define state model and state root”.
This LIP specifies a new state model for a Lisk blockchain.

Here is the complete LIP draft:

LIP: <LIP number>
Title: Define state model and state root
Author: Alessandro Ricottone <alessandro.ricottone@lightcurve.io>
Type: Informational
Created: <YYYY-MM-DD>
Updated: <YYYY-MM-DD>
Required: Introduce sparse Merkle trees LIP

Abstract

The state of an interoperable chain in the Lisk ecosystem is organized as a sparse Merkle tree, the state tree, built on top of a generic key-value storage for each module of the chain. This allows to authenticate the whole state with the tree Merkle root, the state root. In this LIP, we define the state model of a Lisk blockchain and the construction of the state tree from which the state root is calculated.

Copyright

This LIP is licensed under the Creative Commons Zero 1.0 Universal.

Motivation

A blockchain created with the Lisk SDK is organized in a modular way: each module registered in the chain defines its own state and the possible state transitions, i.e., the transactions defined within the module or the handlers that can be called by other modules. This is reflected by the state tree: each module maintains its own key-value storage and sparse Merkle tree, which are finally merged together to form the complete state tree. This LIP specifies the general format for the key-value storages and the way in which they are combined together in the state tree.

Organizing the state of a blockchain as a Merkle tree allows to cryptographically authenticate the whole state with a single hash, the state root. The state root property is calculated at the end of the block processing and included in the block header. When a certificate is posted to a chain, the state root is used to authenticate the information contained in it against the block header, which has been signed by the sending chain validators. For example, a node can verify that the cross-chain messages in the certificate payload have been included in the outbox of the sending chain by recomputing the state root signed in the block header from the outbox root. This is done using a “witness”, i.e. an array of hashes used to recompute the root starting from the known elements of the tree.

In the future, the state root could further facilitate the introduction of light clients. Light clients would only check the validity of block headers, without fully verifying every transaction in a block. The state root can then be used to verify the correctness of a portion of the state, such as a user balance, requested by the light client to a full node.

Furthermore, the state root could be used for faster synchronization: a node could download a snapshot of the blockchain at a block close to the tip, therefore synchronizing quickly to the last state, while in the background it downloads and verifies blocks in the past. In this case, the state root would be used to verify the validity of the snapshot.

Finally, the state root can be employed to improve the regenesis protocol. A node could download the state root from a trusted party and then check that the genesis block of a chain defines a state that corresponds to the trusted state root.

Rationale

Each module registered in a chain maintains a separate key-value storage. In this LIP, we specify how the entries of the key-value storage are inserted in a sparse Merkle tree (introduced in LIP “Introduce sparse Merkle trees”), the state tree. The state root is then set to the root of the state tree. The key-value entries of the persistent storage are specified in the respective modules. There is no a-priori separation between user and chain accounts, as they are treated as any generic key-value entry.
Each entry specifies a storage prefix and a storage key. The storage prefix identifies different ‘‘buckets’’ within the module storage, i.e. group of entries with different scope. The storage key identifies different entries within a bucket.

Keys of the state tree leaf nodes are obtained by concatenating the module ID, the storage prefix and the hash of the storage key. Prepending the module ID allows to separate the state tree in smaller subtrees (one per module), each managed by the respective module, which is also responsible for creating witnesses relative to properties stored in their database (see Figure 1). Similarly, the storage prefix separates the different buckets within the same database into different subtrees. Storage keys are hashed in order to randomize them. This effectively mitigates spam attacks that aim at creating many database entries (e.g., accounts) in a certain key neighborhood (e.g., close to the target account address), in order to increase the length of inclusion proofs for that key (in our SMT implementation).

Given B buckets in a module store and N entries in the bucket, the module subtree will contain on average Log(B) + Log(N) layers. The expected number of operations needed to validate a witness from a leaf to the state root is 32 (for the 4 bytes of the module ID), plus 16 (for the 2 bytes of the storage prefix), plus Log(N) for the module subtree. Empty hashes are not included in the witness, and the resulting witness size is therefore Log(M) + Log(B) + Log(N), where M indicates the number of modules registered in the chain.

Specification

Figure 1: The general structure of the state sparse Merkle tree for a Lisk blockchain defining two custom modules. The state root is the Merkle root. Each module defines a different generic key-value store.
The keys of the leaf nodes start with the module IDs, so that each module subtree is separated from the others. Not all modules are depicted in this example.

The state root is the root of a sparse Merkle tree (specified in LIP “Introduce sparse Merkle trees”), the state tree. Each registered module of the chain defines a generic key-value storage. Storage values can be arbitrary byte arrays, but they typically correspond to data structures serialized according to LIP 0027. The corresponding leaf nodes have keys given by the concatenation of the module ID, storage prefix, and the SHA-256 hash of the storage key. Here, module IDs are serialized as uint32, with a fixed length of 4 bytes, while storage prefixes are serialized as uint16, with a fixed length of 2 bytes. This implies that a leaf-node key of the state tree has a fixed length of 4+2+32 = 38 bytes.

In summary, for a module identified by moduleID and an entry with the storage prefix storagePrefix, storage key storageKey, and storage value storageValue, the corresponding leaf node is added to the state tree using SMT.update(moduleID || storagePrefix || hash(storageKey), hash(storageValue)), where || indicates the bytes-concatenation operation. This leaf node will have the following properties:

leaf.key  = moduleID || storagePrefix || hash(storageKey),
leaf.value = hash(storageValue),
leaf.hash = leafHash(leaf.key, leaf.value),

where the leafHash function is defined in LIP “Introduce sparse Merkle trees”.

The value of state root at a certain height h, stateRoot(h), is the root of the sparse Merkle tree computed from all key-value entries of all modules (as described above), after processing the block at height h.

Backwards Compatibility

This LIP is purely informational, hence it does not imply any incompatibilities per se.

alessandro.ricottone · June 28, 2021, 7:14am

I created a pull request to merge this proposal in the LIP repository.

alessandro.ricottone · July 27, 2021, 12:11pm

The PR has been merged