Define New Transaction Schema

Hello everyone,

I would like to propose a new LIP for the roadmap objective “Define state and state transition model". This LIP updates the terminology used for transaction and transaction properties and defines a new schema to be used to serialize transactions.

I’m looking forward to your feedback.

Here is the complete LIP draft:

LIP: <lip number>
Title: Define new transaction schema
Author: Maxime Gagnebin <maxime.gagnebin@lightcurve.io>
? Discussions-To: <Link to discussion in Lisk Research>
Type: Standards Track
Created: <YYYY-MM-DD>
Updated: <YYYY-MM-DD>

Abstract

This LIP defines a new schema to be used to serialize transactions. The main change is to make module and command identifiers to be of type bytes. This LIP also updates the terminology used for transaction and transaction properties.

Copyright

This LIP is licensed under the Creative Commons Zero 1.0 Universal.

Motivation

The Lisk protocol handles identifiers for transactions, modules, commands, and many more. The type of those identifiers is however not fully consistent, as some are of type uint32 (like module ID and chain ID) and others of type bytes (like transaction ID and block ID). Moreover, all identifiers used in the new state model to compute the store keys must be first converted to type bytes. Unifying all identifier types to be of type bytes simplifies their handling and avoids unnecessary type conversion.

Rationale

Unifying Identifier Type

The type identifiers in the Lisk protocol are not fully consistent, as some are of type uint32 and others of type bytes. This means that special care is required to use the proper type whenever the identifier is used. Further, identifiers are often used to compute the store keys in the state tree and for this purpose must always be of type bytes. Hence, identifiers of uint32 type need to be converted to fulfill this purpose. This implies that the implementation very often converts those identifiers from their integer form (as schema entries) to the corresponding bytes (as store key). The new transaction schema introduced in this LIP sets the module ID and command ID to type bytes.

Defining identifiers as type bytes also requires fixing the length of the identifier. This was not possible when using identifiers of type uint32, as the full 4 bytes of the maximal range always had to be assumed when using the identifier in the state tree.

New Property Names

All properties in the proposed transaction schema are equivalent to the ones defined in LIP 0028. The only changes are the renaming of assetID to commandID and of asset to params. As was described in LIP “Update Lisk SDK modular blockchain architecture”.

Specification

The transaction schema defined in LIP 0028 is superseded by the one defined below.

The params property must follow the schema corresponding to the moduleID, commandID pair defined in the corresponding module.

All transaction procedures - serialization, deserialization, signature calculation, signature validation and transaction ID - follow the same specifications already defined in LIP 0028. The resulting serialization or signatures are however different when the proposed transaction schema is used.

Constants

Global Constants
Name Value Description
MODULE_ID_LENGTH_BYTES 4 The length of module IDs.
COMMAND_ID_LENGTH_BYTES 2 The length of command IDs.
PUBLIC_KEY_LENGTH_BYTES 32 The length of public keys.
SIGNATURE_LENGTH_BYTES 64 The length of signatures.
Configurable Constants
Name Mainchain Value Description
MAX_PARAMS_SIZE_BYTES 14 KiB (14*1024 bytes) The maximum allowed length of the transaction parameters.

JSON Schema

Transactions are serialized using transactionSchema given below.

transactionSchema = {
    "type": "object",
    "required": [
        "moduleID",
        "commandID",
        "nonce",
        "fee",
        "senderPublicKey",
        "params",
        "signatures"
    ],
    "properties": {
        "moduleID": {
            "dataType": "bytes",
            "fieldNumber": 1
        },
        "commandID": {
            "dataType": "bytes",
            "fieldNumber": 2
        },
        "nonce": {
            "dataType": "uint64",
            "fieldNumber": 3
        },
        "fee": {
            "dataType": "uint64",
            "fieldNumber": 4
        },
        "senderPublicKey": {
            "dataType": "bytes",
            "fieldNumber": 5
        },
        "params": {
            "dataType": "bytes",
            "fieldNumber": 6
        },
        "signatures": {
            "dataType": "array",
            "items": {
                "dataType": "bytes",
            },
            "fieldNumber": 7
        }
    }
}

Validation

For a transaction trs to be valid, it must satisfy the following:

  • trs.moduleID is of length MODULE_ID_LENGTH_BYTES.
  • trs.commandID is of length COMMAND_ID_LENGTH_BYTES.
  • trs.senderPublicKey is of length PUBLIC_KEY_LENGTH_BYTES.
  • all elements of trs.signatures are of length SIGNATURE_LENGTH_BYTES.
  • trs.params is of length less than or equal to MAX_PARAMS_SIZE_BYTES .

Backwards Compatibility

This LIP results in a hard fork as nodes following the proposed protocol will reject transactions following the previous schema, and nodes following the previous protocol will reject transactions following the proposed schema.

Reference Implementation

TBD

Appendix

In this section, we present a serialization example for a transfer transaction. To calculate the signature, we use the network identifier: networkID = 9ee11e9df416b18bf69dbd1a920442e08c6ca319e69926bc843a561782ca17ee and the tag: tag = "LSK_TX_".encode().

Transaction object to serialize:

myTrs = {
  "moduleID": '00000002',
  "commandID": '0000',
  "nonce": 5n,
  "fee": 1216299416n,
  "senderPublicKey": '6689d38d0d89e072b5339d24b4bff1bd6ef99eb26d8e02697819aecc8851fd55',
  "params": {
    "amount": 123986407700n,
    "recipientID": '2ca4b4e9924547c48c04300b320be84e8cd81e4a',
    "data": 'Odi et amo. Quare id faciam, fortasse requiris.'
  },
  "signatures": [
    '9953f164f9664e05526c1e3a10c4631715cdcb9fd4f376bf7db5334ded3bbc8470bce023d67c7aca16cf3389ea01f3e3c011820c317f1f5a63f98bb6d6b34b07',
    'a95fc611f7207ddaaaf7929f8f19b7c1cb2473ead20e9be99b8c0abc148b4ea35713ed296acbd6612f124698e96d57e6fde0eddbb998b86203d04ff3c3976700'
  ]
}

Binary message without signatures (132 bytes):

0a0400000002120200001805209883fdc3042a206689d38d0d89e072b5339d24b4bff1bd6ef99eb26d8e02697819aecc8851fd55324e0894e2a9f1cd0312142ca4b4e9924547c48c04300b320be84e8cd81e4a1a2f4f646920657420616d6f2e2051756172652069642066616369616d2c20666f7274617373652072657175697269732e

Transaction ID:

48d354de94872d87556d6be51d2b6418dcadcec9

First key pair:

private key = 42d93fa53d631181540ad630b9ad913835db79e7d2510be915513836bc175edc
public key = 6689d38d0d89e072b5339d24b4bff1bd6ef99eb26d8e02697819aecc8851fd55

Second key pair:

private key = 3751d0dee5ee214809118514303fa50a1daaf7151ec8d30c98b12e0caa4bb7de
public key = aa3f553d66b58d6167d14fe9e91b1bd04d7cf5eef27fed0bec8aaac6c73c90b3
2 Likes

I appreciate the consistency of the changes.

On another but related topic, this will make the manual creation of a transaction less intuitive.
The tooling will have to be updated so in userland they can actually pass a plain old javascript number or a string, because that’s how they will be documented.
So less type conversion internally (good) but more type conversion in userland (bad). One could even argue that it’s worse because it’s more error prone in userland while the internals of the SDK is a more controlled environment.
Also at some point (block or command lifecycle, plugins) the developer may try to check the type of command to conditionally execute some code, that will require him to compare the moduleID and the commandID of the transaction with the app constants, which will be known as number or strings. That will require additional boilerplate.

Using bytes instead of uint actually open the possibility for string based ID. Moving forward with this, why not extend the allowed length to allow more descriptives IDs ?

Using bytes instead of uint actually open the possibility for string based ID. Moving forward with this, why not extend the allowed length to allow more descriptives IDs ?

I like the idea here. If we allow general bytes, it is possible to have string ID and we can even merge id and name for module and command in SDK.
The down side to this is the increase in the transaction size. Thus, increase in tx fee and storage size.
If we can agree / accept that, user experience wise on both developer / user, I think it would be quite good

On the other hand, the moduleID is also used in the state store, and for that it needs to have a fixed size. Furthermore, a shorter moduleID implies less hashes while calculating the stateRoot or the eventRoot.

We could still think about mapping an arbitrary string to a unique fixed-sized output (e.g. by hashing the moduleID before using it as a key prefix), but then we would lose control on the state tree structure, which allows, for instance, to reserve some module IDs and even assign a very favorable one to the interoperability module, which in turn decreases the size of witness inside a cross-chain update command.

Why the fact that the moduleID is stored in the state store requires it has a fixed size?

Less hash is a real thing in theory but does it have a practical impact on the performance? If so, does it become a problem?

Can you elaborate a bit more on the last part please?

The moduleID is used as a prefix for all leaf keys in the state tree: lips/lip-0040.md at main · LiskHQ/lips · GitHub

This allows us to separate the module stores from each other. It also implies that it has to have a fixed size, because keys in the sparse Merkle tree must all have the same size: lips/lip-0039.md at main · LiskHQ/lips · GitHub

Regarding the number of hashes: it is true that probably there is not a huge practical impact on the performance, but we should also consider that it would imply longer witnesses in the sparse Merkle tree (because there are more opportunity to “branch” the tree with a non-empty node, I hope it’s clear what I mean).

This is exactly what I mean would happen in the last part of my comment. We are reserving a very “favourable” moduleID for the interoperability module. Basically the interop module would have the only ID starting with a 1 (details are not 100% set in stone yet). All other modules would therefore fall under the same subtree (the one whose first bit is a 0), so that only a single hash is required in all inclusion proofs involving the interop store, as, for instance, those included in a cross-chain update command.

For the fixed length we can set a reasonable max length, and the merkle tree implementation will pad the moduleID with null (or anything) bytes.

Regarding the merkle tree witnesses, I’m not trained enough on the topic, what is the impact of having more witnesses?

For the favorable ID, you can reserve some prefix, let’s say a moduleID cannot start with “@” or byte 0. You can even make that transparent in the implementation by automatically prefixing userland modules by some bytes you like.

Having a longer witness, i.e. the fact that more non-empty hash are necessary to recalculate the Merkle root, impacts the size of CCUs, which in turn means higher fees for these transactions.

I think what you are proposing is basically to have longer moduleIDs (sorry if I took a while :slight_smile: )

Indeed, we can just reserve the first couple bits to constraint the maximum size of witnesses in a CCU, so that would not be necessarily a problem. Keep in mind that this probably means that we lose a whole byte when we utf8 encode strings (since IIRC you need at least one byte per character).

Maybe we can then phrase the discussion in these terms: should we extend the length of moduleIDs so that it would be possible to also encode memorable string values in them (but still keeping them constant size)? And if this is the case, what could be a sensible length?

pro: more user friendly IDs
con: larger transactions
con: performance of the sparse Merkle tree (but we should really check this)
con: maybe the DB I/O is also marginally impacted (this is probably minimal)

To make sure we stay on one by per character we can constrain to the ISO-8859-1 charset. But I don’t see why we should limit that, because the most common characters take only one byte with utf8. So using utf8 would cost almost nothing while improving the developer experience by reducing the number of bad surprises.

A “good enough” length would be 16 I believe although 24 or 32 would be more comfortable.

The length does not have to be fixed except for the merkle tree (afaik), so this merkle tree implementation will have to take care of the padding.

The DB performance won’t be impacted. We are talking about a few bytes, that’s nothing.

For the merkle tree this will have to be benched.

I think we should merge this LIP as is, and come back to the question of the module ID length later on, once we have benched the tree. If we change the length, it would impact also other LIPs regardless.

I think if we merge it as it is, it will be forgotten and we’ll never get back to it.

But this LIP does not define the length of module IDs per se. The most relevant LIP for this would be LIP 40, and then as I said other LIPs would need to be updated. Maybe you can open another discussion thread for this topic?

The spirit is just to turn the IDs into bytes you’re right. However the new defined schema clearly states a fixed length. It’s up to you if you want to go forward with it somewhere else :wink:

Ok, my suggestion would be to move this discussion to this new thread: Length of `moduleID` property

I created a pull request for this proposal on GitHub:

https://github.com/LiskHQ/lips/pull/155