16. Storage

Blockchains generate different types of data that need to be persisted. Over the years, many different types of database systems have been developed to meet different use bases. In-memory databases usually offers the best performance in terms of access speed. But because of limited capacity, keeping everything in RAM isn’t always practical in reality. Cloud storage services like Amazon S3 provides cost efficient options for achieved data at cost of access speed.

1. Embedded DB

Most blockchains rely on embedded databases like LevelDB and RocksDB to store various types of data. Embedded databases are known for performance and shorter latencies. In addition, because everything is in one piece, there is little maintenance required. For example, Ethereum is stuff everything (Merkle tree, blocks, receipts etc. ) into a LevelDB instance.

However, when network grows the storage becomes a big problem. First, embedded databases are difficult to scale once they are running out of storage space. Second, embedded databases are still not fast enough to support high TPS.

2. Storage Service

Arcology replaced the stateDB with a storage service. The storage service provides a set of standard interfaces for external parties to use. The actual database system is hidden behind these service interfaces to make the service vender-agnostic. The actual databases in the storage service only flat state data meaning the Merkle tree is no longer tangled with the state data. The eshing service will take Merkle proofs instead.

3. Decoupling

This design has some unique advantages. First, because of everything is hidden in the storage service, the underlying database are completely decoupled with the data users. An Embedded database instance is no longer the only choice. User can even choose to run multiple databases instances behind the service to fulfill different needs. Second, since the state data are no longer stored in the Merkle tree, they could be retrieved and updated through a direct key value pair lookup, reducing search complexity from O(log⁡n) to O(1).

4. Tiered Storage

An ideal DB system for blockchains has to meet a few seemingly contradictory requirements. There is hardly a one-size-fits-all solution. Arcology classifies the data into different categories and assign them to different databases based on their access frequency.

For example, access speed is paramount while executing a transaction. In contrast, for the sake of cost cutting, it is best to put bulky and less frequently accessed data (i.e. old blocks and old receipts) into more cost-efficient storage solutions.

5. Flexibility

There aren’t strict rules for how data should be categorized or exactly where they should be saved, the decision is up to users to decide based on their budgets and expectation on overall system throughput. Performance-wise, putting all the data in a memory DB would certainly be a perfect choice. On the other hand depending solely on cloud storage is also perfectly possible.

Last updated