Edward Capriolo

Tuesday Feb 20, 2018

BlockChain! Building a NoSQL store - Part 15

My last blog entry in this series was on:


04/17/15 10:53 PM     04/17/15 11:36 PM     Triggers and Coprocessors - Building a NoSQL store - Part 14

That is a God damn shame. That is what that is. So we are back! Humans are creatures of habit. I started looking at blockchain recently and the first thing I thought was, (you might have guessed) 'Lets build a nosql with this!' and here we are.

First up. I really like the Ethereum stack. I am not interested in mining coins using GPU's or trying to make a big stock trade. I barely even contribute to my 401k. I am interested in is ethereum's smart contracts. Smart Contracts look something like and RPC language, however the interesting part is that you also also define data that the contracts manipulate.

It is a really heavy and interesting topic that I have not fully wrapped my brain around yet, but I made it work and I am impressed with myself.

Remix

Ethereum offers remix which is an online IDE for smart contracts. This is a pretty nifty way to get started.


 

Contact basics

Lets cover some smart contract basics.

contract ColumnFamily1 {

struct Right {
string column;
string value;
int256 version;
}

address public owner;
string public columnFamilyName;
uint8 keepVersions;

function ColumnFamily1(string name, uint8 versions) public {
owner = msg.sender;
columnFamilyName = name;
keepVersions = versions;
}

On the surface look like a class definition. Constructors are functions that have the same name as the contract. Variable definitions placed outside of a function are considered storage variables. If a function modifies a storage variable the result is stored in the blockchain.

Some things to note:

  • Address is a type that represents an entity like a person or contract in the block chain.
  • msg is an implicit variable that has information about the message

One pattern for access is storing the address of the entity that created the contract so that subsequent calls can require that the caller is the creator. Keep in mind that is not the only or even the preferred way to handle auth. There are other schemes where maps are keyed by address and users can only update there own keys, and others. That discussion is out-of-scope (aka I do not understand it well)

There is a particularly interesting feature called mapping.

 mapping (string => Right[]) data;
//Maps only the latest value of a single sell
mapping (string => mapping(string => string)) latestData;

Mappings are like hashmaps inside the block chain. They can contain other mappings ,however the left side of the mapping needs to be an immutable simple type. This makes sense as it serves as a pointer to data in the blockchain.

Getting started NoSQL in block chain

First, lets briefly talk about what flavor of NoSQL we want to implement. To keep it simple we will implement a version-ed ColumnFamily which is somewhat like hbase. We will punt on deleting (tombstones( for now.

Simply:

  • put (string rowkey, string column, string value, long version)
  • get (string rowkey, string column)

put("ecapriolo", "firstname", "edward", 1)
put("ecapriolo", "firstname", "bla", 0)

get("ecapriolo", "firstname") --> returns "edward"

contract ColumnFamily1 {

struct Right {
string column;
string value;
int256 version;
}

address public owner;
string public columnFamilyName;
//number of live versions of a cell to keep
uint8 keepVersions;
//Maps a rowkey to all columns
mapping (string => Right[]) data;
//Maps only the latest value of a single sell
mapping (string => mapping(string => string)) latestData;

}

Notice there are two mappings data and latestData. This is more of an implementation detail, however in ethereum the closest notion to CPU time is "gas". Gas is consumed during execution.

The data mapping keeps multiple versions of all cells inside a row. During a scan you would have to loop through it using gas.

The second mapping latestData holds the most recent version of each cell which provides an optimized get path. This will hopefully make more sense after seeing how set and get are implemented.

Implementing get

    function get(string rk, string column) public constant returns (string value) {
return latestData[rk][column];
}

'That was easy button' pressed. In short we know that latestData is a map-> map (string, string) designed specifically to make this lookup "optimal". Somehow the pointers around the blockchain manage this scale-able structure.

Implementing set

This is more meaty then get. The purpose of writeCell is to manipulate both mapping structures.

function writeCell(string rk, string column, string value, int256 version) public returns (int code) {

Right[] memory r = data[rk];
//If r is empty simply add
if (r.length == 0){
data[rk].push(Right({column: column, value: value, version: version}));
latestData[rk][column] = value;
return 10;
}

Note: during debugging I added some return codes to make it easier to see which branches were hit and avoid using the built in remix debugger. 'Return 10' above is an example/


int256 highestVersion = int256(-1);
int256 highestIndex = int256(-1);
//figure out what the max value is for real
uint256 lowestIndex = uint256(10000);
int256 lowestVersion = int256(10000);
uint256 matchedCount = uint256(0);
for (uint256 i=0; i< r.length; i++){
if (StringUtils.equal(r[i].column, column)){
matchedCount = matchedCount + 1;
if (r[i].version > highestVersion){
highestIndex = int256(i);
highestVersion = r[i].version;
}
if (r[i].version < lowestVersion){
lowestIndex = i;
lowestVersion = r[i].version;
}
}
}


Above is the standard interate the list and find the highest/lowest cells.



//if not found add it
if (matchedCount == uint256(0)){
latestData[rk][column] = value;
data[rk].push(Right({column: column, value: value, version: version}));
return 20;
}

These are built in logging macros


log0(bytes32(version));
log0(bytes32(highestVersion));

This is where the bulk of the latest version logic is implemented. The only magic being done here is that we replace the oldest cell if cell count is larger then keep versions. This will keep the structure inside a rowkey fixed with the number of columns. We could do more to sort it etc, but manipulation also uses gas so it becomes a workload optimization problem at that point.


if (version > highestVersion){
if (matchedCount >= keepVersions) {
latestData[rk][column] = value;
data[rk][lowestIndex] = Right({column: column, value: value, version: version});
return 30;
} else {
latestData[rk][column] = value;
data[rk].push(Right({column: column, value: value, version: version}));
return 40;
}
}
return 50;
}

Running it from Remix

Remix makes it pretty simple to get things running. You can not do "everything", but for a Javascript editor I was impressed. Here is a dialog where I try the constructor then call writeCell and get functions.


Maybe next we can implement scan or something fun. But there you have it, NoSQL + BlockChain, if you are interested as this as a unicorn please ping me! I want to get rich and retire.

I checked in the code here https://github.com/edwardcapriolo/ethereum-nosql

Comments:

Post a Comment:
Comments are closed for this entry.

Calendar

Feeds

Search

Links

Navigation