K Cartlidge

DotNet, Golang, Node, JavaScript, Ruby, Python, PHP.

Data based on streams of immutable events.

Using messages to transform state.

I've been interested for a while now in the idea of using messages to transform state.

For example, a web server may receive a message-based API call (eg. using something like Seneca in Node) that causes changes to the application state stored in a database.

The main reason for my dabbling in this area is to experiment with the idea that by using messages to effectively describe any change that occurs you get a stream of immutable events that can be replayed, examined for debugging, used to pipe the same effects to multiple systems, or even to form a seed stream to recreate the dataset at any point in time - which introduces the concept of undo into your data.

As a very nice side-effect the stream of events forms a complete and coherent custody chain for the data. Everything is guaranteed to be fully audited as the only means of change is via the event stream.

Turning the database inside out.

I've just read Martin Kleppman's transcript of his 2014 talk "Turning the database inside-out with Apache Samza" and found it greatly interesting. Where I've missed a step in my thinking is my implicit assumption, based on 30 years of DBMS habit, that the event stream should end in database updates - whether that be SQL, NoSQL, flat-file or something other - and that those updates create a set of data (relational or otherwise) that systems can query for specific subsets of information such as that needed for reporting or for feeding a web page account view.

This is not ideal.

If you have a stream of immutable events which describe your data, then you have everything you need to present that data in any form you wish.

There is no need to involve the intermediate steps of normalisation, storage, stored procedures, warehouse transforms or whatever simply to repurpose that data for specialised use.

The event stream doesn't feed the dataset, it is the dataset.

How the snapshots are stored is an implementation detail. Martin speaks of materialised views over the logs as one means, with the pleasant result that persistence and consistency is handled directly by the database. He also points out that the use of multiple snapshots in the form of materialised views using indexes as cache key replacements eliminates the need for a cache as there is no longer the chance of a cache miss - read his article for details.

In summary, you have the following components:

If the materialised view option is taken then the snapshot is the view and the transform is it's definition, whilst the timeline is the portion of the event stream the view is filtering on (or all of it).

What makes this particularly relevant at the moment is that I'm looking into redux for React, and the article from Martin is in the Thanks at the bottom of the first page. Having an idea of how redux works, I'm looking forward to using it to experiment with data based on streams of immutable events.