Data based on streams of immutable events


Using messages to transform state

I’ve been interested for a while now in the idea of using messages to transform state.

For example, a web server may receive a message-based API call (eg. using something like Seneca in Node) that causes changes to the application state stored in a database.

The main reason for my dabbling in this area is to experiment with the idea that by using messages to effectively describe any change that occurs you get a stream of immutable events that can be replayed, examined for debugging, used to pipe the same effects to multiple systems, or even to form a seed stream to recreate the dataset at any point in time - which introduces the concept of undo into your data.

As a very nice side-effect the stream of events forms a complete and coherent custody chain for the data. Everything is guaranteed to be fully audited as the only means of change is via the event stream.

Turning the database inside out

I’ve just read Martin Kleppman’s transcript of his 2014 talk “Turning the database inside-out with Apache Samza” and found it greatly interesting. Where I’ve missed a step in my thinking is my implicit assumption, based on 30 years of DBMS habit, that the event stream should end in database updates - whether that be SQL, NoSQL, flat-file or something other - and that those updates create a set of data (relational or otherwise) that systems can query for specific subsets of information such as that needed for reporting or for feeding a web page account view.

This is not ideal.

  • It places data into a normalised form that is, by very definition, an intermediate structure to support multiple use cases
  • All use cases for the data require querying and reconstructing it from that intermediate form
  • Without auditing or extra work such as warehouse-style stamping, changes to the data result in overwrites and hence loss of the previous values

If you have a stream of immutable events which describe your data, then you have everything you need to present that data in any form you wish.

There is no need to involve the intermediate steps of normalisation, storage, stored procedures, warehouse transforms or whatever simply to repurpose that data for specialised use.

  • Take your stream and use it to create snapshots of the data in ready-to-use structures. If your immutable events are for a stock control system, for example, then an event that describes an item being picked can have snapshots of current stock levels, required JIT orders, picker performance and so forth.
  • They are effectively pre-prepared and always both available and performant. Separate snapshots for each requirement means everything is consuming an optimised source.
  • Consistency is assured. All systems that consume the event stream to create their own snapshots will see the same result for the same point in the timeline.
  • There is a single source of truth, fully audited and replicable by the application of sequenced events to your system’s starting point.

The event stream doesn’t feed the dataset, it is the dataset

How the snapshots are stored is an implementation detail. Martin speaks of materialised views over the logs as one means, with the pleasant result that persistence and consistency is handled directly by the database. He also points out that the use of multiple snapshots in the form of materialised views using indexes as cache key replacements eliminates the need for a cache as there is no longer the chance of a cache miss - read his article for details.

In summary, you have the following components:

  • timeline - the stream of immutable events forms a timeline of activity from which movements can be reported, snapshots in time derived, and data rolled forward or back.
  • snapshots - any number of frames which hold the state of the system at an instant in time and from a given perspective (current stock, yesterday’s cashing up).
  • transforms - functionality that can be fed a timeline (or portion thereof) and produce a snapshot.

If the materialised view option is taken then the snapshot is the view and the transform is it’s definition, whilst the timeline is the portion of the event stream the view is filtering on (or all of it).

What makes this particularly relevant at the moment is that I’m looking into redux for React, and the article from Martin is in the Thanks at the bottom of the first page. Having an idea of how redux works, I’m looking forward to using it to experiment with data based on streams of immutable events.