Tuesday, September 25, 2012

Lambda Architecture

aka "Runaway complexity in Big Data, and a plan to stop it."

Nathan Marz's talk tonight at Strangeloop coined the term "Lambda Architecture" to describe a hybrid batch+realtime data engine built on functions running over immutable data. This builds on themes from his "Big Data" book.

The pieces all exist, but there's no simple packaging over all of them : distributed raw data store, map-reduce for batch (hadoop/mapr with pig, hive, etc) to precompute views that are stored in fast-read, map-reduce-writable DBs (voldemort, elephantdb), storm for streams, high throughput/small volume db for the storm output (cassandra, risk, hbase), and a custom query merge on top of both. There's no pre-made piece for the custom query merge, possibly storm works there.

Exciting and awesome!

slides and a HackerNews discussion

