Tuesday, September 25, 2012

Lambda Architecture

aka "Runaway complexity in Big Data, and a plan to stop it."

Nathan Marz's talk tonight at Strangeloop coined the term "Lambda Architecture" to describe a hybrid batch+realtime data engine built on functions running over immutable data. This builds on themes from his "Big Data" book.

The pieces all exist, but there's no simple packaging over all of them : distributed raw data store, map-reduce for batch (hadoop/mapr with pig, hive, etc) to precompute views that are stored in fast-read, map-reduce-writable DBs (voldemort, elephantdb), storm for streams, high throughput/small volume db for the storm output (cassandra, risk, hbase), and a custom query merge on top of both. There's no pre-made piece for the custom query merge, possibly storm works there.

Exciting and awesome!

slides and a HackerNews discussion

Monday, September 17, 2012

OSX + remote vim clipboard sync

IT WORKS! <sfx: evil genius laugh />

A few pieces are required to get smooth integration of the local OSX clipboard with the remote vim clipboard. I'll walk you through the configurations and you'll be cutting-and-pasting like it's no big thang. Pasting large blocks of text works so much better via "+p than via system paste into an xterm vim window.

Puzzle Pieces:

  • OSX clipboard syncing with the local X11
  • X11 forwarding in ssh
  • vim compiled with with +xterm_clipboard setting.
  • optional: configure vim to use xterm clipboard by default
  • optional: a better OSX terminal: iTerm2
  • optional: screen + DISPLAY var

OSX clipboard sync with X11

  1. launch X11 (Applications::Utilities::X11)
  2. open pasteboard preferences (X11::preferences::Pasteboard)
  3. check:
    1. enable syncing,
    2. update Pasteboard when CLIPBOARD changes,
    3. update CLIPBOARD when Pasteboard changes
  4. you may want to quit X11 now to ensure the new settings are saved.
  5. Note: don't set update Pasteboard when CLIPBOARD changes, as it produces a very strange paste behavior where full lines will paste as relative.

ssh X-forwarding:

You can enable this on the fly via the -X flag to ssh or by adding "ForwardX11 yes" to your .ssh/config file. ForwardX11 can be set globally or per-host.
Example .ssh/config entry for my vm:
host vm53
  user vm53
  ForwardX11 yes
The forwarding provided via ForwardXll is seen as untrusted by the X Security extension. Untrusted clients have several limitations: they can't send synthetic events or read data from other windows and are time limited.

If you really trust the remote host you can use Trusted forwarding. This is enabled with the -Y flag to ssh or the "ForwardX11Trusted true" option in .ssh/config. I've switched to using trusted connections when connecting to my local VM since my connections are open for days/weeks at a time.

host vm53
  user vm53
  ForwardX11Trusted true

Vim with +xterm_clipboard

Check the capabilities of your vim via vim --version, you're looking for +xterm_clipboard.
vm53% vim --version | grep xterm_clipboard
+xsmp_interact +xterm_clipboard -xterm_save
If your version of vim doesn't have xterm_clipboard, try another package. I'm using vim-nox for my debian/ubuntu machines.

At this point, you should be able to cut and paste using the + buffer to interact with the system clipboard. Paste with "+p and copy/yank with "+y. Under X the clipboard is in the "+" buffer, under windows it is the "*" buffer. In OSX gvim, "+" and "*" appear to be the same buffer?

configure vim to use xterm clipboard by default

Remembering to use the + buffer is extra work. We can make this automatic by setting the clipboard option in vim. set clipboard=unnamedplus (added in Vim 7.3.074) to use the system clipboard when using the default (unnamed) buffer. At this point, p will paste from the system clipboard. AMAZING!

iTerm2

You should ditch the default Terminal app that comes with OSX and use iTerm2 instead. You can have it do "copy on select," just as you'd expect from an Xterm, and it all ties into the work we did above. It also has some other interesting features, like native tmux support.

DISPLAY env with screen

When reconnecting to your remote screen session, you may end up with the DISPLAY variable out-of-sync. By default, I get DISPLAY=localhost:10.0 when I connect to my VM. But each connection opens a new back channel on a new port :11.0, :12.0, etc. You may need to update the value of DISPLAY inside your screen session, via export DISPLAY=localhost:10.0 with the correct DISPLAY value for this ssh connection -- check env DISPLAY outside of the screen session to get the value.

P.S.

I had some troubles testing until I realized I was expecting select-to-copy behavior in Chrome Browser under OSX. Ha! I'm glad I finally spent the 20 minutes to get all these pieces aligned.

Update

Updated to show that X users want the + buffer rather than the * buffer, after reading up on the original patch.

Update

Updated with -Y/X11ForwardTrusted information.
Updated to warn against "update pasteboard" option in OSX X11 app

Wednesday, September 12, 2012

Modify submit-type for Gerrit project via meta/config

The Submit-type of a gerrit code review project can not be changed in the UI after creation. It can be modified via the hidden meta/config branch. Any setting available to create-project can be edited this way.

Project information is not stored in the gerrit database. The information is stored directly in the git repository in a branch named 'meta/config', in two files 'project.config' and 'groups'. The values from these files are cached in the'project-list' and 'projects' caches.

Steps to make a change:

  1. set read and push permissions on refs/meta/config
  2. check out the branch,
  3. change the files,
  4. push the repo back,
  5. clear the cache.

Check out the branch:

% git fetch origin refs/meta/config:refs/remotes/origin/meta/config
% git checkout meta/config

Push back the changes:

#directly:
% git push origin meta/config:meta/config
#via review:
% git push origin meta/config:refs/for/refs/meta/config

Flush the caches:

% ssh gerrit gerrit flush-caches --cache project_list
% ssh gerrit gerrit flush-caches --cache projects

project.config

[access "refs/*"]
        owner = group MYgroup
[receive]
        requireChangeId = true
[submit]
        mergeContent = true
        action = merge always

groups

# UUID                                          Group Name
# eca2c52d733e5740a01747e71f018dcfdeadbeef      MYgroup
I found the meta/config mentioned in some posts (post post) in the repo-discuss newsgroup.