Thursday, July 30, 2009
A month or two ago I started evaluating Hadoop, primarily HDFS. That was when I caught up to where people were 2+ years ago. I knew the Google papers were out, but I didn't pay much attention to them. Now, I'm back using Hadoop (I'm in the process of creating a development cluster, 4 nodes total - baby steps) and trying to understand the sparse multidimensional map approach to data storage (primarily HBase - but it would seem to also apply to BigTable, Cassandra, Dynamo, and Voldemort). I think this wiki entry from Jim Wilson has put more over the top - I think I actually get it.