Interesting article in Technology Review today about the coming ‘Data Deluge’. Our ability to generate and collect data is surpassing our ability to analyze and process it. Re-inventing classic relational database is no mean undertaking – but seems inevitable. From the article:
Consider Facebook. Already host to more digital photos than any other company, Facebook is building new storage and processing infrastructure as fast as it can. Yet it is pushing the database technology it is using to the limit, splitting its famed social graph across 4,000 databases that must all work together as one, Stonebraker says. “They are just dying under the load of the management layer needed to keep this system up,” he says. “They have the hardest database problem on the planet, and there’s no current system that will meet their needs.”
The story goes on to mention ‘Column based Data Stores’ which led me to this article analyzing the performance of Row vs Column based data stores. The paper contains a really fascinating analysis of what it takes to execute a basic set of queries on a star schema using row based vs. column based data stores – and how the column based reads end up being orders of magnitude faster.