Founder of Cloudera Explains What His Company Contributes To The Greater Good

These questions originally appeared on Quora - the knowledge sharing network where compelling questions are answered by people with unique insights.

Answers by Jeff Hammerbacher, Professor at Hammer Lab, founder at Cloudera, investor at Techammer, on Quora.

Q: What made you see the need for Cloudera in industry?

A: There was a lot going on in 2008:

Microsoft tried to buy Yahoo! in February 2008 [1]. This offer catalyzed discussions between Google (Christophe Bisciglia) and Facebook (me) about how to ensure Hadoop development would continue if Microsoft were to buy Yahoo! and end their investment in Hadoop. The first version of Cloudera was discussed with Christophe, Mike Abbott, and me as the founding team and Accel as the VC. Eventually Mike decided to stay at Microsoft and I decided to stay at Facebook. But we kept talking...
The first Hadoop Summit was held in March 2008 [2]. I had been attending Hadoop meetups around the Bay Area for a little more than a year and it seemed that the community was only a few dozen people. The Hadoop Summit had over 400 attendees, though, and drew people from around the country. I was impressed by the scale of interest in the technology.
Teradata released the 2500 series of "low cost" data warehouse appliances in April 2008 [3]. Their "low cost" appliance was still $125k/TB! I figured my costs for a Hadoop cluster were easily 1/10th that and could probably be squeezed down to 1/100th that in a year or two.
We contributed Hive to the Hadoop project in June 2008 [4]. Hive was a proof-of-concept that you could build a data warehouse on top of HDFS and MapReduce. It was ugly but it worked, and for clusters bigger than around 30 nodes it was actually better than anything else we piloted.
Microsoft acquired DATAllegro in July 2008 [5]. A number of shared-nothing distributed database vendors focused on the data warehouse market got going between 1999 and 2005, including Netezza, Greenplum, Aster Data, and Vertica. DATAllegro was the first to exit, and the price (rumored to be $275M) was higher than most expected. I ran a pilot with every one of these vendors and realized they were immature technologies that couldn't scale. The reference provided for me by one vendor had never installed their software; another corrupted data in the middle of a very simple benchmark workload; and a third crashed on a table name larger than 256 characters. And none were thinking about programmability and non-tabular data.
Oracle released Exadata in September 2008 [6]. I piloted this product when it was called "Sage". Oracle, the largest database vendor in the market, had focused on a shared-disk approach to scale out for years with Oracle RAC. The release of Exadata was a sign that shared nothing was the right approach for the future.

The confluence of all of these signals made me believe that there was an opportunity to build a low cost data management vendor who could handle more kinds of data, a higher volume of data, and a more complex workload than just SQL queries.

The vision is just starting to reach the market with

Ibis

and

Arrow

putting an expressive Python interface on top of

Impala

, a high-performance distributed query engine, and

Kudu

, a mutable column store optimized for scans. I'm excited to watch these individual projects cohere into a powerful and fast infrastructure for distributed data management and analysis.

[1] Microsoft Proposes Acquisition of Yahoo! for $31 per Share

[2] Announcing the Hadoop Summit at Yahoo, March 25th, 2008

[3] Teradata introduces lower-cost appliances

[4] Hive as a contrib project

[5] Microsoft to Acquire DATAllegro

[6] History of Exadata

...

Q: In what ways has Cloudera created public good?

A: A few ways:

Open source software. We will spend over one hundred million dollars this year on compensation for developers who write Apache-licensed (and often Apache Software Foundation-governed) open source software. If you believe that software is eating the world, you may also believe that open source software is a significant public good. If you're concerned about corporations owning strong AI, you should probably also be concerned with corporations owning the means to store and analyze large volumes of data. By making our core platform open source under a non-copyleft license, Cloudera ensures that any entity in the world can have access to the most powerful tools for data management and analysis at scale.
cloudera.com
Cloudera Cares: we give our employees two paid days off per year to volunteer. We also donate a few tens of thousands of dollars each year to non-profits.
Cloudera Academic Partnership: we provide course material and software licenses for free to universities that would like to use our software in their classroom activities.

These questions originally appeared on Quora. - the knowledge sharing network where compelling questions are answered by people with unique insights. You can follow Quora on Twitter, Facebook, and Google+. More questions:

Your Loyalty Means The World To Us

Dear HuffPost Reader

Thank you for your past contribution to HuffPost. We are sincerely grateful for readers like you who help us ensure that we can keep our journalism free for everyone.

The stakes are high this year, and our 2024 coverage could use continued support. Would you consider becoming a regular HuffPost contributor?

Dear HuffPost Reader

Thank you for your past contribution to HuffPost. We are sincerely grateful for readers like you who help us ensure that we can keep our journalism free for everyone.

The stakes are high this year, and our 2024 coverage could use continued support. If circumstances have changed since you last contributed, we hope you’ll consider contributing to HuffPost once more.

Support HuffPost

technology open source software Microsoft investing

Submit a tip

What's Hot

Founder of Cloudera Explains What His Company Contributes To The Greater Good

Support HuffPost

Our 2024 Coverage Needs You

Your Loyalty Means The World To Us

Related

Popular in the Community

From Our Partner

What's Hot

What's Hot

Support HuffPost

Our 2024 Coverage Needs You

Your Loyalty Means The World To Us

Related

Popular in the Community

From Our Partner

What's Hot

More In Tech