You can't engage in a conversation about IT today without hearing having cloud computing dropped in the first two sentences. But behind that term is an overwhelming number of types, issues, solutions, and architectures to consider and digest. The world could benefit from a translation of sorts to explain to the cloud non-experts all this IT mumbo-jumbo. Here's my attempt.
Multiple Cloud Types
It's common to envision "the Cloud" as one huge computer network hoarding gobs of information. However, there are many clouds and even different types of clouds, each suitable for different types of problems. Specific features and benefits of cloud types should affect decisions in developing and deploying cloud solutions. And sometimes one cloud type isn't enough, and multiple cloud types need to be combined to solve a problem. For example, a utility cloud often provides the core computing resources needed for data and storage clouds. Here is a closer look at four of the most common types of clouds that I encounter in enterprises.
Utilizing the Utility Cloud
The utility cloud provisions and manages large networks of virtual machines to provide on-demand computing resources that scale horizontally on standard hardware. Utility clouds are often accessible via Application Programming Interfaces, or APIs. This allows nearly anyone, including business users, to provision compute resources to address their IT needs. Examples of utility clouds are Amazon Elastic Compute Cloud (EC2), Openstack Compute (and Rackspace Cloud Servers commercial offering), VMWare vCloud Suite, Microsoft System Center, and Apache Cloudstack.
The Storage Cloud Persists and Grows
The storage cloud type actually contains various subtypes. One type of storage cloud provides API-accessible storage for applications to access block devices or enterprise storage devices, and is typically used to provide backup, archiving, data retention, and document storage. Storage clouds can also be used to synchronize content across multiple devices, systems, etc. and often provide both API and human interfaces for accessing the stored content. Example of storage clouds are Amazon Simple Storage Solutions (S3), Openstack Object Storage, Apple iCloud, Google Drive, Google Cloud Storage, Microsoft Skydrive, and SugarSync.
Dissecting the Data Cloud
The data cloud type is one of the fastest growing, and is characterized as analytic-driven, horizontally scalable processing of large amounts of data, complex data, and other Big Data sources. Of all the cloud types, the data cloud is currently the most complex in terms of the number of software components and the analysis of decision criteria to determine what data cloud framework(s) to deploy. This section will attempt to demystify the data cloud and the multiple frameworks available and why. Data clouds require analytics framework(s) to make sense of the mass information they hold. Analytics perform complicated algorithms for data source correlation, data efficacy, entity disambiguation, relationship identification, trends, etc. However, the framework often decides how that analytic performs its magic.
Apache Hadoop is an open-source software for reliable, scalable, and distributed computing. Its framework allows for the distributed processing of large data sets across clusters of computers using simple programming models. Hadoop is used to process petabytes of data by some of the best-known web companies, including Yahoo, LinkedIn, Facebook, AOL and others. Hadoop is most often used for batch processing of large amounts of data, but offers streaming capabilities in the open source project as well as in commercial offerings.
Twitter's Big Data problem was more than just allowing its users to track the whereabouts and happenings of Lady Gaga and Justin Bieber. It also provides Twitter Trends by user location, so users know what is hot in their area in near real-time. Twitter acquired STORM in their 2011 BackType acquisition and now provides STORM as a free and open-source distributed real-time computation system, in addition to powering its own Twitter Trends. Whereas, Hadoop is most often used for processing large, complex data objects in batch mode, STORM is most often used for processing smaller data objects in real-time.
Several data cloud repositories exist and each provides unique benefits. Apache Cassandra was created by Facebook and still used by companies like NetFlix. Cassandra is a key-value data repository that provides a rack-aware highly available service with no single point of failure. Apache HBase is an open-source, distributed, scalable column-oriented key-value store modeled after Google Bigtable. It provides data versioning and is capable of processing billions of rows X millions of columns. Apache
Accumulo is an open-source, sorted, distributed key/value store with robust, scalable, high-performance data storage and retrieval system. One of its key differentiators is that it offers secure, labeled access at cell level. Accumulo was developed by a U.S. Intelligence Agency and has been the recent topic of Congressional debates on its use within the IC.
Protoyping in the Data Cloud
The prototyping cloud is actually a type of utility cloud, but whereas utility is most often focused on production scalability, a prototyping cloud is focused on prototyping new capabilities. Most of the utility cloud providers offer fast access to server images to be used for prototyping.