Bridging Hadoop and the Cloud

Architecting a Hybrid Cloud Hadoop Solution

Using only HDFS has been the standard way to run Hadoop clusters in the data center. It takes advantage of local storage and local compute resources with complete rack awareness. This Hadoop architecture is very high performance; however what happens when you need more resources than you have locally?

Creating a cloud-based Hadoop architecture has been a challenge for many IT professionals. Keeping HDFS data in block storage is too expensive. Keeping data in commodity cloud storage uses unfamiliar protocols and requires vast amounts of time and networking to make work. Using NFS data with cloud compute has such terrible latency that it becomes an nonviable solution.

This 11-minute video takes a look at how you can overcome these obstacles to build Hadoop clusters on the cloud while still accessing data stored on-premises. You'll learn:

  • Why standard Hadoop workloads lend themselves to using a caching system with NFS protocols instead of HDFS
  • Where the differences are between a standard Hadoop architecture and one on the cloud
  • What decisions you will need to make to run Hadoop clusters more efficiently on the cloud
  • How a Hadoop solution architecture that includes a hybrid approach is the most cost-efficient solution for many organizations, and how to build this infrastructure without seeing reductions in performance