Cloud Computing for Batch Operations
Cloud Bursting for On-Prem Data Access
Most large organizations today have capital and operational investments in their existing data centers. When considering cloud for resource expansion, it’s important that you can maximize your existing infrastructure for the greatest benefit. This video provides an overview of how to best use cloud compute for batch operations, covering topics like: how to avoid data gravity, when to burst data to the cloud, and where cloud cache fits into the batch processing workflow.
Whether it’s in owned assets or a rental contract that won’t expire for a few years, existing investments in infrastructure are hard to throw away and simply move everything to the cloud. Even still, there are many more challenges with moving all (or large portions) of their data into the cloud. This phenomenon, known as “data gravity,” has several different components. First, cost is an issue. How much time and money will it take to move data out of the data center into cloud storage? Second, what happens if I want to move my data to a different cloud provider or split it between two?
Both existing investments and data gravity make it impossible for most to even consider moving entirely onto the cloud. Many methods and procedures already in place are designed for the on-prem environment. These three committments (investments, data gravity, and existing processes) mean that the on-prem environment will be around for years to come.
Take Batch Operations to Cloud Compute
However, cloud resources can still be used in this type of environment. The most compelling use case is the idea of cloud bursting. Cloud bursting allows you to access massive cloud computing resources while keeping your data in your on-prem data center. Let’s look at an example of how cloud bursting is used as an alternative to other methods of expanding computing power.
For example, you’ve been using a grid for running batch processes and now have exhausted its resources, using all of its capacity. You now have two different options for what to do next. The first option is expanding the on-prem compute by adding more nodes, which means taking on more capital expenses. The second option takes the same grid technology and puts a grid in the compute cloud that runs the same operational software as the on-prem grid, which means that you only pay for this grid for the time that you use it to run jobs. In both of these options, you’re adding more computing capacity. The difference lies in how and how much you pay for it.
Keep Data Close with Caching
As shown above, cost is a very compelling reason to use cloud bursting. Still, other important factors should be understood before getting started. With a cloud bursting model, you are choosing not to move your data into the cloud. This means that a long distance between your data and your computing grid exists. Caching is the method used to manage latency when the distance between data storage and the compute grid is great. Adding a caching layer near the compute grid reduces bottlenecking and latency between compute and storage (as shown in this video here).
By placing a software-only caching tier inside the compute cloud, you can apply the same concepts to the compute grid in the cloud as on the grid on-prem. That caching environment pulls up the data and puts the hottest data as close to the grid as possible. This layer holds the batch operation data right next to the cloud computing grid, so that data is nearby and it can process that data quickly.
In summary, using cloud computing for batch operations when local resources have been exhausted is a more cost-effective and time-efficient solution than expanding local compute and storage resources.