Reducing the Need for HPC Data Migration

HPC workloads are incredibly large, encompassing datasets as large as several petabytes. With matching storage and compute requirements, organizations are determining how to best use the vast resources offered by cloud service providers to fill any gaps. However, large file sizes create difficulties when trying to move HPC data to these remote resources. This video from SC16 looks at one way to solve these challenges and make HPC in the cloud a viable option.

Traditional methods of moving data are expensive and very time-consuming. These processes often negate the value-add that the cloud offers. Moving all of your data to the cloud is not necessary in order to use cloud compute for an individual application’s workload. In fact, you don’t need to move large data sets at all. Cloud caching filers can often take on the data required to run each job, putting the data migration portion all onto this caching appliance.

The large datasets do not to leave your data center. The necessary HPC data (a small percentage of its total) is migrated via the caching filer to the application running in the cloud, where it is then used by the app. Then once it is finished, the filer sends that data back to its on-prem location.

If you were using a typical model that was entirely on-prem, you would need to move the data to free the local machine so that it can do the next run. With the cloud, you are able to deploy and tear down resources on-demand as you need them. Once your workloads are finished running, you can stop the billing for your compute usage, and at the same time you haven’t had to purchase additional hardware.


Video Transcription

Rich - Hi, I'm Rich with insideHPC. We're here at SC16 in Salt Lake City, and this afternoon we're at the Avere Systems booth with Bernie. Bernie, welcome to Salt Lake.

Bernie - Thank you very much.

Rich - Well, Bernie, let's start at the beginning for folks who might not know, who is Avere Systems and who do you help?

Bernie - Oh yeah, that's fine. Avere Systems, we are a software and hardware appliance vendor. We create high performance file systems that allow you to run workloads either in the cloud or on premises using both object storage and file-based storage behind us.

Rich - What are you showcasing this week at SC16?

Bernie - Our main solution that we're showcasing here today this week is to really allow our customers to move their HPC workloads from on-prem into the cloud across various cloud providers, like Amazon and Google, and then move their workloads back on-prem, all while not having to move data around to accomplish these tasks.

Rich - One of the biggest sins in HPC is moving data, because it's very painful because of the sizes of the files are so big. How do you guys tackle that challenge?

Bernie - It's absolutely a challenge because nobody wants to move data. It's an expensive and time-consuming ordeal. So, what we do is we offer high performance file system caching that captures the active working set of your application.

So, if you have a massive repository, multi petabytes, but your application is only crunching on maybe a few hundred terabytes, we can actually cache all that data on demand either in the cloud or on premises and serve the HPC for them as they request all this data. So, when we think of the typical model, the super computer has all this data, they crunch the simulation, and then they send it off somewhere because they need to free up the machine to do the next run.

Rich - And that's where Avere Systems would come in, wouldn't it?

Bernie - Absolutely, and that's also where the cloud becomes very attractive because you can deploy resources on demand, do your work, and then tear down all your resources. You've accomplished your task and now you've stopped the billing, and you don't actually need to buy any hardware in that case. So, that's very attractive.

Rich - So, Bernie, this seems to be the supercomputing about machine learning and analytics, what you do with the data afterwards. How does Avere Systems come into play there?

Bernie - We provide the performance level that feeds these applications. So, people write these applications to run across large, large data sets to find that very important sliver of information, or perhaps correlation of data, that's the whole big data problem. And our ability to cache and deliver data at a very high throughput, low latency in the cloud or on-prem gives them the ability to accomplish those tasks much faster with much lower cost.