Genomics In the Cloud

Managing the growing data created in genomics sequencing workflows can be simple and more affordable by creating a Hybrid Cloud architecture that utilizes cloud computing and storage. In this video, Scott Jeschonek, Director of Cloud Products at Avere, introduces how a genomics in the cloud architecture works.

Sequencing labs have more and more genomic data to analyze than ever before. This has put a major strain on genome informatics infrastructures. Many have turned to public cloud providers (like Amazon EC2 and Google Compute Engine) to take advantage of their virtually unlimited compute resources. However, there are many obstacles in the way to make these resources useable. This video explains where these issues occur and how to using a caching layer to start analyzing genomic data in the cloud.

In this example, you have sequencers on-premises in your data center. They then deposit their data on storage that is also located in the data center. You need to expand your compute power for post-processing analysis. For this, you’re interested in utilizing the vast resources of the cloud, but you run into a few challenges.

In order to get your genome data into the cloud, you need to overcome the latency inherent in the WAN. Even then, once your data is placed in the cloud, your applications need to be able to use that data. These applications use NAS protocols and need to call on data that is also using these protocols. The issue here is that cloud natively uses object storage instead of NAS. Finally, you also need a way to get that data back after it has been processed, whether that is back in your data center or in the cloud.

All of this comes together with Avere’s virtual Edge filer (vFXT), which offers cloud caching to help overcome each of the above obstacles. Avere’s caching puts the hottest data as close to the cloud compute grid as possible. This read-ahead caching is used to smooth out the WAN by reducing the distance the client calls have to go. Secondly, Avere works as a cloud gateway. It provides necessary NFS mounts for any applications running in the cloud that utilize the NAS protocols found in on-prem data.

Finally, Edge filers are used to return the data. Using write-behind caching, you can write the data back to your on-premises NAS. However, you may want to use low-cost S3 storage in the cloud instead. Again, this runs into the issue of object vs NAS. The Edge filer uses NFS mounts to translate data stored into the cloud. So clients are able to write their data using familiar NFS protocols, but they are in fact sending it to object-based cloud storage.