Petabyte-Scale Active Archive in Private Object Storage

Accessible, Protected HPC and Big Data Analytics Archive with Avere and Western Digital

University of Warsaw's ICMIn the world of big data science, storage archives protect massive volumes of research-critical content. At the University of Warsaw (UW) Interdisciplinary Centre for Mathematical and Computational Modelling (ICM) in Poland, scientists rely on a petabyte-scale active archive built on Avere Systems and Western Digital technology. An essential component of ICM’s OCEAN research data center supercomputing infrastructure, ICM has 10PB of primary storage and seven petabytes of archive capacity for high-performance computing (HPC) simulations/modelling and big data analytics.

The archive solution integrates an Avere storage gateway that gives systems and scientists seamless access to Western Digital Active Archive object storage[1]. Grzegorz Bakalarski, chief of the ICM division that administers supercomputing infrastructure, says the solution combines NAS functionality to simplify access and the data durability required for cloud-scale environments. “The Avere technology lets us use familiar protocols and tools to connect to the Western Digital object storage. On the archive side, Western Digital’s 15-nines data durability ensures we can protect the valuable and often irreplaceable data generated by OCEAN supercomputers and researchers.”

ICM Deputy Director, Dr. Marek Michalewicz, adds, “One of the challenges we face in enabling big data science is providing sufficiently safe and affordable storage at petabyte scale. The combination of Avere FXT Edge filers and the Western Digital Active Archive System lets us take advantage of object storage efficiencies to support demand.”



Bakalarski says that in 2015 when ICM was planning for the OCEAN data center project, object storage was an unfamiliar architecture to both scientists and systems administrators – in Poland there were only a few small object storage installations. “As part of the public procurement process, we stipulated that the archive solution must provide petabyte scale as well as accessibility via NFS and SMB protocols to make the capacity more immediately usable by the entire OCEAN team.”

The primary objective of the OCEAN project was to build out a center dedicated to big data research and expertise, providing HPC-grade infrastructure for data collection and storage, data curation, and advanced data analysis. Bakalarski explains, "In May 2015 we began construction in an open field,  building entire 6000 m2 facility from ground to roof -  including power plants, climate control, fire protection, BMS, network systems etc.

November 2015 we began the final stage and installed the huge IT systems: 1100 nodes of CRAY XC40 supercomputer, 10PB ultra fast primary DDN storage and 400 nodes Big Data Huawei cluster.” 

The archive system installation was one of the final deliverables. “When the Avere and Western Digital team arrived, we were able to count time-to-completion in hours. In less than one full day, the team had deployed the archive, offered up a brief workshop, and addressed all of our outstanding questions.”

Final success of OCEAN project has been achieved with joint efforts of ICM's staff, vendors, and local IT integrator and partner, COMTEGRA S.A., who coordinated all IT systems deliveries, provided electric power and HVAC connections while interfacing with the construction company. They also installed the DDN and Huawei systems. COMTEGRA ( specializes in storage delivery and service.



Today an Avere FXT Edge filer cluster front-ending the Western Digital Active Archive System at the ICM OCEAN supercomputing data center presents some seven petabytes of usable archive capacity. The archive enables reliable access to aging data, ensuring availability for long-term and future research activities. The archive provides capacity for interdisciplinary teams representing some 200 scientists and developers working in areas such as air transportation, bioinformatics, climate modelling, computer-assisted medicine, cosmology, digital libraries, drug discovery, epidemiology, agriculture, high-energy physics, machine learning, material science, neurobiology, social-network analysis, numerical weather prediction, and more.



Scale to Eliminate Data Here, There, and Everywhere

The Avere cluster integrates with existing systems using NFS, SMB, and S3-compliant access protocols, giving scientists and administrators the ability to use familiar protocols and tools to move data into and out of the Western Digital object storage. The system currently safeguards a massive virtual library of science containing multiple international scientific databases, scientific publications from Elsevier, Springer, Wiley, ACS, AIP, APS, and many others, and numerous complete publishers’ collections of scientific journals dating from 1995. Other protected content includes more than 300TB of results from evolution-of-the-universe cosmology simulations, hundreds of gigabytes of digital master files from UW’s broadcast-television programming, and medical simulations data for vein-implant research.

The archive also protects more than 500TB and twenty years’ worth of unique meteorological data. Bakalarski estimates that every week new national weather forecast data adds several terabytes to the archive. “The Avere and Western Digital archive solution gives us the large scale we need to keep up with data growth. For example, our natural environment modelling group is implementing more fine-grained grid sizes—from a 17-kilometer grid to 4 kilometers and eventually down to a 1.5-kilometer grid—that will consume significantly more archive capacity. We’re also in the process of providing on-demand access to our weather-data repository, making regional, country, and even local precipitation and other meteorological profiles available to both researchers and government institutions.

“Another benefit of the archive is manageability. We’re now able to give users the space required to keep all of their data in a single, centralized location—we don’t have to break up huge datasets, move data around, and keep track of copies. The solution eliminates wasted space, as well as the administrative nightmare of managing data here, there, and everywhere.”

Petabyte-Scale Active Archive Data Center

Protection for Invaluable, Irreplaceable Data

In addition to providing inherently fault-tolerant object storage, the Western Digital Active Archive offers 15-nines data durability and a fail-in-place model with automated self-healing in the event of data corruption. Such functionality ensures availability and integrity of results and research data that may require decades-long accessibility. “Among the leading object storage vendors,” comments Bakalarski, “Western Digital offers one of the highest levels of durability and protection against the nearly inevitable multiple disk failures that at this scale could otherwise present serious risk of data loss.”

Value for Data-intensive Scientific Research and Discovery

“Overall, the Avere and Western Digital archive solution has proven to be a reliable system that delivers the cost-efficient capacity, performance, manageability, and supportability we require in the OCEAN data center environment,” summarizes Bakalarski.

“At this scale,” affirms Michalewicz, “there are real efficiencies in building an archive on object storage. The Western Digital Active Archive System provides a high-density, low-power footprint and significant cost savings over other archive products on the market. We also found benefits in deploying the archive as an on-premises private cloud. Compared to public cloud capacity, our private cloud solution delivers cost savings, as well as data access and security advantages. The solution delivers excellent value to the data center and to the researchers tackling compute- and data-intensive research challenges.”


[1] Western Digital Active Archive System previously named Western Digital HGST.