How to Speed Up Backtesting Results and Financial Simulations

Scaling out big data cluster jobs in financial services

When looking at how to handle big data in financial simulations, backtests, and risk modeling, it's important to fully understand how your compute and storage interact first. This video provides an overview of workflows for financial institutions, then looks at ways to speed financial analyses and backtesting results.

 

Before diving into storage capacity, let's look at the compute side of things. Financial simulations that use big data require a vast amount computing power, such as backtests, are what make managed grids so popular. As analysts generate jobs against the grid, the resource manager and scheduler work together to distribute those jobs across all of the resources in the grid. As you grow, your grid will reach capacity and you'll need to add more assets to your compute grid. While this may be simple to expand, it is not always economically efficient to build a massive infrastructure for compute capacity that is only needed part of the time. Check out this resource for more on how to flexibly expand resources for compute-intensive applications.

How does storage performance hold up for financial applications?

On the storage side of things, there are other issues that start to creep up. Network attached storage (NAS), used by many financial organizations, has a finite amount of access points, meaning that the compute nodes can only access the NAS data a certain amount at a time. As the number of jobs expand, the compute servers try to access the NAS more often. When more jobs try to access the data than the access points permit, they get bottlenecked by the storage that holds the large amounts of financial analysis data.

Adding more access point to the NAS can be costly and complex. Instead, consider adding a caching layer (like Avere's FXT Edge filers) to your financial analysis and backtesting workflows. This caching layer is placed as close to the compute grid as possible, where it caches the hottest data. It cuts down on the number of times that analyst jobs are trying to access the NAS. Instead, the analysts are accessing the data via the caching device. Since it doesn't have the same restrictions on access points as the NAS devices, it can handle many more jobs, eliminate the bottlenecks, and speed up time to results.

Connect our Remote Financial Analysts

Another concern for IT departments in the financial world is connecting with remote offices. One way to keep these employees connected to the data sets is replicating the information at every location. The second option is having each location connect to the main data center via a WAN. However, with this solution, you'll likely experience unacceptable latency.

The same caching layer as described above can be used to keep the hottest data near the users at each remote site (a caching layer at each location), eliminating latency for most processes. These remote analysts access the hottest, most valuable (at that particular time) data from these nearby Edge filers, helping analysts in any location speed up the results of their models.

In summary, you can still achieve faster backtesting results using lots of financial simulation data without making huge investments with traditional storage vendors to boost performance.