In a recent post I discussed some of the issues encountered in the data center when attempting to build horizontal data services across a server-based storage infrastructure. Backup, mobility, replication, and other data protection services need to function just as well for server storage as they have historically with disk arrays. These data services sit just underneath the GemFire level in a data lake architecture:
Deploying horizontal scale-out of data services at the server level is non-trivial. As I mentioned in my last post, it is a foundational design point for a data lake architecture. Unfortunately this design point is not solely limited to the server; the scale out data protection services must spill over to spinning disk as well.
In this post I'd like to spend some time discussing the relationship between the "hot edge" server level and the "cold core" world of spinning disk (HDDs). There are a set of problems which will definitely impact storage software internals. These problems fall into three categories:
- The tiering algorithms
- The growing performance disparity between hot edge and cold core
- The desire to avoid (or minimize) the amount of local HDD managed on-premise.
A good way to start analyzing the issues is to begin with the current state of storage system tiering:
Storage software inside state-of-the-art systems will analyze application traffic patterns and automatically move blocks of "hot" data closer to the app while moving "colder" data further away to various levels of spinning media (for simplicity I am only showing one tier here). EMC's implementation of this type of storage tiering is known as FAST (Fully Automated Storage Tiering or Technology).
As server-based storage becomes more prevalent, the performance disparity between the hot edge solid-state tier and the cold-core spinning disk will grow to the point where the servers will inevitably be waiting for the cold core to respond.
In order to solve this problem there are two new types of algorithms that will begin to appear in the storage software stack:
- Warm core solid state.
- Burst buffer technology.
The warm core solid state concept depicted here uses EMC's XtremeIO product, which contains internal storage software designed specifically for solid state (and not HDD). Many data centers already deploy XtremeIO as a primary storage system in its own right. However an additional usage is emerging where the system is used as a warm spillover target for hot edge servers.
As early as May of last year we began seeing the first hints of this architecture in Brian Gallagher's VMAX discussion at EMC World. At the time Brian referred to it as "zero-ary" tiering (as opposed to secondary or tertiary tiering). His reasoning was that existing VMAX customers could stand up a massive solid state disk array (e.g. XtremeIO) alongside the VMAX system to grow beyond the enterprise flash drives already in the VMAX system. The benefit of this approach is that it would preserve the data protection services (e.g. SnapCopy, RemoteMirroring) already in use by the applications on top of the VMAX.
The specific mechanisms to move data from the hot edge down to the warm core also need to improve. This may be accomplished by leveraging recent advancements in parallel burst buffer technologies.
Technologists at EMC and Los Alamos National Labs (LANL) have known for many years that the day was coming when the bandwidth required to burst data to persistent storage would require so many spinning disk drives that cost would become prohibitive. As early as three years ago these technologists proposed burst buffer storage technologies as an approach to solve this problem.
For a high-level description of their approach, a poster-size description of the EMC / LANL solution can be found here, with a similar presentation being found here. This approach uses burst buffering with solid state that ends up being cheaper and faster.
Tiering continues to be a critical element of a data lake infrastructure. As we follow the path of data from hot edge -> warm core -> cold core, it was pointed out to me by Milind Bhandarkar that Facebook is actually approaching absolute zero by building an ice cold tier with Blu-Ray:
Data Lake Surface Lures < Excellent article by @SteveTodd. But HDDs=warm tier. Cold tier=BluRay at FB http://t.co/BvQ0wG4oPI
— Milind Bhandarkar (@techmilind) March 1, 2014
Tiering is a second key point of discussion in the building of a 3rd platform architecture:
In my next post I'll discuss avoiding or minimizing the amount of HDD stored on-premise.
Steve
Twitter: @SteveTodd
EMC Fellow
Comments