I have been blogging my way through the theme that application workloads drive innovation into the underlying high-tech infrastructure. In my previous two posts I gave evidence that the disk array industry was itself launched by the evolution of application workloads. Both posts focused on the performance aspect of application workloads (the upper portion of the Y-axis displayed below).
For this post I'd like to highlight that innovation in high-tech is also driven by the service level needs of applications (the X-axis).
In my first post I mentioned that the Y-axis is a bit confusing. The units at the top (I/O performance) don't match the units on the bottom (storage capacity). In a similar way the X-axis is a bit vague. Some would argue that the X-axis represents increasingly strict levels of governance, risk, and compliance (GRC). Others would say that it represents the requirements on availability of IT infrastructure services (e.g. five nines, or downtime of less than six minutes per year).
At a high level, as the X-axis extends to the right, the application places increased criticality on protecting and accessing the correct application information. As it extends to the left, attributes such as correctness and availability become less critical to the application.
These service level requirements, historically, have driven innovation in the same way as performance requirements. In order to demonstrate this point, I'd like to pick up where I left off in the last post.
The First Disk Array Innovators
Symmetrix innovated with the introduction of a cached disk array. The resulting product was extremely popular, causing a "gravitational pull" of data onto more and more Symmetrix systems.
In a similar way, CLARiiON innovated with the longest-lasting implementation of RAID5, and likewise customers began storing increased amounts of critical data onto the product.
Over time, users of both products realized that the lifeblood of their business was information, and they began to place very high service-level demands on their information infrastructure. Their disk array cabinets were holding massive amounts of disk drives, thus increasing the likeliness that one (or more) of those disks would fail.
Both product teams responded to evolving application workload service level requirements in two very different ways.
Timefinder and SRDF
In the 1990s the increased amount of disk drives on the customer's premise increased the odds of a double disk failure, which could result in lost customer data. Standard practice at the time was tape backup and restore.
Some customers, however, wanted a higher service level in terms of the amount of time it took to get back up and running with a restored copy of the data. This workload requirement caused the Symmetrix engineers to innovate by creating local copies of the data inside the Symmetrix itself. These copies, known as BCVs (business continuance volumes) were part of a product offering known as TimeFinder, and could be used as a much faster backup and restore mechanism. Over the years TimeFinder has blazed a continuous trail of innovation and solved a number of other application service level use cases as well. TimeFinder copies have also been used for disaster recovery testing and validation, point-in-time recovery, database consistency checks, and test/dev of offline data.
This innovation is often referred to as Snap Copy (or Snapshot), and is depicted below.
In addition to making local service level copies of data inside the array, the SRDF innovation (Symmetrix Remote Data Facility) allowed customers to make remote copies as well. This innovation provided a high service level to application workloads that often had to enable the entire business to be up and running at a remote site in the face of an outage at the primary site.
Clearly, application workload service level requirements drove innovation in the case of the Symmetrix technology. The SRDF innovation is depicted below.
In the mid-range, service levels drove the CLARiiON product to accelerate the implementation of the industry's first mirrored write cache.
Restore as a Workload
Often times during the workload discussion we think of applications such as financial trading, medical imaging, and database applications. Each of these represents a workload with distinct characteristics in terms of I/O rates, bandwidth/throughput, and service levels.
What is often left unstated is that the backup/restore process is a workload unto itself. And in the case of CLARiiON, the "restore" workload ended up accelerating the deployment of the industry's first mirrored write cache.
The first deployment of CLARiiON came with a warning: "don't use RAID5 for write-intensive workloads". The sweet spot for RAID5 applications would be 70/30 read/write ratio (or anything with a read ratio higher than that). Unfortunately, for customers restoring data to a RAID5 configuration, the restore application itself represented a single-threaded 100% write workload! Restore would take hours upon hours to complete given the latencies involved with writing to CLARiiON's earliest version of RAID5.
CLARiiON's roadmap always planned to have a caching mechanism (similar to what Symmetrix had introduced years before). Adding the capability, however, was no small feat. In order to add write caching functionality, the engineering team would have to add battery-backup capability, a vaulting area to de-stage the cache in the case of a power failure, and an industry-first mirroring of cached data to a separate processor.
This historical use case is another example of how service level workloads drive innovation. Over time, many disk arrays would possess innovative features such as RAID5 and mirrored caching, but the earliest version were driven by new workloads.
In my next post I will introduce how customers began running diverse workloads against the storage infrastructure, which in turn introduced new advances into the industry.
Steve
http://stevetodd.typepad.com
Twitter: @SteveTodd
EMC Fellow