I've been writing about the origins of RAID, observing the continual growth of CLARiiON, and commenting about the original software decisions that have fueled this growth.
As it is with many technologies, the word "RAID" originated in the academic community. It gained recognition with the Berkeley paper. How appropriate. Today's class is all about learning. Customers buying RAID solutions need to worry about the three Rs: readin', wRitin', and 'Rithmetic.
Now I'm sure you're saying: I understand the reading part, and I understand the writing part. RAID software at its essence is all about fast and protected reading and writing.
So what's this about arithmetic?
Time to go back to school.
Class, let's review the reading, and the writing. Pretend you're at Berkeley. I like that idea because it was 18 degrees Farenheit when I woke up this morning.
The Berkeley paper taught that the speed of disk drives was not increasing as fast as the speed of the CPUs asking for the data. Disks are slower; they can't keep up. So here comes RAID. Gang a bunch of disk drives together and make them look like one BIG disk drive. How does this address the problem? Well, this BIG disk can handle simultaneous read and write requests across the whole gang, and therefore the CPUs asking for the data think: "not only is this disk BIG, but it's FAST".
So fundamentally, RAID's genesis is related to disk performance. I've been consistent in my posts about that point. If I was teaching this class at Berkeley I'd say 'class dismissed' so we could all go out and get a little sunshine. Then I'd announce that our next lecture would be entitled 'RAID: what's the catch?'. If it's actually sunny where you live maybe you can stop reading at this point. If you happen to be my co-worker in St. Petersburg, Russia, keep reading!
RAID: What's The Catch?
The Berkeley paper went on to state that the catch is the increased chance of disk failure. Yes you get speed, but as soon as a disk fails, you've lost some of your data. And depending on what's on that disk, you might lose all of your data. So the paper proposes the addition of parity information alongside the customer data. This parity information is updated on every write, and consulted on every failure.
So now we've reached the second purpose of RAID: data integrity in the face of failures.
You might be asking, doesn't the update of parity information on write requests violate the first purpose of RAID (performance)? Hold that thought for a different lecture, because there's a good answer. I haven't made my point about math yet!
Class Dismissed
What do you mean class dismissed? We've talked about Readin', we've talked about wRitin', but we haven't talked about 'Rithmetic! Well, RAID 101 is over. We need to move to a different campus.
You can't ship academic papers to customers. You've got to make some choices about implementation. And to be true to RAID, the implementation must be (1) fast, and (2) correct (data integrity).
When faced with how to write the CLARiiON RAID algorithms, we made the decision to base our algorithms on math. Why? Because (1) math is fast, and (2) math is correct.
Math Class
So we've moved our lecture from the sunny Berkeley campus to the wintry streets of Hopkinton and Route 495. Sorry about that. Berkeley took it to one level, and companies had to make choices about what to do next. So let's begin.
RAID products have choices about how they lay out customer data onto disk spindles. They have choices about where they put the parity information. They have choices about how they ensure data integrity. It can be done in a variety of ways. But true RAID is fast and correct. And the fastest, most correct way, in my opinion, is to rely on mathematical lookup.
Mathematical Lookup is Fast
Customer asks for data. RAID algorithms receive the address where customer put said data. RAID algorithms plug address into mathematical algorithm to determine actual location. RAID algorithms proceed with operation. This is fast. CPUs can do this faster than looking on disk to determine data location. If the spirit of RAID is performance, mathematical lookup is the best way to get there.
Mathematical Lookup is Correct
My mathematical lookup is not going to make a mistake. There is not going to be a software bug, or a disk failure, that gives me the wrong answer. The math tells me exactly where the customer data is. And it also tells me exactly where my parity information is. This approach never returns to the application and says: "Sorry, I can't find your data", or "Sorry, there was a disk failure and I can't find the parity" (are these actual SCSI error codes?). It really is the safest way to go.
But there's more. I wrote in my last post that any true RAID solution needs DIBs, or data integrity bits. These DIBs are in addition to customer data and parity information. And if you can't find your DIBs (because you have to lookup where they are), the integrity of your RAID solution is compromised.
So where are the CLARiiON DIBs? Well, like I told you in my last post, they're attached right to the end of every single block of customer (and parity) data. So if there's a failure, and I need to consult my DIBs (to ensure correctness), I mathematically know exactly where they are.
This mathematical decision, made in the late 80s, has resulted in a trusted and fast solution. In the IT world, CLARiiON is a brand name that translates into fast and trusted. And I believe math is a big reason why.
Well just wait one second......
This static mapping is pretty inflexible, isn't it? What happens if I want to extend the capacity of my RAID solution? And there are no contiguous areas on disk? What about other types of virtualizing technologies and techniques that might add value to a customer?
I'm OK with adding all that stuff, because yes, it does add value. But at the very, very bottom of your virtualizing stack is your RAID implementation, make sure it uses MATH (and DIBs)! It's the fastest and most correct way to go.
The three Rs. Be educated about RAID.
Steve
Comments