After nearly fifteen years of building block storage and SAN technologies, I joined the Centera organization. Talk about a different technology.
Talk about a breath of fresh air.
When I get asked for my opinion about the future of Centera, or of CAS, or of XAM, I use one word: bullish.
Because I've seen how it's built. Therefore I know what it can do. I don't think the technology has even come close to hitting its stride.
Here's why.
Four Differences
I build software. When I transferred to Centera, I wanted to know how CAS was built. So I literally spent my first two months reading the Centera internal software. And at the end of the process, I was pretty impressed with what I had learned. The Centera team had tried a number of things that I had never even dreamed of trying in the SAN world. Here are some of the differences that stood out.
#1: Unmodified Trust
When I worked on the PowerPath framework I wrote device drivers that passed data between an application and a storage system. Picture an application handing a buffer to a device driver and saying: "read this data, and guarantee me that it's exactly the same as the first time I ever wrote it". If somebody asked me to write such a driver I wouldn't quite know where to begin. I had never even considered creating such a thing.
Centera does this on every read request. The application asks: "read this data". The Centera driver (the Centera SDK) fetches the data from the storage system, puts it into the application buffer, and says: "here it is, exactly the way you gave it to me the first time you ever wrote it. Guaranteed".
How is this possible? Well, the Centera SDK examines every single byte that originally comes from the application, calculates a final hash, and then checks each byte against the final hash when the app requests the content. If the application overwrites content, the hash changes and a new piece of content is created. But the original content remains untouched.
Again, I had not seen this in the block world, nor in any file system.
#2 Forced Metadata
Imagine a SAN where every piece of data written must be accompanied by metadata. Imagine a file system where every single write operation had to store additional application metadata. Applications aren't accustomed to using these technologies in such a way.
With Centera, however, it's a must. You cannot write data to a Centera without writing metadata alongside of it. The data and metadata are intertwined by design. Consider an application that writes an XRAY to a Centera, and then terminates without sending the XRAY's metadata. Guess what? You can't get that XRAY back. The write of the XRAY is not "acknowledged" until the metadata has been sent.
The Centera SDK creates a certain amount of metadata by default. And it will also automatically create application-specific metadata by default if the application tells it to. On top of this, the application can add its own custom metadata for each piece of content (for example, patient="Steve Todd").
But at the end of the day, the application must commit the metadata to the Centera and receive an acknowledgement. Only then is the write considered "committed". Quite different, isn't it? The power of this approach is a topic for another post.
#3 Nothing to Do
Centera is a storage system. I've configured plenty of storage systems in my time. Unpack, cable, power-on, make first contact. Once network connectivity has been established, it's time to configure. Select disks. Select stripe size. Create LUNs. Assign to servers. Partition cache. Most storage system management software has automated many if not all of these steps.
When I first worked with Centera, and it came time to configure..... well, there was nothing to configure! Once it is up on the network, the application simply opens the IP address and starts writing content (followed closely by metadata!). That's unique. Now, there are some configuration parameters that can be set, of course. Which brings me to another feature that I had not thought possible before.
#4 Application Zoning
Centera calls them virtual pools. It's kind of like LUN masking in a SAN. Instead of granting data accessibility on a hardware basis (e.g. WWNs or switch ports), Centera grants accessibility on a logical application basis. So if I have an application that writes XRAYs to a Centera, and I have another application writing faxed mortgage agreements to the same Centera, the XRAYs and mortgage agreements live in different logical pools and can't be seen by each other.
This particular feature highlights an aspect of Centera which is starting to become obvious: Centera is a storage box that's in tight with the application. It can automatically insert metadata on behalf of the application AND it can identify an application's content.
The BIG Difference
I mentioned earlier in the blog that I'm bullish on this technology. Yes there's a stock market analogy here, but there's also a "that bull is charging" analogy.
As a software engineer I've highlighted some unique behaviors that I have not seen built before. And I haven't even mentioned retention, or retention classes, or system query, or profile clips, or any of the other very cool features that are available for application use. And it's only a matter of time before more applications start figuring out some unique ways to use these novel concepts. Hundreds upon hundreds of applications have already integrated with the Centera SDK.
This is the "Centera hits its stride" part of my blog. There's one, final, BIG DIFFERENCE that I have noticed about Centera. It's called XAM. XAM is an industry-wide standard API for writing content and metadata. And it removes one big obstacle that I believe has hindered Centera adoption: the view that the Centera SDK is EMC proprietary.
Take a look at the features and capabilities I've described. There was no available API that had that kind of richness, so EMC had to create its own. And as Centera adoption grew, EMC was quick on the trigger to standardize.
Why This is a Game-Breaker
I haven't even started describing the customer benefits of this technology. I've simply pointed out how it's different. I hope to write more about it soon. But many customers already know. And many customers are about to find out.....
To put it plainly, there's been a hesitancy on the part of some customers to purchase Centera because of the proprietary nature of the Centera API. XAM removes this concern. Customers can now take a look at the richness of this technology without fear of vendor lock-in. More applications can figure out new uses for unmodified trust, application zoning, and forced metadata. More administrators can reap the benefits of simplified storage administration.
Keep an eye on Centera. And don't wear red.
Steve
Thanks for this interesting post. I think the major difference between Centera and other EMC storage solutions is that Centera is implemented on the application level and not attributed to the specific hardware. That's mean that in threory you can use Centera code and XAM API and run it on any commodity hardware. I am curious if EMC is planning to offer Centera as an independent software package not bundled with hardware.
Centera software sounds similar to Apache Hadoop file system and API (http://hadoop.apache.org/), which is an amazing Open-source project, and Google file system ( http://labs.google.com/papers/gfs.html ).
It is really nice to see that EMC is doing amazing job in the field of distributed object-oriented file systems because it is a future of the storage.
Posted by: Eugene Gorelik | February 28, 2008 at 08:03 AM