In my last post I described how "global policies" are truly the Atmos special sauce.
These policies can control the initial placement of data, the migration of data, the compression and dedup of data, and so on.
How does Atmos implement these types of global policies? Answering this question requires a peek under the hood of how the Atmos software is architected. In this post I'll cover one of the more important data structures in Atmos, as well as a subset of the Atmos software components.
The data structure describing an Atmos Policy is called an "LSO".
Policy Described as a Tree Structure
An Atmos LSO is a data structure which describes a customer policy. Consider a policy which synchronously mirrors all new content to Boston and New York, while asynchronously mirroring a copy to a "spin-down" location in Shanghai. An Atmos administrator creates a name for this policy (e.g. policy="Gold") and then a tree is built to describe the policy.
This policy is stored on Atmos "metadata servers". The software component responsible for this task is called MDS. Once the policy has been created, the next step for an administrator is to specify the rules for triggering this policy. For example, there may be a specific user that wants content mirrored on the U.S. East Coast and spun-down (later) in Shanghai. Let's call this rule "User = 'Joe'". Whenever "Joe" introduces content into Atmos, this policy should be applied. The administrator makes this association; the MDS stores it.
Storing New Content
Let's take a look at how "Joe" then stores new content into Atmos. This requires the introduction of another software component: the Atmos Client. Atmos client software is a required component in the Atmos software architecture. If "Joe" is storing content to Atmos via Web Services, the Atmos client is invoked. If "Joe" is storing content via a file system interface, the Atmos client is invoked (think of a FUSE-type integration in this case).
When "Joe" creates content, the Atmos client contacts MDS with a create request, and sends along user metadata. This metadata typically includes user ids, file names, timestamps, etc. It can also include additional rich metadata if "Joe" is using a web services interface. The MDS parses this metadata and finds a policy match. Since the user's name is "joe", then "policy = "Gold" is the correct policy for Joe's data.
Populating the Location into an LSO
The LSO contains the geographic hints: New York, Boston, and Shanghai. At this point the MDS must locate the actual physical location to store the new data (i.e. there may be several Atmos storage servers in Shanghai, but only one supports spindown). The MDS gets this information from a software component known as the Resource Management Server (RMS). The RMS knows "the state of the world" for any given Atmos deployment.
The RMS provides the final piece of the puzzle. The MDS loads up the LSO with the physical location of the storage, and the LSO is returned to the Atmos client. Joe begins his write operation, and the client directly streams the content to the proper synchronous locations. When Joe is done, Atmos schedules the asynchronous piece (more on this in future posts). The overall flow between components is depicted below.
Questions?
Administrators can create fairly sophisticated and powerful policies that can be represented using an LSO. If this description leads to more questions, leave a comment here.
Steve
Steve,
How does the client communicate the policies to Atmos? I can imagine how with the webby dubby protocols like SOAP, but let's say I'm using NFS and I write a file. How do I say "you know that file I wrote? Replicate it to Shanghai!"?
Thanks,
Stephen
Posted by: Stephen Foskett | November 11, 2008 at 06:58 AM
Hi Steve -
Great post. So how does this per-object policy lookup step affect ingest performance?
Mike
Posted by: Mike | November 11, 2008 at 07:07 AM
Stephen,
The client doesn't communicate policies to Atmos. Using your NFS example, the NFS create gets bridged to a local Atmos client, who scrapes out relevant metadata (e.g. NFS client userid), sends it to the MDS, who sends the LSO back to the Atmos client (who caches the LSO). The subsequent NFS writes again bridge to the Atmos client, who examines the cached LSO, and streams content straight to the proper location(s).
Steve
Posted by: Steve Todd | November 11, 2008 at 08:14 AM
Mike,
Atmos makes extensive use of policy caching. Initial performance hit on policy lookup, then straight streaming of data to physical location(s).
Steve
Posted by: Steve Todd | November 11, 2008 at 08:16 AM
Steve,
So there are no policies applied to NFS-written data? Or just generic policies?
Say I write File X via NFS to Atmos. I wouldn't be able to say "send that file to Shanghai"?
Still confused...
Posted by: Stephen Foskett | November 11, 2008 at 07:12 PM
So in many ways and I know it is more functionally rich than this; if I am accessing file-systems using NFS/CIFS, it kind of works like Acopia. I can set policies on a user/filesystem basis? So if I have a user who I know is a VIP; I could set his files to replicate etc?
I have another question, if user Joe goes to Shanghai and accesses the file; does he access the spun-down version? If so, can the LSO work out that I need that content replicated back?
Posted by: Martin G | November 12, 2008 at 12:22 AM
Stephen,
As I read your question again I realize I didn't answer it fully.
Once you've written a file via NFS, and it's already been stored (based on policy), any given file can be "moved" somewhere else by creating a new policy specifically for that file. Atmos has a "Job Services" component which will automatically execute the task once the new policy appears.
Steve
Posted by: Steve Todd | November 12, 2008 at 07:08 AM
Hey Martin,
Yes, if I know the user is a VIP, I can set his files to replicate.
Your question about "access from Shanghai" is a good one, and begs the question: "how do reads work?". That answer would take a few paragraphs (at least), I think a future post would be more appropriate.
Steve
Posted by: Steve Todd | November 12, 2008 at 07:16 AM