ChrisAn's Blog Please read my disclaimer.

simplegeek

a.k.a. Chris Anderson

Do you really have the memory?

While reliability issues come in many flavors (reentrancy, exception handling, dead locks, etc.), the current most popular reliability issue to debate (at least inside of Microsoft) is out of memory, aka OOM.

In .NET 1.0 and 1.1 the CLR wasn't hardned to OOM. Basically if you hit a hard out of memory failure (couldn't grow the page file, GC running doesn't free resources) the CLR falls over. For Whidbey the CLR execution engine and a small percentage of mscorlib has been "hardened" to OOM.

Ah, first, define harden.

One of the architects on the Indigo team had a great definition (I probably can't come up with verbatim his definition, but i'll try): Hardening means tolerating a fault and leaving the component in a consistent state. That consistent state might be unavailable, but the component is never left in a corrupted state.

In the overly simple example I did the other day, it was easy to make the component hardened to any failures.

Ah, next define failures.

Here is a great one - how should your component behave if the power cable is unplugged? Power loss is a failure. Hard disk crashing is a failure. Running out of memory is a failure. You see the pattern. There are extremely reliable systems out there. I heard anecdotally about a banking system that had a reliability policy that if a nuclear bomb went off in one city and pending transactions would be automatically rerouted to another city. That's a pretty high bar.

So, back to our little out of memory problem. Because of the dynamic nature of the CLR - virtual method calls, dynamic JIT, boxing, etc - it is extremely hard to write code that is guaranteed to never require an allocation. In unmanaged code this tends to be easier (not easy, but easier) because allocations are always explicit and never asynchronous. With that, hitting a hard memory failure (you can't allocate a byte) means that you basically can't continue to run managed code.

Of course, nothing is ever so simple. In Whidbey the notion of "constrained execution regions" (CER) was introduced with reliability policy which allows for writing managed code that is in fact hardened to hard out of memory failures. But writing that type of code is truly rocket science.

So what is a component to do?

Caveat: I'm talking here just about the problems in this space... there is no need to panic.

06/12/2004 9:22 PM | #Software

Content © 2003 Chris Anderson | Subscribe to my RSS feed.

Powered by BlogX