ChrisAn's Blog Please read my disclaimer.

simplegeek

a.k.a. Chris Anderson

Avalon's rearchitecture - Performance

Part of the original goals of Avalon was to leverage the power of the GPU. The idea of hardware accelerated 2D graphics has been around for a long time, but it hadn't really taken off yet (remember, we started Avalon in early 2001 [fixed date]). Today it seems pretty obvious that leveraging the GPU for 2D is the right path, but at the time that we started there was a lot of concern about the type of hardware that we would require.

One hugely controversial decision that we made early on was to write Avalon in managed code. From a platform component, the CLR is still immature. It's getting better by leaps and bounds. The work to integrate the CLR into Yukon, moving the entire system to 64-bit, and the work in .NET 1.1, and Whidbey have made the system amazing. Again, when we started .NET was just first released in beta. Also, the development teams working on Avalon are used to the C and C++ toolset that they had been working with for the past twenty years. The decision to move to managed code was a fun process, but that I'll leave for another day.

The DOTNET tree was not measuring up to performance goals. The startup time was slow, but more troubling was that the scalability of the system - the number of elements that we could create and manipulate was just too small. To acheive our goals around richness we would need a system that could scale to tens of thousands of elements, and we were only a couple thousand elements displayed in a reasonable performance. Layout was slow, working set was huge, it was a bit scary.

Before we began the rearchitecture we spent a bunch of time focusing the entire team on performance. We started doing deep analysis and looking at how the system hung together. We created two virtual teams to try and drive the performance effort - a bottoms up team that was focusing on tuning components and code, and a top down team that was looking at the design of the system to see what we could do architecturally to make the system faster. When we started the rearchitecture effort we took the data from the bottoms up and top down team as baseline data.

We had several simplifications that we hoped would contribute to better performance:

  • One tree
  • Simplified property engine
  • CPS/DPS

One tree

By unifying the tree, we felt that for UI scenarios we would see a huge performance benefit. Specifically you would only have a single node for a element (like a Rectangle) where as before you might have up to 3 objects. Also, the connections between the elements would be much more simplified, there was no need for back pointers from the visual tree to the element tree to managed input, etc.

Simplified property engine

The original property engine from DOTNET was designed to support a very complex expression evaluation system, track dependencies, deal with complex CSS-style rules, etc. Part of the rearchitecture was the decision to cut several of these features to make the property system much simpler, and therefore (hopefully) faster.

CPS/DPS

The DOTNET layout engine, using presenters, was a single monolithic system. That system handled everything from a single absolute X,Y layout up to a complex paginated table. The issue was that for all the simple UI scenarios we were doing a ton of work for relatively no value. The solution was to create the "Control Presenter System" that would handle basic UI layout, and a more advanced "Document Presenter System" that would deal with the rich pagination and typographic layout.

By the time we finished the design for all of this we had actually completely removed the concept of presenters from the system, but the name CPS/DPS stuck for the architectural design. Funny how those things happen.

As we rolled out the architecture proposal the performance aspects of the proposal were probably the most controversial. No prototype can accurately represent how a system will behave in the "wild", and our performance team had previously heard promises of large changes helping performance. Unfortunately we were never able to get a 100% accurate measurement of the perf gains, but we had to make a decision. As with a lot of decisions, you never have all the information that you would like to have.

During the implementation and conversion to the new architecture we hit several perf snags. We didn't get any huge working set improvements (we are still working on this today), however we did achieve our scalability. The system (in our tests, no warrenties implied, etc.) scales linearly as you add more elements. The largest I've done is a couple hundred thousand elements... it was really slow, but it didn't crash.

Best performance bug: I forget the root cause, but when you ran a specific test the working set of the app would hit around 1.2GB (yes, gigabyte!) before the app would die.

We have made a couple other changes to the system since the big rearchitecture - the new styling system is one example - and we continue to try and make the system faster. The PDC bits are clearly way below what we want for performance.

11/21/2003 9:42 PM | #Longhorn

Content © 2003 Chris Anderson | Subscribe to my RSS feed.

Powered by BlogX