Part of the original goals of Avalon was to leverage the power of the GPU. The idea
of hardware accelerated 2D graphics has been around for a long time, but it hadn't
really taken off yet (remember, we started Avalon in early 2001 [fixed date]). Today
it seems pretty obvious that leveraging the GPU for 2D is the right path, but at the
time that we started there was a lot of concern about the type of hardware that we
would require.
One hugely controversial decision that we made early on was to write Avalon in managed
code. From a platform component, the CLR is still immature. It's getting better by
leaps and bounds. The work to integrate the CLR into Yukon, moving the entire system
to 64-bit, and the work in .NET 1.1, and Whidbey have made the system amazing. Again,
when we started .NET was just first released in beta. Also, the development teams
working on Avalon are used to the C and C++ toolset that they had been working with
for the past twenty years. The decision to move to managed code was a fun process,
but that I'll leave for another day.
The DOTNET tree was not measuring up to performance goals. The startup time was slow,
but more troubling was that the scalability of the system - the number of elements
that we could create and manipulate was just too small. To acheive our goals around
richness we would need a system that could scale to tens of thousands of elements,
and we were only a couple thousand elements displayed in a reasonable performance.
Layout was slow, working set was huge, it was a bit scary.
Before we began the rearchitecture we spent a bunch of time focusing the entire team
on performance. We started doing deep analysis and looking at how the system hung
together. We created two virtual teams to try and drive the performance effort -
a bottoms up team that was focusing on tuning components and code, and a top down
team that was looking at the design of the system to see what we could do architecturally
to make the system faster. When we started the rearchitecture effort we took the data
from the bottoms up and top down team as baseline data.
We had several simplifications that we hoped would contribute to better performance:
-
One tree
-
Simplified property engine
-
CPS/DPS
One tree
By unifying the tree, we felt that for UI scenarios we would see a huge performance
benefit. Specifically you would only have a single node for a element (like a Rectangle)
where as before you might have up to 3 objects. Also, the connections between the
elements would be much more simplified, there was no need for back pointers from the
visual tree to the element tree to managed input, etc.
Simplified property engine
The original property engine from DOTNET was designed to support a very complex expression
evaluation system, track dependencies, deal with complex CSS-style rules, etc. Part
of the rearchitecture was the decision to cut several of these features to make the
property system much simpler, and therefore (hopefully) faster.
CPS/DPS
The DOTNET layout engine, using presenters, was a single monolithic system. That system
handled everything from a single absolute X,Y layout up to a complex paginated table.
The issue was that for all the simple UI scenarios we were doing a ton of work
for relatively no value. The solution was to create the "Control Presenter System"
that would handle basic UI layout, and a more advanced "Document Presenter System"
that would deal with the rich pagination and typographic layout.
By the time we finished the design for all of this we had actually completely removed
the concept of presenters from the system, but the name CPS/DPS stuck for the architectural
design. Funny how those things happen.
As we rolled out the architecture proposal the performance aspects of the proposal
were probably the most controversial. No prototype can accurately represent how a
system will behave in the "wild", and our performance team had previously heard promises
of large changes helping performance. Unfortunately we were never able to get a 100%
accurate measurement of the perf gains, but we had to make a decision. As with a lot
of decisions, you never have all the information that you would like to have.
During the implementation and conversion to the new architecture we hit several perf
snags. We didn't get any huge working set improvements (we are still working on this
today), however we did achieve our scalability. The system (in our tests, no warrenties
implied, etc.) scales linearly as you add more elements. The largest I've done is
a couple hundred thousand elements... it was really slow, but it didn't crash.
Best performance bug: I forget the root cause, but when you ran a specific test
the working set of the app would hit around 1.2GB (yes, gigabyte!) before the app
would die.
We have made a couple other changes to the system since the big rearchitecture - the
new styling system is one example - and we continue to try and make the system faster.
The PDC bits are clearly way below what we want for performance.