memory and managed code

By mikehearn

Many new programmers are confused by a simple fact of industry – that Java is clearly easier and less bug-prone than C++, yet most programs they interact with daily are written in C++.

Microsoft popularised the term “managed code” to refer to software that is … well … managed, ie at runtime there is another program present supervising and helping the main program to run. Usually this helper program is called a virtual machine. So, I’ll talk about managed code and use that term to mean Java, any .NET targetted language, and also Python/Ruby.

It doesn’t take long until they realise why managed code has not taken over completely: simply running managed apps on your own desktop gives you a feel for it. Mark Russinovich articulated it well – they’re just SLOW and that means people hate using them, because running one or two drags their computer into the mud.

the many faces of speed

Programmers like speed and dislike crack. The jury is out on weed. Actually everyone likes speed but programmers like it most of all because without it their programs are uncompetitive. One reason Eclipse blew away NetBeans so fast is that it felt more responsive. One reason MS Word still blows away OpenOffice Writer is because the former doesn’t take 30 seconds to start.

Over the years lots has been written about the performance characteristics of Java. A few things we know – on a modern VM, managed code can execute extremely quickly. There are many benchmarks out there showing Java outperforming C++. The Fedora team are renaming their “Native Eclipse” to “Fedora Eclipse” after so many people asked why the so-called ‘native’ version was actually slower than the version running on Suns JVM. Tom Tromey summed it up well:


There is a common misconception that “AOT compilation == Super Duper Fast”. That isn’t true. Performance is tricky and the current crop of proprietary JITs are excellent compilers.

CPU-bound performance is, in other words a solved problem: even with the overhead of compiling code at the same time as running it, modern VMs have had such huge quantities of smart people thrown at them that they can run code very fast. There are a few CPU-bottleneck problems in the design of the Java language, I think C# fixed most of them. For instance in Java all methods are virtual, whereas in C# (like C++) you must manually specify it. Virtual methods are expensive, and not always required, so this makes sense.

But the sheer quantity of instructions that can be executed per second is only one factor influencing performance. There are several, but the one I’ll focus on here is memory.

why does managed code use so much memory?

I think the reasons excessive memory usage is bad are obvious. In every situation where managed code is used – desktop, server and embedded (think mobile phones) using too much memory is bad. Swapping on a desktop hurts end user experience, on a server it can cause the whole system to die, and on phones it increases the cost of the device for the consumer.

Whereas the raw runtime performance of Java and .NET have improved hugely over the years, memory usage generally hasn’t. It’s improved a bit. Here is a table comparing different early JRE versions:

Table 1. Measured memory usage (bytes)
  Content (bytes) JRE 1.1.8 (Sun) JRE 1.1.8 (IBM) JRE 1.2.2 (Classic) JRE 1.2.2 (HotSpot 2.0 beta)
java.lang.Object

0

26

31

28

18

java.lang.Integer

4

26

31

28

26

int[0]

4 (length)

26

31

28

26

java.lang.String
(4 characters)

8 + 4

58

63

60

58

This is showing the overhead inherant in every single object on the heap. In other words, this is what you pay just for having an object that doesn’t do anything.

For the most recent version of the Java VM, the overhead is 12 bytes – one word (4 bytes on a 32 bit machine) for identity hashcode and GC control data, one word to point to the class object (ie, the vtable pointer) and one word for the lock that every Java object is defined to have (Java calls them monitors).

In contrast, C/C++ structs have no overhead, and a C++ class usually only has the overhead of the vtable pointer. Oh, and sometimes the malloc bookkeeping data too.

As far as I can see, it’s very hard to fix this. Escape analysis has been proposed as a way to eliminate the monitor in some cases, but the real problem is Javas insistence on making everything a heap object which results in far more overhead than in an equivalent C++ app.

strings

Things get worse when we get to strings. C++ uses the same string protocol as C, where a string is a series of non-zero characters of arbitrary length, terminated by a null. Actually the runtime doesn’t force you to use this, and you can use pascal style strings as well (where the length is stored as the first byte/word), but you’ll have to roll your own if you want to do that.

For an ASCII string “Hello World” then C++ will use 12 bytes.

In Java, every string is also an object, so we immediately pay the 12 bytes of overhead there. Then, a string is actually a character array, and in Java arrays are objects themselves, so we pay another 12 bytes of object overhead. Arrays store their length as a machine word as well, so that’s another 4 bytes (28). We already blew double the C++ storage and we haven’t actually done anything yet! But it gets worse. Java is unicode-enabled, which is a buzzword compliant way of saying every character takes two bytes instead of one. This is silly because often strings represent system internal things like config file keys or xml tag names, and not end user things like button labels. System internal strings are 99% of the time ASCII, so that second byte will go unused. Still, you have to pay the price for all strings anyway. C++ can do unicode too, but you can usually choose your encoding – UTF8 for space efficiency, UTF16 for CPU efficiency. So anyway, now we add 22 bytes to store the string and we’re up to 50. Finally the String class has various other fields as well like offset, count and hash, so we have to add another 12 bytes to give 62 bytes for “Hello World”, compared to 12 for C++.

62 vs 12

Ouch! That doesn’t even include the 4 byte reference.

heap vs stack

Every Java type, except for primitives like integers, characters and booleans, are allocated on the heap. In C++ any complex type can be allocated either within another complex type, or on the stack, or on the heap. Smart decisions about where data should be stored can be a huge win.

In a previous journal entry I talked about stack allocation being a use for escape analysis, but C++ developers have been benefiting from this for years. Using the stack is good not only because it’s very fast to allocate and deallocate, not only because the stack is likely hot in the cache, but also because it reduces the amount of work the garbage collector has to do. Studies of Java programs have shown that most objects die young, that is, they don’t last very long relative to the others. This is the reason behind generational garbage collectors. In C++ the same problem is solved by allocating temporary die young objects on the stack, which reduces pressure on the heap manager significantly.

conclusion

Javas memory usage problems have many causes, most of them hard to fix. Even if Sun dropped the religious devotion to JIT compilation and spent the next year optimizing the VM, it wouldn’t help much because the Java platform itself lends itself to very memory-inefficient usage patterns.

Many people have found Ruby on Rails to be a very productive environment for web application development, and many people have wished that the benefits of Java/C# could be gained without the need for a hulking runtime environment and ridiculous memory usage. I’m sure that a dedicated team of smart people could produce a desktop equivalent to RoR – one that provides many of the benefits of Java-like languages in a form that can be comfortably deployed on desktop-class systems.

2 Responses to “memory and managed code”

  1. MatzeB Says:

    Would be interesting to know what .net does in this area. From what I’ve heard it does have a stack and can put temporary objects on it without any overhead (that’s what they call boxing/unboxing AFAIK)…

  2. Mike Says:

    c# has a stackalloc keyword which does what you’d expect, but it can only be used in unsafe code. It’s not really a general mechanism in other words.

    Stack allocation will come to VMs eventually via escape analysis …. but is the 10 years it took really worth it?

Leave a Reply

You must be logged in to post a comment.