Last time I talked about Singularity I rambled about micro-kernels for a while. You probably knew that stuff backwards anyway. Onwards!
**How does it work?**
Singularity pulls off the impressive feat of being a pure microkernel design that nonetheless is ~30% faster than the traditional approach, and about 10% faster than the monolithic approach for a file-io heavy benchmark. How does it do that?
The trick is very simple – they just throw away hardware memory protection entirely. In Singularity, everything runs in kernel mode, and everything runs in the same address space. The MMU, in other words, doesn’t do anything. There are no “processes” in the traditional sense.
Obviously, if that was the _only_ thing they do it wouldn’t be very interesting. 85% of Windows blue-screens are caused by drivers, not kernel bugs (I suspect the rest are caused by hardware failure). Innumerable privilege escalation vulnerabilities can be caused by bad drivers. Every privilege escalation is a gift to the bad guys. Protection against software bugs is what makes micro-kernels useful.
Programs in Singularity _are_ isolated from each other, but the isolation is done entirely in software, using type theory instead of silicon. They can do this, because programs in Singularity are all implemented in a C# derivative called Sing# (although in theory any .NET language could be used, Singularity uses a few features that they added to C#, so other languages would need the same minor extensions).
You already know that in most modern languages like Java and C# you can’t access memory directly. Most obviously, there’s no way to write the following C in Java:
*((char *)0×1234) = ‘X’; // overwrite the byte at location 1234
It’s not just that there’s no syntax for it. It’s that the Java/C# compilers produce a form of assembly language that can be checked by a bytecode verifier to mathematically prove that the program does not do this. Once proven, we can translate the JVM/.NET opcodes into actual code that the CPU can run, confident that if we don’t give the code a reference to an object, it can’t read or write to it. That’s not only good for reliability – we can build a security system on top of that!
**exceptions to the above**
OK, so that was the theory. In practice it wasn’t so easy. Just like theoretically “sandboxed” programs written in C can escape the sandbox by exploiting kernel bugs, the same is true of Java and .NET – by exploiting bugs in the VM or native class libraries (written in C++) the security system can be compromised.
Worse, in some cases, features like reflection can be used to access objects that people did not expect.
Both these problems have caused breaches in JVM applet security in the past. Given how huge the modern JREs are, that isn’t surprising.
Singularities solution is again straightforward – almost all of the system is itself written in C#, with only a small amount written in C++ and assembly language. This immediately makes the core “kernel” robust against the most common types of attacks. Even then, Microsoft are working on research that will let them prove the safety of the tiny remaining pieces of unsafe code.
The building block in Singularity is called a SIP, for “software isolated process”. Singularity retains the idea of a process being something that owns resources, has its own memory space and is independently scheduled, but it’s all enforced with software and mathematics.
A SIP has its own heap, its own garbage collector (you can choose one from several the OS offers, reflecting the fact that no one GC fits all situations), and its own memory pages. So it’s more isolated than some similar approaches (like KaffeOS), in which all code is loaded into a single runtime, and objects can be exchanged between different programs. This has the disadvantage that you can’t easily exchange objects between two SIPs. It has the pragmatic advantage that SIPs can be quite different from each other – not only using different garbage collectors, but also different language runtimes and in-memory layouts. And it means they can be deallocated quickly, without doing a full-heap GC, which could potentially be extremely slow.
Given that we’re no longer using the CPUs hardware modes, it’s no longer clear how we should define what the “kernel” really is. In Singularity, the “kernel” is the software component that connects SIPs together, does memory management, loads new SIPs and handles some other tasks. Logically, it also includes the trusted/unsafe parts of the system written in C++ or assembly. Some of these are actually attached to a SIP, like the garbage collectors, but because they are just trusted to be correct they can be thought of as a part of the kernel.
SIPs call into the kernel simply by doing a regular function call. You need a way to mark the stack so the kernels own garbage collector doesn’t interefere with the SIPs garbage collector, but that’s easy and fast. Thus all the syscall overhead is avoided.
SIPs communicate via “channels”. These are message based pipes, sort of like UNIX sockets, except strongly typed and faster. A channel is actually a mathematical abstraction – sending a message via such a pipe does not involve any copying or fancy hardware tricks. It’s simply updating a few pointers in memory. Because of this, sending even very large amounts of data between SIPs (say between the network driver and a web server) is fast.
It might seem odd that invoking the kernel is just a regular function call. Surely that’s not possible? What stops a SIP from simply invoking the instructions to control the hard disk itself?
The answer is that a SIP is shipped (say, on CD) as a set of MSIL bytecode files. These files are not compiled at runtime as in a regular Java or .NET system. Instead, when the software is installed, it is compiled ahead of time into native code after being statically checked using a variety of analyses. Software installation is a privileged operation in Singularity – it’s handled by the kernel itself. Only software installed by the kernel will be allowed to run.
Because you can’t represent CPU specific instructions like “write to this IO port” or “trigger this interrupt” in safe MSIL, the only way for a piece of code to do that is to be linked with some trusted native library that will do it for you. Because the kernel is in charge of software installation, it can verify that only certain software is linked with such libraries, and even then, only in certain ways. Thus, the kernel can control access to hardware resources without relying on hardware checks – by controlling who has access to the CPU (and how) at install time.
In a regular OS, a process more or less corresponds to a program. A few programs have multiple processes, for instance, iTunes installs a program that navel gazes until you plug in an iPod. But generally speaking, one program uses up one process, and actually it’s quite common for one process to contain more than one program – for instance, a web browser that hosts plugins.
Because SIPs are so cheap, it’s reasonable – encouraged, even – to split a single program into several co-operating SIPs. Thus we have a problem – how do we start a program? What even _is_ a program in such a setup?
The answer is that a program in Singularity is defined by a manifest. A manifest is (like in .NET) an XML file describing the SIPs that make up a program, and the connections between them. A manifest is mostly auto-generated from metadata annotations in the code. When you start a program, you actually invoke a manifest.
**drivers in singularity**
This has interesting implications for how drivers are managed. Unfortunately, I blew my word count again. It’ll have to wait for next time.