Archive for November, 2005

wednesday stuff

November 30, 2005

website

The new website work that’s been going on is awesome. I love both what Isak and Hongli have done, though they focus on different things. Specifically:

  • I love the new CSS and layout Isak has done. It’s clean, clear and to the point. I like the way he restructured the sidebar with apparently no loss of information!
  • Honglis mockup has the right idea with respect to the content on the front page: a few big, clear buttons for the common tasks non-technical end users will want to do.

I wonder if they can be combined somehow? The most important thing IMHO is the content. The current visual design of autopackage.org is a bit old now, but it doesn’t bother me too much. The problem, as Isak identified, is that there’s a lot of content and it kind of sprawls. It’s hard to find what you want. Another – worse – problem is that the front page lacks focus. There’s a lot of crud like the status and todo section there left over from before we really released to the world. It should die in favour of a page that’s focussed like a FRICKIN’ LAZER on the user. We can split extra stuff into a developer zone if need be.

This all revolves around a key question of course – who is our target audience for the website? Is it non-technical end users, technical end users, developers, packagers, random interested enthusiasts who got there via slashdot? Who is it? The reality is, it’s a mix of all of these. I think we should make the front page non-technical end user focussed because the technical/enthusiast types are the ones most likely to explore on their own. Beyond a few simple pages though, we don’t need to worry about them too much. Autopackage isn’t that interesting to end users: if they got here, they’re probably either curious or stuck.

Firefox 1.5

The new mozilla.com site is great. This demonstrates exactly the sort of focus we need for autopackage. It’s a bit over the top in the corporate zone though: they have a careers page that reads like some fluff-filled “synergistic” company with 10,000 employees, not the 45-man operation that it is. Made me laugh anyway :) Fortunately the front page is far more friendly and they devote a whopping 30% of the page to the primary use case: downloading Firefox. The front page is also almost entirely devoid of clutter, on my screen it doesn’t even reach the bottom of the browser window.

One thing that isn’t so great is the terminology confusion. Where do you download extensions? That’d be a site called “Mozilla Addons”. Check out that page – we have a “top extensions” link and a “most popular addons link”. I suspect this mess is the result of calling {extensions, plugins, themes, search engines} collectively addons but if so then this organisation is far from obvious. My first intuition was simply that addons==extensions. The naming of some other stuff is confusing too – “search engines” makes it sound like you can literally install Google into your web browser (all it really does is set up default search pages and stuff).

Zach Lipton made a good point: for end users Firefox 1.5 is BORING. I am advertising it in my MSN Display Name right now, which is something I rarely do (in fact I rarely put anything computer related in it at all) because I think it’s important that people know about and use Firefox. But the upgrade is kind of a hard sell. The benefits are not visible. The Firefox 2 roadmap is terrifying – as Zach says:

There are some very cool ideas floating out on the horizon about revamping Bookmarks and History into Places and finally fixing feed handling. I hope that these things happen and quickly. Firefox 2.0 cannot be about multi-locale installers, customizable toolbars, or fixing Extension Manager. These are not features that make users excited. If we cannot innovate, we are dead. Plain and simple.

What’s even more stupid from my perspective is that Firefox has a TON of amazingly cool and useful features written for it already in the form of extensions. Every release of Firefox could have cute new features with minimal UI impact folded into the core by doing something as simple as merging extensions in. For instance, tab preview, flashblock and so on. I hope they focus on the history search stuff – this interests me.

blogs

I discovered an excellent new blog on usability. It’s one of those very rare blogs that after reading an entry or two, compelled me to go back to the beginning and read ALL the entries. Wow. The only other blogs that have made me do that are belle de jour (diary of london hooker) and Chris Brummer (articles from a .NET runtime developer).

I have a few of these “must read” blogs now, but it’s a pain checking back for new content all the time. I used Firefox Live bookmarks up until now but they kinda suck too, in particular, you can’t click the button to actually go to the website itself. I know, I know – this is why feed readers were invented. Unfortunately I’ve yet to find any (except Planet) that don’t blow. I clearly need to spend more time looking for one.

Anyway, a lot of what is written in Flow|State (I dunno the guys name) applies to autopackage. In fact, I’d make it mandatory reading for anybody writing desktop software that will potentially be used by large numbers of people. The stuff on BBOPs was good – whilst this sort of UI is nothing new and could correctly be called ‘menu driven UI’, BBOP seems to capture the style more appropriately. I’ve been struggling to express ideas about BBOP UIs for a while now, and having a name for it makes it so much easier.

Lesson learned: naming something makes it easier to work with, as long as that name comes with a reasonably precise definition of it. Don’t struggle with concepts. Name them.

lockdown

November 27, 2005

Given the mass epidemics of spyware, viruses and bots we’ve seen over the past few years it’s not surprising people are looking at ways to deal with rogue software. The old world assumption that programs are a tool and the user an experienced craftsman no longer holds – arguably it hasn’t for decades – and that changes everything.

How can we stop rogue programs? One way is to lock them down so rather than having full access to the entire system (as programs running on XP Home have today), they only have the permissions they need to do their job. This is hard. It mixes up different concepts that use the same implementation – least privilege is used to stop hackers gaining extra influence by hacking a program, yet people also want to use it to stop evil programmers doing things they shouldn’t.

The hard part isn’t really the technology: SELinux lets us specify to a fine level what a program can and cannot do. The hard part is deciding what constitutes “evil”. Right now, applications have full access to the entire system, at all times. There is no defence against evil software, not on Windows, OS X or Linux. This applies even if the programs don’t run as root, as typically they are installed as root and that means they can mark themselves suid-root if they so wish. For autopackages you can install to $HOME to slightly mitigate this risk, but the root vs user distinction isn’t enough to stop the majority of “evil” programs out there.

What is Evil?

We can start with a very basic idea of what evilness is, and work upwards from there. To start with, let’s say that evil software is software that tries to tamper with the core operating system internals. An obvious example is adware like Aurora. This program employs several tricks like hiding a respawner inside svchost.exe and modifying winlogon.exe internal registry entries, forcing itself to load at Windows startup in such a way that makes it near-impossible to delete (even in safe-mode!)

This sort of random hooking of OS entrypoints is pretty clearly Evil. It has no legitimate uses for non operating system developers, as there are documented ways to load your program at startup.

Embedding yourself into the system so tightly you cannot be removed is one way to tamper with OS internals. There are others, like the browser helper objects (BHOs) and winsock LSP chains which can be used to filter or rewrite web pages arbitrarily.

Examples on Linux would include patching the kernel system call table, overwriting or modifying “init”, and other techniques favoured by crackers.

Stopping it

We can draw up a simple set of rules to protect the operating system core that are likely to completely eliminate undesirable application behaviour with minimal risk of breaking backwards compatibility. Obviously, doing this requires some distinction between an application and an operating system component but autopackage vs rpm/deb already provides that.

Here’s a quick stab at such rules. Programs (including their install scripts) should not be able to:

  • Load kernel modules
  • Modify or overwrite existing files
  • ptrace an existing process running at a different privilege level
  • Add files to peoples home directories EXCEPT for menu items, desktop icons, and at few other well defined integration points.

Of course, there are several ways to implement these rules. One is SELinux policy, which will become an option once Fedora Core 5 ships. Another way is to use the NeXT/Apple approach of “interpreted integration”, in which a package is an inert thing containing files that are loaded and interpreted by the system shell. Installers can’t modify anything the OS doesn’t let them modify. For autopackage, SELinux is probably the only feasable method, though I slightly prefer the NeXT approach.

balls

November 25, 2005

One of the things we like to do at Durham University is have balls. These are fancy affairs where we dress up in black tie, have a nice dinner, get drunk and then everybody dances to cheesy music of the type played at Klute (officially the worst nightclub in Europe, oh yes).

There’s one coming up soon, at Lumley Castle. It’s the History Society ball, and even though I don’t study history I’m going to it with a few friends. Not too many friends though, as it’s on the same night as Ladies Night. Ladies night is a ball just like every other one, except with a twist – the ladies of the castle have to ask the men, instead of the other way around. Ah yeah, I forgot to mention. Durham is so old fashioned that for some balls (not all) you have to go with a partner, sometimes a boyfriend/girlfriend/date but usually just a mate. And of course it’s usually the guys who have to propose ;) Anyway, I didn’t get invited by a castle Lady (boo hoo), but that’s OK, I’m hardly alone. Everybody who’s going is either going with friends or partners from back home anyway.

The highlight of the calendar, at least for us Castle students, is the June Ball. You can read more about it on that wikipedia article, but basically it’s a ball but BIGGER. I’ve only been once before (well you can only go 3 times if you’re on a 3 year course!) and it was pretty spectacular I’ve gotta admit. There were fireworks, hypnotists, a chocolate fountain and even a hot air balloon! Every year the theme is different, when I went it was “Around the world in 80 days”. Fortunately, even though it’s themed it’s one of the few entertainments you don’t need fancy dress for.

Yeah. Durham is a bit odd. But it has its charms :)

on compilers

November 22, 2005

Something that’s interested me for a while now is compilers and compiler optimization – it’s quite astonishing the things optimizing compilers can do to your code. This is more important than ever, as since CPU speed hit the wall one obvious way to make things go faster is improve the compiler. Even better, on an open source system like Linux, when you improve the compiler everything gets faster. More on that in a moment.

One interesting thing about this field is that most of the research seems to be done by academia, often many years before the techniques become used in mainstream compilers. For instance, mainstream compilers that use static single assignment (SSA) form have only become common recently (GCC only got it with v4!) even though the technology was being researched back in the 80s.

so how does this relate to autopackage?

One of the things autopackage does is move responsibility for actually running the compiler to the upstream developers instead of the distribution. This has both advantages and disadvantages. An advantage is that upstream can control which version of the compiler is used, and with which optimization settings. As anybody who has had to deal with Gentoo 0pt1miz3d builds triggering crashes, build failure or worse can attest, that’s useful. Being able to control compiler version is also useful – with every release of GCC some code that previously compiled no longer does. Usually when this happens the distributions each go through and try to patch the applications in their archives so they compile with the new compiler. It’s done by distros and not developers because developers use distros, so they have to wait for the distro to upgrade the compiler before they can … hence you get a catch-22.

One of the big disadvantages is that when new compiler optimizations are introduced, it may take a long time before users benefit from them across the board. Over time, various people have looked at ways of solving this one. A popular technique is to effectively split the compiler in two, with the frontend (the part that parses the code and generates an intermediate language-neutral form) being run by the developer and the backend (the part that generates the native machine code your chip actually executes) being run by the user. This setup is used by Java, .NET and a few others, and the backend is usually called a “virtual machine”. As the intermediate form which is input to the VM is usually independent of the underlying CPU architecture, this also has the nice advantage of abstracting the CPU though these days x86 is the de-facto standard on the desktop anyway so the value of this is questionable.

Most VMs are designed to compile code just-in-time (JIT), this is the approach Java and .NET take by default, but JIT compilation has the fairly obvious disadvantage that it has to be very fast and this limits possibilities for optimization. The end result is you could end up running poor quality code. A more recent trend has been to move away from JIT compilers towards ahead of time (AOT) compilers which are allowed to take as long as they want to generate the final executable image. That allows for quite advanced optimisations which may be “slow” (we’re still probably talking about a few seconds on a fast machine).

getting the best of both worlds

If you keep both the original bytecode and the generated binaries, you can get most of the advantages of VM technology without most of the downsides. By upgrading the VM, you can transparently introduce many new optimizations to the users system without requiring upstream developers to do new releases. By generating the native binaries ahead of time (for instance, at install time) the optimization engine can use its full power to produce the fastest and most compact binaries possible. As users already expect a short delay whilst installing software, this is a perfectly acceptable place to do it.

Currently I’m not aware of any installers that do this – I think the .NET framework allows installers to do it if they so wish using the .NET ngen tool but I don’t think this is widespread. One obvious problem is that usually VMs are developed in conjunction with many other goals: new languages, new APIs, new security models, new everything! It’s quite hard to find a VM that doesn’t impose all this other stuff on you, however fortunately one does exist: LLVM. The LLVM bytecode format is essentially a serialized SSA tree and the engine is already capable of many advanced optimisations.

The Mono VM is also capable of doing AOT compilation, and as the .NET IL is capable of expressing C++ it’s probably capable of representing C too, so this is an alternative avenue. Right now LLVM seems like a more natural fit for our purposes (that of optimizing code clientside using an upgradable VM), as it’s smaller and the dev team focus almost entirely on optimization.

future

There are several new kinds of possible optimisations which are still in the research phase (as far as I’m aware anyway) and haven’t really hit the mainstream yet.

The first I want to cover is called escape analysis. This solves two persistent problems that plague all modern object oriented languages like Java or C#:

  • Huge amounts of heap churn as lots of small objects are rapidly allocated and turned into garbage (think temporary strings, arrays etc).
  • Object locking takes place behind the scenes all the time, even if your application isn’t multi-threaded

The first problem has been known for a long time, and is the primary motivation behind generational garbage collectors. Still, generational GC has overhead, as does allocating space for all these objects from the heap. The second is a more subtle issue – like reference counting or virtual function calls, it can lead to death by a thousand cuts. It’s unlikely to show up on profiles as the overhead is spread out over such a wide area, but it can still be significant.

Escape analysis allows you to determine information about the scope and lifetime of an object. The information gained from it can be used to eliminate locks from an object (if you can prove it’s never accessible by more than one thread) and to allocate the object on the stack. Stack allocation is interesting because allocating and deallocating stack space is extremely fast (one instruction on most cpus), and because it doesn’t touch the heap it reduces GC pressure significantly. In some tests, over 90% of object allocations were found to be stack allocatable. Correct use of stack based allocation is one reason C++ code has traditionally had much better memory footprint than Java apps. The Mustang release of the Sun Java VM is supposed to feature escape analysis in the VM.

Another interesting optimization is purity analysis. A “pure” function is one that doesn’t have side-effects, that is, given the same inputs it will always generate the same outputs (roughly). This is also known as referential transparency. If a function is pure, many more optimizations can be applied to it. For instance:

    for (int i = 0; i < foo; i++)
    {
        int x = strlen(input_str);
        // do something with x
    }

Given this code it’s fairly obvious that input_str doesn’t change, so constantly recalculating its length is redundant. An optimisation called loop invariant code motion can convert it to this:

    int x = strlen(input_str);
    for (int i = 0; i < foo; i++)
    {
        // do something with x
    }

Which is clearly more efficient. But, the compiler can only do this because it knows that strlen() is pure. If we were calling some other function we’d written ourselves it couldn’t do this. You can mark functions as pure using a GCC extension, but a better solution is if the compiler can figure out purity itself. New research from the likes of Alexandru Salcianu demonstrate ways to detect purity at compile time, even in functions that do things like allocate new objects and modify them on the heap.

various things

November 20, 2005

We finally have a planet. That’s only been on the todo list for, oh, I don’t know, several months. The original request from our users is here:

http://autopackage.org/forums/viewtopic.php?t=116

and it was posted on August 3rd. But hey, we got there in the end. Hopefully the planet will give a nice half-way point between the mailing list which can be very technical at times and the end user forums, which don’t carry much project news.

OK. In future, I’m going to blog about whatever the hell I like, autopackage related or not. But to start with, here’s an update on where we’re at for people who don’t follow the mailing list religiously.

So what’s new then?

Ye gods, where to start? The 1.2 release is finally nearing completion. It features, amongst other things:

  • C++ support. And by support we really mean it – each binary is compiled twice, a binary delta is calculated to keep file size down, and then the patch is optionally applied at install time. This means you can ship even KDE or Qt apps inside autopackages and they’ll run on distros compiled with both GCC = 3.4, which accounts for 99% of desktop Linux installs these days.

  • Prettier graphics. Hurray for eyecandy! Previously uninstalling packages was mostly non-graphical so it’s nice to get that fixed.

  • We can uninstall packages if they conflict now – it has to be activated by packagers though. It’s useful for things like upgrading an RPM install to the autopackage version, but should only be used by applications. Libraries and other support code should continue to install to the /usr/lib/autopackage/ directory, which is automatically used by all apbuild compiled programs. Also we only uninstall things on RPM systems. Debian systems based on apt are too fragile to take this.

  • LZMA compression (in English, that means smaller files and faster downloads)
  • Better internationalization support – for users, that means fewer English strings in the GUI if you don’t want them. For developers, it means autopackage can suck translations directly from .desktop files, so your intltool based infrastructure is reused.

  • Anonymous statistics service: this allows us to gather useful data about which packages succeed or fail the most often. We can therefore identify “hot dependencies”, those which often cause users pain, and packagers can use this information to improve their packages. It’ll also be useful in the construction of any future desktop linux platform, which is a topic I love to bang on about at times. Of course this statistics reporting will be optional (just gotta add the checkbox ….)
  • Various improvements to relaytool. This program lets developers convert hard dependencies (I must have this library) to soft dependencies (I’d like this library). In other words, things get easier to install. Of course, not every project uses this ability – some, like Gaim, consistently reject it on the grounds that it’d mean the Debian packagers would have to manually add the library to their specfiles (boo hoo).

  • And lots of other stuff I’ve probably forgotten about.

So, there’s lots of interesting additions here for both end users and developers here. The C++ support is especially useful as it opens up the possibility of packaging KDE/Qt apps. One such program we’re developing a package for right now is Scribus.

Other stuff

Python continues to cause us pain. Developers, if you’re considering adding a Python scripting interface to your app – don’t! The Python developers continue to ignore even basic needs of binary compatibility – this time around the problem is that you can configure it for either 2 byte or 4 byte unicode strings. Of course, some distros configure it differently to others, and then the exported API changes. That makes distribution of any program that uses libpython in binary form a waste of time. A year or two ago I’d have sighed, sat down and written a shim to convert strings back and forth as necessary so apps could be compatible with both. These days, I’m just too tired. At some point developers who are writing libraries that are treated as platform components just have to start meeting that responsibility, we can’t work around their mistakes forever.

And finally ….

Does anybody else here watch Lost? This show has rapidly replaced 24 as my “must watch” TV of the week. It’s not quite as good as 24 was (before series 4), sometimes the script writing gets a bit unbelievable, but it’s still really good fun to watch.

f1st p0st

November 16, 2005

When talking to people about autopackage, security is a common concern. There was (yet another) thread about it on the forums so I moved it to the mailing list along with a new proposal. Hopefully we can get some traction on this issue so people stop bugging us about it (even though autopackage is just as secure as rpms or tarballs IMHO).

I just found a neat feature of StarDict – it watches what you’re typing or selecting and then shows you the translations for those words. Now if only I could read Chinese. Anyway, welcome to my (new) blog! 热情的!

Currently listening to Stars, by Dubstar