Archive for January, 2006

royksopp

January 31, 2006

One of my favourite tracks of all time has to be Royksopps “Only this moment”. I first heard this song some time last year from MusicOne and have loved it ever since. The video is quite enchanting too, it seems to be some young people fighting for the revolution whilst falling in love :) NB the Flash version I just linked weirdly plays the video in mirror-image (and quality is typical streaming affair). Plus, they actually rearranged the song from the album version which is what I’m used to! It’s still good though.

Something that annoys me about the web is the huge number of song lyrics sites that just copy and paste from each other. Sometimes they’re clearly wrong so, in case there are any other people out there who love the song and want to know what they are saying, here is my interpretation of the lyrics :)

Only this moment
Holds us together
Close to perfection
Nothing else out there
No one to guide us
Lost in our senses
Deep down inside I know our love will die

 ... okay ...

Only this moment
Holds us together
Lost in confusion
Feelings are out there
Scared of devotion
Doubting intentions
Deep down inside I know our love will die

(guitar solo, this is where the biplane flys around in the video)

Stay or forever go,
Play or you'll never know,
When heaven decided,
You can't deny it,
The soul you'll be waiting for,

Stay or forever go,
Play or you'll never know,
Your spirit's divided,
You will decide if I'm all you've been waiting for,

Clouds in my head have been parted with grace
By the voice of an angel revealing her face
and her words they make sense and I don't understand
Falling in love isn't part of a plan

Forces within me mix reason with lust, but
I try to accept it and not make it worse,
'Cause I know I might lose it by taking a chance,
But love without pain isn't really romance

Only this moment
Holds us together
Close to perfection
Nothing else out there
Always beside her
Trusting my senses
Deep down inside I know love will survive

(repeat)

(vocal melody)

insomnia

January 29, 2006

I haven’t been sleeping well lately. Last night was the worst so far – I lay in bed all night, exhausted but unable to sleep.

I now have some “Nytol” tablets, but they don’t seem to do anything, possibly if this continues I’ll need to get some stronger stuff from the doctor. I am hoping it’s just a temporary glitch, as there’s nothing I can see that would obviously cause it.

Anybody got tips for getting to sleep at night? I tried calm music, counting sheep, reading etc …. all the usual suspects. No dice.

bits and pieces

January 22, 2006

piccolo

January 22, 2006

Graphics frameworks seems to be slowly heading in two directions:

  • A roughly PostScript style immediate mode API. By “immediate mode” I mean you issue commands like “move here, then draw a line to here, then create a path that goes here here and here, finally fill that path in”. Take a look at the Cairo API to see what I mean.

    Apple (arguably NeXT) got there first with Display PostScript, but DPS was a total sewer and nobody uses it anymore. The main problem with Display PostScript was that PostScript was far more powerful than was really needed ….. in particular PS is a turing complete programming language. That meant you could do the equivalent of uploading arbitrary programs to the display server. Imagine if any program could load any shared libraries it wanted into the X server: at best, the feature would be used badly, at worst, it’d be abused for nefarious ends. So Apple now use Display PDF, which is a bit like Display PostScript except different. You draw graphics by rendering using PDF primitives. In the Tiger (10.4) release, Display PDF rendering isn’t hardware accelerated.

    The Linux community got there next with the integration of Cairo into GTK+ 2.8, Mozilla, OpenOffice and other projects. Cairo isn’t much different to Display PDF in concept, but the API is cleaner and easier to use as it didn’t try and ram a square peg (Acrobat files) into a round hole (a simple drawing API). Cairo is, theoretically, hardware accelerated as it simply layers over XRENDER – unfortunately most drivers ignore the opportunity. The nVidia drivers support it but it’s off by default for stability reasons. The nVidia guys claim they can’t enable it until there’s a test suite, the X developers have sent mixed messages back about this over the years. On the other hand, making Cairo use Glitz and therefore OpenGL isn’t too hard, and then it really is hardware accelerated, using the 3D acceleration pipeline. On most cards this is and will always be better than using the 2D hardware acceleration, which is what most RENDER accelerated drivers use (as far as I know).

    Microsoft will get there last, for a change. Well OK, technically they got there before the Linux community did with GDI+ but GDI+ was really quite a crappy implementation (for instance, not hardware accelerated at all, very slow, lacking in features etc) and this sort of add-on DLL had been floating around for ages. I’m interested mostly in fundamental changes in graphics architecture here, not throwaway DLLs, so we can ignore GDI+. I’m not sure what the immediate mode drawing API in Vista will be, but it’s probably going to be a thin layer on top of Direct3D.

    So basically, every major OS is heading towards having an immediate mode drawing API for 2D graphics that is accelerated by the 3D engine on the GPU. Cairo is looking good, and follows in the footsteps of OpenGL: provide a cross-platform, easy to use, hardware accelerated API and the world will beat a path to your door.

  • Some kind of graph-based retained mode API layered on top. By “retained mode” I mean you say “I want an object here, another object there, and I want the last object to animate” and after you set it up, the actual rendering commands are generated by the framework. Usually this API also handles animation, input event handling and so on. Often this is called a “canvas” and many apps implement them internally for themselves. Attempts at creating a re-usable canvas API litter the programming landscape, most crater dramatically.

    Nonetheless, people keep trying, mostly because a generic canvas API is a very powerful tool in much the same way that a widget toolkit is powerful. Write it once, everybody saves time (in theory) and apps can become more consistent as a result. So far nobody seems to have got the balance between features and ease of use right yet.

One of the attempts at creating a useful canvas I’ve been looking at lately is Piccolo. This has been created by a team of developers that have been writing canvases since the early 90s, and they’ve been around the block a few times. The API and implementation, at first glance, seems almost too good to be true. They’ve clearly learnt their lesson: Piccolo is a direct evolution of Jazz, which apparently died the typical death-through-overengineering such frameworks often do. Whilst most of the capabilities are the same, the rewrite focussed almost entirely on API ease of use and implementation size/speed. Piccolo is an incredibly small 180kb, complete with utility libraries!

Piccolo comes in several flavours: Java, .NET, Compact .NET, and .NET with Direct3D. So far the .NET version interests me the most, it’s the one the Piccolo team use internally for most of their new work, and of course it means we can use Mono which is a generally more pleasant environment than Java is.

So where’s the catch? The catch, of course, is that the .NET version relies on Windows. Specifically, it delegates a lot of the meaty hard work to the System.Drawing API (which is a wrapper around GDI+). This is mostly implemented in Mono, save for a few parts, which aren’t. Usual story. Still, the Piccolo code seems very straightforward and porting it directly to use Cairo instead of System.Drawing doesn’t look too hard. By setting Cairo up to use Glitz, you’d achieve a similar setup to the Direct3D implementation whilst simultaneously opening up the possibility of mixing 3D and 2D rendering.

This sort of super easy to use canvas opens up new possibilities in user interface design. And of course it might be useful for autopackage too – right now some of the graphics in autopackage like the uninstall animation are quite hacky, slow, flaky etc. And of course not as pretty as they could be. Providing a new frontend that is based on Piccolo could let us get a lot closer to my original vision for the graphical frontend, which wasn’t much like what we have today (dreams never are, are they?) ;)

reflecting on open source project management

January 18, 2006

Progress on autopackage 1.2 is agonizingly slow, yet nonetheless I’m happy. How can this be?

One reason is that things are still happening despite outsiders seeing a glacial speed of development. In fact there is plenty of discussion on the mailing list, new people are appearing and writing patches, and we get a lot of testing feedback.

Another reason is since I’ve been spirited away by commercial and academic work pretty much constantly since 1.0 came out, it’s been a great chance to see the little team we’ve got work whilst I’m “away”.

I guess it’s not fashionable to say stuff like this, at least I don’t see open source maintainers say it very often, but I’m hugely proud of the guys who are currently hacking on autopackage. The project has gone from being just me to having a wonderfully talented and dedicated team. If I was running a company and any of them applied for a job, they’d get it just like that. Because I know they’re that good.

good people can’t be bought

I’ve been reading two things tonight – the story of Ion Storm and Daikatana, which talks about company culture (or the lack of it), and Joel Spolski. I should have been working but oh well, I finished an assignment earlier today so maybe I get a night off. One article that sticks in my mind is The Guerilla Guide to Interviewing, which interests me as I’ve only ever done a few job interviews at software companies and have never run one myself (I was kinda fortunate in getting most of my work over the past few years via headhunting ….). As I’m thinking of one day running my own company, it’s especially interesting.

The article is worth reading in full but it can be summed up simply:

Hire people who are smart
Hire people who get things done

I’m incredibly lucky that people like Hongli Lai, Curtis L. Knight, Isak Savo, Taj Morton and Rykel have chosen to sit down and work with me. I’m lucky that people like Rob Staudinger, Aaron Spike and Tim Ringenbach took the risk of trying a new thing and helping us through the teething problems. I’m still pretty young and inexperienced when it comes to the software industry, but I’ve already worked with a variety of people and have read stories of many others, so I can appreciate what we’ve got here. The value of these guys is simple – they’re smart and they get things done.

culture

I’m happy for another reason. In the past, when spending time on autopackage I’ve traditionally allocated a portion of my time to things that weren’t directly related to building the software itself. Things like dealing with email, working on the website, writing articles, chatting on IRC and even sometimes doing interviews. Long hours have been spent debating some philosophical point with somebody on IRC or the mailing list. Sometimes this felt like wasted time, but it had two important benefits:

  • It helped spread the word. Sometimes these people left unconvinced (or sometimes they convinced me that I was wrong!). But sometimes I changed their mind and they went ahead and continued the debate in other forums, arguing on our behalf. This is especially noticable in places like the Ubuntu forums, in which threads that talk about autopackage routinely run for months on end.

  • It helped cement the culture of the project, and pass on its ethos and values to my team members.

There are some aspects of the project which have been called out by spectators as rare or unusual. For instance, the strong focus on non-technical users (evident in the graphical how-to and Flash demo) or the total commitment to backwards compatibility. I think most successful companies have a set of core values and a strong culture. Many have strong visions of what they are for – the dreaded “mission statement” is usually written by companies who don’t have such a vision but feel they should.

A lot of the benefit of ‘wasting’ time debating the same points, sometimes over and over, in public forums is that people can watch and when they watch they absorb the values and visions we have for the project. The autopackage project doesn’t need an explicit mission statement because it’s obvious and can be figured out just by watching what we do.

… and how it took hold

Something that makes me very happy is when I see people like Isak and Taj debating whether a particular change will break backwards compatibility or not. There’s no ifs or buts here – no arguing over whether package developers who relied on a bug or underspecified feature “deserve to lose” – compatibility with 1.0 packages is essential for 1.2 and everybody knows that. People who disagree don’t flame or debate this with us, because they already decided their core values are incompatible with ours and went elsewhere. The backwards compatibility issue makes me especially happy because it’s a very rare thing in the open source world – some projects claim they guarantee backwards compatibility, but when it comes to the crunch they’d rather fix an obscure bug and break a popular app or (worse) language binding. I didn’t pull this value out of thin air, it was largely inspired by reading Raymond Chen and Spolski’s How Microsoft lost the API war.

Something else that makes me happy is seeing Isak redesigning the user interface and web site to be more user friendly. In some projects or work environments this just wouldn’t happen. The reasons are varied – perhaps nobody wants to offend the boss, perhaps there is a culture of things being “done” and then moving onto the next task meaning that nobody revisits what was already completed, or perhaps the budget just doesn’t allow it. The originals were largely done by myself and Hongli, and whilst they were not bad when we first released them hindsight is always 20:20 – the new versions are a big improvement and refine our focus on the non-technical end user still further.

The culture isn’t something I built myself. Lots of people had an effect on it. For instance, one thing autopackage has lots of is (IMHO) attractive and easy to read documentation, both for developers and end users. Usually this is Honglis work – he’ll take a boring piece of text like an API spec and dress it up with some nice CSS and images. The difference is surprising. He also really helped push the end user focus, with things like the how-to document. Now he is suffering the same fate as me, Isak is picking up the torch. Meanwhile Curtis continues the tradition of posting large patches for review – we all have commit access, but nonetheless the opportunity for patch review is always there and Curtis has been awesome at doing this properly even for large or complex features (often implemented entirely by him).

what next?

So now the trick is this. I won’t be around forever – this should be obvious. At some point, I will move on to pastures new and leave autopackage behind. I’m a firm believer in the benign dictator model for open source projects. Even if that dictator is pretty hands off (as I am right now) I think there needs to be somebody, worst case anybody, who can resolve disputes and potentially change the direction of the project. If there isn’t such a person then the project becomes effectively rudderless and usually ends up drifting and unable to adapt to big changes. The problem is, when I have several people on the project, all of whom are good – how can I choose a single person? Argh! Management is hard. Why the hell am I thinking of one day starting a company?!

arguing for fixed policy trust

January 16, 2006

There are three ways to approach the problem of determing the correct constraints for software, as discussed in a previous journal entry.

  1. Ask the user to grant permissions “just in time”
  2. Provide pre-written trust profiles that bundle together allowed actions, then let the user select each one
  3. Have some external authority determine trust. Won’t deal with this one today.

A real world implementation of the first is my Sony Ericsson W800 mobile phone. Like most J2ME phones, when a mobile app wants to do something potentially Evil, the application manager asks the user a simple question like, “Allow FooApp to take photos with the camera?” or “Allow FooApp to access the internet?”. Access decisions can be remembered. There are only 7 permissions:

  1. Internet access
  2. Messaging
  3. Automatic start
  4. Local connectivity
  5. Multimedia
  6. Read user data
  7. Write user data

These are all self explanatory. Even though apps can be quite sophisticated and do a lot of stuff, the permissions scheme is still intuitive, managable and it doesn’t normally get in your way. Note that “Read/write operating system internals” is not an option here, which is sensible because you can’t upgrade the OS by installing a J2ME Jar file. Instead you must use a SonyEricsson proprietary upgrade app which works via the USB cable. The different mechanism (wireless/over the air install vs USB) acts as simple trust cue: no app should ask you to install it in a totally different way to every other app. If it does that’s a good sign somebody is up to mischief.

Internet Explorer is an example of a program that uses the second approach. You can change the security level of each “zone”. In high security mode many things are disabled, like ActiveX controls and even cookies. In low security mode things are very permissive so you get max compatibility. Often a workaround for an exploit is to change security levels.

applicability to desktop operating systems

Sometimes people argue we should use the first system – selective granting of permissions – on desktop computers too. I’ve even seen somebody say with a straight face that it should be done on an individual syscall level, a suggestion that was rightly laughed at. But does this make sense? Is it possible to apply only 7 permissions to the full universe of software and get away with it? If not, how many permissions are too many? SELinux already has many, many different permutations of permission sets a process can have, most of them esoteric and low level (eg, connect to server X, ptrace, send data over message bus, screenshot an X window, access USB devices …)

So what about the second approach, that of fixed profiles which bundle up many different permissions? The user would assign an program a profile at install time. There are a whole bunch of different design issues here. For one, should you use an IE style “high security/low security” sliding scale? I don’t think so. How can the user objectively decide this? “High” is so relative it’s meaningless, there’s no way to figure out if an application needs high or low privs to work correctly.

A better style would be something like, “Internet Application”, “Game”, “Multimedia Application”, “Maintenance Tool” etc. But obviously much software can’t be pigeonholed like that. iTunes is both a Multimedia Application and an Internet Application. The next logical step is to let the user select more than one profile at once, the set of privileges granted to a process is then the union of all the selected profiles. It’s up to the user to decide what categories a program falls into, and up to the program to test for the exact privs it needs and if it doesn’t have them, ask the user to grant them along with an explanation of why it’s necessary. Arguably what we’ve done now is go back to the explicit privilege granting model except not done just-in-time and with renamed privs. But that’s not necessarily a bad thing.

Of course, users can still be socially engineered into granting excess privileges to an application which is why I argue the most important privs like operating system modification shouldn’t EVER be grantable by a user. But, through careful selection and authoring of profiles, I think we can help the user make the correct decisions much more often than they do now.

security and usability

January 12, 2006

Nobody likes security that gets in their way. A friend of mine found a 3com wireless router that didn’t have WEP enabled, which is common enough, but also had the default admin password. So he was able to get in and (if he had wanted to) reconfigure his neighbours router.

You can change the admin password when setting it up, but clearly people sometimes don’t, because it’s getting in the way of their goal which is to get online. And nobody wants another password to remember.

If I was designing a wireless router, I’d probably have a button on it that needed to be pressed to let a browser into the admin interface. Then I’d let you optionally set up email-address/password combos afterwards for the (few?) who wanted to reconfigure it remotely.

qed

January 10, 2006

A few weeks ago I did a post about apt-get and security, and how autopackage affects all that.

One point I made is that distros should not be able to control what software users install, because they are inevitably biased in their selections. No distro today is a shining tower of justice and equal opportunity – every one of them discriminates. Sometimes that’s because of ideology (Debian) or lack of manpower, and for distros funded by corporations it can be due to a ton of factors, most of which are invisible.

I used Mono and Fedora as an example of how this discrimination can be opaque and unaccountable, and got slammed in the comments by some people who felt Red Hats explanation of “legal reasons” was good enough, and how dare I suggest it was because Red Hat were aggressively funding GCJ/Classpath/Eclipse. It was said:

This is a poor example because this is in fact a legal issue. Whether you understand the legal issues involved in a detailed way to make such a claim is a different thing. Very few people do but to suggest that this is somehow non legal is a misleading claim

… and …

If you ever learned anything about legal issues it is that you dont explain everything in minute detail because everybody who wants to try and poke holes will do that anyway and nitpick.

Yet today, Chris Blizzard posts to his blog the following:

We’ve been the longest holdout in shipping mono. This was for a variety of reasons; Some were business-related and others were strategic in nature but those don’t really matter right now.

I don’t see the word legal in that. To me, this nicely rams home the point that controlled repositories are divisive and that even generally awesome corporations like Red Hat will end up abusing them.

Epilogue – I should note that I love Red Hat, and I think they’re one of the most impressive companies around. They managed to build a successful company employing hundreds of people based on a fundamentally idealistic premise: that software should be free. They’ve always been true friends of the community, and I’ll always be grateful for what they’ve done and continue to do. I’m picking on Red Hat here because Mono is such a neat demonstration of my point, but there are plenty of other examples. Nobody should interpret this post as a “Red Hat is evil” rant, because they aren’t. And the fact that eventually they came around to the communities decision and admitted they were wrong is a wonderful demonstration of that.

memory and managed code

January 7, 2006

Many new programmers are confused by a simple fact of industry – that Java is clearly easier and less bug-prone than C++, yet most programs they interact with daily are written in C++.

Microsoft popularised the term “managed code” to refer to software that is … well … managed, ie at runtime there is another program present supervising and helping the main program to run. Usually this helper program is called a virtual machine. So, I’ll talk about managed code and use that term to mean Java, any .NET targetted language, and also Python/Ruby.

It doesn’t take long until they realise why managed code has not taken over completely: simply running managed apps on your own desktop gives you a feel for it. Mark Russinovich articulated it well – they’re just SLOW and that means people hate using them, because running one or two drags their computer into the mud.

the many faces of speed

Programmers like speed and dislike crack. The jury is out on weed. Actually everyone likes speed but programmers like it most of all because without it their programs are uncompetitive. One reason Eclipse blew away NetBeans so fast is that it felt more responsive. One reason MS Word still blows away OpenOffice Writer is because the former doesn’t take 30 seconds to start.

Over the years lots has been written about the performance characteristics of Java. A few things we know – on a modern VM, managed code can execute extremely quickly. There are many benchmarks out there showing Java outperforming C++. The Fedora team are renaming their “Native Eclipse” to “Fedora Eclipse” after so many people asked why the so-called ‘native’ version was actually slower than the version running on Suns JVM. Tom Tromey summed it up well:


There is a common misconception that “AOT compilation == Super Duper Fast”. That isn’t true. Performance is tricky and the current crop of proprietary JITs are excellent compilers.

CPU-bound performance is, in other words a solved problem: even with the overhead of compiling code at the same time as running it, modern VMs have had such huge quantities of smart people thrown at them that they can run code very fast. There are a few CPU-bottleneck problems in the design of the Java language, I think C# fixed most of them. For instance in Java all methods are virtual, whereas in C# (like C++) you must manually specify it. Virtual methods are expensive, and not always required, so this makes sense.

But the sheer quantity of instructions that can be executed per second is only one factor influencing performance. There are several, but the one I’ll focus on here is memory.

why does managed code use so much memory?

I think the reasons excessive memory usage is bad are obvious. In every situation where managed code is used – desktop, server and embedded (think mobile phones) using too much memory is bad. Swapping on a desktop hurts end user experience, on a server it can cause the whole system to die, and on phones it increases the cost of the device for the consumer.

Whereas the raw runtime performance of Java and .NET have improved hugely over the years, memory usage generally hasn’t. It’s improved a bit. Here is a table comparing different early JRE versions:

Table 1. Measured memory usage (bytes)
  Content (bytes) JRE 1.1.8 (Sun) JRE 1.1.8 (IBM) JRE 1.2.2 (Classic) JRE 1.2.2 (HotSpot 2.0 beta)
java.lang.Object

0

26

31

28

18

java.lang.Integer

4

26

31

28

26

int[0]

4 (length)

26

31

28

26

java.lang.String
(4 characters)

8 + 4

58

63

60

58

This is showing the overhead inherant in every single object on the heap. In other words, this is what you pay just for having an object that doesn’t do anything.

For the most recent version of the Java VM, the overhead is 12 bytes – one word (4 bytes on a 32 bit machine) for identity hashcode and GC control data, one word to point to the class object (ie, the vtable pointer) and one word for the lock that every Java object is defined to have (Java calls them monitors).

In contrast, C/C++ structs have no overhead, and a C++ class usually only has the overhead of the vtable pointer. Oh, and sometimes the malloc bookkeeping data too.

As far as I can see, it’s very hard to fix this. Escape analysis has been proposed as a way to eliminate the monitor in some cases, but the real problem is Javas insistence on making everything a heap object which results in far more overhead than in an equivalent C++ app.

strings

Things get worse when we get to strings. C++ uses the same string protocol as C, where a string is a series of non-zero characters of arbitrary length, terminated by a null. Actually the runtime doesn’t force you to use this, and you can use pascal style strings as well (where the length is stored as the first byte/word), but you’ll have to roll your own if you want to do that.

For an ASCII string “Hello World” then C++ will use 12 bytes.

In Java, every string is also an object, so we immediately pay the 12 bytes of overhead there. Then, a string is actually a character array, and in Java arrays are objects themselves, so we pay another 12 bytes of object overhead. Arrays store their length as a machine word as well, so that’s another 4 bytes (28). We already blew double the C++ storage and we haven’t actually done anything yet! But it gets worse. Java is unicode-enabled, which is a buzzword compliant way of saying every character takes two bytes instead of one. This is silly because often strings represent system internal things like config file keys or xml tag names, and not end user things like button labels. System internal strings are 99% of the time ASCII, so that second byte will go unused. Still, you have to pay the price for all strings anyway. C++ can do unicode too, but you can usually choose your encoding – UTF8 for space efficiency, UTF16 for CPU efficiency. So anyway, now we add 22 bytes to store the string and we’re up to 50. Finally the String class has various other fields as well like offset, count and hash, so we have to add another 12 bytes to give 62 bytes for “Hello World”, compared to 12 for C++.

62 vs 12

Ouch! That doesn’t even include the 4 byte reference.

heap vs stack

Every Java type, except for primitives like integers, characters and booleans, are allocated on the heap. In C++ any complex type can be allocated either within another complex type, or on the stack, or on the heap. Smart decisions about where data should be stored can be a huge win.

In a previous journal entry I talked about stack allocation being a use for escape analysis, but C++ developers have been benefiting from this for years. Using the stack is good not only because it’s very fast to allocate and deallocate, not only because the stack is likely hot in the cache, but also because it reduces the amount of work the garbage collector has to do. Studies of Java programs have shown that most objects die young, that is, they don’t last very long relative to the others. This is the reason behind generational garbage collectors. In C++ the same problem is solved by allocating temporary die young objects on the stack, which reduces pressure on the heap manager significantly.

conclusion

Javas memory usage problems have many causes, most of them hard to fix. Even if Sun dropped the religious devotion to JIT compilation and spent the next year optimizing the VM, it wouldn’t help much because the Java platform itself lends itself to very memory-inefficient usage patterns.

Many people have found Ruby on Rails to be a very productive environment for web application development, and many people have wished that the benefits of Java/C# could be gained without the need for a hulking runtime environment and ridiculous memory usage. I’m sure that a dedicated team of smart people could produce a desktop equivalent to RoR – one that provides many of the benefits of Java-like languages in a form that can be comfortably deployed on desktop-class systems.

captcha

January 1, 2006

We’ve got a big problem with comment spam on the autopackage forums. The typical solution to this is a captcha. Existing captchas are almost all based around a warped or distorted series of letters in an image. This is often trivially defeatable using OCR and AI techniques, or even better, by using the lure of free porn to make humans solve them (by inlining them in the web page).

It seems to me that there must be better ways to design these things. Shape recognition is something that has been around in a commercial form for ages, but people can do far harder tasks than these which are still research problems for computers. For instance:

  • Answering arbitrary questions such as “New York is a ?????” (city) or “We are held to the ground by ?????” (gravity)
  • Selecting correct statements from a series of potentially incorrect ones, eg:
    • Dogs and cats are both animals
    • Books are usually made of metal
    • The rain makes things dry.
  • Complex image recognition – for instance, picking out a tank that is hiding in the trees
  • Language recognition. Pick out the animal-related sentence which makes sense from this nonsense text (I highlighted it here):


    Following an encounter this morning in the bishopric with teachers and students of the springs adding to its velocity. At each bounce, Mr. Coyote came into contact with the enormous mammaries. The prize was instituted in 1972 by a culture of suspicion, of desperation and of the Rocket Sled to aid him in pursuit of his prey. Upon receipt of the Rocket Sled. Again, Defendant sold over the palate, Sierra Nevada Porter also struck a good thing: the opinion I heard was that you could talk quietly and your voice would bounce off the ceiling a couple of bucks and watch any of several dozen shoot-em-ups, again without parental consent. Snakes have cold blood. So why doesn’t Warner Bros. take all Dirty Harry movies and cut four cables in that area, then the asphalt faded, replaced by dirt and gravel and I fudged history to make it into Mobicom and got rewritten as Internet-Drafts. There was a Historical Society in the old days. It’s funny, humor is. God damned is it funny.

There are other problems that aren’t necessarily hard for a computer – like speech recognition – but which could be made very hard by using other technologies like Windows Media DRM. There is currently no known way to decrypt a DRMd file to raw samples of the type you can feed to a recognition engine, so a captcha based on playing a short protected file of somebody talking would be effective as you can’t write a program to do it automatically.

free porn

Solving the free porn problem is hard too. Ultimately it relies on the ability to cut the captcha out of the original page and inline it into another one, then intercept the humans response. Preventing this is very difficult when you take into account things like framesets or popup windows (the human can be asked to look at the whole web page), though there are techniques which could defeat this too. For instance if the captcha response can’t be intercepted (eg if it’s encrypted), this would make it a lot harder to do. Flash provides such abilities, as well as making it a lot harder to process the input files mechanically.

If I was to design a new captcha tomorrow, I’d probably build a Flash program that downloads a photo at random from a library of several thousand images that a human could easily identify, then provides 5 possibilities for what it is to select from. 4 of them would be obviously wrong. The image would be warped/distorted in various ways to prevent trivial screenshot/image matching attacks. The Flash program would encrypt the response and directly submit it back to the originating server so short of hacking the Flash plugin or file itself there’s no way to mechanically figure out what the user entered.

If we’re allowed to use a separate plugin entirely then things get a lot more robust – for instance we could use the video mechanisms of the OS to colour-key a part of the screen and then feed it the puzzle as encoded video, preventing screenshot attacks entirely (of course really this should be a part of the display server security system, a la SE-X).

Long term we need a fully fledged, official and standardised infrastructure for providing “proof of life”. Until then a nasty arms race will have to do.