Markup, not HTML, for source code
15 May 2008, by Ben add a comment
Jeff Atwood’s idea to use HTML for his upcoming stackoverflow.com site is not a terrible one, except for one thing: source code. Even inside <pre> tags, < and & characters aren’t escaped. So instead of saying
if y < 32:
x = (y << 3) & 0xFF
you have to write
if y < 32:
x = (y << 3) & 0xFF
Please, no!
Admittedly, it wouldn’t be hard for them to have their back-end convert these characters to HTML entities if they’re inside a <pre> tag. But then you can ask, is it really HTML anymore?
And would it correctly interpret “if n<pre: x >>= 3”?
Even apart from all that, however, I much prefer the simplicity and relative beauty of a good markup language (read: Textile or Markdown). And it’s hardly a learning barrier if you’ve got a pop-up “help window” nearby, like reddit or this blog do (see “formatting help” below).
Decent: cool, minus the cucumber.
29 April 2008, by Ben 2 comments
Decent.
The word’s just got a ring to it.
Yeah, pretty decent.
It’s like, totally way better than Cool. Same meaning and all that, but without those frigid “as a cucumber” connotations. Plus, it’s got a much bigger moral and artistic backbone.
Trust me, all the decent people are doing it. All the people here, at my place. Well, I am, at any rate. Woke up one morning and just switched. Easy as s/cool/decent/g.
Chesterton said we should break the conventions but keep the commandments. So why not start with your lingo, blingo, rap rap, you know what I’m sayin’?
C’mon, join the revolution: bring decent to a town near you.
Decent, man.
P.S. Yep, I’m afraid this is tongue-in-cheek. Though I have “switched” — and years before DecentURL was conceived. Then again, maybe it’s just getting late. :-)
Tetris, the new FizzBuzz
23 April 2008, by Ben 5 comments
No, I don’t really think Tetris is going to become the new FizzBuzz anytime soon. But a few days ago a colleague and I were talking about Tetris, and he mentioned that it’s apparently the most ported game out there. Not hard to believe — it’s fun, simple, and doesn’t require any fancy graphics.
Well, I’ve never actually written a Tetris clone. (I think you’re allowed to say that as a programmer.) Mainly I was curious how long it’d take. Things always take longer than you expect, so I guessed maybe half a day or a couple of evenings.
Turns out it wasn’t too tricky — I was having fun with the finished product in a short while (about two and a half hours). Almost half of that was spent on the “graphics” and keyboard input.
I wanted to time myself excluding writing the always-fiddly I/O, so what I did was start by writing four simple functions:
void setup(void); void teardown(void); void putsquare(int x, int y, int colour); int waitkey(void);
I wrote a fairly basic (but colourful) implementation of these for the Windows console, figuring that I really wanted to concentrate on writing the gameplay, not spend time on fancy graphics. (Half of my game projects as a teenager died because I started with the graphics, and keeping track of all those VGA registers just got too hard.)
So here’s the challenge: take my Tetris I/O header file and my Win32 implementation, and see how long it takes to write your own tetris.c.
Or choose your favourite language and write your own versions of the I/O routines, and then time yourself writing the gameplay.
Email me your own version, and I’ll post it here along with mine in a couple of days. And no cheating, Googlers. :-)
Just for fun, for the version I like best I’ll give $15 for the author to pledge to a project of his choice over at microPledge, or a DecentURL premium subscription if you prefer.
Update: I’ve now added my version below.
microPledge, customized.
11 April 2008, by Ben add a comment
For almost a year now, microPledge has been helping fund various open source projects. But we’re now also branching into the business world.
The basic idea is tailor-made versions of microPledge that businesses can use for their own products or features.
This extends easily to non-software products such as books and movies. For example, if a book publisher wants to get a feel for how popular a book will be before having to publish it, he can post a “project” on his book-tailored microPledge.
Potential readers can pledge money towards books they like. If a book’s pledges reach a level the publisher’s happy with, he goes ahead and does his publishing magic, and everyone’s happy. (But if the target’s not reached, he knows there wasn’t enough interest, and nobody’s lost anything.)
- The ideas stage: You propose a new product or feature.
- Get committed customers: Your customers pledge money to your product. This builds up in our trust account to pay you.
- How you plan: The amount pledged gives you a way to gauge demand for your product before it’s even designed.
- Getting it done: Pledgers get the finished product (or money-back security).
We already run a similar service targeted at open-source software — feel free to try that out at microPledge.com.
We’re offering businesses tailor-made microPledge sites, and we have various levels of customization in mind, priced according to your needs:
- Basic customization and hosting, where you get microPledge running on your own domain with your logo, integrated with your own website.
- A private site with a separate database and an authorization option so only users of your products can sign in.
- Totally customized look and feel (e.g., book publication site) with additional branding options.
Please have a look the PDF version of our brochure. Alternatively, we can send a glossy, printed version of this brochure to you or your manager on request.
For more information, please contact us.
GET, POST, safety, idempotency
29 March 2008, by Ben 7 comments
About a year ago I wrote a short article asking when it’s okay to use GET to do POST’s job. Since then I’ve learnt a bit about web standards and web pragmatism, but also about the specifics: safety and idempotency.
Recently I found a blog devoted to well-designed URLs. I love well-designed URLs (enough to have made DecentURL.com, a web service that turns ugly URLs into decent ones). For instance, there’s no question about which of these URLs is better:
http://micropledge.com/projects/modwsgi http://micropledge.com/app.cgi?type=3&id=2401
Unfortunately, I think Mike Schinkel got a bit carried away in his somewhat ranty post about how SnipURL’s GET-based API is bad.
On the web, it’s not simply a case of “GET is always evil for requests that aren’t safe”. I’m afraid he’s ignoring the evidence that GET sometimes just works better. Paul Buchheit sums it up nicely:
There’s no question that POST is the “right” way and generally safer, but sometimes it’s annoying. Even though GET isn’t “supposed” to work, it often can be made to. Don’t believe me? Google does billions of dollars a year in GET based transactions in the form of CPC ad clicks (which can cost over $50/click).
Okay, perhaps I’m a touch biased. Perhaps I’m being a little defensive because DecentURL’s API works just like SnipURL’s. :-)
But apart from the pragmatics (“it works”), I believe Mr SnipURL and I are sticking to the standard. It’s not just about GET vs POST. There’s this other little point: the distinction between safe and idempotent.
Safe means a request doesn’t cause any side effects. A safe request just grabs data from a database and display it. Static pages, browsing source code, reading your email online — these are all “safe” requests.
Idempotent means that doing the request 10 times has the same effect as doing it once. An idempotent request might create something in a database the first time, but it won’t do it again. Or it’ll just return the reference to it the next time around. As a friend said to me:
From the browser’s perspective, there is no difference than if the response had always existed for all time prior to the first request. One can cache that response without any perceptible effect, for instance, and bots can request it again and again without damaging anything.
Idempotent is exactly what creating a DecentURL or a SnipURL is, and it’s why we’re allowed to use GET. You do it the first time, and the service creates a record in the database. But there’s no harm in GETting it again — the service simply grabs the existing database entry.
As the HTTP standard notes:
In particular, the convention has been established that the GET and HEAD methods SHOULD NOT have the significance of taking an action other than retrieval. These methods ought to be considered “safe”.
It’s a “should”, a rule of thumb for good reason. But what the standard does actually mandate is that GET must be idempotent (see 9.1.2).
And it seems like the makers of all the popular URL redirection services realise this (TinyURL, SnipURL, Metamark, notlong.com). All of those services either use GET normally, allow GET, or allow GET in the API call that creates a URL.
But what about the spider trap that Mike proposes? (See the PHP code in his blog entry.) Won’t it fill our databases? Won’t it suck our non-safe services into recursive oblivion? Well, apparently it doesn’t happen, or TinyURL and co would have big problems on their hands.
Also, if you create a decent robots.txt, you’ll stop spider-trap problems, at least from good spiders. And if it’s a spider with malicious intent, it could just as easily break a POST-only service as an allow-GETs service. Just like with XSRF, using POST isn’t a catch-all for baddies.
An author who groks hackers
18 February 2008, by Ben add a comment
I’m half way through reading Quentin Schultze’s Habits of the High-Tech Heart, a book about the ethics of all this technology we bask in. It’s quite a good book, certainly very thought-provoking for a geek like me.
Anyway, I was happy (and a little surprised) that he groks us hackers. It seems he understands the term and the culture fairly well, which is kind of unusual for authors. To quote a couple of paragraphs from chapter 4:
Often the people who identify high-tech moral issues are not in-house managers or technicians but rather extra-organizational “hackers.” The media portray hackers as evil technological zealots, when in fact they are frequently creative and hardworking people who strongly desire to advance information technology for the good of society. Hackers are different from “crackers,” who crack into computer systems often with self-serving and even malevolent intentions. The term “hacker” originated as a description of writers of software who “hack” together the code that makes programs work. But the idea of hacking has come to mean in some technology circles a kind of good-intentioned surveillance of cyberspace with an eye toward helping the common users avoid being exploited—a cyber-version of Robin Hood. The “open source” movement, which supports making nonproprietary software code available to all interested parties, is one example of this altruistic intent; long before this movement, however, hackers shared code with each other. To a typical hacker, poorly written code is a testament to foolishness. Moreover, unstable or insecure code is morally wrong. Most hackers exploit network weaknesses in order to document how vulnerable such systems really are, not to steal information or destroy an organization’s technology.
The truth is that hackers are sometimes the only people willing to point out the moral foolishness behind cyber-hype. In the current era of widespread cyber-foolishness, hackers are whistleblowers who alert the wider world to the folly of corporations and governments that operate vulnerable technological empires. If hackers can break into a federal agency’s computer, for instance, so can other governments or terrorists. Hackers embarrass us all by demonstrating that our information systems are overly touted. They often reveal the imprudence within mainstream technology endeavors, becoming the de facto consciences among technologists. Whereas businesses and governments tend to ignore the larger moral issues, except as they affect the bottom line, hackers are often the only people who will admit publicly that the techno-emperor is naked. Some hacker groups even hold to a “hacker ethic,” whereas most corporations and nonprofit organizations never establish significant ethical standards for their own information technology departments.
Maybe I just like it because he’s patting hackers on the back :-), but I do think he’s onto something. I wish the book were a bit more punchy and concise, but it’s definitely worth a read.
SOAP won’t make you clean
6 February 2008, by Ben 18 comments
I am something of a minimalist, so maybe it’s just me, but for a while now I’ve had bad feelings about SOAP. (Yeah, I mean the XML-based remote procedure thingy, not the stuff you wash your hands with.)
However, it wasn’t until I implemented a simple query to get my PayPal balance that I had actual evidence. Here’s how you get your balance …
First send this XML to https://api-3t.paypal.com/2.0/:
<?xml version="1.0" encoding="UTF-8"?>
<soapenv:Envelope
xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<soapenv:Header>
<RequesterCredentials xmlns="urn:ebay:api:PayPalAPI"
soapenv:actor="http://schemas.xmlsoap.org/soap/actor/next"
soapenv:mustUnderstand="1">
<ebl:Credentials xmlns:ebl="urn:ebay:apis:eBLBaseComponents">
<ebl:Username>U</ebl:Username>
<ebl:Password>P</ebl:Password>
<ebl:Signature>S</ebl:Signature>
</ebl:Credentials>
</RequesterCredentials>
</soapenv:Header>
<soapenv:Body>
<GetBalanceReq xmlns="urn:ebay:api:PayPalAPI">
<GetBalanceRequest>
<Version xmlns="urn:ebay:apis:eBLBaseComponents">2.30</Version>
</GetBalanceRequest>
</GetBalanceReq>
</soapenv:Body>
</soapenv:Envelope>
And wait for this equally lovely-looking response:
<?xml version="1.0" encoding="UTF-8"?>
<SOAP-ENV:Envelope
xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:SOAP-ENC="http://schemas.xmlsoap.org/soap/encoding/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:cc="urn:ebay:apis:CoreComponentTypes"
xmlns:wsu="http://schemas.xmlsoap.org/ws/2002/07/utility"
xmlns:saml="urn:oasis:names:tc:SAML:1.0:assertion"
xmlns:ds="http://www.w3.org/2000/09/xmldsig#"
xmlns:market="urn:ebay:apis:Market"
xmlns:auction="urn:ebay:apis:Auction"
xmlns:sizeship="urn:ebay:api:PayPalAPI/sizeship.xsd"
xmlns:ship="urn:ebay:apis:ship"
xmlns:skype="urn:ebay:apis:skype"
xmlns:wsse="http://schemas.xmlsoap.org/ws/2002/12/secext"
xmlns:ebl="urn:ebay:apis:eBLBaseComponents"
xmlns:ns="urn:ebay:api:PayPalAPI">
<SOAP-ENV:Header>
<Security xmlns="http://schemas.xmlsoap.org/ws/2002/12/secext"
xsi:type="wsse:SecurityType">
</Security>
<RequesterCredentials xmlns="urn:ebay:api:PayPalAPI"
xsi:type="ebl:CustomSecurityHeaderType">
<Credentials xmlns="urn:ebay:apis:eBLBaseComponents"
xsi:type="ebl:UserIdPasswordType">
<Username xsi:type="xs:string"></Username>
<Password xsi:type="xs:string"></Password>
<Signature xsi:type="xs:string">S</Signature>
<Subject xsi:type="xs:string"></Subject>
</Credentials>
</RequesterCredentials>
</SOAP-ENV:Header>
<SOAP-ENV:Body id="_0">
<GetBalanceResponse xmlns="urn:ebay:api:PayPalAPI">
<Timestamp xmlns="urn:ebay:apis:eBLBaseComponents">2008-02-06T00:29:17Z</Timestamp>
<Ack xmlns="urn:ebay:apis:eBLBaseComponents">Success</Ack>
<CorrelationID xmlns="urn:ebay:apis:eBLBaseComponents">9ed8e32f98405</CorrelationID>
<Version xmlns="urn:ebay:apis:eBLBaseComponents">2.300000</Version>
<Build xmlns="urn:ebay:apis:eBLBaseComponents">499645</Build>
<Balance xsi:type="cc:BasicAmountType" currencyID="USD">1234.56</Balance>
<BalanceTimeStamp xsi:type="xs:dateTime">2008-02-06T00:29:17Z</BalanceTimeStamp>
</GetBalanceResponse>
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>
Then you parse out the only thing you care about, the number 1234.56 inside the <Balance> tag.
There’s something a little bit wrong with having to process three pages of XML when all anyone wants is a 7-byte string. Oh, and maybe the fact that it’s in USD (another 3 bytes).
Contract that to a more RESTful and plain-texty approach, also known as KISSing, where the entire request and response would be:
https://api.paypal.com/balance?username=U&password=P&signature=S
1234.56 USD
(Admittedly, PayPal does support a simple “name-value pair” approach with many of their API calls, but for some reason not this one.)
I’m sure some people will say, “But you never see that ugly XML if you use libraries.” Sure, you can hide most of the ickyness of SOAP behind bloated XML parsers and WSDL libraries, but why did you need the uglyness in the first place?
So, next time you’re thinking of implementing a SOAP server, think again. Hearken instead to the cries of Roy Fielding and Douglas Crockford.
Bounties for bug fixers: a bug-tracker plugin
4 February 2008, by Berwyn one comment

Want to promote a bug fix or new feature? An extremely simple microPledge-based solution for your favorite bug-tracker is on its way. Here’s how it works:
You’ve just found an annoying bug in your cutting-edge Instant Messenger: when you type a smiley, it sends out a frowney. Humph. What a nuisance.
You’d like to do something about it, but you know that if you file a bug report on their bug tracker, it won’t get looked at for 3 months because it’s a cutting-edge IM client, and the authors are all excited about the new voice chat feature they’re working on.
Wouldn’t it be nice if you could provide incentive to the developers to fix your bug? You guess there are at least 100 other people who are annoyed by the bug. If you could team up with them … let’s say you could all pledge $5 to the fix. I’ll bet you’d have a developer on the case in minutes. Even most rich developers that I know will snap up $500 just to change frowney to smiley.
Well, here comes Fund-a-bug. This project represents the first endeavor to integrate a bug tracker with microPledge. The Trac system is an obvious choice to start with because of its wide-spread use and its simple way of writing plugins.
Of course, you wouldn’t need this plugin to start a microPledge project to fix your bug. But writing up a project would take effort. And getting people to know about the funding project would be a challenge. But with this Trac plugin, the bug appears right there in Trac where people are looking. And all they need to do is click “Fund this bug” and punch in a dollar figure. How simple can it get? The microPledge project will be automatically created as required. Cool.
Link rot, soft 404s, and DecentURL
25 January 2008, by Ben 2 comments
Go straight to the “soft 404″ detector code.
The problem
So, you’ve just put together a really good-looking résumé, saved it out as a “preserve my formatting” PDF file with clickable links, and you’re ready to go job-hunting. You email it out to a bunch of promising companies, not to mention a few recruiting agencies, just in case.
Then, horror of horrors, the previous company you worked for does a “website upgrade”, changing the structure of all their web addresses. Suddenly half of the links in your résumé — which is already in the hands of potential employers — are broken. Dead, link-rotting away. Not a good look for someone who’d called himself an “accomplished web developer”.
A solution
You hit yourself and wished you’d piped your URLs through some URL redirection service that allowed you to change where they pointed to later. Happily, this is one of the services DecentURL provides.
Then you think, “Hey, it’d be nice if the redirection service could automatically email me when my links went bad, so I didn’t find out three weeks later from my friend’s cousin’s son.”
But (what a coincidence!) DecentURL’s premium services do that too. I’ve implemented a system that checks your URLs for dead pages every three days, and if any of them are bad, it lets you know.
Soft 404s and cleverly detecting dead pages
It turns out to be non-trivial to detect dead pages. Some web servers, instead of returning Not Found on dead pages (the 404 error code), return OK (200) and present you with the home page, or redirect you somewhere else. (I wish we could all just follow the standards.)
Alas. Here I’d thought that checking for dead pages would be this simple:
def is_dead(url):
try:
fp = urllib2.urlopen(url)
fp.read()
return False
except urllib2.HTTPError:
return True
So I dreamed up a few ad-hoc ways to try and detect fake error pages (does the URL give me the home page? if so, it’s a bad link), but then I discovered a paper on the web’s decay by some IBM research guys.
Section 3 calls the fake 200 OK errors “soft 404 pages”, and gives some pseudo-code for and an explanation of a fairly simple and general algorithm for detecting dead pages.
I’ve turned this into little Python library, soft404.py. Feel free to use that in your own stuff — though I’d be interested in hearing about what you’re working on if you do.
How it works
Here’s just a quick overview of the algorithm, taken from the comment at the top of my code:
Basically, you fetch the URL in question. If you get a hard 404, it’s easy:
the page is dead. But if it returns 200 OK with a page, then we don’t
know if it’s a good page or a soft 404. So we fetch a known bad URL (the
parent directory of the original URL plus some random chars). If that
returns a hard 404 then we know the host returns hard 404s on errors,
and since the original page fetched okay, we know it must be good.
But if the known dead URL returns a 200 OK as well, we know it’s a host
which gives out soft 404s. So then we need to test the contents of the
two pages. If the content of the original URL is (almost) identical to
the content of the known bad page, the original must be a dead page too.
Otherwise, if the content of the original URL is different, it must be a
good page.
That’s the heart of it. HTTP redirects complicate things just slightly, but not much. For more info, see my code or read the paper.
The end
You’re still reading? Good going. I’d be honoured if you’d sign up for DecentURL’s premium services, which use this algorithm, otherwise just have fun using the code!
Ten quirky things about Python
16 January 2008, by Ben 37 comments
Just thought I’d share a bunch of neat (and weird) things I’ve noticed about the Python programming language:
- You can chain comparisons as in
assert 3.14 < pi < 3.15. It’s a neat equivalent ofassert pi > 3.14 and pi < 3.15that you can’t do in most other languages.
Ints don’t overflow at 31 (or 32) bits, they just get promoted to longs automatically. And long in Python doesn’t mean 64 bits, it means arbitrarily long (albeit somewhat slower). In fact, it looks like in Python 3000 there won’t even be the int/long distinction.
Default values are only evaluated once, at compile-time, not run-time. Try
def func(A=[]): A.append(42); return Aand the A-list will grow between calls. The Python tutorial has more.When concatenating strings,
''.join(list)is much faster thanfor x in list: s += x. In fact, thejoinis O(N) whereas the+=is O(N²). There’s been a lot of debate about making this faster, and it looks like it should be faster in Python 2.5, but my tests show otherwise. Any ideas why?The syntax of
print >>file, valuesis just plain weird. Not to mention the spacing “features” ofprint. I’m glad to hear that for Python 3000 they’re making print a function, and one with more sensible habits.You can create a one-element tuple with
(x,). Tuples are normally written(x, y, z), but if you go(x)Python sees it as just a parenthesised value.And for all those times you reference methods of integer literals, you can go
(5).__str__. You’d think it’d be just5.__str__, but the parser thinks the5.is a float and then gets stuck.You can use properties instead of getter and setter functions. For example,
serial.baudrate = 19200can setserial._baudas well as running some code to set your serial port’s bit rate.An
else:clause after aforloop will be executed only if the loop doesn’t end viabreak. Quite useful for search loops and the like — in other languages you often need an extra test after the loop.You tell me yours. Ha! You thought you were going to escape. :-)

