Consider the humble URL. In a few short years it’s become so ubiquitous as to be rendered invisible. It’s hard to imagine a world without it, and it’s hard to remember that there was once a time when not having a uniform means of locating resources was considered a fundamental stumbling block to the deployment of any large-scale hypertext system — never mind a world-wide one.
But despite the universality of URLs, we often forget that they’re not just a handy way to address network resources. They’re also valuable communication tools. They help orient users in your architecture, and can suggest whether other options are available.
In Edward Tufte’s classic book The Visual Display of Quantitative Information, he coins the term chartjunk to refer to needless visual flourishes that contribute nothing to the effectiveness of an information design in communicating to its audience. These days, our URLs are loaded down with something very similar: long strings of characters that exist only to satisfy some technical constraint, detracting from the effectiveness of our URLs as communication tools. Call it CMSjunk.
Back when every asset was a file in a directory on the server’s hard drive, we had to give them sensible names, like products.html or bookcover.jpg. This was primarily for our own convenience; when your content management process involves poking around in a filesystem, names you can remember and scan the screen for work much better than names you have to write down and compare character for character.
This practice of giving files easy-to-remember names had one unintended side effect that benefited our users as well. Not only did our filenames make it easy for us to remember which file was which, it made it easy for our users to tell them apart when they appeared in the URLs on oursite.
The advent of content management systems has been a boon in many ways, but the readability of URLs is not one of them. Databases don’t give assets names; instead, they need formulas for retrieving those assets. CMS developers, figuring nobody reads URLs, simply embedded those formulas right there. Sometimes that would manifest itself as just an inscrutable number; at other times, the URL would include a whole string of parameters needed for the CMS to function.
One of my favorite examples of user-hostile URLs is Toronto’s Globe and Mail newspaper. Every URL from this site seems to be loaded down with CMSjunk:
http://www.globeandmail.com/servlet/ ArticleNews/PEstory/TGAM/20020909/RVCRR/ Business/business/business_temp/2/2/5/
In case the URL didn’t tip you off — over and over and over — this is a story from the Business section. This isn’t even as bad as the Globe and Mail URLs get, though. Here’s a different URL for the very same story, returned by the site’s search engine:
http://www.globeandmail.com/servlet/ GIS.Servlets.HTMLTemplate?current_row=3 &tf=tgam/search/tgam/SearchFullStory.html &cf=tgam/search/tgam/SearchFullStory.cfg &configFileLoc=tgam/config &encoded_keywords=dvd&option= &start_row=3&start_row_offset1= &num_rows=1&search_results_start=1 &query=dvd
Of course, the staff of the Globe and Mail shouldn’t bear all of the blame for this mess. After all, many of the most successful commercial CMS vendors require their customers to go to extraordinary lengths to implement an alternative to the cumbersome strings of commas, dashes, and digits their systems generate by default.
Compare the above to URLs from a couple of the Globe and Mail‘s competitors:
There’s still some room for improvement here, but at least you could read these URLs to someone over the phone. Some corporate sites go one step further: not only are their URLs human-readable, they’re also human-guessable.
Each of these URLs contains a product name. Swap out that product for another of the company’s products — such as ipod instead of powerbook, or acrobat instead of photoshop — and the URL still works. So if you know the URL for a particular bit of information about one product, you can easily find that same information about any of the company’s other products.
Some sites go even further with this principle of supporting URLs that people can guess. I almost never have to navigate Microsoft’s site because I can often guess a URL for the content I want. Having trouble with Visio? Try microsoft.com/visio. Need a new driver for that spiffy optical mouse? microsoft.com/mouse will get you there.
Note that these aren’t the actual URLs of the pages I land on. Instead, these top-level redirects transport me to the page I’m looking for. So microsoft.com/visio becomes http://www.microsoft.com/office/visio/default.asp and microsoft.com/mouse becomes http://www.microsoft.com/hardware/mouse/default.asp. Microsoft’s site doesn’t scold me for not knowing the correct address — it just takes me there. And seeing that new URL http://www.microsoft.com/hardware/mouse/default.asp in my browser might just encourage me to explore what other content might be filed under microsoft.com/hardware.
This approach isn’t unique to Microsoft; top-level redirects are being used on many other sites. The next time you find yourself about to go hunting for product information on a corporate site, try guessing at a top-level redirect. You might be surprised at what you find.
Some might argue that, in a perfect world, URLs would be used only by machines, hidden entirely from users. But in our imperfect world, users have come to depend on URLs to communicate key information as they navigate through the Web. Systems that don’t take this user behavior into account pull the rug out from under users who have come to rely on readable URLs. Recognizing that people really do read URLs — and, in turn, making those URLs easy for people to read — is really just an extension of the user-centered philosophy of design. It’s all about creating systems that work the way people work, rather than the way technology works.