Short URL Auto-Discovery


author: Robert Spychala (http://twitter.com/robspychala) - 4/1/2009

version 1.0.1 - added diagram 4/2/2009
version 1.0.2 - added second diagram 4/3/2009
version 1.0.3 - added Bookmarklet 4/6/2009
version 1.0.4 - fixed Bookmarklet (@clint's suggestion), updated link spec, updated implemented list 4/7/2009
version 1.0.5 - fixed Bookmarklet (@amoebe's suggestion)
version 1.1.0 - removed to rel="alternate" from feedback (Sam Johnston) and case against rev=canonical (John Gruber) 4/12/2009
version 1.1.1 - reverted back to the original "shorturl" 4/13/2009
version 1.1.2 - http://metamark.net/ suggested to use redirect HTTP code 301 - 4/16/2009
version 1.1.3 - added http://shorturl.appjet.net/ - 4/16/2009
version 1.1.4 - modfied Bookmarklet to use shorurl.appjet.net API 4/29/2009
version 1.1.5 - appjet.net discontinued their service so I moved the bookmarklet to http://relshorturl.appspot.com/ - 7/26/2009

Summary

Short URL auto-discovery is a simple way to link a long URL with a short URL. The following code should be placed in the <head> section of the HTML page.

<link rel="shorturl" href="http://short.com/1234" />

or add the following to the HTTP Headers of the page

Link: <http://short.com/1234>; rel=shorturl

In most real-world situations, the short URL then redirects with an HTTP code 301 to the long URL, but that behavior is not covered by this RFC.

That's it! :) try it at: http://relshorturl.appspot.com/

why not use rel="alternate .... "

Sam Johnson pointed out alternate doesn't make sense since it implies a link to same content but different format like PDF for example

why not rel="shortcut"

Shortcut in the web context is not well understood nomenclature when referring to short URLs (fine to define shortcut icons with rel="shortcut icon" though and if we wanted to follow that model (adjective noun) we'd use rel="shortcut url", but that seems excessive)

Potential legacy code breakage as suggested by http://twitter.com/soypunk/status/1509403319

Also somehow shortcut seems like the wrong wording... implies a link that will bypass something ... a splash screen, etc.

why not rel="shorter" or rel="short"

Implies shorter version of the content

why not rev="canonical"


rev attribute is absent from HTML5, confusing with rel="canonical" and breaks Google's proposed definition of canonical for search purposes.


Part of making a new RFC to describe a simple concept is simple naming. People know that a URL is what's in the location bar in their browser. Besides we'd never see a URI that's not an URL in this context.

why not rel="short_url"

The _ is ugly.

why not rel="shortlink"


nice, but not that much different from shorturl to warrant a change IMHO. shortaddress? shortlocation? there are tons of other options ex: tinylink? If consensus is that it's better i'll switch for sure. better to have one way to do a simple thing, but i just don't want to knee-jerk change it cause people already implemented rel="shorturl".

The Problem

Over the past few years SEO efforts lent to longer and more descriptive canonical URLs for content pages. During this time URL shorteners such as tinyurl.com and others came in to help undo that trend and make URLs fit into limited 140 character situations for sites like twitter.com or SMS messages.

Unfortunately, URL shorteners lose link information. As a user it is valuable to know that a link you're clicking is going to a content site such as youtube.com, nytimes.com or a potentially harmful and malicious site.

Furthermore, the site that is represented by the shortened URL might have already been visited by you - and clicking on it again might not have been the intent of the user.


Suggestions from community


http://laughingmeme.org/2009/04/03/url-shortening-hinting/ (rev="canonical")

I personally don't like this solution for the main reason that it took me a few seconds to "get it." i don't think simple concepts like this should be so complicated.

I like http://revcanonical.appspot.com/ and Kellan Elliot's suggestion of adding "alternate" into the mix with <link rel="alternate shorter" href="..."> ... although I think "alternate short" makes more sense. People emailed me suggesting changing shorturl to short and I agree that the underscore and the word "url" seem excessive. (edit 4/13/2009 alternate doesn't make sense)

Also I think maybe using rev=cononical mixes its original intended use to tell search engines to get rid of session GET parameters from URLs as published by google.

http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html

ex if i try to bookmark:

http://example.com/1234.html?session_id=1111

http://example.com/1234.html?session_id=2222

and they have <link rev="caonical" href="http://bit.ly/1234">

it would imply that both those are the canonical for http://bit.ly/1234 ... and really only the http://example.com/1234.html URL is the canonical.

Also i just read that the rev attribute is gone from HTML 5 (i would guess cause it wasn't being used.)

http://www.w3.org/TR/html5/semantics.html#the-link-element


http://userscripts.org/scripts/show/40582 (thanks @THE_REAL_ROSS and waxy.org/links for the link)

http://joshua.schachter.org/2009/04/on-url-shorteners.html (joshua schachter blog entry on the subject.)

http://www.scripting.com/stories/2009/03/07/solvingTheTinyurlCentraliz.html (Dave Winer)


The Proposed Solution


Sites that implement #shorturl RFC as of 4/13/2009


Auto-discovery for short URLs

The proposed solution is to provide a mechanism to auto-discover if a long URL url that is entered in by a user has a canonical short URL that is prefered over bit.ly, tinyurl.com, etc.

URL Providers history (tumblr.com, snaplog.com, blogger.com, et al.)

Some websites already have the functionality to shorten their URLs. For example on snaplog.com a URL can be accessed 2 ways:

http://robert.snaplog.com/:E7d/brooklyn

or the shortened way:

http://snaplog.com/:E7d

Currently there is not a standard way to show a relationship between the long URL and the short URL for a page.


via <HEAD> addition

Content html pages would programatically describe the shortened URL by embeding a <link> attribute in the page.

<link rel="shorturl" href="http://short.com/1234" />

The above should appear once in the <head> of the html document.


via HTTP Headers

Link Header Draft (linked from Sam Johnson's blog)

Link: <http://short.com/1234>; rel=shorturl

Parsing algorithms should match for the existence of the shorturl string in the rel attribute.


URL Consumers (twitter.com, Tweetie.app, Twitteriffic.app, et al.)

Twitter clients and other consumers would load the URL that the user is embedding during posting time and check for the <link rel="shorturl" href="..." /> tag. If it exists the application should honor the href value and use it in place of the original URL.

If the <link rel="shorturl" tag is missing then the application will shorten the URL with tinyurl.com or another simiar site.


Bookmarklet


Flow Diagram


Shows the URL Consumers checking the website for auto-discovery tag information.



Shows the URL shortener tools such as tinyurl.com checking the website for auto-discovery tag information.



Challenges

This proposed API would have to be implemented by both parties to be successful:
  1. content providers like snaplog.com, tumblr.com, blogger.com, typepad.com et al.
  2. the twitters, and facebooks, their blogging tools and clients to check for the shorturl tag

Other <link> based auto-discovery approaches

RSS link Autodiscovery:

http://jeremy.zawodny.com/blog/archives/000967.html

Share Partners:

http://www.facebook.com/share_partners.php

http://digg.com/tools/thumbnails

Favicon:

http://en.wikipedia.org/wiki/Favicon (thanks http://news.ycombinator.com/user?id=sam_in_nyc)

OpenSearch:


Request For Comments / Feedback (RFC):

on twitter use #shorturl with feedback


Č
ć
Robert Spychala,
Apr 17, 2009, 5:05 AM
ą
Robert Spychala,
Apr 16, 2009, 8:44 PM
ą
Robert Spychala,
Apr 16, 2009, 8:44 PM
Comments