Extend HTML with your own metadata

Ah, the endless joy of specs
You know that thing about space shuttle rockets being two-horse-asses-wide. If you want to save some time, the motto of the story is Specifications and bureaucracies live forever.

I do love HTML despite the fact that we are still tied to the HTML 4 spec (1999). I would find use for a couple more attributes, call it semantic sugar or whatever.

Let's talk about an example use case to validate some number between 0 (inclusive) and 100 (exclusive).

Option 1: inject JSON attributes

<input type="text" name="foo" id="foo"/>

<script>
// one of these must be added per input field
Object.extend($('foo'), {
   min: 0,
   max: 100,
   excludeMin: false,
   excludeMax: true   
});
</script>

pros

  • Even my grandma can get this code.

cons

  • With any mid-size form there will be like a ton of script tags opening and closing all the time. Even if you manage to put them all together, you will not feel very proud of it.
  • Every input field must have a unique ID attribute to locate it for extension.
  • You should NOT generate javascript code from your server side. Ever.
  • The most important: the DOM node must exist prior to extending it. This is an uncomfortable restriction about where to put this javascript snippet, most probably after a </form> closing tag which is a serious problem if you plan to develop a tag library where you do not know the surrounding HTML code.

Inject attributes using CSS

Lots of people do this (jQuery plugin authors, for example):

<input type="text" name="field" class="extended __min_0 __max_100 __excludemin_false __excludemax_true"/>

<script>
// this must be called just once for the entire page
document.observe('dom:loaded', function() {
 $$('.extended').each(function(element) {
   element.className.scan(/__(\w+)_(\w+)/, function(p) { element[p[1]] = p[2] });
 });
});
</script>

pros

  • It does not have any of the JSON problems.
  • It is a valid use of the CSS class. From the HTML spec: "The class attribute has several roles in HTML: [...] For general purpose processing by user agents."

cons

  • UGLY. Again. Sure, for simple cases it just seems right, but try to explain that to your web designer next time he asks about a "required date __min_2005-10-20T10:00:00Z __something_ortheother". When the real world challenges your application, your HTML gets a legacy reek.
  • You are mixing up style and behavior metadata in the same attribute.
  • You will have to UrlEncode values that contain special characters (double quotes, spaces, etc).

Namespaced attributes

I like this extensibility thing about XHTML:

<html
   xmlns:v="http://myhost/schemas/validation"
   xmlns="http://www.w3.org/1999/xhtml">
<input type="text" name="field" v:min="0" v:max="100" v:excludemin="false" v:excludemax="true"/>

This is used by W3C ARIA and wicket.

pros

  • At last, something I can look at.

cons

  • You must develop and maintain your own XML schema for everything you extend.
  • You think you are going XML? you are not.

Long story short, your browser is parsing your page as HTML (not XML) unless you include an "application/xhtml+xml" http header. Once you do, everything is fine: any subtle mistake in your XML document will render a blank page in the browser, javascript snippets get picky about their contents, and oh, it will not work at all in any IE browser. That is what "fine" means in my book.

The word out is that browser vendors are focusing in supporting HTML 5 instead of XHTML 1 and 2, but as long as you know what you are doing, schemas work just fine.

HTML attributes

The alternative to XHTML is HTML tag soup, which is more or less similar to XHTML. Anything that is not recognized by the browser is valid, and as long as they belong to valid tags they can be reached from your javascript code. This is exploited by the dojo toolkit and HTML 5:

<input type="text" name="field" min="0" max="100" data-exclude-min="false" data-exclude-max="true"/>

In this example I have included some HTML 5 attributes and preceded the rest with "data-". It would be nice if the HTML group takes some advice from John Resig and listen to the community about what is needed in the spec. Out of the ten validation parameters I use frequently, HTML 5 only supports three - min, max, and pattern.

pros

  • Here goes my grandma again, lecturing me about how much she loves the simplicity in this code.

cons

  • You may not put cool buzzwords in your resume.

Drop by Madrid to enjoy some deeper insight

Do not miss the Open Java Day event organized by JavaHispano and Sun on June 26-27. I'll be talking about practical lessons learned from the web tier: CSS frameworks, charset handling, javascript tests, accessibility, etc. I don't know how much I can fit into one hour, but it will be fun.

UPDATE: John Resig has added further insight on this subject here.