Working on I’m coming to realise that there are at least two usefully distinct levels of semantic data on the web:

There’s the basic “object” level at which microformats act, defining simple, basic-level objects like posts and people with properties like name, phone and content.

Then there’s the level at which HTML works, marking up blocks of text and creating a tree of elements, each of which gives context to the text it contains, for example blockquote elements for containing content from another source, code elements for “computer code” (might be some space to make that more useful — who’s up for adding the type attribute to code?) and so on.

So what? So these are the two sufficiently standardised levels at which content on the web can be made portable, and mutually understood by many parties. Any additional undefined semantics introduced by author-defined classnames and the meaning communicated by their default styling is unportable, and will be lost when that content is viewed elsewhere (for example shown in a reader or as a cross-site comment.

So how can you tell if your content is sufficiently portable? For the object-level (microformats) a validator like indiewebify.me can be used. Strangely, there aren’t as many tools for the markup level, but one surefire way to check is to disabled CSS in your browser. Is your content still understandable using only the default styles? If so it’s probably pretty portable.

updated: