The "semantic web" as a theory has been around for ages and I remember working with people, a decade ago, who were investigating how to build a semantic web.
The semantic web, a term coined by Sir Tim Berners-lee, is a vision that would allow automated agents and software to access the Web intelligently, via machine-readable metadata embedded within content.
There are a number of standards, tools, methodologies and technologies around that have been created to aid in the development of a semantic web, yet it is still unrealised and alludes the world.
There are a number of reasons for this including the physical size of the web, the vastness of knowledge and how to categorise it all into suitable classes, and the completeness, consistency and standardisation of information, to name just a few issues to deal with. I imagine some even question whether it is truly possible due to the sheer scale and requirements involved.
Probably the biggest impact of working towards a semnatic web is Extensible Markup Language (XML). XML is a set of rules for encoding documents in machine-readable form. However it's always practical to create and/or display all information and data of a site in XML form. Luckily there are various other ways to add meta data to existing HTML docs.
3 popular (and when I say popular I mean supported by Google - http://www.google.com/support/webmasters/bin/topic.py?hl=en&topic=21997) methods for adding metadata to HTML content are:
Adding these to HTML creates "rich snippets" that are understandable by machines and agents such as search engine bots and spiders.
RDFa (Resource Description Framework in attributes)
RDFa extends XHTML with attribute level extensions that allow rich metadata to be embedded within Web documents. However RDFa can add metadat to any XML-based language.
http://www.w3.org/TR/xhtml-rdfa-primer/
http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=146898
Microformats
Microformats are already widely used by applications such as linkedin.com and facebook.com for user profile information. Microformats generally use the "class" attribute of elements to describe the encapsulated data, but do also sometimes use the id, title, rel or rev attribute as well.
Microformats continue to be developed with standards for products, reviews, etc. being relased.
http://microformats.org/
http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=146897
Microdata
Microdata is a feature of HTML5. Much has been made about the canvas tag, videos and animation and interactive creative possibilites with HTML5; however one area that has not received much attention is Microdata. HTML5 Microdata achieves effectively the same result as RDFa and Microformats, by semantically tagging data.
http://dev.w3.org/html5/md/
http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=176035
And finally....
Relatively speaking RDFa seems relatively complex to use and implement. Microformats are popular, yet limited in scope currently, and also I have a bit of an issue with using classes to describe data. Microdata seems like it will be the future for tagging content, but has limited support and is still in the early stages. Having said that Google uses Microdata, and where Google goes, others follow.
The semantic web, a term coined by Sir Tim Berners-lee, is a vision that would allow automated agents and software to access the Web intelligently, via machine-readable metadata embedded within content.
There are a number of standards, tools, methodologies and technologies around that have been created to aid in the development of a semantic web, yet it is still unrealised and alludes the world.
There are a number of reasons for this including the physical size of the web, the vastness of knowledge and how to categorise it all into suitable classes, and the completeness, consistency and standardisation of information, to name just a few issues to deal with. I imagine some even question whether it is truly possible due to the sheer scale and requirements involved.
Probably the biggest impact of working towards a semnatic web is Extensible Markup Language (XML). XML is a set of rules for encoding documents in machine-readable form. However it's always practical to create and/or display all information and data of a site in XML form. Luckily there are various other ways to add meta data to existing HTML docs.
3 popular (and when I say popular I mean supported by Google - http://www.google.com/support/webmasters/bin/topic.py?hl=en&topic=21997) methods for adding metadata to HTML content are:
- RDFa
- Microformats
- Microdata
Adding these to HTML creates "rich snippets" that are understandable by machines and agents such as search engine bots and spiders.
RDFa (Resource Description Framework in attributes)
RDFa extends XHTML with attribute level extensions that allow rich metadata to be embedded within Web documents. However RDFa can add metadat to any XML-based language.
http://www.w3.org/TR/xhtml-rdfa-primer/
http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=146898
Microformats
Microformats are already widely used by applications such as linkedin.com and facebook.com for user profile information. Microformats generally use the "class" attribute of elements to describe the encapsulated data, but do also sometimes use the id, title, rel or rev attribute as well.
Microformats continue to be developed with standards for products, reviews, etc. being relased.
http://microformats.org/
http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=146897
Microdata
Microdata is a feature of HTML5. Much has been made about the canvas tag, videos and animation and interactive creative possibilites with HTML5; however one area that has not received much attention is Microdata. HTML5 Microdata achieves effectively the same result as RDFa and Microformats, by semantically tagging data.
http://dev.w3.org/html5/md/
http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=176035
And finally....
Relatively speaking RDFa seems relatively complex to use and implement. Microformats are popular, yet limited in scope currently, and also I have a bit of an issue with using classes to describe data. Microdata seems like it will be the future for tagging content, but has limited support and is still in the early stages. Having said that Google uses Microdata, and where Google goes, others follow.
As a fellow web developer, I completely agree on the your assessment on the future of metadata on the web.
ReplyDeleteThat being said, according to Google, all three of the types of semantic markup you've been discussing are actually supported by google to some extent according to what I've read earlier this morning.
http://www.google.com/support/webmasters/bin/topic.py?topic=21997
Then again, HTML5 has various other benefits (iPhone video, simpler and more semantic markup, etc) in addition to having one of the most straightforward microdata markup schemes for SEO.
It's my opinion that this is the direction the web is going to be taking.
Hi James,
ReplyDeleteThanks for the comment. You're right with Google currently supporting all three types of semantic markup to varying extents, but it still all feels a little messy to me this attempt at describing data, with loads of different companies taking loads of different approaches and no de facto standard anywhere.
Like you rightly point out though, HTML5, along with all its other features, also has some new elements like 'nav', 'details' and 'menu' to name a few, which could help pave the way for making data more structured and processed easier. Just a shame we have to accommodate legacy browsers as web developers, and wait for all browsers to become compliant.