Machine learning::Infobox


Infobox::editor    Article::first    Infobox::title    -last::about    -first::style    Manual::liyang

Machine learning About 44.2% of Wikipedia articles contained an infobox in 2008,{{#invoke:Footnotes|sfn}} and about 33% in 2010.{{#invoke:Footnotes|sfn}} Automated semantic knowledge extraction using machine learning algorithms is used to "extract machine-processable information at a relatively low complexity cost".{{#invoke:Footnotes|sfn}} However, the low coverage makes it more difficult, though this can be partially overcome by complementing article data with that in categories in which the article is included.{{#invoke:Footnotes|sfn}} The French Wikipedia initiated the project Infobox Version 2 in May 2011.{{#invoke:Footnotes|sfn}}<ref>The project is hosted on the French Wikipedia page Infobox/V2.</ref>

Knowledge obtained by machine learning can be used to improve an article, such as by using automated software suggestions to editors for adding infobox data.{{#invoke:Footnotes|sfn}} The iPopulator project created a system to add a value to an article's infobox parameter via an automated parsing of the text of that article.{{#invoke:Footnotes|sfn}}

DBpedia uses structured content extracted from infoboxes{{#invoke:Footnotes|sfn}} by machine learning algorithms to create a resource of linked data in a Semantic Web; it has been described by Tim Berners-Lee as "one of the more famous" components of the linked data project.{{#invoke:Footnotes|sfn}}

Machine extraction creates a triple consisting of a subject, predicate or relation, and object.{{#invoke:Footnotes|sfn}} Each attribute-value pair of the infobox is used to create an RDF statement using an ontology.{{#invoke:Footnotes|sfn}} This is facilated by the narrower gap between Wikipedia and an ontology than exists between unstructured or free text and an ontology. {{#invoke:Footnotes|sfn}}

The semantic relationship between the subject and object is established by the predicate.{{#invoke:Footnotes|sfn}} In the example infobox, the triple ("crostata", type, "tart") indicates that a crostata is a type of tart. The article's topic is used as the subject, the parameter name is used as the predicate, and the parameter's value as the object.{{#invoke:Footnotes|sfn}}{{#invoke:Footnotes|sfn}} Each type of infobox is mapped to an ontology class, and each property (parameter) within an infobox is mapped to an ontology property.{{#invoke:Footnotes|sfn}} These mappings are used when parsing a Wikipedia article to extract data.

Many Wikipedia infoboxes also include microformat markup, making the text rendered on the page readable by software.<ref>Wikipedia:WikiProject Microformats</ref>{{#invoke:Footnotes|sfn}}

Infobox sections
Intro  Wikipedia  Machine learning  Notes  References  Further reading  

Machine learning
PREVIOUS: WikipediaNEXT: Notes