Wikipedia™ is a fantastic resource for college students and the curious internet-goer: now, thanks to Lexalytics, its enormous potential is available for use in text mining. Just as we were first to market with our extraordinary sentiment analysis software back in 2004, Lexalytics is proud to introduce our revolutionary Concept Matrix, the first semantic analysis technology based on the contents of Wikipedia™.
The Concept Matrix utilizes the links between Wikipedia™ articles to develop a matrix of semantic associations between keywords and phrases, forming a comprehensive web of large concepts, the topics and entities that branch off of each concept, and the links between all of these articles.
Think of it this way: each article on Wikipedia™ contains dozens of links to other articles, all related to the concept of politics. Each of those articles, in turn, contains links to other pages, and so on. The closer an article is in this chain to the original topic, the more closely related it is to that topic, and the stronger the association is.
For example: the “Politics” article on Wikipedia™ contains in its second sentence a link to an article on “Governance”; this article in turn contains a link to the “Corporation” entry, which in turn includes links that discuss the legal influence of large corporations. The Concept Matrix recognizes the connections between all of these articles and the topics they discuss; this knowledge then comes in handy when analyzing a large collection of Tweets to determine consumer sentiment towards a large multinational.
We processed the top 640,000 articles on Wikipedia™ for their semantic associations. From our analysis, we found 1,100,000 words and bi-grams (two-word combinations) that have 56,000,000 links between them. Based on these links, we developed a matrix of associations and relations that form the basis for a number of interesting technologies.