Wikipedia Thesaurus

Wikipedia, a very large scale Web-based dictionary, is an invaluable Web corpus for knowledge extraction. The impressive characteristics are not limited to the scale, but also include the live updates, a dense link structure, brief link texts and URL identification for concepts. After a number of early experiments, our conviction that Wikipedia is a notable Web corpus for knowledge extraction has been strongly confirmed. We first extracted a Web thesaurus from Wikipedia. A thesaurus is a data structure that defines semantic relatedness among words. Our efforts are available on the following link.

Wikipedia Thesaurus Search

By analyzing 1.7 million concepts on Wikipedia, we constructed a very large scale association thesaurus which contains more than 78 million associations. The accuracy is better than other methods based on NLP because we avoided the NLP problems by link structure mining for Web-based dictionaries. Have fun.


Tool Box