project:en:kiritani

Classifying Web Page by Using Knowledge Bases for Entity Retrieval

We propose a novel method to classify Web pages by using knowledge bases for entity search, which is a kind of typical Web search for information related to a person, location or organization. Concretely, we apply the following procedure.

kiritani.jpg

  1. We get a set of Web pages as search results by using a search engine an entity set based on a knoledgease.
  2. Web pages are mapped to entities according to a correspondence degree expressing how strongly a page corresponds to an entity. We compute the degree based on the similarity or the frequency.
  3. We construct a PEC graph based on the pages, entities, classes, and their relations, as obtained from 1), 2) and the knowledge base.
  4. By analyzing the graph, the pages are classified into classes according to a correspondence degree expressing how strongly a page corresponds to a class. We compute the degree based on the distinguishbility of pages and universality of classes.
project/en/kiritani.txt · Last modified: 2011/11/25 05:00 by ylab