Japanese / English
Kikori: XML Information Retrieval System Using Relational Databases
Search results of Web search engine such as Google are list of documents in general. Users of the search engine need to search relevant part in the documents by themselves. For long documents such as academic papers, it is a hard work to search relevant part in the documents. Such documents often have structure in the document, that is, a document is consist of some chapters and a chapter is consist of some sections and a section consist of some subsections or paragraphs and so on. Such sub-documents (document fragments) can be considered to be fine granular results of queries for structured documents. In these years, XML is used as a standard for structured documents, and we can consider retrieval of nested document fragments (elements) as XML-IR (XML Information Retrieval). We have developed XML-IR system using relational databases. We mainly handled queries by Keyword Set and developed Kikori-KS. We also discussed about user interfaces in Kikori. As INEX [1] is developing a test collection for XML-IR systems, we are using XML documents provided by INEX in our prototype system.

Figure 1. An overview of Kikori-KS system.
XML documents are stored into relational databases using the relational schema based on XRel [2]. In addition to the tables for XML databases, we prepared the table for storing term weights information. Figure 3 is a example of storage instance for the XML tree in Figure 2. The system can automatically translate keyword sets into SQL statements, and calculate the score of each relevant element.

Figure 2. An example of XML tree.
docID file
0 doc1.xml

docID elemID pathID st ed label
0 0 0 1 236 database
0 1 1 10 44 XML Index
0 2 2 45 68 XML Index
: : : : : :

docID elemID pathID st ed label
0 4 4 75 143 Introduction
0 7 4 144 219 XML Labeling
pathID pathexp
0 #/article
1 #/article#/transaction
2 #/article#/title
: :

term docID elemID tfipf
database 0 0 0.3
database 0 1 0.1
: : : :
xml 0 0 0.3
xml 0 2 0.4
: : : :
Figure 3. A storage example.

Increasing speed of query processing is possible by creating a materialized view which is the result of joining the tables in Figure 3.
We have developed a user-friendly interface for displaying search results. To facilitate user browsing and specify the position of the relevant document fragments (elements) easily, result elements are aggregated by documents, and outline elements, which are sections and subsections in the case of scholarly articles, are output with relevant elements. The anchor texts corresponding to elements with a high score are indicated by using a larger font. Users can browse the content of the document fragment highlighted within the document by clicking the corresponding anchor text.

Figure 4. FetchHighlight interface.

Figure 5. Browsing document fragment.

In addition, we have developed another type of user interface for XML documents that are constructed by marking up documents originally composed of pages, such as scholarly articles or books. Result elements are overlaid on the physical layout of pages in the user interfaces.

Figure 6. Presentation of search results using page layout.
[1] INEX. "INitiative for the Evaluation of XML Retrieval," http://inex.is.informatik.uniduisburg.de/.
[2] M. Yoshikawa, T. Amagasa, T. Shimura, and S. Uemura, "XRel: A path-based approach to storage and retrieval of XML documents using relational database," ACM Trans. on Internet Technology, vol.1, no.1, pp.110-141, Aug. 2001.