Kikori: XML Information Retrieval System Using Relational Databases
|
|
Search results of Web search engine such as Google are list of documents in general. Users of the search engine need to search relevant part in the documents by themselves. For long documents such as academic papers, it is a hard work to search relevant part in the documents. Such documents often have structure in the document, that is, a document is consist of some chapters and a chapter is consist of some sections and a section consist of some subsections or paragraphs and so on. Such sub-documents (document fragments) can be considered to be fine granular results of queries for structured documents.
In these years, XML is used as a standard for structured documents, and we can consider retrieval of nested document fragments (elements) as XML-IR (XML Information Retrieval).
We have developed XML-IR system using relational databases. We mainly handled queries by Keyword Set and developed Kikori-KS. We also discussed about user interfaces in Kikori. As INEX [ 1] is developing a test collection for XML-IR systems, we are using XML documents provided by INEX in our prototype system.
Figure 1.
An overview of Kikori-KS system.
XML documents are stored into relational databases using the relational schema based on XRel [ 2]. In addition to the tables for XML databases, we prepared the table for storing term weights information.
Figure 3 is a example of storage instance for the XML tree in Figure 2.
The system can automatically translate keyword sets into SQL statements, and calculate the score of each relevant element.
Figure 2.
An example of XML tree.
|
Document
Element
docID
|
elemID
|
pathID
|
st
|
ed
|
label
|
0
|
0
|
0
|
1
|
236
|
database
|
0
|
1
|
1
|
10
|
44
|
XML Index
|
0
|
2
|
2
|
45
|
68
|
XML Index
|
:
|
:
|
:
|
:
|
:
|
:
|
Outline
docID
|
elemID
|
pathID
|
st
|
ed
|
label
|
0
|
4
|
4
|
75
|
143
|
Introduction
|
0
|
7
|
4
|
144
|
219
|
XML Labeling
|
|
Path
pathID
|
pathexp
|
0
|
#/article
|
1
|
#/article#/transaction
|
2
|
#/article#/title
|
:
|
:
|
Term
term
|
docID
|
elemID
|
tfipf
|
database
|
0
|
0
|
0.3
|
database
|
0
|
1
|
0.1
|
:
|
:
|
:
|
:
|
xml
|
0
|
0
|
0.3
|
xml
|
0
|
2
|
0.4
|
:
|
:
|
:
|
:
|
|
Figure 3.
A storage example.
|
Increasing speed of query processing is possible by creating a materialized view which is the result of joining the tables in Figure 3.
We have developed a user-friendly interface for displaying search results. To facilitate user browsing and specify the position of the relevant document fragments (elements) easily, result elements are aggregated by documents, and outline elements, which are sections and subsections in the case of scholarly articles, are output with relevant elements.
The anchor texts corresponding to elements with a high score are indicated by using a larger font.
Users can browse the content of the document fragment highlighted within the document by clicking the corresponding anchor text.
Figure 4.
FetchHighlight interface.
|
Figure 5.
Browsing document fragment.
|
In addition, we have developed another type of user interface for XML documents that are constructed by marking up documents originally composed of pages, such as scholarly articles or books. Result elements are overlaid on the physical layout of pages in the user interfaces.
Figure 6.
Presentation of search results using page layout.
-
Toshiyuki Shimizu and Masatoshi Yoshikawa, ``XML Information Retrieval Considering Physical Page Layout of Logical Elements,'' 10th International Workshop on the Web and Databases (WebDB 2007), Beijing, China, June 15, 2007. (demo)
[paper]
-
Toshiyuki Shimizu, Norimasa Terada, and Masatoshi Yoshikawa, ``Kikori-KS: An Effective and Efficient Keyword Search System for Digital Libraries in XML,'' 9th International Conference on Asian Digital Libraries (ICADL 2006), Lecture Notes in Computer Science (LNCS), Springer-Verlag, Vol. 4312, pp. 390-399, Kyoto, Japan, November 27-30, 2006.
(46/170 = 27%)
[paper]
[slides]
-
Kei Fujimoto, Toshiyuki Shimizu, Norimasa Terada, Kenji Hatano, Yu Suzuki, Toshiyuki Amagasa, Hiroko Kinutani, and Masatoshi Yoshikawa, ``Implementation of a High-Speed and High-Precision XML Information Retrieval System on Relational Databases,'' 4th International Workshop of the Initiative for the Evaluation of XML Retrieval (INEX 2005), Lecture Notes in Computer Science (LNCS), Springer-Verlag, Vol. 3977, pp. 254-267, Dagstuhl Castle, Germany, November 28-30, 2005.
[1]
|
INEX. "INitiative for the Evaluation of XML Retrieval," http://inex.is.informatik.uniduisburg.de/.
|
[2]
|
M. Yoshikawa, T. Amagasa, T. Shimura, and S. Uemura, "XRel: A path-based approach to storage and retrieval of XML documents using relational database," ACM Trans. on Internet Technology, vol.1, no.1, pp.110-141, Aug. 2001.
|
|
Last updated on
Mar 9, 2008
|
|