LIÊN KẾT WEBSITE
XPath-wrapper induction for data extraction
Proceedings - 2010 International Conference on Asian Language Processing, IALP 2010 Số , năm 2010 (Tập , trang 150-153)
DOI: 10.1109/IALP.2010.33
Tài liệu thuộc danh mục: Scopus
Conference Paper
English
Từ khóa: Amount of information; Data extraction; Human being; Structured information; Template-based; User query; Wrapper induction; Natural language processing systems
Tóm tắt tiếng anh
The Web contains an enormous amount of information which is formatted for human beings. This makes it difficult for computer to extract relevant content from various sources. This paper presents an XPath-wrapper induction algorithm which leverages user queries and template-based sites for extracting structured information. Our experiments show average accuracy of 94%. 2010 IEEE.