| Sign In to gain access to subscriptions and/or personal tools. |
Towards Constructing a Chinese Information Extraction System to Support Innovations in Library ServicesInformation System Department, Library of the Chinese Academy of Sciences (LCAS), Information Technology Section of IFLA, zhangzhx{at}mail.las.ac.cn, Digital Library Research and Development of the Library Society of China, Graduate University of the Chinese Academy of Sciences
Library of Chinese Academy of Sciences, majoring in information extraction, adam.li{at}sap.com
Information System Department of the Library of the Chinese Academy of Sciences, wuzx{at}mail.las.ac.cn
Library of Chinese Academy of Sciences, liny{at}lib.bnu.edu.cn Being aware of the importance of Information Extraction (IE) in supporting innovation in many areas of library services, the authors began to construct a Chinese information extraction system to effectively process huge Chinese information resources. The authors bring forth a Chinese IE solution which makes full use of the GATE (General Architecture for Text Engineering) system from the University of Sheffield, trying to develop a Chinese IE plug-in to process Chinese information resources based on the GATE framework. The article analyses the framework of the GATE system, describes the Chinese IE solution based on the GATE system and focuses on three key difficulties in the process of implementing a Chinese information extraction system. These are: 1. Chinese tokenizing problem; 2. professional gazetteers; 3. Chinese named entity recognition. The authors have successfully implemented this system and carried out an experiment in which the Chinese IE system successfully extracted thousands of pieces of science and technology news. The authors believe this system is a significant trial and lays a good foundation for future research work.
Key Words: Information extraction Chinese language natural language processing General Architecture for Text Engineering GATE innovation
IFLA Journal, Vol. 33, No. 4,
340-350 (2007) |
|||