Abstract
We describe the multilingual Named Entity Recognition and Classification (NERC) subpart of an e-retail product comparison system which is currently under development as part of the EU-funded project CROSSMARC. The system must be rapidly extensible, both to new languages and new domains. To achieve this aim we use XML as our common exchange format and the monolingual NERC components use a combination of rule-based and machine-learning techniques. It has been challenging to process web pages which contain heavily structured data where text is intermingled with HTML and other code. Our preliminary evaluation results demonstrate the viability of our approach.
Original language | English |
---|---|
Title of host publication | Proceedings of the 3rd International Conference on Language Resources and Evaluation (LREC 2002) |
Pages | 1060-1067 |
Number of pages | 8 |
Publication status | Published - 2002 |
Externally published | Yes |
Event | 3rd International Conference on Language Resources and Evaluation - Las Palmas, Canary Islands, Spain Duration: 29 May 2002 → 31 May 2002 |
Conference
Conference | 3rd International Conference on Language Resources and Evaluation |
---|---|
Abbreviated title | LREC 2002 |
Country/Territory | Spain |
City | Las Palmas, Canary Islands |
Period | 29/05/02 → 31/05/02 |
ASJC Scopus subject areas
- Linguistics and Language
- Language and Linguistics
- Education
- Library and Information Sciences