Multilingual XML-based named entity recognition for E-retail domains

Claire Grover, Scott McDonald, Donnla Nic Gearailt, Vangelis Karkaletsis, Dimitra Farmakiotou, Georgios Samaritakis, Georgios Petasis, Maria Teresa Pazienza, Michele Vindigni, Frantz Vichot, Francis Wolinski

Research output: Chapter in Book/Report/Conference proceedingConference contribution

8 Citations (Scopus)

Abstract

We describe the multilingual Named Entity Recognition and Classification (NERC) subpart of an e-retail product comparison system which is currently under development as part of the EU-funded project CROSSMARC. The system must be rapidly extensible, both to new languages and new domains. To achieve this aim we use XML as our common exchange format and the monolingual NERC components use a combination of rule-based and machine-learning techniques. It has been challenging to process web pages which contain heavily structured data where text is intermingled with HTML and other code. Our preliminary evaluation results demonstrate the viability of our approach.
Original languageEnglish
Title of host publicationProceedings of the 3rd International Conference on Language Resources and Evaluation (LREC 2002)
Pages1060-1067
Number of pages8
Publication statusPublished - 2002
Externally publishedYes
Event3rd International Conference on Language Resources and Evaluation - Las Palmas, Canary Islands, Spain
Duration: 29 May 200231 May 2002

Conference

Conference3rd International Conference on Language Resources and Evaluation
Abbreviated titleLREC 2002
Country/TerritorySpain
CityLas Palmas, Canary Islands
Period29/05/0231/05/02

ASJC Scopus subject areas

  • Linguistics and Language
  • Language and Linguistics
  • Education
  • Library and Information Sciences

Fingerprint

Dive into the research topics of 'Multilingual XML-based named entity recognition for E-retail domains'. Together they form a unique fingerprint.

Cite this