Collocation discovery for optimal bilingual lexicon development

Scott McDonald, Davide Turcato, Paul McFetridge, Fred Popowich, Janine Toole

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The accurate translation of collocations, or multi-word units, is essential for high quality machine translation. However, many collocations do not translate compositionally, thus requiring individual entries in the bilingual lexicon. We present a technique for collocation extraction from large corpora that takes into account the dispersion of the collocations throughout the corpus. Collocations are ranked to more accurately reflect how likely they are to occur in a wide variety of texts; collocations which are specific to a particular text are less useful for lexicon development. Once the collocations are extracted, appropriate bilingual lexical entries can be developed by lexicographers.
Original languageEnglish
Title of host publicationProceedings of 13th Biennial Conference of the Canadian Society for Computational Studies of Intelligence (AI 2000)
EditorsHoward J. Hamilton
PublisherSpringer Verlag
Pages126-137
Number of pages12
ISBN (Print)9783540675570
DOIs
Publication statusPublished - 19 May 2000
Externally publishedYes
Event13th Biennial Conference of the Canadian Society for Computational Studies of Intelligence: Advances in Artificial Intelligence - Montreal, Canada
Duration: 14 May 200017 May 2000

Publication series

NameLecture Notes in Computer Science
Volume1822
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference13th Biennial Conference of the Canadian Society for Computational Studies of Intelligence: Advances in Artificial Intelligence
Abbreviated titleAI 2000
Country/TerritoryCanada
CityMontreal
Period14/05/0017/05/00

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Collocation discovery for optimal bilingual lexicon development'. Together they form a unique fingerprint.

Cite this