A Cognitive Model of Chinese Word Segmentation for Machine Translation

Author/Editor: Wu, Zhijie

Year of publication: 2011

Keywords: Chinese word segmentation, machine translation, pragmatically-oriented language, contextual information, cognitive model

Place of Publication & Publisher: Meta (Montreal: Les Presses de l’Université de Montréal), Vol.53, No.3, 631-644.

Publisher URL: http://www.erudit.org/revue/meta/2011/v56/n3/index.html

ISBN/ISSN: 0026-0452

The Chinese language, unlike English, is written without marked word boundaries, and Chinese word segmentation is often referred to as the bottleneck for Chinese-English machine translation. The current word-segmentation systems in machine translation are either linguistically-oriented or statistically-oriented. Chinese, however, is a pragmatically oriented language, which explains why the existing Chinese word segmentation systems in machine translation are not successful in dealing with the language. Based on a language investigation consisting of two surveys and eight interviews, and its findings concerning how Chinese people segment a Chinese sentence into words in their reading, we have developed a new word-segmentation model, aiming to address the word-segmentation problem in machine translation from a cognitive perspective. 

