《镜子大全》《朝华午拾》分享 http://blog.sciencenet.cn/u/liwei999 曾任红小兵,插队修地球,1991年去国离乡,不知行止。

博文

ALPAC 黑皮书 4/9: 机助翻译

已有 4062 次阅读 2013-10-6 15:54 |个人分类:立委科普|系统分类:科研笔记| ALPAC

在曼海姆和卢森堡的机器辅助翻译


因为它变得越来越明显,全自动高质量的机器翻译是不会很长一段时间来实现,兴趣开始显示在机器辅助翻译。该委员会有两个重要的机器辅助翻译系统的运作:联邦武装部队翻译公司,曼海姆,德国,卢森堡的欧洲煤钢共同体,术语局的知识。在这些中心的做法是保守的,一台机器是用来生产专门词汇表有助于在翻译特定文件。 (虽然翻译系统在美国空军外国技术部,赖特 - 帕特森空军基地,在操作的频率越来越高,被称为“机器辅助翻译”,它实际上是一个系统的人类辅助机器翻译,靠,必须posteditors弥补机器输出的不足。 )


机器辅助翻译AT联邦武装部队的翻译机构,德国曼海姆

联邦武装部队翻译公司进行了一项实验,旨在确定到什么程度,在什么地方机输出可以帮助翻译人员。两位译者都给予相同的英文文本被翻译成德文。无论是翻译是在文本处理技术领域的专家。翻译有传统的词典和其他参考工程技术经验丰富的专家库以及访问。 B被翻译只给出一个基于文本或文本相关的词汇( TRG ) ,列出了所有技术条款,只有在原有的文本序列中,在其发生,再加上相当于德国或等值。为了尽量减少任何差异,译者的能力,第二个文本被翻译翻译乙TRG和翻译工作的传统方式。

重复上述过程有两个不同的翻译和两个不同的技术文本。测试结果表明,与传统的辅助工作,需要一个翻译的50-86 %之间(平均66 %) ,更多的时间比与文本相关的词汇的翻译工作。除了增加速度的TRG不同的翻译,另一个优点是,使用此方法的翻译了三分之一更少的错误。

下面我们引用了一份题为“文本相关的技术词汇表生产数码电脑,提供自动翻译援助的程序, ” F. Krollmann , HJ舒克和U.温克勒(德国原装出现在翻译1965年1月发行的Beiträge zur Sprachkunde的和信息verarbeitung ) :

这两个实验已经示出,的速度(和因此成本)翻译的工作,以及他的产品(并因此在编辑器的输出)的质量可以被大大改善,如果它是可能的减轻译者非生产性的而繁琐的寻找正确的技术术语,经常不能可能还包含在任何传统的字典。这些数字表明,理想的情况下,在科学技术文本的翻译错误配额可以减少约40 % ,可以提高至少10-15 % ,因为经验表明,这个数字更好地理解课文,经常导致改进的语言再现(明确性风格) - 译者生产率可提高50%以上。

该系统的工作原理通过以下方式。译者读通过的文本进行翻译,强调他希望知道德国相当于英语单词。该文本,然后给出一个打孔机打孔操作拳下划线的单词卡,并在同一时间进行英语单词的形态减少(在大多数情况下,这仅仅涉及省略屈折后缀) 。信息卡上的电脑,它可以产生三个或四个文本相关的词汇在约10分钟,然后放入。 TRG系统开始运作,在1965年和1966年初,德律风根TR - 4电脑与特里尔数据链路连接。

目前联邦空军翻译交流用语,与美国国防语言研究所/西海岸分公司,英国海军部,欧洲煤钢共同体,和其他机构有合作协议。

测试运行的分析和一些样本输出是被发现在附录12 。这项技术被开发由联邦国防部西德很慈祥委员会使用的材料附录12 。


在欧洲煤钢共同体,卢森堡的机器辅助翻译

术语局成立于1950年的欧洲煤钢共同体( CECA )提供援助,以翻译局,其中有执行翻译的任务,进入和退出的四种官方语言CECA法语,荷兰语,意大利,德国人。

术语局, JA巴克拉克先生,主管估计,至少25%的译者的时间都花在用语问题,译者的时间高达75 % ,在困难的文档,在这些问题上花费。莉迪亚女士海尔斯伯格布鲁塞尔自由大学和她的小组合作,对这个问题的各种办法进行了审议。不一会儿,一个系统设计译者费时术语的问题找到答案的工作变得更加容易。

在CECA利用该系统是一个自动字典查找包括上下文。该操作是类似于使用曼海姆,但输出的是有所不同的。这是一个类似翻译表示,下划线,他希望帮助的话。整个句子,然后keypunched并送入计算机。电脑通过搜索程序,并打印出的句子或句子,最近匹配(词项)中的句子。翻译,然后打印出他们的背景和在它们发生在源接收所需的项目。

翻译句子是由计算机来完成的,而是由一个人的翻译。然而,由于每个查询产生的数据被添加到数据的基础上,该系统是在使用中,找到具有所需的术语在适当的范围内的句子的概率越大。附录13中所示的样本典型的法国CECA英文输出。

已经建立起来的信息由CECA不仅是有价值的,在回答查询的翻译,但也使CECA在很短的时间内发布专门的词汇表。附录13 ,副本一个提取钢利用国会对从五种语言词汇的连接。

该委员会认为难以评估后期编辑的难度和成本。最初的反应是容易被这样的RT拜尔[物理。今天, 18(1) , 50 (1965) ] :

我必须承认,结果是最不开心的。我发现,我至少花了很多时间在编辑我仿佛从一开始就进行了整个翻译。即使在那个,我怀疑,如果编译读取顺利,我会从头开始。我得出的结论是,今天的机器翻译外国语言的一种形式蹩脚的英语有点类似洋泾浜英语。但它仍然为读者学习这种方言为了了解俄罗斯实际上写的。学习俄语不会更加困难。或许有一天,机器会使它,但我作为一个翻译还不相信,我必须把我的猴子扳手进机械,为了防止我的技术失业。

该委员会有一些后期编辑做实验(见附件14 ) 。后期编辑,只要翻译了,但人们说,他们愿意这样做的少,每字! FTD的数字表明,在内部进行后期编辑的速度比内部翻译。

FTD操作的研究表明,西里尔文字键盘的转录是一个非常小的部分的总成本。因此,自动字符识别,可以降低成本的操作只有一点点。另一方面,成本的很大一部分是一起把最后的翻译,用图和公式,并再现它。

如果我们的成本比较人类内部翻译( 40元1000个俄语单词)内机器辅助翻译FTD ( 1000俄语单词36元)的费用,机器辅助翻译似乎要稍差昂贵。但FTD机器辅助翻译昂贵的合同翻译( 33% 1,000美元) ,远超过昂贵的,比联合出版物研究服务( JPRS ),的翻译( 16元的1000个英语单词) 。

附录15 JPRS最近的一些翻译和FTD的机器辅助翻译的质量专家的比较数据。 JPRS翻译的文本被认为更好的比FTD翻译的。被判定为差在这两种情况下, JPRS优于FTD的文字和数字的再现的质量。我们不知道为什么空军支付翻译FTD比由上级和提词JPRS转换将花费更多。



~~~~~~~~~~~~~~~~~~~~~~~~~~··


Machine-Aided Translation at Mannheim and Luxembourg


As it becomes increasingly evident that fully automatic high-quality machine translation was not going to be realized for a long time, interest began to be shown in machine-aided translation. The Committee has knowledge of two important machine-aided translation systems in operation: the Federal Armed Forces Translation Agency, Mannheim, Germany, and the Terminological Bureau of the European Coal and Steel Community, Luxembourg. At these centers the approach is conservative; a machine is used to produce specialized glossaries helpful in the translation of particular documents. (Although the translation system in operation at the USAF Foreign Technology Division, Wright-Patterson Air Force Base, is being called, with increasing frequency, “machine-aided translation,” it is actually a system of human-aided machine translation, relying, as it must, on posteditors to make up for the deficiencies of the machine output.)


MACHINE-AIDED TRANSLATION AT THE FEDERAL ARMED FORCES TRANSLATION AGENCY, MANNHEIM, GERMANY

The Federal Armed Forces Translation Agency conducted an experiment designed to determine to what extent and in what areas machine output could aid the human translator. Two translators were given identical English texts to be translated into German. Neither translator was a specialist in the technical field treated in the text. Translator A had the conventional dictionaries and other reference works found in technical libraries and access to experienced experts. Translator B was given only a text-based or text-related glossary (TRG) that listed all and only the technical terms in the original text in the sequence in which they occurred plus their German equivalent or equivalents. To minimize any differences in the translators' abilities, a second text was translated in which translator A used the TRG and translator B worked in the conventional way.

The procedure above was repeated with two different translators and two different technical texts. Results of the test indicated that a translator working with conventional aids requires between 50-86 percent (average, 66 percent) more time than a translator working with a text-related glossary. In addition to increased speed, another advantage of the TRG type of translation was that using this method the translators made one third fewer errors.

We quote below from a translation of a paper titled “Production of Text- Related Technical Glossaries by Digital Computer, A Procedure to Provide an Automatic Translation Aid,” by F. Krollmann, H. J. Schuck, and U. Winkler (the German original appeared in the January 1965 issue of Beiträge zur Sprachkunde und Informations-verarbeitung):

These two experiments have shown that the speed (and thus the cost) of the translator's work as well as the quality of his product (and thus the output of the editor) can be considerably improved if it is possible to relieve the translator of the unproductive and tiresome search for the correct technical term that frequently cannot possibly be included yet in any of the conventional dictionaries. These figures would suggest that, ideally, the error quota in translations of technical-scientific texts can be reduced by approximately 40 percent–a figure which experience indicates can be improved by at least another 10-15 percent since better understanding of the text frequently results in improved linguistic rendition (unambiguity of style) – and that translator productivity can be increased by over 50 percent.

The system works in the following way. The translator reads through the text to be translated and underlines the English words for which he desires to know the German equivalent. The text is then given to a keypunch operator who punches the cards for the underlined words and at the same time performs morphological reduction of the English words (in most cases this simply involves omitting the inflectional suffixes). The information on the cards is then put into the computer, which can produce three or four text-related glossaries in about 10 min. The TRG system became operational in 1965 and in early 1966 was connected by a data-link with a Telefunken TR-4 computer in Trier.

At present the Federal Air Force Translation Agency has a cooperative agreement for exchange of terminologies with the U.S. Defense Language Institute/West Coast Branch, the British Admiralty, the European  Coal  and Steel Community, and others.

An analysis of a test run and some sample output is to be found in Appendix 12. This technique was developed by the Federal Ministry of Defense of West Germany which very kindly made available for the Committee use of the material in Appendix 12.


MACHINE-AIDED TRANSLATION AT THE EUROPEAN COAL AND STEEL COMMUNITY, LUXEMBOURG

The Terminological Bureau of the European Coal and Steel Community (CECA) was established in 1950 to provide assistance to the Translation Bureau, which had the task of performing translations into and out of the four official languages of CECA–French, Dutch, Italian, and German.

The Head of the Terminological Bureau, Mr. J. A. Bachrach, estimates that a minimum of 25 percent of the translator's time is spent on terminological questions and that, in difficult documents, up to 75 percent of the translator's time is spent on these problems. In collaboration with Mrs. Lydia Hirschberg of the Free University of Brussels and her group, various approaches to this problem were considered. Soon a system was devised by which the translator's time-consuming job of finding the answers to questions of terminology was made easier.

The system utilized at CECA is one of automatic dictionary look-up with context included. The operation is similar to that used at Mannheim, but the output is somewhat different. It is similar in that the translator indicates, by underlining, the words with which he desires help. The entire sentence is then keypunched and fed into a computer. The computer goes through a search routine and prints out the sentence or sentences that most nearly match (in lexical items) the sentences in question. The translator then receives the desired items printed out with their context and in the order in which they occur in the source.

The translation of the sentence is not done by the computer, but by a human translator. However, since the data produced by each query are added to the data base, the more the system is in use, the greater is the probability of finding sentences that have the desired term in the proper context. A sample of typical CECA French-English output in shown in Appendix 13.

The information that has been built up by CECA not only is of value in answering the queries of translators but also enables CECA to publish specialized glossaries in a very short time. Appendix 13, a copy of one extract from a five-language glossary prepared for the Congress on Steel Utilization is attached.

The Committee finds it difficult to assess the difficulty and cost of postediting. An initial reaction is apt to be like that of R. T. Beyer [Phys. Today 18 (1), 50 (1965)]:

I must confess that the results were most unhappy. I found that I spent at least as much time in editing as if I had carried out the entire translation from the start. Even at that, I doubt if the edited translation reads as smoothly as one which I would have started from scratch. I drew the conclusion that the machine today translates from a foreign language to a form of broken English somewhat comparable to pidgin English. But it then remains for the reader to learn this patois in order to understand what the Russian actually wrote. Learning Russian would not be much more difficult. Someday, perhaps, the machines will make it, but I as a translator do not yet believe that I must throw my monkey wrench into the machinery in order to prevent my technological unemployment.

The Committee had some postediting done as an experiment (see Appendix 14). Postediting took as long as translation, yet people said they were willing to do it for less per word! FTD figures indicate that in-house postediting is done faster than in-house translation.

Studies of the FTD operation indicate that keyboard transcription of the cyrillic text is a very minor part of the total cost. Thus, automatic character recognition could cut the cost of the operation only a little. On the other hand, a large fraction of the cost is in putting the final translation together, with figures and equations, and reproducing it.

If we compare the cost of human in-house translation ($40 per 1,000 Russian words) with the cost of machine-aided translation within FTD ($36 per 1,000 Russian words), machine-aided translation appears to be somewhat less expensive. But FTD machine-aided translation is costlier than contract translation ($33 per 1,000) and far costlier than Joint Publications Research Service (JPRS) translation ($16 per 1,000 English words).

Appendix 15 gives data on a comparison by experts of the quality of some recent JPRS translations and FTD machine-aided translations. The text of the JPRS translations was judged to be better than that of the FTD translations. The quality of the reproduction of text and figures was judged to be poor in both cases, with JPRS superior to FTD. We wonder why the Air Force pays more for translations made by FTD than superior and prompter JPRS translations would cost.


【置顶:立委科学网博客NLP博文一览(定期更新版)】



https://blog.sciencenet.cn/blog-362400-730522.html

上一篇:ALPAC 黑皮书 3/9: 机器翻译
下一篇:ALPAC 黑皮书 5/9: 结论
收藏 IP: 192.168.0.*| 热度|

1 rosejump

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

数据加载中...
扫一扫,分享此博文

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-12-22 20:31

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部