实践答疑|如何使用 Memsource 机器翻译词汇表?

您所在的位置:网站首页 memsource翻译软件 实践答疑|如何使用 Memsource 机器翻译词汇表?

实践答疑|如何使用 Memsource 机器翻译词汇表?

2023-11-04 14:31| 来源: 网络整理| 查看: 265

以下文章来源于烟台译博云天公司 ,作者Memsource

Memsource 是一款在线计算机辅助翻译软件,操作简便,功能齐全,编辑器、记忆库、术语库、QA等基本功能一应俱全。同时,Memsource 还提供了多个 MT 引擎集成,并能对各种引擎进行管理。此外,在 Memsource 机器翻译词汇表的辅助下,还可以根据译者需求人工干预机器翻译的输出结果,可大大提升译后编辑效率。

Machine translation glossaries: why they matter and how to use them?机器翻译词汇表:为何重要以及如何使用?

Machine translation glossaries are one of the simplest ways to customize MT. Learn what they are, why they matter, and how to leverage them to improve MT output in the long run.

机器翻译词汇表是定制机器翻译的最简单方法之一。让我们一起来看看什么是机器翻译词汇表,它们为什么重要,以及如何利用它们来长期改善机器翻译的输出结果。

With machine translation (MT), precision and recall are critical to success. Every translation counts. The more curated and accurate the information you provide for your MT engines, the better they’ll perform.

对机器翻译(MT)来说,精确率和召回率至关重要。每一次翻译都很重要。为MT引擎提供信息时,提供的信息越准确,其表现就会越好。

译者注:在机器翻译任务中,BLEU 和 ROUGE 是两个常用的评价指标,BLEU 根据精确率(Precision)衡量翻译的质量,而 ROUGE 根据召回率(Recall)衡量翻译的质量。

What are machine translation glossaries?

什么是机器翻译词汇表?

Glossaries, in the context of machine translation, are a collection of words and phrases with a preferred machine translation. They’re sometimes referred to as: 

Custom terminology

Custom vocabulary

Custom dictionaries, etc.

在机器翻译的背景下,词汇表是一个被机器翻译首选使用的单词和短语的集合,有时被称为

自定义术语

自定义词汇

自定义词典等

MT glossaries are similar to term bases, but instead of being used by linguists, they are designed to be used by machine translation software.

MT词汇表类似于术语库,但它们的使用者不是语言学家,而是为机器翻译软件使用而设计的。

When attached to MT engines, glossaries help improve the quality of the MT output by ensuring that the MT engines correctly Apply pre-determined terminology.

MT 引擎在启用词汇表时,通过确保正确应用预先确定的术语,可以提高 MT 输出的质量。

Before a source text is translated by an MT engine, it will compare the attached glossary file to the source text to identify terms that have a preferred translation and Apply those.MT引擎在翻译源文本之前,会比较词汇表文件和源文本,确定并应用首选翻译的术语。

It’s important to note that an MT glossary doesn’t re-train an engine—it simply overrides any Appropriate term with a predetermined translation.

需要注意的是,MT 词汇表并没有重新训练引擎——它只是用预定的翻译人工控制了原文中所有适当术语的翻译方式。

Why are MT glossaries important?

为什么 MT 词汇表很重要?

MT engines have dramatically improved in output quality over the past few years. Nevertheless, they still lack the contextual understanding of a human translator.

在过去的几年里,MT 引擎输出质量有了很大提高。然而,MT 引擎仍然无法做到像人类译员一样理解上下文。

This means they can make some very basic errors, especially when handling an ambiguous word or a term that has a specific meaning in a given context.这就意味着 MT 引擎可能会犯一些非常基本的错误,尤其是在处理模棱两可的单词或在特定语境中有特定含义的术语时。Since glossaries are adapted to a domain’s or company’s specific terminology, they help machine translation output be far more accurate than if the engine just drew from general-purpose data sets.由于词汇表是根据某个领域或公司的特定术语进行调整的,所以机器翻译引擎在词汇表辅助下输出的结果远比从通用数据集中提取的结果准确性更高。

How do MT glossaries work?

MT 词汇表是如何工作的?

The steps that an MT engine usually follows are:

Receive a source text

Translate the source text

Display the output translation

MT 引擎通常遵循的步骤是:

接收源文本

翻译源文本

显示输出的翻译结果

With an MT glossary included, MT engines add an intermediate step to the process:

Receive source text

Translate the source text

Search and replace the translation with your preferred terminology

Present the output translation

由于包含了 MT 词汇表,MT 引擎在这个过程中增加了一个中间步骤。

接收源文本

翻译源文本

搜索并替换首选术语

显示输出的翻译结果

To put it another way, with the help of glossaries, the MT engine searches for matches and automatically Applies them while translating.

换句话说,在词汇表的帮助下,MT 引擎会搜索匹配的词汇,并在翻译时自动应用。

For example, suppose you have a brand for a Bluetooth speaker called “Connected,” and you want to translate the following sentence into Spanish: “Your Connected device was not detected.”

例如,假设你有一个名为 “Connected ”的蓝牙音箱品牌,你想把下面这个句子翻译为西班牙语:“没有检测到你的 Connected 设备”。

Without an MT glossary, your MT engine would produce something like the following result: “No se ha detectado tu dispositivo conectado” (literal back-translation into English: “Your connected device was not detected”). As you can see, the brand name “Connected” has been translated as “conectado,” which would be incorrect in this case.如果没有 MT 词汇表,MT 引擎会输出类似下面的结果。“No se ha detectado tu dispositivo conectado”(直译为英语:“你的连接设备没有被检测到”)。可以看到,品牌名称 “Connected ”被译为 “conectado”,这样的译法是错误的。If you add the brand name “Connected” to your MT glossary, you can enforce the non-translatability of the term. In that case, the MT engine will produce this result: “No se ha detectado tu dispositivo Connected.” This is spot on—using an MT glossary significantly improves accuracy by automatically providing the desired translation.如果将品牌名称 “Connected ”添加到MT 词汇表中,就可以强制不翻译该术语。在这种情况下,MT 引擎会输出下列结果:“No se ha detectado tu dispositivo Connected”,输出结果完全正确,所以使用 MT 词汇表可以通过自动提供所需的翻译来有效提升译文准确性。

Best practices for using MT glossaries

使用 MT 词汇表的最佳实践

To ensure MT glossaries remain reliable and always up to date, here are a few best practices to follow:

为了确保 MT 词汇表内容可靠并始终保持更新,可遵循下列做法:

Keep it simple: Small glossaries, focusing only on the most essential terms, tend to be more effective—massive glossaries could even harm your translation output.

维持极简:聚焦常用词表,避免词条过多。

Limit customizations to words that you only want to be translated in one way: The translation suggested by the MT engine should match exactly what you want.

自定义设置应限于只以一种方式翻译的单词:MT 引擎建议的翻译结果应当与期望的翻译结果完全匹配。

Ensure glossaries are free of errors: Keep your terms free of spelling mistakes, formatting errors, or incorrect translations.

确保词汇表正确无误:确保术语没有拼写错误、格式错误或翻译错误。

Avoid having duplicate terms: MT engines can struggle to Apply the correct term if multiple instances are found.

避免出现重复的术语:如果词汇表中有多个重复术语,MT 引擎可能难以正确应用术语。

Post-edit essential translations: While glossaries can enhance translation quality, don’t trust them blindly—high-quality human checks on your MT output are always the best guarantee of accuracy. This process is called “post-editing.”

重要的翻译应进行译后编辑:虽然词汇表可以提高翻译质量,但也不可盲信词汇表。对MT 输出结果进行高质量的人工检查始终是确保准确性的最佳途径。这个过程被称为 “译后编辑”。

Be mindful of your language pair: In morphologically complex languages, like Finnish, Arabic, or Turkish, words may change shape depending on the context—so customizations for these languages may not always produce the best results.

注意语言对:在芬兰语、阿拉伯语和土耳其语等形态复杂的语言中,单词可能会根据上下文改变形态。所以,对这些语言进行自定义设置并不一定能够产生最佳效果。

Review documentation: Although the basic glossary functionality is similar across MT engines, the specifics might differ; it may be helpful to read the available documentation to find out how to best work with a given engine.

查看文档:尽管各类 MT 引擎的基本词汇表功能相似,但具体细节上可能有所不同;阅读现有文档有助于了解如何让词汇表与特定机器翻译引擎更好地配合。

Not all kinds of terms are Appropriate for glossaries: For the best results, focus on compound nouns; examples often include product names, like “Postmates” or other specific terms like “WeWork.”

并非各种术语都适用于词汇表。为获得最佳效果,复合名词需要关注;如 “Postmates ” 等产品名称或“WeWork”等特定术语。

What terms are suitable for MT glossaries?

哪些术语适用于 MT 词汇表?

To maximize the impact and accuracy of MT glossaries, it’s important to use them for specific types of terms:

为了最大化 MT 术语表的影响力和准确性,将其用于特定的术语很重要。 

Product names like “Ford Mondeo,” “Samsung Galaxy Note 5,” etc.

产品名称:“福特蒙迪欧”、“三星 Galaxy Note 5 ”等。

Company names like “Apple,” “Microsoft,” etc.

公司名称:“苹果”、“微软”等。

Ambiguous words, e.g., homonyms (multiple-meaning words) like “crane” (a machine vs. an animal) or “lead” (the metal vs. a potential client)

棱模两可的词:例如, “crane”(机器与动物)或 “lead”(金属与潜在客户)等同形异义词(多义词)。

Abbreviations: A shortened form of a word or phrase that’s frequently used in the industry or domain of interest, e.g., TMS for “translation management system”

缩略语:在相关行业或领域经常使用的单词、短语的简称,例如,TMS 代表 “翻译管理系统”。

Borrowed words: Foreign words that the MT engine will likely keep in the original language, like the French “côte de boeuf” dish, but which you want to translate nevertheless—in this case, “rib eye”.

外来词:MT 引擎可能会保留原语中的外来词,如法国菜肴 “côte de boeuf ”,但它仍然需要翻译,本例中应译为 “里脊牛排”。

What terms are less suitable for MT glossaries?

哪些术语不太适合用 MT 词汇表?

At the same time, some morphological categories are less suitable to be documented and used in a machine translation glossary:

Verbs: MT glossaries can’t conjugate them correctly in grammatical person, number, gender, tense, aspect, mood, voice, degree of formality, clusivity, transitivity, or valency.

Inflected languages with many cases and grammatical genders: MT glossaries can’t currently change the form or ending of some words when the way in which they’re used in sentences changes.

同时,有些形态类的词不太适合在机器翻译词汇表中记录或使用。

动词。MT 词汇表不能正确连接动词的语法人称、数、性、时态、体、情态、语态、正式程度、包含性、及物性或配价。

有许多格和性的曲折变化语言。当某些词在句中的使用方式发生变化时,MT 词汇表无法实时改变这些词的形式或结尾。

Managing MT glossaries for all engines directly within a TMS

在 TMS 中直接管理所有机器翻译引擎的 MT 词汇表

Translation management systems (TMS) allow localization managers not only to centralize and automate the localization workflow but also make full use of well-established translation technology like translation memories and glossaries.

翻译管理系统(TMS)让本地化经理不仅能够使本地化工作流程集中化和自动化,还能充分利用翻译记忆库和词汇表等成熟的翻译技术。

Modern TMS solutions, like Memsource, enable the use and management of glossaries without the need to upload and manage them with each individual MT provider.

Memsource 等现代的 TMS 解决方案无需向每个 MT 供应商上传和管理词汇表,就能够对其进行使用和管理。

In Memsource, you can directly upload, edit, and use MT glossaries for all supported engines, which can significantly reduce the amount of deployment and management time.

在 Memsource 中可以直接上传、编辑和使用所有支持机器翻译引擎的 MT 词汇表,这可以大大减少部署和管理词汇表的时间。

How does glossary support work with each MT engine in Memsource?

词汇表如何在 Memsource 的每个 MT 引擎中发挥作用?

MT Glossaries are available as a part of Memsource Translate, the platform’s MT management hub. Besides MT glossaries, Memsource Translate subscribers can take advantage of a number of fully managed machine translation and advanced AI-powered features like MT Quality Estimation and MT Autoselect.

Memsource Translate 是 Memsource 的 MT 管理中心,MT 词汇表是其一部分。除了 MT 词汇表,Memsource Translate 的用户还可以利用一些完全管理的机器翻译和先进的 AI 功能,如 MT 质量评估和 MT 自动选择。

Memsource MT术语表

Through Memsource Translate, users can also add their own MT glossaries, which they can Apply to fully managed MT engines:

Google Translate

Amazon Translate

DeepL

Microsoft Translator

Rozetta Translate

Tencent TranSmart

用户还可以通过 Memsource Translate 添加自己的 MT 词汇表,可以将其应用于完全管理的 MT 引擎:

Google Translate

Amazon Translate

DeepL

Microsoft Translator

Rozetta Translate

Tencent TranSmart

Memsource 完全管理的 MT 引擎As soon as you create a custom glossary, you need to attach it to an existing MT profile. You can create multiple MT glossaries and use them for different translation projects.创建自定义的词汇表后,需要将其配置到现有的 MT 文件中。可以创建多个 MT 词汇表,并可将其应用于不同的翻译项目。

Looking to the future

展望未来MT glossaries are a simple and effective way to increase machine translation output quality. This is especially true for:为提高机器翻译输出质量,MT 词汇表是一个简单而有效的方法。这一点对于以下情况非常适用:

Domains with low-frequency terms of translation memories that aren’t very large or well-curated

Small-to mid-sized companies without big enough datasets to use custom MT

Bigger companies that have compiled substantial amounts of terminology data over several years or decades—the data isn’t consistent or language or style best practices have evolved or changed.

术语使用频率较低,其翻译记忆库规模不大且未经过精心整理的领域

没有足够大的数据集来使用自定义 MT 的中小型公司

在数年或数十年间编制了大量的术语数据,但前后数据不一致,或其语言、风格已经发生了变化的大型公司

Nevertheless, MT glossaries come with limitations as well. At some point, an MT glossary can get so large that it can hinder localization managers who manage it—regular updates may become a headache and have a higher risk of accidentally introducing errors.

然而,MT 词汇表也有局限。有时过大的 MT 词汇表可能会阻碍本地化经理对其进行管理,定期更新可能会成为一个问题,而且意外引入错误的风险会更高。

Equally important, most MT glossaries available on the market still have a search-and-replace functionality. With the continuous improvement in MT technology, engines are expected to get even better and let everyone use glossary terms with morphologically correct inflections.

同样重要的是,市场上大多数 MT 词汇表仍然具有搜索和替换的功能。随着 MT 技术的不断改进,机器翻译引擎会变得越来越好,让每个人都能使用形态正确的曲折变化语词汇表。

To make the most of their machine translation efforts, localization managers should always prioritize their needs and available resources before deciding if custom machine translation glossaries are right for their use case.

为了充分利用机器翻译,本地化经理在确定定制机器翻译词汇表是否适合他们的使用情况之前,应始终优先考虑需求和可用资源。

What are MT Glossaries?什么是机器翻译词汇表?视频见原推送

Links:1.https://www.memsource.com/machine-translation/2.https://www.memsource.com/blog/post-editing-machine-translation-best-practices/3.https://www.memsource.com/translation-management-system/4.https://www.memsource.com/features/machine-translation/5.https://help.memsource.com/hc/en-us/articles/4409263455762-MT-Glossaries

转载来源:烟台译博云天公司公众号

转载编辑:丁羽翔

译文仅供参考,不当之处欢迎大家在后台留言提出!

本文来源于微信公众号“翻译技术教育与研究”、微信公众号“语言服务行业”,致力于语言服务行业资讯、洞察、洞见~ 关注我们,了解更多精彩内容~



【本文地址】


今日新闻


推荐新闻


CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3