機械翻訳 – arXiv最新論文の紹介

The Translation Barrier Hypothesis: Multilingual Generation with Large Language Models Suffers from Implicit Translation Failure

The Translation Barrier Hypothesis: Multilingual Generation with Large Language Models Suffers from Implicit Translation Failure [25.0]
生成のための暗黙的なタスク解決–>翻訳パイプラインの存在を実証する。 108言語対にわたる単語翻訳タスクに対して,この仮説を検証した。全体的な失敗のかなりの部分は、翻訳失敗に起因していることが分かりました。
論文参考訳（メタデータ） (Sat, 28 Jun 2025 02:09:21 GMT)
「We find that a significant portion of overall failures indeed stems from translation failure, or the model’s inability to translate correctly solved intermediate concepts into the target language. This is especially true for low-resource target languages.」という指摘
動作自体はBeyond English-Centric LLMs: What Language Do Multilingual Language Models Think in? – arXiv最新論文の紹介からもそうなんだろうと思いつつ、中間言語は学習の中心になった言語に影響されているんだろうなと思うとそれでよいのかという気がしなくはない。

Exploring Translation Mechanism of Large Language Models

Exploring Translation Mechanism of Large Language Models [23.7]
大規模言語モデル(LLM)は多言語翻訳タスクにおいて著しく成功している。本研究では,計算成分の観点から,LLMの翻訳機構について検討する。
論文参考訳（メタデータ） (Mon, 17 Feb 2025 13:50:29 GMT)
LLMを用いた翻訳の解析。「translation is predominantly facilitated by a sparse subset of specialized attention heads (less than 5%), which extract source language, indicator, and positional features. MLPs subsequently integrate and process these features by transiting towards English-centric latent representations.」とのこと。

Multilingual Machine Translation with Open Large Language Models at Practical Scale: An Empirical Study

Multilingual Machine Translation with Open Large Language Models at Practical Scale: An Empirical Study [13.4]
GemmaX2-28は、28言語で最上位の多言語翻訳性能を達成する9Bモデルである。 GemmaX2-28 は TowerInstruct や XALMA などの最先端 (SOTA) モデルより一貫して優れている。
論文参考訳（メタデータ） (Fri, 07 Feb 2025 06:59:27 GMT)
「Parallel-First Monolingual-Second (PFMS) data mixing strategy」を用い「To the best of our knowledge, GemmaX2-28-9B is the open model with the highest translation quality.」を主張する機械翻訳モデルの提案。データのレシピによって翻訳性能がかなり変わるのがとても参考になる。
リポジトリはGemmaX2 – a ModelSpace Collection

BOUQuET: dataset, Benchmark and Open initiative for Universal Quality Evaluation in Translation

BOUQuET: dataset, Benchmark and Open initiative for Universal Quality Evaluation in Translation [28.5]
このデータセットは、まず英語以外の言語で手作りされている。それぞれのソース言語は、世界の人口の半分が一般的に使っている23の言語に代表される。
論文参考訳（メタデータ） (Thu, 06 Feb 2025 18:56:37 GMT)
翻訳用ベンチマーク、「Non-English-centric focus. Source-BOUQuET is handcrafted by proficient speakers of French, German, Hindi, Indonesian, Mandarin Chinese, Russian, and Spanish.」というのが特徴的
プロジェクトサイトはBouquet – a Hugging Face Space by facebook

近い報告として文書レベルのデータセットも提案されていた。

DOLFIN — Document-Level Financial test set for Machine Translation [5.3]
文書レベル機械翻訳(MT)専用のテストセットを提案する。データセットは、専門の財務文書から構築される。テストセットは5つの言語ペアに対する1950年の平均的なアライメントセクションで構成されている。
論文参考訳（メタデータ） (Wed, 05 Feb 2025 10:30:40 GMT)
「en、fr、es、it、de」が対象、リポジトリはLinguaCustodia/dolfin · Datasets at Hugging Face

DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought

DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought [89.5]
DRT-o1は、長いチェーン・オブ・シークレットの成功をニューラルマシン翻訳(MT)にもたらす試みである。まず、既存の文献から模範文や比喩文を含む文を抽出し、その後、長い思考を通してこれらの文を翻訳する多エージェントフレームワークを開発する。文献翻訳実験の結果, DRT-o1の有効性が示された。
論文参考訳（メタデータ） (Mon, 23 Dec 2024 11:55:33 GMT)
Chain of thoughtの機械翻訳への応用、データを収集・マルチエージェントフレームワークでのデータ合成、fine tuningというアプローチ。14Bで124 GPU hoursは思ったよりも少ない印象だが、性能は大きく向上している。
プロジェクトサイトはGitHub – krystalan/DRT-o1: DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought

DelTA: An Online Document-Level Translation Agent Based on Multi-Level Memory

DelTA: An Online Document-Level Translation Agent Based on Multi-Level Memory [96.4]
大規模言語モデル(LLM)のための文書レバレッジ翻訳エージェントであるDelTAを紹介する。 DelTAは、様々な粒度とスパンにまたがる情報を格納するマルチレベルメモリ構造を備えている。実験結果から,DelTAは翻訳の一貫性や品質において,強いベースラインを著しく上回ることがわかった。
論文参考訳（メタデータ） (Thu, 10 Oct 2024 17:30:09 GMT)
LLMを利用した機械翻訳エージェント。Proper Noun Records、Bilingual Summary、Long-Term Memory、Short-Term Memoryを持つ。
リポジトリはGitHub – YutongWang1216/DocMTAgent: Code and data releases for the paper — DelTA: An Online Document-Level Translation Agent Based on Multi-Level Memory

Preliminary WMT24 Ranking of General MT Systems and LLMs

Preliminary WMT24 Ranking of General MT Systems and LLMs [69.8]
自動メトリクスに基づくWMT24一般MTシステムの序列である。公式ランキングは人間による評価であり、自動ランキングよりも優れている。
論文参考訳（メタデータ） (Mon, 29 Jul 2024 11:01:17 GMT)
「This is the preliminary ranking of WMT24 General MT systems based on automatic metrics.」、自動評価によるものではあるがとても興味深い
印象的な結果を残している「Unbabel -Tower70B」はAnnouncing Tower : An Open Multilingual LLM for Translation-Related Tasks (unbabel.com)、Tower – a Unbabel Collection (huggingface.co)の大規模バージョンだろうか。詳細が気になるところ。

sPhinX: Sample Efficient Multilingual Instruction Fine-Tuning Through N-shot Guided Prompting

sPhinX: Sample Efficient Multilingual Instruction Fine-Tuning Through N-shot Guided Prompting [27.1]
本稿では,多言語合成指導調律データセット sPhinX を作成するための新しいレシピを提案する。 SPhinXは、命令応答対を英語から50言語に選択的に翻訳することで作成される。 Phi-3-Small と Mistral-7B の2つの最先端モデルを微調整するために sPhinX の有効性を検証した。
論文参考訳（メタデータ） (Sat, 13 Jul 2024 13:03:45 GMT)
「To mitigate this issue, we prompt GPT-4 to selectively translate the instructions, so that the tasks are translated into the appropriate language without changing the semantic meaning.」とLLMを用いた機械翻訳を有効に使った多言語fine tuning。
「We devise LAnguage-Specific N-shot Guided Instruction fine-tuning (LANG) strategy for enhancing the multilingual capabilities of LLMs」を含め有効だとは思うのだが現時点ではライセンス上使いにくい・・・（ライセンス的にOKなNemotronだと現実的なのか気になるところ）

PrExMe! Large Scale Prompt Exploration of Open Source LLMs for Machine Translation and Summarization Evaluation

PrExMe! Large Scale Prompt Exploration of Open Source LLMs for Machine Translation and Summarization Evaluation [22.7]
大規模言語モデル(LLM)はNLPの分野に革命をもたらした。本研究では,機械翻訳(MT)および要約データセット上で,オープンソースのLLMベースのメトリクスに対して,720以上のプロンプトテンプレートを評価する。
論文参考訳（メタデータ） (Wed, 26 Jun 2024 17:56:29 GMT)
機械翻訳と要約を対象とした大規模なプロンプトテンプレートの評価。複数のオープンなLLMで検証しており、LLM間の性能差も参考になる。コードが公開されたら細かく見てみたいところ。
プロジェクトサイトはNLLG (nl2g.github.io)、リポジトリはGitHub – Gringham/PrExMe

Is Translation All You Need? A Study on Solving Multilingual Tasks with Large Language Models

Is Translation All You Need? A Study on Solving Multilingual Tasks with Large Language Models [79.5]
大規模言語モデル (LLM) は多言語機能を示しているが、トレーニングコーパスの不均衡のため、主に英語中心である。この作業は、NLPタスクから実際のユーザクエリまで、評価を拡張します。深い言語理解を必要とする文化関連のタスクでは、ネイティブ言語のプロンプトがより有望になる傾向があります。
論文参考訳（メタデータ） (Thu, 20 Jun 2024 11:09:42 GMT)
LLMの性能にも依存していそうだが、「We compare various multilingual prompting strategies in NLP tasks, finding that translation remains a strong baseline even for LLMs.」とのこと。
データの偏り（英語に特化など）が激しい、基礎性能が高くない場合は特に機械翻訳が有効に思え、直観に反しない結果。タスクによっては機械翻訳が適していないというのもそうだろうと思う。

2025年8月
月	火	水	木	金	土	日
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31