2025年2月18日 – arXiv最新論文の紹介

Enhancing LLM Character-Level Manipulation via Divide and Conquer [108.7]
大規模言語モデル(LLM)は、幅広い自然言語処理(NLP)タスクにまたがる強力な一般化機能を示している。彼らは文字レベルの文字列操作において顕著な弱点を示し、文字削除、挿入、置換といった基本的な操作に苦労した。本稿では,トークンレベルの処理と文字レベルの操作のギャップを埋める新しい手法であるDivide and Conquerによる文字レベル操作を提案する。
論文参考訳（メタデータ） (Wed, 12 Feb 2025 07:37:39 GMT)
「For example, when prompting models to insert ‘a’ after every ‘e’ in the word “intelligence”, even one of the state-of-the-art LLMs, ChatGPT-4o, returns a wrong answer: “intellaigenca”.」というようなトークン単位と文字単位の相違により意外と難しい文字操作に対する対応方法の提案。「We first decompose the token into an atomized character sequence. Then, we perform character-wise manipulations on the individual characters. Finally, we reconstruct the token from the modified sequence.」と3ステージ構成。
リポジトリはhttps://github.com/Eric2i/CharDCとのことだが、現時点では404

BOUQuET: dataset, Benchmark and Open initiative for Universal Quality Evaluation in Translation [28.5]
このデータセットは、まず英語以外の言語で手作りされている。それぞれのソース言語は、世界の人口の半分が一般的に使っている23の言語に代表される。
論文参考訳（メタデータ） (Thu, 06 Feb 2025 18:56:37 GMT)
翻訳用ベンチマーク、「Non-English-centric focus. Source-BOUQuET is handcrafted by proficient speakers of French, German, Hindi, Indonesian, Mandarin Chinese, Russian, and Spanish.」というのが特徴的
プロジェクトサイトはBouquet – a Hugging Face Space by facebook

近い報告として文書レベルのデータセットも提案されていた。

DOLFIN — Document-Level Financial test set for Machine Translation [5.3]
文書レベル機械翻訳(MT)専用のテストセットを提案する。データセットは、専門の財務文書から構築される。テストセットは5つの言語ペアに対する1950年の平均的なアライメントセクションで構成されている。
論文参考訳（メタデータ） (Wed, 05 Feb 2025 10:30:40 GMT)
「en、fr、es、it、de」が対象、リポジトリはLinguaCustodia/dolfin · Datasets at Hugging Face