2024年6月 – ページ 5 – arXiv最新論文の紹介

Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis [118.1]
Video-MMEは、ビデオ解析におけるMLLMの完全なマルチモード評価ベンチマークである。我々は,GPT-4シリーズやGemini 1.5 Pro,オープンソース画像モデルなど,最先端のMLLMを幅広く評価した。我々の実験によると、Gemini 1.5 Proは最も優れた商用モデルであり、オープンソースモデルよりも大幅に優れています。
論文参考訳（メタデータ） (Fri, 31 May 2024 17:59:47 GMT)
ビデオ解析を対象としたベンチマーク。900個、256時間の動画に対して2.7KのQAを人がのテーションしている。ドメインも様々（GitHub – BradyFU/Video-MME: ✨✨Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis）。
現時点のベンチマーク結果はGemini Proがもっともよく、Gemini Flash、GPT-4o、GPT-4Vが続いている。APIによって使えるデータ種類が異なるなど前提を合わせるのが難しい点に注意が必要。例えば「Since the video interface of GPT-4o has not been released yet, we sample 10 frames and evaluate the model using multiple images as input.」という注釈がある。
リポジトリはVideo-MME: Welcome

X-Instruction: Aligning Language Model in Low-resource Languages with Self-curated Cross-lingual Instructions

X-Instruction: Aligning Language Model in Low-resource Languages with Self-curated Cross-lingual Instructions [43.9]
大規模な言語モデルは、英語のような高リソース言語ではうまく反応するが、低リソース言語では苦労する。そこで本研究では,低リソース言語における英語の命令と応答を併用した言語間命令を新たに構築する手法を提案する。
論文参考訳（メタデータ） (Thu, 30 May 2024 06:45:23 GMT)
下記3段階（リポジトリより）で低リソースな言語用にcross-lingual instructions datasetを作る手法の提案。
- X-Instruction Generation: Language models learn to generate cross-lingual instructions for multilingual texts using seed data.
- X-Instruction Refinement: Language models iteratively label and refine cross-lingual instruction samples.
- X-Instruction Diversification: The final instruction data are sampled from different clusters of embedding from the English instruction to increase the diversity.
リポジトリはGitHub – ZNLP/X-Instruction: Official code and data for ACL-2024 paper “X-Instruction: Aligning Language Model in Low-resource Languages with Self-curated Cross-lingual Instructions”

Artificial Intelligence Approaches for Predictive Maintenance in the Steel Industry: A Survey

Artificial Intelligence Approaches for Predictive Maintenance in the Steel Industry: A Survey [9.1]
予測保守(PdM)は産業4.0の柱の一つとして登場した。この調査は、鉄鋼業界におけるAIベースのPdM分野における知識の現状を総合するものである。
論文参考訳（メタデータ） (Tue, 21 May 2024 13:32:46 GMT)
鉄鋼業界＆予測保守におけるAI活用のサーベイ。
業界・タスク特化であるが35ページと長く伝統的な手法を含めていろいろなアプローチがされているのだなと興味深かった。 PdMだとProduct Managerを思い浮かべるかもしれないが、ここではPredictive Maintenance。

The Fine-Tuning Paradox: Boosting Translation Quality Without Sacrificing LLM Abilities

The Fine-Tuning Paradox: Boosting Translation Quality Without Sacrificing LLM Abilities [18.2]
機械翻訳のための微調整大型言語モデル(LLM)は、全体的な翻訳品質が改善されている。モデルサイズは70億から65億までの範囲で,LLaMAおよびファルコン系のモデルに対して広範な翻訳評価を行う。フォーマルなステアリングを行う能力の低下、数ショットの例による技術的翻訳の作成、文書レベルの翻訳を行う能力の低下を観察する。
論文参考訳（メタデータ） (Thu, 30 May 2024 14:25:56 GMT)
「Our results show that while fine-tuning improves the general translation quality of LLMs, several abilities degrade.」に対して「We show that incorporating a mix of monolingual and parallel data during fine-tuning can preserve abilities of LLMs.」とのこと
翻訳特化にしたら下がる能力もあるのは当然だと思うが、単言語なデータを入れるのみで能力の維持が可能というのに驚き。

Transformer in Touch: A Survey

Transformer in Touch: A Survey [29.6]
自然言語処理の分野で最初に大きな成功を収めたTransformerモデルは、最近、触覚認識の応用に大きな可能性を示している。本稿では,触覚技術におけるトランスフォーマーの適用と開発について概観する。
論文参考訳（メタデータ） (Tue, 21 May 2024 13:26:27 GMT)
触覚の領域にもTransformerが応用されつつあるようで、そのサーベイ
いろいろなところで使われていて本当にすごい

METRAG: Multi–layEred Thoughts enhanced RetrievalAugmented Generation framework

Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts [39.5]
我々は、類似性は必ずしもパナセアではなく、類似性に完全に依存することは、時として検索拡張生成の性能を低下させるであろうと論じている。我々はMulti layEred ThoughtsEnhanced Retrieval Augmented GenerationフレームワークであるMetRagを提案する。
論文参考訳（メタデータ） (Thu, 30 May 2024 09:50:38 GMT)
ただの類似検索によるRAGではなく、類似検索＋supervisedな学習を行ったモデルの組み合わせにさらに適応型要約を使ってRAGを行うアプローチの提案
非常に重そうなアプローチではあるがベンチマークでは他手法に比べて優れた結果を出している

A Survey on Multi-modal Machine Translation: Tasks, Methods and Challenges

A Survey on Multi-modal Machine Translation: Tasks, Methods and Challenges [35.9]
マルチモーダル機械翻訳は学術と産業の両方に大きな関心を集めている。テキストと視覚の両方を入力として取り、視覚的コンテキストを活用して、ソーステキストの曖昧さに対処する。
論文参考訳（メタデータ） (Tue, 21 May 2024 10:34:47 GMT)
マルチモーダルな機械翻訳に関するサーベイ。研究が続いてきた分野ではあるがMLLMの影響を大きく受けそうな雰囲気（サーベイにも言及はある）

Evaluating and Modeling Social Intelligence: A Comparative Study of Human and AI Capabilities

Evaluating and Modeling Social Intelligence: A Comparative Study of Human and AI Capabilities [29.2]
本研究では,人間の認知の最も顕著な側面の一つである社会的知性を評価するためのベンチマークを紹介する。我々は、社会力学の総合的理論枠組みを開発し、逆推論(IR)と逆逆計画(IIP)の2つの評価タスクを導入した。大規模な実験と分析の結果、人間は最新のGPTモデルを上回る性能、ゼロショット学習、ワンショット一般化、マルチモダリティへの適応性を示した。
論文参考訳（メタデータ） (Mon, 20 May 2024 07:34:48 GMT)
社会的知性を測るためのベンチマーク、対象はInverse Reasoning (IR) とInverse Inverse Planning (IIP)。GPT-4でもタスクによっては人間とギャップがある。結論の「We hope that our study contributes valuable information towards the advancement of ASI.」にASIが出ているのに少しびっくり。
リポジトリはGitHub – bigai-ai/Evaluate-n-Model-Social-Intelligence

A Survey on Large Language Models with Multilingualism: Recent Advances and New Frontiers

A Survey on Large Language Models with Multilingualism: Recent Advances and New Frontiers [48.3]
LLM(Large Language Models)の急速な開発は、自然言語処理における顕著な多言語機能を示している。 LLMのブレークスルーにもかかわらず、多言語シナリオの研究は依然として不十分である。本調査は,多言語問題に対する研究コミュニティの取り組みを支援することを目的としており,LLMに基づく多言語自然言語処理における中核概念,鍵技術,最新の発展の包括的理解を提供する。
論文参考訳（メタデータ） (Fri, 17 May 2024 17:47:39 GMT)
LLMの多言語対応に関するサーベイ。
リポジトリも参考になる　GitHub – kaiyuhwang/MLLM-Survey: The paper list of multilingual pre-trained models (Continual Updated).

The SkatingVerse Workshop & Challenge: Methods and Results

The SkatingVerse Workshop & Challenge: Methods and Results [137.8]
SkatingVerse Workshop & Challengeは、人間の行動理解のための新規で正確な方法の研究を促進することを目的としている。 SkatingVerse Challengeで使用されるデータセットが公開された。世界中から参加する約10チームがSkatingVerse Challengeに出場した。
論文参考訳（メタデータ） (Mon, 27 May 2024 14:12:07 GMT)
HAU：Human action understanding のためのデータセット及びコンペティションに関する論文。参加チーム上位の手法や工夫も簡単にではあるが紹介されている。
プロジェクトサイトは1st SkatingVerse Challenge

2024年6月
月	火	水	木	金	土	日
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30