2024年11月11日 – arXiv最新論文の紹介

HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems

HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems [62.4]
Retrieval-Augmented Generation (RAG) は知識能力の向上を目的としている。 HTML RAGは、検索された知識のフォーマットとして、平易なテキストの代わりにHTMLを使用する。我々は,情報の損失を最小限に抑えつつ,HTMLの短縮化を図るため,HTMLのクリーニング,圧縮,プルーニング戦略を提案する。
論文参考訳（メタデータ） (Tue, 05 Nov 2024 09:58:36 GMT)
RAGで使用する知識のフォーマットとしてHTMLを使用するという提案、ベンチマークでも優れた結果とのこと。ベースLLM（Llama 3.1 8B・70B）×提案手法・PlainText・Markdownの結果が興味深い。（HTMLがベストなのか読み取るのが難しいような気がしなくもない）
リポジトリはGitHub – plageon/HtmlRAG: HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieval Results in RAG Systems

Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent [83.4]
Hunyuan-Largeは、オープンソースのTransformerベースのエキスパートモデルのミックスである。我々は,Hunyuan-Largeの優れた性能を,様々なベンチマークで徹底的に評価する。 Hunyuan-Largeの主な実践は、以前の文献より大きい大規模合成データである。
論文参考訳（メタデータ） (Tue, 05 Nov 2024 04:14:25 GMT)
高性能かつモデルが公開されているタイプのLLM。389Bパラメータうち52BがアクティブなるMoEでLlama 3.1 70Bを超え、405Bと競合的と主張。比較的寛容なライセンスであるが「THIS LICENSE AGREEMENT DOES NOT APPLY IN THE EUROPEAN UNION AND IS EXPRESSLY LIMITED TO THE TERRITORY, AS DEFINED BELOW.」というのが特徴的。「This Agreement and any dispute arising out of or relating to it will be governed by the laws of the Hong Kong Special Administrative Region of the People’s Republic of China」との記載も。
リポジトリはGitHub – Tencent/Tencent-Hunyuan-Large、モデルはtencent/Tencent-Hunyuan-Large · Hugging Face

Number Cookbook: Number Understanding of Language Models and How to Improve It [64.0]
大規模言語モデル(LLM)は、基本的な数値的な理解と処理において予期せぬ誤りを犯しながら、複雑な推論タスクの増大を解決することができる。本稿では,LLMの数値理解と処理能力(NUPA)について包括的に検討する。
論文参考訳（メタデータ） (Wed, 06 Nov 2024 08:59:44 GMT)
LLMにおける numerical understanding and processing ability (NUPA)の分析と、その改善方法の検討。現状だとコード生成を介すなどツールを使うアプローチが有力だが、「1) we want to study the self-contained NUPA of LLMs,　2) calling external tools whenever encountering numbers increases the inference latency (Xu et al , 2024), and 3) we believe NUPA without tools is a necessary ability of AGI.」という点から本件ではツール利用が検討対象外となっている。
現時点では「We investigate NUPA of LLMs and introduce a comprehensive benchmark, the NUPA test, to reveal that numerical problems remain challenging for modern LLMs.」とのこと。やはり難しい問題。実用上はコード生成を介すなどして対応できなくはないが・・・。
リポジトリはGitHub – GraphPKU/number_cookbook

Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level [73.1]
我々は、エンドツーエンドの自律データサイエンスエージェントであるAgent K v1.0を紹介する。経験から学ぶことによって、データサイエンスのライフサイクル全体を管理する。キー情報を選択的に保存して検索することで、長期記憶と短期記憶を最適化する。
論文参考訳（メタデータ） (Tue, 05 Nov 2024 23:55:23 GMT)
「our results indicate that Agent K v1.0 has reached a performance level equivalent to Kaggle Grandmaster, with a record of 6 gold medals, 3 silver medals, and 7 bronze medals」とKaggleのグランドマスター並みを主張するエージェントシステムの提案。
パイプライン構成やプロンプトなど参考になる点は多いが、「However, because this assessment relies on a custom split of the training data rather than the competition’s actual private test set, it remains uncertain whether an agent’s high ranking in this context would align with results on the original Kaggle leaderboard.」という記載やLeakの可能性など「ほんまかいな」という疑問点はなくはない。

Neural Fields in Robotics: A Survey [39.9]
Neural Fieldsは、コンピュータビジョンとロボット工学における3Dシーン表現の変革的アプローチとして登場した。この調査は、ロボット工学における彼らの応用を探求し、知覚、計画、制御を強化する可能性を強調している。それらのコンパクトさ、メモリ効率、微分可能性、基礎モデルと生成モデルとのシームレスな統合は、リアルタイムアプリケーションに理想的です。
論文参考訳（メタデータ） (Sat, 26 Oct 2024 16:26:41 GMT)
「This paper provides a thorough review of Neural Fields in robotics, categorizing applications across various domains and evaluating their strengths and limitations, based on over 200 papers.」というサーベイ、ロボット分野で研究・応用が広がっているとのこと。
リポジトリはNeural Fields in Robotics: A Survey