arXiv最新論文の紹介

CC2Vec

CC2Vec: Combining Typed Tokens with Contrastive Learning for Effective Code Clone Detection [20.7]
CC2Vecは、単純なコードクローンを素早く識別するために設計された新しいコード符号化手法である。広く使われている2つのデータセット(BigCloneBenchとGoogle Code Jam)上でCC2Vecを評価する。
論文参考訳（メタデータ） (Wed, 01 May 2024 10:18:31 GMT)
「In this paper, we introduce CC2Vec, a novel code encoding method designed to swiftly identify simple code clones while also enhancing the capability for semantic code clone detection.」とのこと。意味まで考慮して判定していけるのはすごい。
リポジトリはGitHub – CC2Vector/CC2Vec

Why Tabular Foundation Models Should Be a Research Priority

Why Tabular Foundation Models Should Be a Research Priority [65.8]
テーブルデータは、多くの分野において支配的なモダリティであるが、研究の注意がほとんど与えられず、スケールとパワーの面ではかなり遅れている。私たちは現在、表形式の基礎モデル、あるいはLTM(Large Tabular Model)と呼ばれるものの開発を始める時が来たと信じています。
論文参考訳（メタデータ） (Thu, 02 May 2024 10:05:16 GMT)
Large Tabular Model、欲しいと思いつつ汎用的にできるのか＆コストが見合うのかは論文を読んでなお結構疑問

Causal Evaluation of Language Models

Causal Evaluation of Language Models [33.3]
言語モデルの因果的推論能力を評価するための総合的なベンチマークとして,CaLM(Causal Evaluation of Language Models)がある。 CaLMは4つのモジュールからなる分類法であり、因果的対象(評価対象)、適応(結果の取得方法)、メートル法(結果の測定方法)、エラー(悪い結果の分析方法)である。
論文参考訳（メタデータ） (Wed, 01 May 2024 16:43:21 GMT)
LLMの因果的な推論を評価するためのベンチマーク、Causal Evaluation of Language Models (CaLM)の提案、GPT-4がLeaderboardトップだが、最新のモデルでの検証結果を知りたいところ
プロジェクトサイトはCausal Evaluation of Language Models (opencausalab.github.io)

Is Bigger Edit Batch Size Always Better? — An Empirical Study on Model Editing with Llama-3

Is Bigger Edit Batch Size Always Better? — An Empirical Study on Model Editing with Llama-3 [2.6]
本研究では,最新の大言語モデルであるLlama-3に着目したターゲットモデル編集分析を行う。最大4096個の編集を対象とする評価により,最も効果的な編集層を同定する。
論文参考訳（メタデータ） (Wed, 01 May 2024 17:50:37 GMT)
Llama-3を対象としたモデル編集、出るのが速い・・・
「Contrary to previous belief, our experiments show that earlier layers may be more optimal intervention points, and that smaller, frequent sequential batch size edits have a superior performance in comparison to larger batch sizes.」、この手のテクニックはモデルが更新されるたび変わるのだろうか。。。

The Power of Question Translation Training in Multilingual Reasoning: Broadened Scope and Deepened Insights

The Power of Question Translation Training in Multilingual Reasoning: Broadened Scope and Deepened Insights [108.4]
本稿では,大規模言語モデルの英語と非英語のパフォーマンスのギャップを埋めるための質問アライメント手法を提案する。実験結果から,質問アライメント手法は多様な推論シナリオにおける多言語のパフォーマンス向上に有効であることが示唆された。その成功のメカニズムを理解するために、表現空間、チェーン・オブ・シンク、翻訳データスケールを分析する。
論文参考訳（メタデータ） (Thu, 02 May 2024 14:49:50 GMT)
多言語性能を上げるための２段階のアライメント手法（ question alignment and response alignment）の提案。さらに「En-X translation training can implicitly bias LLM to generate non-English chain-of-thought and increase the question-response language consistency.」とのこと。分析や解釈も面白い。
リポジトリはGitHub – NJUNLP/QAlign

Plan-Seq-Learn: Language Model Guided RL for Solving Long Horizon Robotics Tasks

Plan-Seq-Learn: Language Model Guided RL for Solving Long Horizon Robotics Tasks [50.3]
Plan-Seq-Learn (PSL) は、抽象言語と学習した低レベル制御の間のギャップを埋めるためにモーションプランニングを使用するモジュラーアプローチである。 PSLは85%以上の成功率、言語ベース、古典的、エンドツーエンドのアプローチを達成している。
論文参考訳（メタデータ） (Thu, 02 May 2024 17:59:31 GMT)
今なお難しい長期計画のためのフレームワークの提案。自然言語による高レベルな計画と、それを実現するための「Sequencing Module 」「Learning Module」からなる。
リポジトリはPlan-Seq-Learn (mihdalal.github.io)

Beyond Chain-of-Thought: A Survey of Chain-of-X Paradigms for LLMs

Beyond Chain-of-Thought: A Survey of Chain-of-X Paradigms for LLMs [39.2]
CoT(Chain-of-Thought)は、大規模言語モデル(LLM)の印象的な推論能力を引き出す、広く採用されているプロンプト手法である。 CoTのシーケンシャルな思考構造に触発されて、様々な領域やLLMを含むタスクにまたがる様々な課題に対処するために、多くのChain-of-X(CoX)手法が開発されている。
論文参考訳（メタデータ） (Wed, 24 Apr 2024 06:12:00 GMT)
Chain of Xのサーベイ、提案されている手法が色々あって面白い。

CRISPR-GPT: An LLM Agent for Automated Design of Gene-Editing Experiments

CRISPR-GPT: An LLM Agent for Automated Design of Gene-Editing Experiments [51.4]
大規模言語モデル(LLM)は様々なタスクにおいて有望であるが、しばしば特定の知識が欠如し、生物学的設計の問題を正確に解くのに苦労する。本研究では,CRISPRに基づく遺伝子編集実験の設計プロセスを自動化するために,ドメイン知識と外部ツールを付加したLLMエージェントであるCRISPR-GPTを紹介する。
論文参考訳（メタデータ） (Sat, 27 Apr 2024 22:59:17 GMT)
遺伝子編集を対象としたLLMエージェントの提案。
確かに親和性は高そうだし、NLPの応用もやられてきた分野ではあるが、この分野にもLLMがという驚き。本件では対象としていないようだが、遺伝子というモダリティが直接扱えるようになる日も近いのだろうか。

KAN: Kolmogorov-Arnold Networks

KAN: Kolmogorov-Arnold Networks [16.8]
MLP(Multi-Layer Perceptrons)の代替として、KAN(Kolmogorov-Arnold Networks)を提案する。カンはエッジ上で学習可能なアクティベーション機能を持つ(“weights”)。この一見単純な変化により、KANSAは精度と解釈可能性という点で、ニューラルネットワークを上回ります。
論文参考訳（メタデータ） (Tue, 30 Apr 2024 17:58:29 GMT)
MLPよりも性能・解釈可能性が優れていると主張する構造の提案。「KANs and MLPs are dual: KANs have activation functions on edges, while MLPs have activation functions on nodes. This simple change makes KANs better (sometimes much better!) than MLPs in terms of both model accuracy and interpretability.」とのこと。現時点では「Currently, the biggest bottleneck of KANs lies in its slow training. KANs are usually 10x slower than MLPs, given the same number of parameters.」という記載もあるが、本当かつ広く受け入れられるのだろうか。。
リポジトリはGitHub – KindXiaoming/pykan: Kolmogorov Arnold Networks

Capabilities of Gemini Models in Medicine

Capabilities of Gemini Models in Medicine [100.6]
医療専門のマルチモーダルモデルであるMed-Geminiを紹介する。メドジェニーニを14の医療ベンチマークで評価し,その内10に新たな最先端(SoTA)性能を確立した。我々の結果は、Med-Geminiの可能性を示唆する証拠を提供するが、より厳密な評価は実世界の展開に先立って重要である。
論文参考訳（メタデータ） (Mon, 29 Apr 2024 04:11:28 GMT)
医療特化のGemini、Med-Geminiに関する報告。GPT-4を上回る性能。かつ、「Finally, Med-Gemini’s performance suggests real-world utility by surpassing human experts on tasks such as medical text summarization and referral letter generation, alongside demonstrations of promising potential for multimodal medical dialogue, medical research and education.」
医療用にfine tuningすればこうなるだろうとは思いつつ、進化が速い。

2025年7月
月	火	水	木	金	土	日
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31