staka – ページ 81 – arXiv最新論文の紹介

Apple Intelligence Foundation Language Models

Apple Intelligence Foundation Language Models [109.6]
本報告では、モデルアーキテクチャ、モデルトレーニングに使用されるデータ、トレーニングプロセス、評価結果について述べる。私たちは、Responsible AIと、モデル開発全体を通して原則がどのように適用されているかに重点を置いています。
論文参考訳（メタデータ） (Mon, 29 Jul 2024 18:38:49 GMT)
Appleによる基盤モデルの紹介。「AFM-server: We train AFM-server from scratch for 6.3T tokens on 8192 TPUv4 chips, using a sequence length of 4096 and a batch-size of 4096 sequences.」といったようにかなり詳細な内容が記載されている。「AFM-on-device: For the on-device model, we found that knowledge distillation [Hinton et al , 2015] and structural pruning are effective ways to improve model performance and training efficiency.」とデバイス向けはMINITRON / Compact Language Models via Pruning and Knowledge Distillation – arXiv最新論文の紹介 (devneko.jp)と近いアプローチに見える。
プロジェクトサイトはIntroducing Apple’s On-Device and Server Foundation Models – Apple Machine Learning Research

Preliminary WMT24 Ranking of General MT Systems and LLMs

Preliminary WMT24 Ranking of General MT Systems and LLMs [69.8]
自動メトリクスに基づくWMT24一般MTシステムの序列である。公式ランキングは人間による評価であり、自動ランキングよりも優れている。
論文参考訳（メタデータ） (Mon, 29 Jul 2024 11:01:17 GMT)
「This is the preliminary ranking of WMT24 General MT systems based on automatic metrics.」、自動評価によるものではあるがとても興味深い
印象的な結果を残している「Unbabel -Tower70B」はAnnouncing Tower : An Open Multilingual LLM for Translation-Related Tasks (unbabel.com)、Tower – a Unbabel Collection (huggingface.co)の大規模バージョンだろうか。詳細が気になるところ。

Text-to-SQLタスクのサーベイ

A Survey on Employing Large Language Models for Text-to-SQL Tasks [7.7]
リレーショナルデータベースに格納されるデータの量の増加により、様々な分野において、このデータの効率的なクエリと利用の必要性が高まっている。 LLM(Large Language Models)の最近の発展を活かすため、様々な新しい手法が登場し、迅速なエンジニアリングと微調整に重点が置かれている。
論文参考訳（メタデータ） (Sun, 21 Jul 2024 14:48:23 GMT)
実用的にも重要なSQL生成タスクのサーベイ
LLMの影響は大きい

LAMBDA: A Large Model Based Data Agent

LAMBDA: A Large Model Based Data Agent [7.2]
LAMBDAは、オープンソースのコードフリーマルチエージェントデータ分析システムである。複雑なデータ駆動アプリケーションにおけるデータ分析の課題に対処するように設計されている。 LAMBDAは、さまざまな機械学習データセットで強力なパフォーマンスを示している。
論文参考訳（メタデータ） (Wed, 24 Jul 2024 06:26:36 GMT)
マルチエージェントなデータ分析システム
リポジトリはGitHub – Stephen-SMJ/LAMBDA: This is the offical repository of paper “LAMBDA: A large Model Based Data Agent”. https://www.polyu.edu.hk/ama/cmfai/lambda.html

SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers

SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers [43.2]
SPIQAは、科学研究論文の文脈内で複雑な図形や表を解釈するために設計されたデータセットである。データセット作成には自動および手動のキュレーションを使用します。 SPIQAは270Kの質問をトレーニング、検証、3つの異なる評価分割に分割する。
論文参考訳（メタデータ） (Fri, 12 Jul 2024 16:37:59 GMT)
科学論文を対象としたマルチモーダルなQAデータセット。zero shotな性能ではものにもよるがGPT-4oが優れているよう。「Furthermore, fine-tuning two open-source systems, LLaVA and InstructBLIP, on the SPIQA training set results in significant improvements over zero-shot evaluations, indicating promising avenues for designing specialized systems for scientific QA in the future.」とfine tuningの有効性を示唆しているのも興味深い。
リポジトリはGitHub – google/spiqa

Unveiling In-Context Learning: A Coordinate System to Understand Its Working Mechanism

Unveiling In-Context Learning: A Coordinate System to Understand Its Working Mechanism [28.8]
大規模言語モデル(LLM)は、テキスト内学習能力に優れる。最近の研究は、ICLに関する2つの矛盾する見解を示している。両ビューを体系的なフレームワークに統合する2次元コーディネートシステムを提供する。
論文参考訳（メタデータ） (Wed, 24 Jul 2024 05:26:52 GMT)
ICLの重要な要素である「タスク認識」と「近い事例情報の供給」についてマトリクスで検証した論文。

LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models

LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models [71.8]
LMMS-EVALは50以上のタスクと10以上のモデルを持つ統一的で標準化されたマルチモーダルベンチマークフレームワークである。 LMMS-EVAL LITEは、カバー範囲と効率の両方を重視したプルーニング評価ツールキットである。マルチモーダルなLIVEBENCHは、ニュースやオンラインフォーラムを継続的に更新し、野生におけるモデルの一般化能力を評価する。
論文参考訳（メタデータ） (Wed, 17 Jul 2024 17:51:53 GMT)
マルチモーダルなLLM用のベンチマーク。LiveBenchではGPT4 TurboがGPT4oより高スコアとなっている。
リポジトリはGitHub – EvolvingLMMs-Lab/lmms-eval: Accelerating the development of large multimodal models (LMMs) with lmms-eval、リーダーボードはLiveBench – a Hugging Face Space by lmms-lab

Very Large-Scale Multi-Agent Simulation in AgentScope

Very Large-Scale Multi-Agent Simulation in AgentScope [115.8]
我々は,ユーザフレンドリーなマルチエージェントプラットフォームであるAgentScopeの新機能とコンポーネントを開発した。高いスケーラビリティと高効率を実現するために,アクタをベースとした分散機構を提案する。多数のエージェントを便利に監視し、管理するためのWebベースのインターフェースを提供する。
論文参考訳（メタデータ） (Thu, 25 Jul 2024 05:50:46 GMT)
マルチエージェントシミュレーションを想定したフレームワークの提案、Apache 2ライセンスとOSS。使いやすそうなのと「Users only need to simply specify the distributions of the population from several aspects, a large number of agents with detailed and diverse characteristics can be effortlessly generated accordingly.」といった機能があるのも特徴的。
リポジトリはGitHub – modelscope/agentscope: Start building LLM-empowered multi-agent applications in an easier way.

A Survey of Defenses against AI-generated Visual Media: Detection, Disruption, and Authentication

A Survey of Defenses against AI-generated Visual Media: Detection, Disruption, and Authentication [15.9]
深層生成モデルは様々なコンピュータビジョンアプリケーションで顕著な性能を示した。これらのモデルは、誤情報、偽造、著作権侵害などの悪意ある目的のために使用されることがある。本稿では,AI生成したビジュアルメディアに対する防衛研究の体系的かつタイムリーなレビューを行う。
論文参考訳（メタデータ） (Mon, 15 Jul 2024 09:46:02 GMT)
「This survey provides a comprehensive overview of research on proactive and passive defenses against AI-generated visual media, covering the mainstream defense tasks of detection, disruption, and authentication, as well as their trustworthiness.」というサーベイ

OpenDevin

OpenDevin: An Open Platform for AI Software Developers as Generalist Agents [109.9]
私たちは、人間の開発者と同様の方法で世界と対話するAIエージェントを開発するためのプラットフォームであるOpenDevinを紹介します。プラットフォームが新しいエージェントの実装を可能にし、コード実行のためのサンドボックス環境との安全なインタラクション、評価ベンチマークの導入について説明する。
論文参考訳（メタデータ） (Tue, 23 Jul 2024 17:50:43 GMT)
ソフトウエア構築の自動化を目指すCognition | Introducing Devin, the first AI software engineerのオープン版。様々なベンチマークでの評価や他手法との比較も興味深い。ベースモデルとしてはClaude 3.5 sonnetの優秀さが目立ち、Claude 3.5 Opusに期待大。
リポジトリはGitHub – OpenDevin/OpenDevin: 🐚 OpenDevin: Code Less, Make More

2025年8月
月	火	水	木	金	土	日
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31