arXiv最新論文の紹介

ViT-1.58b

ViT-1.58b: Mobile Vision Transformers in the 1-bit Era [27.7]
本稿では、メモリと計算オーバーヘッドを大幅に削減する新しい1.58ビット量子化ViTモデルViT-1.58bを紹介する。 CIFAR-10 と ImageNet-1k の実験では、ViT-1.58b は完全精度の Vit に匹敵する精度を維持している。
論文参考訳（メタデータ） (Wed, 26 Jun 2024 04:01:19 GMT)
1 bit(1.58 bit)なLLMとHAWK・Griffin – arXiv最新論文の紹介 (devneko.jp)のViT版、「Our results show that ViT-1.58b achieves competitive accuracy on benchmarks like CIFAR10 and ImageNet-1k with significantly lower resource requirements.」とViTでも良い結果らしい。
リポジトリはGitHub – DLYuanGod/ViT-1.58b

Evaluating Copyright Takedown Methods for Language Models

Evaluating Copyright Takedown Methods for Language Models [100.4]
言語モデル(LM)は、潜在的に著作権のある資料を含む様々なデータに対する広範な訓練からその能力を引き出す。本稿では,LMの著作権削除の可能性と副作用を初めて評価する。システムプロンプトの追加、デコード時間フィルタリングの介入、未学習アプローチなど、いくつかの戦略を検討する。
論文参考訳（メタデータ） (Wed, 26 Jun 2024 18:09:46 GMT)
著作権に守られたコンテンツを生成してしまわないよう対策する手法についての研究。データセットを構築、様々な手法で検証を行っている。「Through COTAEVAL, we discover that none of the mainstream takedown methods excel across all metrics.」とのことで対策は簡単ではないよう。
リポジトリはCotaEval: Evaluating Copyright Takedown Methods for Language Models

Mamba or RWKV: Exploring High-Quality and High-Efficiency Segment Anything Model

Mamba or RWKV: Exploring High-Quality and High-Efficiency Segment Anything Model [138.2]
変換器を用いた分割法は高解像度画像を扱う際の効率的な推論の課題に直面している。本研究では,異なるアーキテクチャを探索し,効率的なセグメント・アズ・ア・モデルの設計に焦点をあてる。 RWKV-SAM は SAM-like モデルのための単純で効果的で高速なベースラインである。
論文参考訳（メタデータ） (Thu, 27 Jun 2024 17:49:25 GMT)
Segment AnythingモデルにおけるRWKVとMambaを比較、RWKV-SAMという高速かつ高性能な構造を提案。「In particular, we find that under the efficient segmentation setting of high-resolution image inputs, RWKV runs faster than Mamba.」とのこと。
リポジトリはGitHub – HarborYuan/ovsam: [arXiv preprint] The official code of paper “Open-Vocabulary SAM”.

A Survey of Multimodal-Guided Image Editing with Text-to-Image Diffusion Models

A Survey of Multimodal-Guided Image Editing with Text-to-Image Diffusion Models [117.8]
画像編集は、ユーザーが特定の要求を満たすために、与えられた合成画像または実際の画像を編集することを目的としている。この分野での最近の顕著な進歩は、テキスト・ツー・イメージ(T2I)拡散モデルの開発に基づいている。 T2Iベースの画像編集手法は、編集性能を大幅に向上させ、マルチモーダル入力でガイドされたコンテンツを修正するためのユーザフレンドリーなインタフェースを提供する。
論文参考訳（メタデータ） (Thu, 20 Jun 2024 17:58:52 GMT)
画像編集に関するサーベイ、引用数が300を超える包括的内容、GitHub – xinchengshuai/Awesome-Image-Editingとリポジトリも公開されている。

OlympicArena Medal Ranks: Who Is the Most Intelligent AI So Far?

OlympicArena Medal Ranks: Who Is the Most Intelligent AI So Far? [24.7]
我々は、最近リリースされたClaude-3.5-Sonnet、Gemini-1.5-Pro、GPT-4oに焦点を当てている。本稿では,各種分野にわたる総合的なパフォーマンスに基づいて,初めてオリンピック・メダリスト・テーブルを用いてAIモデルをランク付けする手法を提案する。
論文参考訳（メタデータ） (Mon, 24 Jun 2024 16:31:12 GMT)
最新LLMを含むベンチマーク結果、「Claude-3.5-Sonnet shows highly competitive overall performance over GPT-4o, even surpassing GPT-4o on a few subjects (i.e., Physics, Chemistry and Biology)」、「Gemini-1.5-Pro and GPT-4V are ranked consecutively just behind GPT-4o and Claude-3.5-Sonnet, but with a clear performance gap between them.」と現時点ではGPT-4oとClaude 3.5 Sonnetが双璧のよう。
リポジトリはGitHub – GAIR-NLP/OlympicArena: This is the official repository of the paper “OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI”

Ragnarök: A Reusable RAG Framework and Baselines for TREC 2024 Retrieval-Augmented Generation Track

Ragnarök: A Reusable RAG Framework and Baselines for TREC 2024 Retrieval-Augmented Generation Track [51.3]
RAGベースの検索システムを構築、テスト、視覚化、体系的に評価するためのアリーナを持つことが不可欠である。 TREC 2024 RAG Trackを提案する。
論文参考訳（メタデータ） (Mon, 24 Jun 2024 17:37:52 GMT)
すごい名前のRAG評価用ベンチマーク・フレームワーク
リポジトリはGitHub – castorini/ragnarok: Retrieval-Augmented Generation battle!

Googleから公開モデルとしては規模の大きいLLM Gemma2がリリースされた。9Bと27Bの公開。Llama3など競合する公開モデルを超える性能とのこと。テクニカルレポート（gemma-2-report.pdf (storage.googleapis.com)）には「The 9 billion and 27 billion parameter models are available today, with a 2 billion parameter model to be released shortly.」とある。「We also train the 2B and 9B models with knowledge distillation (Hinton et al , 2015) instead of next token prediction. The resulting models deliver the best performance for their size, and even offer competitive alternatives to models that are 2-3× bigger.」と蒸留を効果的に使っているもの面白い。5. Ablationsをみるに効果は大きそう

ニュースリリース：Google launches Gemma 2, its next generation of open models (blog.google)
リポジトリ
- google/gemma-2-9b-it · Hugging Face
- google/gemma-2-27b-it · Hugging Face

いつもの翻訳ベンチマークでは非常に高い性能を示した。期待大である。Gemma 2 9Bの機械翻訳性能 | ぷるーふおぶこんせぷと (staka.jp)

OpenAIからはGPT-4の間違いを見つけ修正提案するCriticGPTが出ている。今はコードの修正が対象。限界もあるようだがこのような研究は重要。Finding GPT-4’s mistakes with GPT-4 | OpenAI

The Responsible Foundation Model Development Cheatsheet: A Review of Tools & Resources

The Responsible Foundation Model Development Cheatsheet: A Review of Tools & Resources [100.2]
ファンデーションモデル開発は、急速に成長するコントリビュータ、科学者、アプリケーションを引き付けている。責任ある開発プラクティスを形成するために、我々はFoundation Model Development Cheatsheetを紹介します。
論文参考訳（メタデータ） (Wed, 26 Jun 2024 02:19:01 GMT)
責任ある基盤モデル開発のためのチートシート。チートシートとあるが広範な内容となっている。
プロジェクトサイトはResources for Foundation Models – Foundation Model Development Cheatsheet (fmcheatsheet.org)

SeaKR: Self-aware Knowledge Retrieval for Adaptive Retrieval Augmented Generation

SeaKR: Self-aware Knowledge Retrieval for Adaptive Retrieval Augmented Generation [45.4]
本稿では,Self-Aware Knowledge Retrieval (SeaKR)を紹介する。 SeaKRは, LLMの自己認識不確かさを内部状態から抽出する適応RAGモデルである。複雑で単純な問合せ解答データセットを用いた実験により,SeaKRが既存の適応RAG法より優れていることが示された。
論文参考訳（メタデータ） (Thu, 27 Jun 2024 14:38:33 GMT)
「SEAKR activates retrieval when the LLMs present high self-aware uncertainty for generation.」という戦略のRAG。Agenticで複雑な動作でFLARE（Fugu-MT 論文翻訳(概要): Active Retrieval Augmented Generation (fugumt.com)）やDRAGIN（Fugu-MT 論文翻訳(概要): DRAGIN: Dynamic Retrieval Augmented Generation based on the Real-time Information Needs of Large Language Models (fugumt.com)）を上回る。
リポジトリはGitHub – THU-KEG/SeaKR

Themis: Towards Flexible and Interpretable NLG Evaluation

Themis: Towards Flexible and Interpretable NLG Evaluation [39.1]
我々は,人間とGPT-4アノテーションを併用した大規模NLG評価コーパスNLG-Evalを構築し,この分野における関連データの欠如を軽減した。我々は,NLG評価専用のLLMであるThemisを提案する。
論文参考訳（メタデータ） (Wed, 26 Jun 2024 14:04:29 GMT)
評価のためのデータセット構築（0.5 million samples and 58 datasets across 9 NLG tasks）とfine tunedなモデルの提案。UniEvalやGEvalを上回る性能。
リポジトリはGitHub – PKU-ONELab/Themis: The official repository for our NLG evaluation LLM Themis and the paper Themis: Towards Flexible and Interpretable NLG Evaluation.

2025年7月
月	火	水	木	金	土	日
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31