2023年12月 – ページ 4 – arXiv最新論文の紹介

Image Super-Resolution with Text Prompt Diffusion

Image Super-Resolution with Text Prompt Diffusion [123.9]
画像SRにテキストプロンプトを導入し、劣化前の情報を提供する。実験により、テキストプロンプトを画像SRに導入すると、合成画像と実画像の両方で優れた結果が得られることが示された。
論文参考訳（メタデータ） (Fri, 24 Nov 2023 05:11:35 GMT)
超解像タスクにテキストプロンプトを導入する研究、確かにスコアが上がっており面白い。適切なプロンプトをかけるかとか、リークはとか思わなくはないが何らかの情報は追加されるはずで効果はありそう。
リポジトリはGitHub – zhengchen1999/PromptSR: PyTorch code for our paper “Image Super-Resolution with Text Prompt Diffusion”

Sequential Modeling Enables Scalable Learning for Large Vision Models

Sequential Modeling Enables Scalable Learning for Large Vision Models [120.9]
本稿では,言語データを用いずにLVM(Large Vision Model)を学習できる新しい逐次モデリング手法を提案する。我々は、生画像やビデオや注釈付きデータソースを表現できる共通フォーマット「視覚文」を定義した。
論文参考訳（メタデータ） (Fri, 1 Dec 2023 18:59:57 GMT)
ピクセル以外の情報を用いないモデルの提案、プロンプトもピクセル。「So, we graciously hand over to you, our gentle reader, the task of pondering whether our modest LVM also exhibits the much-vaunted ‘Sparks of AGI’.」というコメントが面白く、熱い。
プロジェクトサイトはLarge Vision Models (yutongbai.com)

GPT-4V with Emotion: A Zero-shot Benchmark for Multimodal Emotion Understanding

GPT-4V with Emotion: A Zero-shot Benchmark for Multimodal Emotion Understanding [38.5]
GPT-4 with Vision (GPT-4V) は様々なマルチモーダルタスクにおいて顕著な性能を示した。本稿では,マルチモーダル感情理解におけるGPT-4Vの能力について定量的に評価する。
論文参考訳（メタデータ） (Thu, 7 Dec 2023 13:27:37 GMT)
GPT-4による感情分類、タスクやドメインによってはsupervisedな手法を超えている。頑健性についても検証が行われており「This resilience to color space changes suggests that GPT-4V is inherently robust in this regard.」とのこと。一方で「However, GPT-4V performs poorly in micro-expression recognition (see Table 5), which indicates that GPT-4V is currently tailored for general domains.」との指摘も。なかなか悩ましい結果ではあるが、一般用途では強力に使えそうに思える。
リポジトリはGitHub – zeroQiaoba/gpt4v-emotion: GPT-4V with Emotion

Exchange-of-Thought

Exchange-of-Thought: Enhancing Large Language Model Capabilities through Cross-Model Communication [76.0]
大規模言語モデル(LLM)は、最近、Chain-of-Thoughtテクニックによる複雑な推論タスクにおいて大きな進歩を遂げました。本稿では,問題解決時のクロスモデル通信を可能にする新しいフレームワークであるExchange-of-Thought (EoT)を提案する。
論文参考訳（メタデータ） (Mon, 4 Dec 2023 11:53:56 GMT)
モデル間通信をしながら回答を導くフレームワークの提案。ChatEval – arXiv最新論文の紹介 (devneko.jp)に近い動作のように思える。
性能は通常のCoTよりも良いとのこと。コスト分析があるのも面白い。

TaskWeaver

TaskWeaver: A Code-First Agent Framework [51.8]
TaskWeaverは、LLMで動く自律エージェントを構築するためのコードファーストフレームワークである。ユーザ要求を実行可能なコードに変換し、ユーザ定義プラグインを呼び出し可能な関数として扱う。リッチなデータ構造、フレキシブルなプラグイン利用、動的プラグイン選択のサポートを提供する。
論文参考訳（メタデータ） (Fri, 1 Dec 2023 07:42:56 GMT)
ChatGPT + Advanced data analyticsのような動作をするフレームワークの提案。リポジトリにあるビデオが分かりやすい。
リポジトリはGitHub – microsoft/TaskWeaver: A code-first agent framework for seamlessly planning and executing data analytics tasks.

Creative Leap-of-Thought

Let’s Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation [100.9]
Chain-of-Thought(CoT)は、大きな言語モデルをステップバイステップで推論し、その論理的推論能力を動機付ける。大規模言語モデル(LLM)におけるLeap-of-Thought(LoT)能力について検討する。 LoTは、強い結びつきと知識の飛躍を含む、シークエンシャルで創造的なパラダイムである。
論文参考訳（メタデータ） (Wed, 6 Dec 2023 03:20:29 GMT)
「While effective for logical tasks, CoT is not conducive to creative problem-solving which often requires out-of-box thoughts and is crucial for innovation advancements.」の解決のため instruction tuning に大喜利データを使うなど非常に面白い内容。人間の評価によって効果を確認とのこと。
リポジトリはGitHub – sail-sg/CLoT: Official Codebase of our Paper: “Let’s Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation”

A Survey of Large Language Models for Code: Evolution, Benchmarking, and Future Trends

A Survey of Large Language Models for Code: Evolution, Benchmarking, and Future Trends [30.8]
一般的な大規模言語モデル(LLM)は、ソフトウェア工学におけるコード生成のようなタスクにおいて大きな可能性を証明している。コードLLMのかなりの部分は、モデルファインチューニングを通じて一般的なLLMから派生している。現在、Code LLMとそのパフォーマンスに関する体系的な調査が欠如している。
論文参考訳（メタデータ） (Fri, 17 Nov 2023 07:55:16 GMT)
LLMでのコード生成に関するサーベイ
ものすごく色々あるというのと、表４のPerformance of LLMs in HumanEval Benchmarkのような比較表がとても参考になる

Ego-Exo4D

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives [194.5]
多様な大規模マルチモーダルビデオデータセットとベンチマークチャレンジであるEgo-Exo4Dを提案する。Ego-Exo4Dは、熟練した人間の活動を同時に捉えたエゴセントリックでエゴセントリックなビデオを中心にしている。世界の13都市から800人以上の参加者が131の異なる自然シーンでこれらの活動を行った。
論文参考訳（メタデータ） (Thu, 30 Nov 2023 05:21:07 GMT)
マルチモーダル認識のための基礎データセットの提案、1400時間超と大規模
プロジェクトサイトはEgo-Exo4D (ego-exo4d-data.org)

Competition-Level Problems are Effective LLM Evaluators

Competition-Level Problems are Effective LLM Evaluators [124.8]
本稿では,Codeforcesにおける最近のプログラミング問題の解決において,大規模言語モデル(LLM)の推論能力を評価することを目的とする。まず,問題の発生時間,難易度,遭遇したエラーの種類など,様々な側面を考慮して,GPT-4の望ましくないゼロショット性能を総合的に評価する。驚くべきことに、GPT-4のTheThoughtivedのパフォーマンスは、2021年9月以降、あらゆる困難と種類の問題に対して一貫して問題が減少するような崖を経験している。
論文参考訳（メタデータ） (Tue, 5 Dec 2023 03:44:19 GMT)
LLMのデータ汚染問題を検証するためにCodeforceの問題を利用。「We find a significant decrease in perceived performance of GPT-4 on unseen problems, consistent across a range of difficulties, problem types, and experimental settings.」という結果でなかなか衝撃的。
別の検証でも似たような指摘はあったし、Geminiのテクニカルレポートでも「 Evaluation on these benchmarks is challenging and may be affected by data contamination.We performed an extensive leaked data analysis after training to ensure the results we report here are as scientifically sound as possible, but still found some minor issues and decided not to report results on e g LAMBADA (Paperno et al , 2016).（gemini_1_report.pdf (storage.googleapis.com)）」という指摘がある。正しい評価は難しい。

Chain of Code

Chain of Code: Reasoning with a Language Model-Augmented Code Emulator [119.0]
言語モデル(LM)はコード記述を活用して思考の連鎖推論を改善する。我々は、LMコード駆動推論を改善するシンプルな、そして驚くほど効果的な拡張であるChain of Code (CoC)を提案する。
論文参考訳（メタデータ） (Thu, 7 Dec 2023 17:51:43 GMT)
LLMをコードを通して考えさせることによって性能が向上する（Chain of Code achieves 84%, a gain of 12% over Chain of Thought）とのこと。PALのようなプログラミング言語を通すアプローチと異なり、実行できる場合はインタプリタを実行できない場合は疑似コードを LMulator (a portmanteau of LM and emulator)を通して解釈する点が特徴。
リポジトリはChain of Code (google.com)

2023年12月
月	火	水	木	金	土	日
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31