2023年10月16日 – arXiv最新論文の紹介

Mistral 7B [62.2]
Mistral 7Bはすべての評価ベンチマークでLlama 2 13B、推論、数学、コード生成でLlama 1 34Bを上回っている。また、命令に従うように微調整されたモデルも提供します。 Mistral 7B — Instructは、Llama 2 13Bを越え、人間と自動化ベンチマークの両方でチャットモデルを提供します。
論文参考訳（メタデータ） (Tue, 10 Oct 2023 17:54:58 GMT)
小規模パラメータで高性能と噂のLLM、Apache 2.0 licenseとオープンなモデル
ブログMistral 7B | Mistral AI | Open source models、HuggingFacemistralai (Mistral AI_) (huggingface.co)以外にドキュメントDeploy with SkyPilot | Mistral AI Large Language Modelsも充実している。

LLMとFactualityの関係は社会実装上大きな興味を持たれている。サーベイや評価フレームワークワークが立て続けに出ていた。

Factuality Challenges in the Era of Large Language Models [113.3]
大規模言語モデル(LLM)は、誤った、誤った、あるいは誤解を招くコンテンツを生成する。 LLMは悪意のあるアプリケーションに利用することができる。これは、ユーザーを欺く可能性があるという点で、社会に重大な課題をもたらす。
論文参考訳（メタデータ） (Tue, 10 Oct 2023 03:34:46 GMT)
社会実装に重きを置いた調査と提言、「Given the rapid and widespread growth in the use of LLMs, society must act quickly with appropriate regulation, education, and collaboration.」と規制より。

Beyond Factuality: A Comprehensive Evaluation of Large Language Models as Knowledge Generators [78.6]
大規模言語モデル(LLM)は、下流の知識集約タスクのための情報検索技術より優れている。しかし、コミュニティの懸念は、この無検閲の知識を使用することの事実と潜在的意味について多岐にわたる。本研究では,6つの重要な視点から生成した知識を評価するために設計されたCONNERを紹介する。
論文参考訳（メタデータ） (Wed, 11 Oct 2023 08:22:37 GMT)
Factuality, Relevance, Coherence, Informativeness, Helpfulness, Validityの評価フレームワークの提案
リポジトリはGitHub – ChanLiang/CONNER: The implementation for EMNLP 2023 paper ”Beyond Factuality: A Comprehensive Evaluation of Large Language Models as Knowledge Generators“

Survey on Factuality in Large Language Models: Knowledge, Retrieval and Domain-Specificity [61.5]
本調査は,大規模言語モデル(LLM)における事実性の重要課題に対処する。 LLMが様々な領域にまたがる応用を見出すにつれ、その出力の信頼性と正確性は重要となる。
論文参考訳（メタデータ） (Wed, 11 Oct 2023 14:18:03 GMT)
Retrieval-Augmented LLMを含めたサーベイ
リポジトリはGitHub – wangcunxiang/LLM-Factuality-Survey: The repository for the survey paper <<Survey on Large Language Models Factuality: Knowledge, Retrieval and Domain-Specificity>>