2025年1月10日 – arXiv最新論文の紹介

M$^3$oralBench: A MultiModal Moral Benchmark for LVLMs [66.8]
LVLMのための最初のMultiModal Moral BenchmarkであるM$3$oralBenchを紹介する。 M$3$oralBench は Moral Foundations Vignettes (MFVs) の日常的なモラルシナリオを拡張し、テキストから画像への拡散モデル SD3.0 を用いて対応するシナリオイメージを作成する。道徳基礎理論(MFT)の6つの道徳的基礎にまたがって道徳的評価を行い、道徳的判断、道徳的分類、道徳的対応の課題を含む。
論文参考訳（メタデータ） (Mon, 30 Dec 2024 05:18:55 GMT)
マルチモーダルなモラルベンチマーク、「Care/Harm (dislike for suffering of others), Fairness/Cheating (proportional fairness, Loyalty/Betrayal (group loyalty), Authority/Subversion (respect for authority and tradition), Sanctity/Degradation (concerns for purity and contamination), Liberty/Oppression (concerns on oppression and coercion)」の6つの道徳的基礎がベース
リポジトリはGitHub – BeiiiY/M3oralBench: The official Github page for “M³oralBench: A MultiModal Moral Benchmark for LVLMs”

How Panel Layouts Define Manga: Insights from Visual Ablation Experiments [24.4]
本稿では,マンガ作品の視覚的特徴を,特にパネルレイアウトの特徴に着目して分析することを目的とする。研究手法として,マンガのページイメージを入力として,マンガタイトル予測のための深層学習モデルをトレーニングした。具体的には,ページイメージ情報をパネルフレームに限定してアブレーション研究を行い,パネルレイアウトの特性を解析した。
論文参考訳（メタデータ） (Thu, 26 Dec 2024 09:53:37 GMT)
マンガのレイアウトの特性分析
「This study used deep learning to explore whether panel page designs in manga vary by work.　Our experiments showed that even without characters and text, panel layouts exhibit inherent uniqueness, serving as a key distinguishing feature for manga.　This was validated through classification tasks and supported by Grad-CAM visualizations.」はまぁそうだろうと思う。はたしてDeepを使う必要があるのかはやや謎ではあるが。。。