DocSAM: Unified Document Image Segmentation via Query Decomposition and Heterogeneous Mixed Learning

DocSAM: Unified Document Image Segmentation via Query Decomposition and Heterogeneous Mixed Learning [39.1]
文書画像のセグメンテーションは、文書解析と認識に不可欠である。既存のメソッドはこれらのタスクを別々に処理し、その結果、一般化とリソースの浪費が制限される。本稿では,様々な文書画像セグメンテーションタスク用に設計されたトランスフォーマーベースの統合フレームワークであるDocSAMを紹介する。
論文参考訳（メタデータ） (Sat, 05 Apr 2025 07:14:53 GMT)
MLLM全盛の現状でも重要なDocument image segmentationについて「DocSAM integrates layout analysis, multi-grained text segmentation, and table structure decomposition into a single model, reducing the need for specialized models and enhancing efficiency.」という手法の提案。
リポジトリはGitHub – xhli-git/DocSAM

コメントを残す

コメントを残す コメントをキャンセル