MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing

MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing [117.6]
MinerU2.5は、例外的な計算効率を維持しつつ、最先端の認識精度を実現する文書解析モデルである。提案手法では,局所的なコンテンツ認識からグローバルなレイアウト解析を分離する,粗大な2段階解析戦略を採用している。
論文参考訳（メタデータ） (Mon, 29 Sep 2025 16:41:28 GMT)
MinerU: An Open-Source Solution for Precise Document Content Extraction – arXiv最新論文の紹介の最新バージョン、強力な1.2BのVLM。汎用的・商用API、特化型モデルを上回る性能。
リポジトリはGitHub – opendatalab/MinerU: Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.、デモも存在するMinerU – a Hugging Face Space by opendatalab、高速で高性能。

コメントを残す

コメントを残す コメントをキャンセル