Towards Best Practices for Open Datasets for LLM Training

Towards Best Practices for Open Datasets for LLM Training [21.4]
多くのAI企業は、著作権所有者の許可なく、データ上で大きな言語モデル(LLM)をトレーニングしています。創造的なプロデューサーは、いくつかの著名な著作権訴訟を引き起こした。データ情報を制限するこの傾向は、透明性、説明責任、革新を妨げることによって害をもたらす。
論文参考訳（メタデータ） (Tue, 14 Jan 2025 17:18:05 GMT)
学習等に使用するデータセットを選ぶベストプラクティスの整理、「The permissibility of doing so varies by jurisdiction: in countries like the EU and Japan, this is allowed under certain restrictions, while in the United States, the legal landscape is more ambiguous.」とはあるが日本でもとても大事な内容。

コメントを残す

コメントを残す コメントをキャンセル