AudioDER: A Deduplication-Enhanced Reasoning Dataset for Post-Training Large Audio-Language Models copertina

AudioDER: A Deduplication-Enhanced Reasoning Dataset for Post-Training Large Audio-Language Models

AudioDER: A Deduplication-Enhanced Reasoning Dataset for Post-Training Large Audio-Language Models

Ascolta gratuitamente

Vedi i dettagli del titolo
Audio AI models have gotten good at recognizing what they hear, but complex reasoning — understanding causation, context, and implication across sound, speech, and music — remains a frontier challenge. A key bottleneck is training data: existing datasets are highly redundant, meaning models see many acoustically similar samples that provide overlapping rather than additive learning signal. AudioDER builds a pipeline that first deduplicates audio by acoustic similarity, then generates chain-of-thought reasoning annotations using a large language model. The resulting 191,000-sample dataset consistently improves reasoning performance across multiple benchmarks. Applications include voice assistants that reason about complex audio scenes, medical audio analysis, accessibility tools, and any system requiring nuanced understanding of audio in context. Authors: Hui Geng, Yi Su, Han Yin, Tianjiao Wan, Qisheng Xu, Jiaxin Chen, Zijian Gao, Hengzhu Liu, Xie Chen, Kele Xu Paper: https://arxiv.org/abs/2606.14591v1
adbl_web_anon_alc_button_suppression_t1
Ancora nessuna recensione