Abstract:
We present an importance sampling based approach to the active learning problem of selecting additional training data to supplement a seed model. Our proposed Δ-AUC selec...Show MoreMetadata
Abstract:
We present an importance sampling based approach to the active learning problem of selecting additional training data to supplement a seed model. Our proposed Δ-AUC selection optimizes AUC improvement in keyword search and is evaluated on the Spanish Fisher corpus. We show that over different training data sizes, Δ-AUC selection consistently outperforms random sampling by 1.05% to 2.69% absolute AUC and requires no more than 60% of the transcriptions needed by random sampling to achieve the same AUC. On terms not seen in the original seed model training, the proposed algorithm achieves a 3.47% better AUC and 4.66% reduction in word error rate. We also introduce a regression analysis model that can refine our Δ-AUC strategy in the future.
Published in: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Date of Conference: 20-25 March 2016
Date Added to IEEE Xplore: 19 May 2016
ISBN Information:
Electronic ISSN: 2379-190X