![]() Results not only confirm individual sub-system advancements over an established baseline, the final grand fusion solution also represents a comprehensive overall advancement for the NIST SRE2012 core tasks. Evaluation of the proposed framework is performed on the NIST SRE2012 corpus. This task is achieved through intrinsic and extrinsic back-end algorithm modification, resulting in complementary sub-systems. Second, we construct a highly discriminative speaker verification framework. This task is achieved by proposing novel back-end algorithms. First, we investigate more robust back-ends to address noisy multi-session enrollment data for speaker recognition. This study aims to explore the case of robust speaker recognition with multi-session enrollments and noise, with an emphasis on optimal organization and utilization of speaker information presented in the enrollment and development data. Index Terms-Clustering, DARPA RATS, frequency-dependent kernel, Hartigan dip test, peer-led team learning, speech activity detection, NIST OpenSAD, NIST OpenSAT. We performed comparative studies of the proposed approaches with multiple baselines including SohnSAD, rSAD, semisupervised Gaussian mixture model, and Gammatone spectrogram features. The CRSS corpora facilitate standalone SAD evaluations on naturalistic audio streams. We establish two Center for Robust Speech Systems (CRSS) corpora namely CRSS-PLTL-II and CRSS long-duration naturalistic noise corpus. The NIST-OpenSAD-2015 and NIST-OpenSAT-2017 corpora are used for standalone SAD evaluations. We used both backends for comparative evaluations in two phases: first, standalone SAD performance and second, the effect of SAD on text-dependent speaker verification using RedDots data. While VMGMM is a model-based approach, the DipSAD is nonparametric. We further proposed two decision backends: First, variable model-size Gaussian mixture model (VMGMM) and second, Hartigan dip-based robust feature clustering. FDK statistical descriptors are combined by principal component analysis into one-dimensional FDK-SAD features. FDK provides enhanced spectral decomposition from which several statistical descriptors are derived. We propose a novel frequency-dependent kernel (FDK) based SAD features. SAD is challenging for naturalistic audio streams containing multiple noise-sources simultaneously. For applications like zero-resource speech processing and NIST-OpenSAT-2017 public safety communications task, it might not be feasible to collect SAD annotations. Supervised SAD typically leverages machine learning models trained on annotated data. Speech activity detection (SAD) is front-end in most speech systems, e.g., speaker verification, speech recognition etc.
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |