Skip to Main Content
Currently existing singer identification (SID) methods follow the framework of speaker identification (SPID), which requires that singing data be collected beforehand to establish each singer's voice characteristics. This framework, however, is unsuitable for many SID applications, because acquiring solo a cappella from each singer is usually not as feasible as collecting spoken data in SPID applications. Since a cappella data are difficult to acquire, many studies have tried to improve SID accuracies when only accompanied singing data are available for training; but, the improvements are not always satisfactory. Recognizing that spoken data are usually available easily, this work investigates the possibility of characterizing singers' voices using the spoken data instead of their singing data. Unfortunately, our experiment found it difficult to replace singing data fully by using spoken data in singer voice characterization, due to the significant difference between singing and speech voice for most people. Thus, we propose two alternative solutions based on the use of few singing data. The first solution aims at adapting a speech-derived model to cover singing voice characteristics. The second solution attempts to establish the relationships between speech and singing using a transformation, so that an unknown test singing clip can be converted into its speech counterpart and then identified using speech-derived models; or alternatively, training data can be converted from speech into singing to generate a singer model capable of matching test singing clips. Our experiments conducted using a 20-singer database validate the proposed solutions.