buchspektrum Internet-Buchhandlung

Neuerscheinungen 2014

Stand: 2020-02-01
Schnellsuche
ISBN/Stichwort/Autor
Herderstraße 10
10625 Berlin
Tel.: 030 315 714 16
Fax 030 315 714 14
info@buchspektrum.de

Hassam Sheikh

Overview of Speech Based Gender Identification


Erstauflage. 2014. 76 S. 220 mm
Verlag/Jahr: ANCHOR ACADEMIC PUBLISHING 2014
ISBN: 3-9548922-8-6 (3954892286)
Neue ISBN: 978-3-9548922-8-0 (9783954892280)

Preis und Lieferzeit: Bitte klicken


This book focuses on the basics of natural language processing and machine learning required to make a standard speech- based gender identification system. In this book all the required signal processing techniques required for understanding the basics of natural language processing including all types of Fourier transform, basic speech enhancement techniques, voice activity detection and pitch estimation using sub harmonic-to-harmonic ratio are briefly explained as well. In the machine learning part, all the relevant machine learning models like Support Vector Machines, Gaussian Mixture Models and Adaptive boosting are explained. Lastly the results of different gender identification systems that were implemented using state of the art techniques are portrait and analysed.
Textprobe/ text sample:
Chapter 6, System Design and Implementation:
In previous chapters different elements of gender identification system ranging from
speech enhancement methods to different feature extraction and identification models
have been described. In this chapter how all those method and features explained
in previous chapters are combined together to make a gender identfication system
is explained.
6.1, Toolboxes:
The implementation was done in MATLAB which is high level language commonly
used to perform mathematically complex tasks. The two main MATLAB toolboxes
that were used to develop the system are described below.
6.1.1, Signal Processing Toolbox:
Building a gender identification system is mostly a signal processing task. Voicebox
toolbox by [Bro11] is a MATLAB toolbox which was used to manipulate audio
files, implementing noise reduction and extracting acoustic features like MFCC. To
extract SDC features from MFCC codes provided by [Sah12] were used. Finally to
extract pitch, the code provided by [Sun02] was used.
6.1.2, Machine Learning Toolbox:
Building a gender identification system also involves a great part of machine learning
as well. A number of MATLAB toolboxes are available online but LIBSVM
toolbox by [CL11] and NETLAB 3.3 toolbox by [NB04] was used to train the SVM
model and Gaussian Mixture Models respectively.
6.2, System Design:
This section describes all the components of gender identification system used to
design a gender identification system. This was a research project so many basic
principles of software development and software engineering were ignored.
6.2.1, Requirement:
The purpose of this research was to build a system that can be used to identify
the gender of the speaker regardless of the age, language, accent and dialect in
real world environments where different additive environmental sounds like silence,
background noise and music makes it hard to achieve 100% accuracy. The system
should be robust enough to work for variety of different speakers. The first choice
for the system was to implement a system in which gender can be recognized while
the speaker is speaking into the microphone.
6.2.2, Initial Approach:
The fundamental difference between a male and a female speech is pitch. The first
approach to build a gender identification system was to create a pitch based model
which can identify the gender by extracting the pitch of the speech. To extract
the pitch from the speech, code provided by [Sun02] was used. Pitch of every 25
milliseconds of the frame was calculated and mean of all the pitches from all the
frames was used as the pitch of speech input. The range of pitch for both genders is
between 100 to 300 Hz so all the frequencies below 100 and above 300 was ignored.
For classification of the gender, an SVM was trained using the LIBSVM toolbox by
[CL11].
The algorithm used in this system is as follows; first the audio files were loaded
in the MATLAB environment, then speech enhancement techniques like spectral
subtraction and voice activity detection were applied to remove all the noises and
silences respectively using the functions specsub and vodsohn from voicebox. After
that pitch extraction method was implemented in the enhanced audio files. Finally
the mean of the pitch extracted for each 25 millisecond frame was taken to determine
the pitch of each audio file. Each file served as one data point in the dataset.
The training set was mostly consisted in Librivox recordings of one hour each
for each gender at different frequencies ranging from 16 KHz to 48 KHz from Voxforge
open source database. Testing data was also taken from the same database.
At first only English language speech utterances were taken.
Initially decision stump was used for this classification but after analysing the
data it was found out that though the data was one dimensional but it was not linearly
separable so a nonlinear SVM model using