Speech and face emotion recognition can be widely used in many applications, like assessing customer satisfaction with the quality of services in a call center, detecting/assessing the emotional state of children in care, and to recognize human emotion by the robot. There are many challenges in the speech and image face recognition systems, including recording a real dataset in a natural environment without using any filter recording device to enhance the quality of a signal. The other challenge is the ambiguity about the list/definition of emotions, the lack of agreement on a manageable set of uncorrelated speech-based emotion relevant features, and the difficulty of collected emotion-related datasets under natural circumstances.
In this thesis, to cope with these challenges a system of identifying human speech and facial emotions using a Support
Vector Machine (SVM) has been proposed to improve detection performance effectively with multiple emotions. Facial affection was detected by using the lower half of the face after extracting the important properties by the histograms of oriented gradients (HOG) algorithm, and the results obtained from the face showed a high accuracy that reached (91%) and this accuracy is high compared to the rest of the research and systems that used the entire face to be able to distinguish emotion and use many algorithms to discover features.
The emotion was detected through speech using Mel-frequency cepstral coefficients (MFCC )and pitch after extracting the important features, and the results obtained from sound showed high accuracy and reached (90%).