AI Formed Audio and Human Audio Detection

Prof. K. S. Warke; Siddhi Choughule; Ketki Dandgavale; Anjali Mundhe

doi:10.47392/IRJASH.2024.028

Authors

Prof. K. S. Warke Assistant Professor, Computer Engineering Department, Bharati Vidyapeeth’s College of Engineering for Women, Pune, India. Author
Siddhi Choughule Student, Computer Engineering Department, Bharati Vidyapeeth’s College of Engineering for Women, Pune, India. Author
Ketki Dandgavale Student, Computer Engineering Department, Bharati Vidyapeeth’s College of Engineering for Women, Pune, India. Author
Anjali Mundhe Student, Computer Engineering Department, Bharati Vidyapeeth’s College of Engineering for Women, Pune, India. Author

DOI:

https://doi.org/10.47392/IRJASH.2024.028

Keywords:

Support Vector Machine (SVM), Convolutional Neural Network (CNN), Mel-Frequency Cepstral Coefficients, Common Voice dataset, Audio detection

Abstract

Our project, "AI Formed Audio and Human Audio Detection," addresses the limitations of current fake audio detection methods by developing an automated end-to-end solution. We leverage a convolutional neural network (CNN) framework to efficiently detect human audio using speech waveforms and acoustic features like MFCCs, which extract high-level representations and consider prosody differences between genuine and fake speech. We utilize the Common Voice dataset from Kaggle for authentic human voice samples, and the pyttsx3 library to convert sentences from the Flickr8k.txt file into male and female synthetic voices. Feature selection and extraction techniques focused on MFCCs ensure robust feature representation, and the dataset is standardized using a Standard Scaler to enhance model performance. Both CNN and the Support Vector Machine (SVM) models were used for classification, with CNN model outperforming the SVM in accuracy. Prioritizing user-friendliness and accessibility, we provide an interactive user interface that accepts audio in various formats, such as WAV and MP3. Our approach, combining automated feature selection, MFCC-based feature extraction, CNN and SVM modelling, and an intuitive interface, accurately detects AI formed audio and human audio, helping to safeguard against misinformation and privacy violations while ensuring accessibility for a broader audience.

AI Formed Audio and Human Audio Detection

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

License

Information

Latest publications