AI Formed Audio and Human Audio Detection
DOI:
https://doi.org/10.47392/IRJASH.2024.028Keywords:
Support Vector Machine (SVM), Convolutional Neural Network (CNN), Mel-Frequency Cepstral Coefficients, Common Voice dataset, Audio detectionAbstract
Our project, "AI Formed Audio and Human Audio Detection," addresses the limitations of current fake audio detection methods by developing an automated end-to-end solution. We leverage a convolutional neural network (CNN) framework to efficiently detect human audio using speech waveforms and acoustic features like MFCCs, which extract high-level representations and consider prosody differences between genuine and fake speech. We utilize the Common Voice dataset from Kaggle for authentic human voice samples, and the pyttsx3 library to convert sentences from the Flickr8k.txt file into male and female synthetic voices. Feature selection and extraction techniques focused on MFCCs ensure robust feature representation, and the dataset is standardized using a Standard Scaler to enhance model performance. Both CNN and the Support Vector Machine (SVM) models were used for classification, with CNN model outperforming the SVM in accuracy. Prioritizing user-friendliness and accessibility, we provide an interactive user interface that accepts audio in various formats, such as WAV and MP3. Our approach, combining automated feature selection, MFCC-based feature extraction, CNN and SVM modelling, and an intuitive interface, accurately detects AI formed audio and human audio, helping to safeguard against misinformation and privacy violations while ensuring accessibility for a broader audience.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.