AI Formed Audio and Human Audio Detection

Authors

  • Prof. K. S. Warke Assistant Professor, Computer Engineering Department, Bharati Vidyapeeth’s College of Engineering for Women, Pune, India. Author
  • Siddhi Choughule Student, Computer Engineering Department, Bharati Vidyapeeth’s College of Engineering for Women, Pune, India. Author
  • Ketki Dandgavale Student, Computer Engineering Department, Bharati Vidyapeeth’s College of Engineering for Women, Pune, India. Author
  • Anjali Mundhe Student, Computer Engineering Department, Bharati Vidyapeeth’s College of Engineering for Women, Pune, India. Author

DOI:

https://doi.org/10.47392/IRJASH.2024.028

Keywords:

Support Vector Machine (SVM), Convolutional Neural Network (CNN), Mel-Frequency Cepstral Coefficients, Common Voice dataset, Audio detection

Abstract

Our project, "AI Formed Audio and Human Audio Detection," addresses the limitations of current fake audio detection methods by developing an automated end-to-end solution. We leverage a convolutional neural network (CNN) framework to efficiently detect human audio using speech waveforms and acoustic features like MFCCs, which extract high-level representations and consider prosody differences between genuine and fake speech. We utilize the Common Voice dataset from Kaggle for authentic human voice samples, and the pyttsx3 library to convert sentences from the Flickr8k.txt file into male and female synthetic voices. Feature selection and extraction techniques focused on MFCCs ensure robust feature representation, and the dataset is standardized using a Standard Scaler to enhance model performance. Both CNN and the Support Vector Machine (SVM) models were used for classification, with CNN model outperforming the SVM in accuracy. Prioritizing user-friendliness and accessibility, we provide an interactive user interface that accepts audio in various formats, such as WAV and MP3. Our approach, combining automated feature selection, MFCC-based feature extraction, CNN and SVM modelling, and an intuitive interface, accurately detects AI formed audio and human audio, helping to safeguard against misinformation and privacy violations while ensuring accessibility for a broader audience.

Downloads

Published

2024-07-27