Real-Time Gender and Age Detection Using Visual and Vocal Cues

Authors

  • Malavika.M Student, Dept of Computer Science and Engineering, National Engineering College, Kovilpatti, India. Author
  • Afrin Dinusha.J Student, Dept of Computer Science and Engineering, National Engineering College, Kovilpatti, India. Author
  • Mr. Shenbagharaman A Assistant Professor, Dept of Computer Science and Engineering, National Engineering College Kovilpatti, India. Author
  • Dr. B. Shunmugapriya Assistant Professor (Sr. grade) Dept of Computer Science and Engineering National Engineering College, Kovilpatti, India. Author

DOI:

https://doi.org/10.47392/IRJASH.2025.012

Keywords:

Real-time webcam feeds, Face detection ,Speech features, Deep Neural Network (DNNs), Convolutional Neural Networks (CNNs), OpenCV

Abstract

This study describes a comprehensive multi-modal system that uses voice signals, real-time webcam feeds, and facial photos as three different input modalities for gender and age detection. Convolutional Neural Networks (CNNs) are used for image-based detection in order to extract face traits, categorize gender, and estimate age. For face detection and picture pre-processing, Open CV is included, guaranteeing that the model can handle a range of lighting situations, facial expressions, and occlusions. Deep Neural Networks (DNNs) are used in voice-based identification to evaluate speech features including pitch, tone, and rhythm, which serve as important markers of age and gender. Because it was trained on a wide range of voice sample datasets, the system is resilient to variations in ambient noise, accents, and languages. The webcam-based input continually detects gender and estimates age from live facial data using a real-time processing pipeline that combines CNN and video stream analysis. This results in dynamic and accurate findings even in difficult circumstances. Every modality is intended to work in concert with the others to provide a flexible and adaptable solution. A thorough evaluation of the system's performance reveals great accuracy for each of the three input types. This multi-input architecture is a flexible tool with real-world applications that could be used in a variety of industries, such as security, tailored marketing, human- computer interaction, and assistive technology for people with impairments. Subsequent research endeavours will centre on including other modalities, like behavioural inputs, and optimizing the model to achieve even quicker processing speeds. The integration of different inputs allows the system to be highly adaptive and expandable, delivering personalized user experiences. To provide a more complete picture of users, the suggested framework can potentially be expanded to include additional demographic predictions, like ethnicity and emotion recognition. This work is an important advancement in the disciplines of artificial intelligence (AI) and human-computer interaction because it helps to build intelligent systems that can function effectively in real-time, multi-environment settings.

Downloads

Published

2025-02-22