Shap Based -Android Malware Detection Using Ensemble Learning

Authors

  • Dr N Anitha Devi Assistant professor, Dept. of IT, Coimbatore Institute of Technology, Coimbatore, Tamil Nadu, India Author
  • C Karthika UG Scholar, Dept. of IT, Coimbatore Institute of Technology, Coimbatore, Tamil Nadu, India Author
  • V Pradeepa UG Scholar, Dept. of IT, Coimbatore Institute of Technology, Coimbatore, Tamil Nadu, India Author
  • C Sharmila UG Scholar, Dept. of IT, Coimbatore Institute of Technology, Coimbatore, Tamil Nadu, India Author

DOI:

https://doi.org/10.47392/IRJASH.2025.077

Keywords:

Android malware detection, Sensitive Function Call Graph, NetworkX, Word2Vec, Smali code, API semantic analysis, shap interpreter, social network analysis

Abstract

Android malware remains a critical threat to mobile security, demanding robust and transparent detection mechanisms. This approach proposes a complete method to identify malicious Android apps by using code analysis and graph-based techniques, enabling the identification to be more precise and interpretable. The workflow starts with a detailed pre-processing stage, during which APK samples are decompiled. With the help of Baksmali, we retrieve DEX files and decompile them into Smali code, extracting the program behaviour and program flow. Moreover, Androguard is used to retrieve abstract metadata and permission specifications, helping with code semantics inspection. We then build Sensitive Function Call Graphs (SFCGs) for all Android apps, where vertices are sensitive API-calling functions and edges are their calls between functions. We enrich the graphs with both layout-based features, like degree centrality, closeness centrality, and clustering coefficients, and permission patterns in Smali code. Semantic features are extracted by transforming smali code and using word2Vec.The features are then utilized to construct a strong ensemble learning system of multiple individual classifiers. Furthermore, in our effort to further make our detection system more transparent and strong, we employ SHAP to provide model explanations, resulting in attribute-specific explanations for malware classification results. Experiments with a large reference dataset illustrate the performance of the proposed approach towards obtaining accurate, interpretable, and scalable Android malware detection with approximately 99.9%. The system not only adds to security but also promotes transparency, which is crucial in security-critical applications.

Downloads

Published

2025-07-25