With the rapid growth of mobile apps, users' concerns about their privacy have become increasingly prominent. Android app logs serve as crucial computer resources, aiding developers in debugging and monitoring the status of Android apps, while also containing a wealth of software system information. Previous studies have acknowledged privacy leaks in software logs and Android apps as significant issues without providing a comprehensive view of the privacy leaks in Android app logs. In this study, we build a comprehensive dataset of Android app logs and conduct an empirical study to analyze the status and severity of privacy leaks in Android app logs. Our study comprises three aspects: (1) Understanding real-world developers' concerns regarding privacy issues related to software logs; (2) Studying privacy leaks in the Android app logs; (3) Investigating the characteristics of privacy-leaking Android app logs and analyzing the reasons behind them. Our study reveals five different categories of concerns from real-world developers regarding privacy issues related to software logs and the prevalence of privacy leaks in Android app logs, with the majority stemming from developers' unawar
Despite over 3.5 million Android apps and 200+ million Android Auto-compatible vehicles, only a few hundred apps support Android Auto due to platform-specific compliance requirements. Android Auto mandates service-based architectures in which the vehicle system invokes app callbacks to render the UI and handle interactions, which is fundamentally different from standard Activity-based Android development. Through an empirical study analysis of 98 issues across 14 Android Auto app repositories, we identified three major compliance failure categories: media playback errors, UI rendering issues, and voice command integration failures in line with mandatory requirements for integrating Android Auto support. We introduce AutoComply, a static analysis framework capable of detecting these compliance violations through the specialized analysis of platform-specific requirements. AutoComply constructs a Car-Control Flow Graph (CCFG) extending traditional control flow analysis to model the service-based architecture of Android Auto apps. Evaluating AutoComply on 31 large-scale open-source apps, it detected 27 violations (13X more than Android Lint), while no false positives were observed, ach
Autonomous agents have become increasingly important for interacting with the real world. Android agents, in particular, have been recently a frequently-mentioned interaction method. However, existing studies for training and evaluating Android agents lack systematic research on both open-source and closed-source models. In this work, we propose AndroidLab as a systematic Android agent framework. It includes an operation environment with different modalities, action space, and a reproducible benchmark. It supports both large language models (LLMs) and multimodal models (LMMs) in the same action space. AndroidLab benchmark includes predefined Android virtual devices and 138 tasks across nine apps built on these devices. By using the AndroidLab environment, we develop an Android Instruction dataset and train six open-source LLMs and LMMs, lifting the average success rates from 4.59% to 21.50% for LLMs and from 1.93% to 13.28% for LMMs. AndroidLab is open-sourced and publicly available at https://github.com/THUDM/Android-Lab.
The Android ecosystem relies on either TrustZone (e.g., OP-TEE, QTEE, Trusty) or trusted hypervisors (pKVM, Gunyah) to isolate security-sensitive services from malicious apps and Android bugs. TrustZone allows any secure world code to access the normal world that runs Android. Similarly, a trusted hypervisor has full access to Android running in one VM and security services in other VMs. In this paper, we motivate the need for mutual isolation, wherein Android, hypervisors, and the secure world are isolated from each other. Then, we propose a sandboxed service abstraction, such that a sandboxed execution cannot access any other sandbox, Android, hypervisor, or secure world memory. We present Aster which achieves these goals while ensuring that sandboxed execution can still communicate with Android to get inputs and provide outputs securely. Our main insight is to leverage the hardware isolation offered by Arm Confidential Computing Architecture (CCA). However, since CCA does not satisfy our sandboxing and mutual isolation requirements, Aster repurposes its hardware enforcement to meet its goals while addressing challenges such as secure interfaces, virtio, and protection against in
Android Framework is a layer of software that exists in every Android system managing resources of all Android apps. A vulnerability in Android Framework can lead to severe hacks, such as destroying user data and leaking private information. With tens of millions of Android devices unpatched due to Android fragmentation, vulnerabilities in Android Framework certainly attract attackers to exploit them. So far, enormous manual effort is needed to craft such exploits. To our knowledge, no research has been done on automatic generation of exploits that take advantage of Android Framework vulnerabilities. We make a first step towards this goal by applying symbolic execution of Android Framework to finding bugs and generating exploits. Several challenges have been raised by the task. (1) The information of an app flows to Android Framework in multiple intricate steps, making it difficult to identify symbolic inputs. (2) Android Framework has a complex initialization phase, which exacerbates the state space explosion problem. (3) A straightforward design that builds the symbolic executor as a layer inside the Android system will not work well: not only does the implementation have to ensu
The security of Android has been recently challenged by the discovery of a number of vulnerabilities involving different layers of the Android stack. We argue that such vulnerabilities are largely related to the interplay among layers composing the Android stack. Thus, we also argue that such interplay has been underestimated from a security point-of-view and a systematic analysis of the Android interplay has not been carried out yet. To this aim, in this paper we provide a simple model of the Android cross-layer interactions based on the concept of flow, as a basis for analyzing the Android interplay. In particular, our model allows us to reason about the security implications associated with the cross-layer interactions in Android, including a recently discovered vulnerability that allows a malicious application to make Android devices totally unresponsive. We used the proposed model to carry out an empirical assessment of some flows within the Android cross-layered architecture. Our experiments indicate that little control is exercised by the Android Security Framework (ASF) over cross-layer interactions in Android. In particular, we observed that the ASF lacks in discriminating
Android malware is one of the most dangerous threats on the internet, and it's been on the rise for several years. Despite significant efforts in detecting and classifying android malware from innocuous android applications, there is still a long way to go. As a result, there is a need to provide a basic understanding of the behavior displayed by the most common Android malware categories and families. Each Android malware family and category has a distinct objective. As a result, it has impacted every corporate area, including healthcare, banking, transportation, government, and e-commerce. In this paper, we presented two machine-learning approaches for Dynamic Analysis of Android Malware: one for detecting and identifying Android Malware Categories and the other for detecting and identifying Android Malware Families, which was accomplished by analyzing a massive malware dataset with 14 prominent malware categories and 180 prominent malware families of CCCS-CIC-AndMal2020 dataset on Dynamic Layers. Our approach achieves in Android Malware Category detection more than 96 % accurate and achieves in Android Malware Family detection more than 99% accurate. Our approach provides a meth
The growing capabilities of large language models in natural language understanding significantly strengthen existing agentic systems. To power performant on-device mobile agents for better data privacy, we introduce DroidCall, the first training and testing dataset for accurate Android intent invocation. With a highly flexible and reusable data generation pipeline, we constructed 10k samples in DroidCall. Given a task instruction in natural language, small language models such as Qwen2.5-3B and Gemma2-2B fine-tuned with DroidCall can approach or even surpass the capabilities of GPT-4o for accurate Android intent invocation. We also provide an end-to-end Android app equipped with these fine-tuned models to demonstrate the Android intent invocation process. The code and dataset are available at https://github.com/UbiquitousLearning/DroidCall.
Many developers and organizations implement apps for Android, the most widely used operating system for mobile devices. Common problems developers face are the various hardware devices, customized Android variants, and frequent updates, forcing them to implement workarounds for the different versions and variants of Android APIs used in practice. In this paper, we contribute the Android Compatibility checkS dataSet (AndroidCompass) that comprises changes to compatibility checks developers use to enforce workarounds for specific Android versions in their apps. We extracted 80,324 changes to compatibility checks from 1,394 apps by analyzing the version histories of 2,399 projects from the F-Droid catalog. With AndroidCompass, we aim to provide data on when and how developers introduced or evolved workarounds to handle Android incompatibilities. We hope that AndroidCompass fosters research to deal with version incompatibilities, address potential design flaws, identify security concerns, and help derive solutions for other developers, among others-helping researchers to develop and evaluate novel techniques, and Android app as well as operating-system developers in engineering their s
The Android operating system is currently the most popular mobile operating system in the world. Android is based on Linux and therefore inherits its features including its Inter-Process Communication (IPC) mechanisms. These mechanisms are used by processes to communicate with one another and are extensively used in Android. While Android-specific IPC mechanisms have been studied extensively, Unix domain sockets have not been examined comprehensively, despite playing a crucial role in the IPC of highly privileged system daemons. In this paper, we propose SAUSAGE, an efficient novel static analysis framework to study the security properties of these sockets. SAUSAGE considers access control policies implemented in the Android security model, as well as authentication checks implemented by the daemon binaries. It is a fully static analysis framework, specifically designed to analyze Unix domain socket usage in Android system daemons, at scale. We use this framework to analyze 200 Android images across eight popular smartphone vendors spanning Android versions 7-9. As a result, we uncover multiple access control misconfigurations and insecure authentication checks. Our notable finding
Malware for Android is becoming increasingly dangerous to the safety of mobile devices and the data they hold. Although machine learning(ML) techniques have been shown to be effective at detecting malware for Android, a comprehensive analysis of the methods used is required. We review the current state of Android malware detection us ing machine learning in this paper. We begin by providing an overview of Android malware and the security issues it causes. Then, we look at the various supervised, unsupervised, and deep learning machine learning approaches that have been utilized for Android malware detection. Addi tionally, we present a comparison of the performance of various Android malware detection methods and talk about the performance evaluation metrics that are utilized to evaluate their efficacy. Finally, we draw atten tion to the drawbacks and difficulties of the methods that are currently in use and suggest possible future directions for research in this area. In addition to providing insights into the current state of Android malware detection using machine learning, our review provides a comprehensive overview of the subject.
Android malware attacks are increasing daily at a tremendous volume, making Android users more vulnerable to cyber-attacks. Researchers have developed many machine learning (ML)/ deep learning (DL) techniques to detect and mitigate android malware attacks. However, due to technological advancement, there is a rise in android mobile devices. Furthermore, the devices are geographically dispersed, resulting in distributed data. In such scenario, traditional ML/DL techniques are infeasible since all of these approaches require the data to be kept in a central system; this may provide a problem for user privacy because of the massive proliferation of Android mobile devices; putting the data in a central system creates an overhead. Also, the traditional ML/DL-based android malware classification techniques are not scalable. Researchers have proposed federated learning (FL) based android malware classification system to solve the privacy preservation and scalability with high classification performance. In traditional FL, Federated Averaging (FedAvg) is utilized to construct the global model at each round by merging all of the local models obtained from all of the customers that participa
To satisfy varying customer needs, device vendors and OS providers often rely on the open-source nature of the Android OS and offer customized versions of the Android OS. When a new version of the Android OS is released, device vendors and OS providers need to merge the changes from the Android OS into their customizations to account for its bug fixes, security patches, and new features. Because developers of customized OSs might have made changes to code locations that were also modified by the developers of the Android OS, the merge task can be characterized by conflicts, which can be time-consuming and error-prone to resolve. To provide more insight into this critical aspect of the Android ecosystem, we present an empirical study that investigates how eight open-source customizations of the Android OS merge the changes from the Android OS into their projects. The study analyzes how often the developers from the customized OSs merge changes from the Android OS, how often the developers experience textual merge conflicts, and the characteristics of these conflicts. Furthermore, to analyze the effect of the conflicts, the study also analyzes how the conflicts can affect a randomly
This study examines machine learning techniques like Decision Trees, Support Vector Machines, Logistic Regression, Neural Networks, and ensemble methods to detect Android malware. The study evaluates these models on a dataset of Android applications and analyzes their accuracy, efficiency, and real-world applicability. Key findings show that ensemble methods demonstrate superior performance, but there are trade-offs between model interpretability, efficiency, and accuracy. Given its increasing threat, the insights guide future research and practical use of ML to combat Android malware.
While extremely valuable to achieve advanced functions, mobile phone sensors can be abused by attackers to implement malicious activities in Android apps, as experimentally demonstrated by many state-of-the-art studies. There is hence a strong need to regulate the usage of mobile sensors so as to keep them from being exploited by malicious attackers. However, despite the fact that various efforts have been put in achieving this, i.e., detecting privacy leaks in Android apps, we have not yet found approaches to automatically detect sensor leaks in Android apps. To fill the gap, we designed and implemented a novel prototype tool, SEEKER, that extends the famous FlowDroid tool to detect sensor-based data leaks in Android apps. SEEKER conducts sensor-focused static taint analyses directly on the Android apps' bytecode and reports not only sensor-triggered privacy leaks but also the sensor types involved in the leaks. Experimental results using over 40,000 real-world Android apps show that SEEKER is effective in detecting sensor leaks in Android apps, and malicious apps are more interested in leaking sensor data than benign apps.
Currently, the majority of apps running on mobile devices are Android apps developed in Java. However, developers can now write Android applications using a new programming language: Kotlin, which Google adopted in 2017 as an official programming language for developing Android apps. Since then, Android developers have been able to: a) start writing Android applications from scratch using Kotlin, b) evolve their existing Android applications written in Java by adding Kotlin code (possible thanks to the interoperability between the two languages), or c) migrate their Android apps from Java to Kotlin. This paper aims to study this last case. We conducted a qualitative study to find out why Android developers have migrated Java code to Kotlin and to bring together their experiences about the process, in order to identify the main difficulties they have faced. To execute the study, we first identified commits from open-source Android projects that have migrated Java code to Kotlin. Then, we emailed the developers that wrote those migrations. We thus obtained information from 98 developers who had migrated code from Java to Kotlin. This paper presents the main reasons identified by the
Android is nowadays the most popular operating system in the world, not only in the realm of mobile devices, but also when considering desktop and laptop computers. Such a popularity makes it an attractive target for security attacks, also due to the sensitive information often manipulated by mobile apps. The latter are going through a transition in which the Android ecosystem is moving from the usage of Java as the official language for developing apps, to the adoption of Kotlin as the first choice supported by Google. While previous studies have partially studied security weaknesses affecting Java Android apps, there is no comprehensive empirical investigation studying software security weaknesses affecting Android apps considering (and comparing) the two main languages used for their development, namely Java and Kotlin. We present an empirical study in which we: (i) manually analyze 681 commits including security weaknesses fixed by developers in Java and Kotlin apps, with the goal of defining a taxonomy highlighting the types of software security weaknesses affecting Java and Kotlin Android apps; (ii) survey 43 Android developers to validate and complement our taxonomy. Based o
Android is a widely deployed operating system that employs a permission-based access control model. The Android Permissions System (APS) is responsible for mediating resource requests from applications. APS is a critical component of the Android security mechanism. A failure in the design of APS can potentially lead to vulnerabilities that grant unauthorized access to resources by malicious applications. Researchers have employed formal methods for analyzing the security properties of APS. Since Android is constantly evolving, we intend to design and implement a framework for formal specification and verification of the security properties of APS. In particular, we intend to present a behavioral model of APS that represents the non-binary, context dependent permissions introduced in Android 10 and temporal permissions introduced in Android 11.
With the increasing popularity of Android in the last decade, Android is popular among users as well as attackers. The vast number of android users grabs the attention of attackers on android. Due to the continuous evolution of the variety and attacking techniques of android malware, our detection methods should need an update too. Most of the researcher's works are based on static features, and very few focus on dynamic features. In this paper, we are filling the literature gap by detecting android malware using System calls. We are running the malicious app in a monitored and controlled environment using an emulator to detect malware. Malicious behavior is activated with some simulated events during its runtime to activate its hostile behavior. Logs collected during the app's runtime are analyzed and fed to different machine learning models for Detection and Family classification of Malware. The result indicates that K-Nearest Neighbor and the Decision Tree gave the highest accuracy in malware detection and Family Classification respectively.
Today, Android devices are able to provide various services. They support applications for different purposes such as entertainment, business, health, education, and banking services. Because of the functionality and popularity of Android devices as well as the open-source policy of Android OS, they have become a suitable target for attackers. Android Botnet is one of the most dangerous malwares because an attacker called Botmaster can control that remotely to perform destructive attacks. A number of researchers have used different well-known Machine Learning (ML) methods to recognize Android Botnets from benign applications. However, these conventional methods are not able to detect new sophisticated Android Botnets. In this paper, we propose a novel method based on Android permissions and Convolutional Neural Networks (CNNs) to classify Botnets and benign Android applications. Being the first developed method that uses CNNs for this aim, we also proposed a novel method to represent each application as an image which is constructed based on the co-occurrence of used permissions in that application. The proposed CNN is a binary classifier that is trained using these images. Evaluat