Prior studies on mobile app analysis often analyze apps across different categories or focus on a small set of apps within a category. These studies either provide general insights for an entire app store which consists of millions of apps, or provide specific insights for a small set of apps. However, a single app category can often contain tens of thousands to hundreds of thousands of apps. For example, according to AppBrain, there are 46,625 apps in the "Sports" category of Google Play apps. Analyzing such a targeted category of apps can provide more specific insights than analyzing apps across categories while still benefiting many app developers interested in the category. This work aims to study a large number of apps from a single category (i.e., the sports category). We performed an empirical study on over two thousand sports apps in the Google Play Store. We study the characteristics of these apps (e.g., their targeted sports types and main functionalities) through manual analysis, the topics in the user review through topic modeling, as well as the aspects that contribute to the negative opinions of users through analysis of user ratings and sentiment. It is concluded tha
App developers aim to create apps that cater to the needs of different types of users. This development approach, also known as the "one-size-fits-all" strategy, involves combining various functionalities into one app. However, this approach has drawbacks, such as lower conversion rates, slower download speed, larger attack surfaces, and lower update rates. To address these issues, developers have created "lite" versions to attract new users and enhance the user experience. Despite this, there has been no study conducted to examine the relationship between lite and full apps. To address this gap, we present a comparative study of lite apps, exploring the similarities and differences between lite and full apps from various perspectives. Our findings indicate that most existing lite apps fail to fulfill their intended goals (e.g., smaller in size, faster to download, and using less data). Our study also reveals the potential security risks associated with lite apps.
Mobile software apps ("apps") are one of the prevailing digital technologies that our modern life heavily depends on. A key issue in the development of apps is how to design gender-inclusive apps. Apps that do not consider gender inclusion, diversity, and equality in their design can create barriers (e.g., excluding some of the users because of their gender) for their diverse users. While there have been some efforts to develop gender-inclusive apps, a lack of deep understanding regarding user perspectives on gender may prevent app developers and owners from identifying issues related to gender and proposing solutions for improvement. Users express many different opinions about apps in their reviews, from sharing their experiences, and reporting bugs, to requesting new features. In this study, we aim at unpacking gender discussions about apps from the user perspective by analysing app reviews. We first develop and evaluate several Machine Learning (ML) and Deep Learning (DL) classifiers that automatically detect gender reviews (i.e., reviews that contain discussions about gender). We apply our ML and DL classifiers on a manually constructed dataset of 1,440 app reviews from the Goo
While extremely valuable to achieve advanced functions, mobile phone sensors can be abused by attackers to implement malicious activities in Android apps, as experimentally demonstrated by many state-of-the-art studies. There is hence a strong need to regulate the usage of mobile sensors so as to keep them from being exploited by malicious attackers. However, despite the fact that various efforts have been put in achieving this, i.e., detecting privacy leaks in Android apps, we have not yet found approaches to automatically detect sensor leaks in Android apps. To fill the gap, we designed and implemented a novel prototype tool, SEEKER, that extends the famous FlowDroid tool to detect sensor-based data leaks in Android apps. SEEKER conducts sensor-focused static taint analyses directly on the Android apps' bytecode and reports not only sensor-triggered privacy leaks but also the sensor types involved in the leaks. Experimental results using over 40,000 real-world Android apps show that SEEKER is effective in detecting sensor leaks in Android apps, and malicious apps are more interested in leaking sensor data than benign apps.
Following OpenAI's introduction of GPTs, a surge in GPT apps has led to the launch of dedicated LLM app stores. Nevertheless, given its debut, there is a lack of sufficient understanding of this new ecosystem. To fill this gap, this paper presents a first comprehensive longitudinal (5-month) study of the evolution, landscape, and vulnerability of the emerging LLM app ecosystem, focusing on two GPT app stores: \textit{GPTStore.AI} and the official \textit{OpenAI GPT Store}. Specifically, we develop two automated tools and a TriLevel configuration extraction strategy to efficiently gather metadata (\ie names, creators, descriptions, \etc) and user feedback for all GPT apps across these two stores, as well as configurations (\ie system prompts, knowledge files, and APIs) for the top 10,000 popular apps. Our extensive analysis reveals: (1) the user enthusiasm for GPT apps consistently rises, whereas creator interest plateaus within three months of GPTs' launch; (2) nearly 90\% system prompts can be easily accessed due to widespread failure to secure GPT app configurations, leading to considerable plagiarism and duplication among apps. Our findings highlight the necessity of enhancing t
Context. Many Internet content platforms, such as Spotify and YouTube, provide their services via both native and Web apps. Even though those apps provide similar features to the end user, using their native version or Web counterpart might lead to different levels of energy consumption and performance. Goal. The goal of this study is to empirically assess the energy consumption and performance of native and Web apps in the context of Internet content platforms on Android. Method. We select 10 Internet content platforms across 5 categories. Then, we measure them based on the energy consumption, network traffic volume, CPU load, memory load, and frame time of their native and Web versions; then, we statistically analyze the collected measures and report our results. Results. We confirm that native apps consume significantly less energy than their Web counterparts, with large effect size. Web apps use more CPU and memory, with statistically significant difference and large effect size. Therefore, we conclude that native apps tend to require fewer hardware resources than their corresponding Web versions. The network traffic volume exhibits statistically significant difference in favou
With the onset of COVID-19, governments worldwide planned to develop and deploy contact tracing (CT) apps to help speed up the contact tracing process. However, experts raised concerns about the long-term privacy and security implications of using these apps. Consequently, several proposals were made to design privacy-preserving CT apps. To this end, Google and Apple developed the Google/Apple Exposure Notification (GAEN) framework to help public health authorities develop privacy-preserving CT apps. In the United States, 26 states used the GAEN framework to develop their CT apps. In this paper, we empirically evaluate the US-based GAEN apps to determine 1) the privileges they have, 2) if the apps comply with their defined privacy policies, and 3) if they contain known vulnerabilities that can be exploited to compromise privacy. The results show that all apps violate their stated privacy policy and contain several known vulnerabilities.
Fairness is one of the socio-technical concerns that must be addressed in software systems. Considering the popularity of mobile software applications (apps) among a wide range of individuals worldwide, mobile apps with unfair behaviors and outcomes can affect a significant proportion of the global population, potentially more than any other type of software system. Users express a wide range of socio-technical concerns in mobile app reviews. This research aims to investigate fairness concerns raised in mobile app reviews. Our research focuses on AI-based mobile app reviews as the chance of unfair behaviors and outcomes in AI-based mobile apps may be higher than in non-AI-based apps. To this end, we first manually constructed a ground-truth dataset, including 1,132 fairness and 1,473 non-fairness reviews. Leveraging the ground-truth dataset, we developed and evaluated a set of machine learning and deep learning models that distinguish fairness reviews from non-fairness reviews. Our experiments show that our best-performing model can detect fairness reviews with a precision of 94%. We then applied the best-performing model on approximately 9.5M reviews collected from 108 AI-based ap
With the emergence of deep learning techniques, smartphone apps are now embedded on-device AI features for enabling advanced tasks like speech translation, to attract users and increase market competitiveness. A good interaction design is important to make an AI feature usable and understandable. However, AI features have their unique challenges like sensitiveness to the input, dynamic behaviours and output uncertainty. Existing guidelines and tools either do not cover AI features or consider mobile apps which are confirmed by our informal interview with professional designers. To address these issues, we conducted the first empirical study to explore user-AI-interaction in mobile apps. We aim to understand the status of on-device AI usage by investigating 176 AI apps from 62,822 apps. We identified 255 AI features and summarised 759 implementations into three primary interaction pattern types. We further implemented our findings into a multi-faceted search-enabled gallery. The results of the user study demonstrate the usefulness of our findings.
Context: More than 50 countries have developed COVID contact-tracing apps to limit the spread of coronavirus. However, many experts and scientists cast doubt on the effectiveness of those apps. For each app, a large number of reviews have been entered by end-users in app stores. Objective: Our goal is to gain insights into the user reviews of those apps, and to find out the main problems that users have reported. Our focus is to assess the "software in society" aspects of the apps, based on user reviews. Method: We selected nine European national apps for our analysis and used a commercial app-review analytics tool to extract and mine the user reviews. For all the apps combined, our dataset includes 39,425 user reviews. Results: Results show that users are generally dissatisfied with the nine apps under study, except the Scottish ("Protect Scotland") app. Some of the major issues that users have complained about are high battery drainage and doubts on whether apps are really working. Conclusion: Our results show that more work is needed by the stakeholders behind the apps (e.g., app developers, decision-makers, public health experts) to improve the public adoption, software quality
While there have been various studies towards Android apps and their development, there is limited discussion of the broader class of apps that fall in the fake area. Fake apps and their development are distinct from official apps and belong to the mobile underground industry. Due to the lack of knowledge of the mobile underground industry, fake apps, their ecosystem and nature still remain in mystery. To fill the blank, we conduct the first systematic and comprehensive empirical study on a large-scale set of fake apps. Over 150,000 samples related to the top 50 popular apps are collected for extensive measurement. In this paper, we present discoveries from three different perspectives, namely fake sample characteristics, quantitative study on fake samples and fake authors' developing trend. Moreover, valuable domain knowledge, like fake apps' naming tendency and fake developers' evasive strategies, is then presented and confirmed with case studies, demonstrating a clear vision of fake apps and their ecosystem.
The advancement of artificial intelligence (AI) and the significant growth in the use of food consumption tracking and recommendation-related apps in the app stores have created a need for an evaluation system, as minimal information is available about the evidence-based quality and technological advancement of these apps. Electronic searches were conducted across three major app stores and the selected apps were evaluated by three independent raters. A total of 473 apps were found and 80 of them were selected for review based on inclusion and exclusion criteria. An app rating tool is devised to evaluate the selected apps. Our rating tool assesses the apps' essential features, AI-based advanced functionalities, and software quality characteristics required for food consumption tracking and recommendations, as well as their usefulness to general users. Users' comments from the app stores are collected and evaluated to better understand their expectations and perspectives. Following an evaluation of the assessed applications, design considerations that emphasize automation-based approaches using artificial intelligence are proposed. According to our assessment, most mobile apps in th
Android is the most popular mobile operating system in the world, running on more than 70% of mobile devices. This implies a gigantic and very competitive market for Android apps. Being successful in such a market is far from trivial and requires, besides the tackling of a problem or need felt by a vast audience, the development of high-quality apps. As recently showed in the literature, connectivity issues (e.g., mishandling of zero/unreliable Internet connection) can result in bugs and/or crashes, negatively affecting the app's user experience. While these issues have been studied in the literature, there are no techniques able to automatically detect and report them to developers. We present CONAN, a tool able to detect statically 16 types of connectivity issues affecting Android apps. We assessed the ability of CONAN to precisely identify these issues in a set of 44 open source apps, observing an average precision of 80%. Then, we studied the relevance of these issues for developers by (i) conducting interviews with six practitioners working with commercial Android apps, and (ii) submitting 84 issue reports for 27 open source apps. Our results show that several of the identifie
Mobile apps have become indispensable for daily life, not only for individuals but also for companies/organizations that offer their services digitally. Inherited by the mobility of devices, there are no limitations regarding the locations or conditions in which apps are being used. For example, apps can be used where no internet connection is available. Therefore, offline-first is a highly desired quality of mobile apps. Accordingly, inappropriate handling of connectivity issues and miss-implementation of good practices lead to bugs and crashes occurrences that reduce the confidence of users on the apps' quality. In this paper, we present the first study on Eventual Connectivity (ECn) issues exhibited by Android apps, by manually inspecting 971 scenarios related to 50 open-source apps. We found 304 instances of ECn issues (6 issues per app, on average) that we organized in a taxonomy of 10 categories. We found that the majority of ECn issues are related to the use of messages not providing correct information to the user about the connectivity status and to the improper use of external libraries/apps to which the check of the connectivity status is delegated. Based on our findings
In 2022, over half of the web traffic was accessed through mobile devices. By reducing the energy consumption of mobile web apps, we can not only extend the battery life of our devices, but also make a significant contribution to energy conservation efforts. For example, if we could save only 5% of the energy used by web apps, we estimate that it would be enough to shut down one of the nuclear reactors in Fukushima. This paper presents a comprehensive overview of energy-saving experiments and related approaches for mobile web apps, relevant for researchers and practitioners. To achieve this objective, we conducted a systematic literature review and identified 44 primary studies for inclusion. Through the mapping and analysis of scientific papers, this work contributes: (1) an overview of the energy-draining aspects of mobile web apps, (2) a comprehensive description of the methodology used for the energy-saving experiments, and (3) a categorization and synthesis of various energy-saving approaches.
Food recognition and nutritional apps are trending technologies that may revolutionise the way people with diabetes manage their diet. Such apps can monitor food intake as a digital diary and even employ artificial intelligence to assess the diet automatically. Although these apps offer a promising solution for managing diabetes, they are rarely used by patients. This chapter aims to provide an in-depth assessment of the current status of apps for food recognition and nutrition, to identify factors that may inhibit or facilitate their use, while it is accompanied by an outline of relevant research and development.
With the proliferation of smart phone, a major growth in the use of apps related to the health category, specifically those concerned with foot health can be observed. Although new, these apps are being used practically for scanning feet with an aim to providing accurate information about various properties of the human foot. With the availability of many 'foot scanning and measuring apps' in the app stores, the need for an evaluation system for such apps can be deemed necessary as little information regarding the evidence-based quality of these apps is available. To characterize the assessment of measurement techniques and essential software quality characteristics of mobile foot measuring apps, and determine their effectiveness for potential use as commercial professional tools for foot care health professionals such as pedorthists, podiatrists, orthotists and so on, to assist in measuring foot for custom shoes, and for individuals to enhance the awareness of foot health and hygiene and prevention of foot-related problems. An electronic search across Android and iOS app stores was conducted between July 2020 and August 2020 for apps related to foot measurement. Mobile apps with s
Third-party security apps are an integral part of the Android app ecosystem. Many users install them as an extra layer of protection for their devices. There are hundreds of such security apps, both free and paid in Google Play Store and some of them are downloaded millions of times. By installing security apps, the smartphone users place a significant amount of trust towards the security companies who developed these apps, because a fully functional mobile security app requires access to many smartphone resources such as the storage, text messages and email, browser history, and information about other installed applications. Often these resources contain highly sensitive personal information. As such, it is essential to understand the mobile security apps ecosystem to assess whether is it indeed beneficial to install them. To this end, in this paper, we present the first empirical study of Android security apps. We analyse 100 Android security apps from multiple aspects such as metadata, static analysis, and dynamic analysis and presents insights to their operations and behaviours. Our results show that 20% of the security apps we studied potentially resell the data they collect
An increasing number of mental health services are offered through mobile systems, a paradigm called mHealth. Although there is an unprecedented growth in the adoption of mHealth systems, partly due to the COVID-19 pandemic, concerns about data privacy risks due to security breaches are also increasing. Whilst some studies have analyzed mHealth apps from different angles, including security, there is relatively little evidence for data privacy issues that may exist in mHealth apps used for mental health services, whose recipients can be particularly vulnerable. This paper reports an empirical study aimed at systematically identifying and understanding data privacy incorporated in mental health apps. We analyzed 27 top-ranked mental health apps from Google Play Store. Our methodology enabled us to perform an in-depth privacy analysis of the apps, covering static and dynamic analysis, data sharing behaviour, server-side tests, privacy impact assessment requests, and privacy policy evaluation. Furthermore, we mapped the findings to the LINDDUN threat taxonomy, describing how threats manifest on the studied apps. The findings reveal important data privacy issues such as unnecessary per
LLM app stores have seen rapid growth, leading to the proliferation of numerous custom LLM apps. However, this expansion raises security concerns. In this study, we propose a three-layer concern framework to identify the potential security risks of LLM apps, i.e., LLM apps with abusive potential, LLM apps with malicious intent, and LLM apps with exploitable vulnerabilities. Over five months, we collected 786,036 LLM apps from six major app stores: GPT Store, FlowGPT, Poe, Coze, Cici, and Character.AI. Our research integrates static and dynamic analysis, the development of a large-scale toxic word dictionary (i.e., ToxicDict) comprising over 31,783 entries, and automated monitoring tools to identify and mitigate threats. We uncovered that 15,146 apps had misleading descriptions, 1,366 collected sensitive personal information against their privacy policies, and 15,996 generated harmful content such as hate speech, self-harm, extremism, etc. Additionally, we evaluated the potential for LLM apps to facilitate malicious activities, finding that 616 apps could be used for malware generation, phishing, etc. Our findings highlight the urgent need for robust regulatory frameworks and enhanc