The Object-Oriented (OO) software system evolves over the time to meet the new requirements. Based on the initial release of software, the continuous modification of software code leads to software evolution. Software needs to evolve over the time to meet the new user's requirements. Software companies often develop variant software of the original one depends on customers' needs. The main hypothesis of this paper states that the software when it evolves over the time, its code continues to grow, change and become more complex. This paper proposes an automatic approach (Iris) to examine the proposed hypothesis. Originality of this approach is the exploiting of the software variants to study the impact of software evolution on the software metrics. This paper presents the results of experiments conducted on three releases of drawing shapes software, sixteen releases of rhino software, eight releases of mobile media software and ten releases of ArgoUML software. Based on the extracted software metrics, It has been found that Iris hypothesis is supported by the computed metrics.
Testing is an indispensable part of software development. However, a career in software testing is reported to be unpopular among students in computer science and related areas. This can potentially create a shortage of testers in the software industry in the future. The question is, whether the perception that undergraduate students have about software testing is accurate and whether it differs from the experience reported by those who work in testing activities in the software development industry. This investigation demonstrates that a career in software testing is more exciting and rewarding, as reported by professionals working in the field, than students may believe. Therefore, in order to guarantee a workforce focused on software quality, the academy and the software industry need to work together to better inform students about software testing and its essential role in software development.
The ability of Generative AI (GAI) technology to automatically check, synthesize and modify software engineering artifacts promises to revolutionize all aspects of software engineering. Using GAI for software engineering tasks is consequently one of the most rapidly expanding fields of software engineering research, with over a hundred LLM-based code models having been published since 2021. However, the overwhelming majority of existing code models share a major weakness - they are exclusively trained on the syntactic facet of software, significantly lowering their trustworthiness in tasks dependent on software semantics. To address this problem, a new class of "Morescient" GAI is needed that is "aware" of (i.e., trained on) both the semantic and static facets of software. This, in turn, will require a new generation of software observation platforms capable of generating large quantities of execution observations in a structured and readily analyzable way. In this paper, we present a vision and roadmap for how such "Morescient" GAI models can be engineered, evolved and disseminated according to the principles of open science.
Background: Research software is software developed by and/or used by researchers, across a wide variety of domains, to perform their research. Because of the complexity of research software, developers cannot conduct exhaustive testing. As a result, researchers have lower confidence in the correctness of the output of the software. Peer code review, a standard software engineering practice, has helped address this problem in other types of software. Aims: Peer code review is less prevalent in research software than it is in other types of software. In addition, the literature does not contain any studies about the use of peer code review in research software. Therefore, through analyzing developers perceptions, the goal of this work is to understand the current practice of peer code review in the development of research software, identify challenges and barriers associated with peer code review in research software, and present approaches to improve the peer code review in research software. Method: We conducted interviews and a community survey of research software developers to collect information about their current peer code review practices, difficulties they face, and how th
Context. Software startups face significant challenges in building minimum viable products, particularly in the early stages, when resources are limited and expertise in user experience is scarce. Objective. Introduce StartFlow, a structured method that helps non-specialized professionals create MVP prototypes using the wireflow technique, a combination of wireframes and user flows. StartFlow consists of three steps: (i) organizing features; (ii) building wireflows; and (iii) verifying and refining them based on usability heuristics. Method. To assess the method Startflow, we first conducted a focus group with researchers in Software Engineering, Human-Computer Interaction, and Software Startups. Afterward, we conducted a proof-of-concept study, which consisted of an experiment and a heuristic evaluation with experts. Results. The qualitative analysis of the focus group revealed that participants found the method straightforward, flexible, and helpful in structuring user flows and identifying visual components. However, they also pointed out the need to improve its presentation, clarify its iterative nature, and strengthen its connection to broader UX principles. The results of the
Agile software development relies on self-organized teams, underlining the importance of individual responsibility. How developers take responsibility and build ownership are influenced by external factors such as architecture and development methods. This paper examines the existing literature on ownership in software engineering and in psychology, and argues that a more comprehensive view of ownership in software engineering has a great potential in improving software team's work. Initial positions on the issue are offered for discussion and to lay foundations for further research.
A flaky test yields inconsistent results upon repetition, posing a significant challenge to software developers. An extensive study of their presence and characteristics has been done in classical computer software but not quantum computer software. In this paper, we outline challenges and potential solutions for the automated detection of flaky tests in bug reports of quantum software. We aim to raise awareness of flakiness in quantum software and encourage the software engineering community to work collaboratively to solve this emerging challenge.
A paradigm shift is underway in Software Engineering, with AI systems such as LLMs playing an increasingly important role in boosting software development productivity. This trend is anticipated to persist. In the next years, we expect a growing symbiotic partnership between human software developers and AI. The Software Engineering research community cannot afford to overlook this trend; we must address the key research challenges posed by the integration of AI into the software development process. In this paper, we present our vision of the future of software development in an AI-driven world and explore the key challenges that our research community should address to realize this vision.
Over twenty years ago, the Software Engineering (SE) research community have been involved with Evidence-Based Software Engineering (EBSE). EBSE aims to inform industrial practice with the best evidence from rigorous research, preferably from systematic literature reviews (SLRs). Since then, SE researchers have conducted many SLRs, perfected their SLR procedures, proposed alternative ways of presenting their results (such as Evidence Briefings), and profusely discussed how to conduct research that impacts practice. Nevertheless, there is still a feeling that SLRs' results are not reaching practitioners. Something is missing. In this vision paper, we introduce Evidence to Decision (EtD) frameworks from the health sciences, which propose gathering experts in panels to assess the existing best evidence about the impact of an intervention in all relevant outcomes and make structured recommendations based on them. The insight we can leverage from EtD frameworks is not their structure per se but all the relevant criteria for making recommendations to practitioners from SLRs. Furthermore, we provide a worked example based on an SE SLR. We also discuss the challenges the SE research and pr
With software maintenance accounting for 50% of the cost of developing software, enhancing code quality and reliability has become more critical than ever. In response to this challenge, this doctoral research proposal aims to explore innovative solutions by focusing on the deployment of agents powered by Large Language Models (LLMs) to perform software maintenance tasks. The iterative nature of agents, which allows for continuous learning and adaptation, can help surpass common challenges in code generation. One distinct challenge is the last-mile problems, errors at the final stage of producing functionally and contextually relevant code. Furthermore, this project aims to surpass the inherent limitations of current LLMs in source code through a collaborative framework where agents can correct and learn from each other's errors. We aim to use the iterative feedback in these systems to further fine-tune the LLMs underlying the agents, becoming better aligned to the task of automated software improvement. Our main goal is to achieve a leap forward in the field of automatic software improvement by developing new tools and frameworks that can enhance the efficiency and reliability of
Large language models (LLMs) have rapidly gained popularity and are being embedded into professional applications due to their capabilities in generating human-like content. However, unquestioned reliance on their outputs and recommendations can be problematic as LLMs can reinforce societal biases and stereotypes. This study investigates how LLMs, specifically OpenAI's GPT-4 and Microsoft Copilot, can reinforce gender and racial stereotypes within the software engineering (SE) profession through both textual and graphical outputs. We used each LLM to generate 300 profiles, consisting of 100 gender-based and 50 gender-neutral profiles, for a recruitment scenario in SE roles. Recommendations were generated for each profile and evaluated against the job requirements for four distinct SE positions. Each LLM was asked to select the top 5 candidates and subsequently the best candidate for each role. Each LLM was also asked to generate images for the top 5 candidates, providing a dataset for analysing potential biases in both text-based selections and visual representations. Our analysis reveals that both models preferred male and Caucasian profiles, particularly for senior roles, and fav
Context: Systematic reviews (SRs) summarize state-of-the-art evidence in science, including software engineering (SE). Objective: Our objective is to evaluate how SRs report research artifacts and to provide a comprehensive list of these artifacts. Method: We examined 537 secondary studies published between 2013 and 2023 to analyze the availability and reporting of research artifacts. Results: Our findings indicate that only 31.5% of the reviewed studies include research artifacts. Encouragingly, the situation is gradually improving, as our regression analysis shows a significant increase in the availability of research artifacts over time. However, in 2023, just 62.0% of secondary studies provide a research artifact while an even lower percentage, 30.4% use a permanent repository with a digital object identifier (DOI) for storage. Conclusion: To enhance transparency and reproducibility in SE research, we advocate for the mandatory publication of research artifacts in secondary studies.
Software quality estimation is a challenging and time-consuming activity, and models are crucial to face the complexity of such activity on modern software applications. In this context, software refactoring is a crucial activity within development life-cycles where requirements and functionalities rapidly evolve. One main challenge is that the improvement of distinctive quality attributes may require contrasting refactoring actions on software, as for trade-off between performance and reliability (or other non-functional attributes). In such cases, multi-objective optimization can provide the designer with a wider view on these trade-offs and, consequently, can lead to identify suitable refactoring actions that take into account independent or even competing objectives. In this paper, we present an approach that exploits NSGA-II as the genetic algorithm to search optimal Pareto frontiers for software refactoring while considering many objectives. We consider performance and reliability variations of a model alternative with respect to an initial model, the amount of performance antipatterns detected on the model alternative, and the architectural distance, which quantifies the eff
This chapter defines and presents different kinds of software ecosystems. The focus is on the development, tooling and analytics aspects of software ecosystems, i.e., communities of software developers and the interconnected software components (e.g., projects, libraries, packages, repositories, plug-ins, apps) they are developing and maintaining. The technical and social dependencies between these developers and software components form a socio-technical dependency network, and the dynamics of this network change over time. We classify and provide several examples of such ecosystems. The chapter also introduces and clarifies the relevant terms needed to understand and analyse these ecosystems, as well as the techniques and research methods that can be used to analyse different aspects of these ecosystems.
We have conducted a qualitative psychology study to explore the experience of feeling overwhelmed in the realm of software development. Through the candid confessions of two participants who have recently faced overwhelming challenges, we have identified seven distinct categories: communication-induced, disturbance-related, organizational, variety, technical, temporal, and positive overwhelm. While most types of overwhelm tend to deteriorate productivity and increase stress levels, developers sometimes perceive overwhelm as a catalyst for heightened focus, self-motivation, and productivity. Stress was often found to be a common companion of overwhelm. Our findings align with previous studies conducted in diverse disciplines. However, we believe that software developers possess unique traits that may enable them to navigate through the storm of overwhelm more effectively.
The estimation and improvement of quality attributes in software architectures is a challenging and time-consuming activity. On modern software applications, a model-based representation is crucial to face the complexity of such activity. One main challenge is that the improvement of distinctive quality attributes may require contrasting refactoring actions on the architecture, for instance when looking for trade-off between performance and reliability (or other non-functional quality attributes). In such cases, multi-objective optimization can provide the designer with a more complete view on these trade-offs and, consequently, can lead to identify suitable refactoring actions that take into account independent or even competing objectives. In this paper, we present open challenges and research directions to fill current gaps in the context of multi-objective software architecture optimization.
The happy-productive worker thesis states that happy workers are more productive. Recent research in software engineering supports the thesis, and the ideal of flourishing happiness among software developers is often expressed among industry practitioners. However, the literature suggests that a cost-effective way to foster happiness and productivity among workers could be to limit unhappiness. Psychological disorders such as job burnout and anxiety could also be reduced by limiting the negative experiences of software developers. Simultaneously, a baseline assessment of (un)happiness and knowledge about how developers experience it are missing. In this paper, we broaden the understanding of unhappiness among software developers in terms of (1) the software developer population distribution of (un)happiness, and (2) the causes of unhappiness while developing software. We conducted a large-scale quantitative and qualitative survey, incorporating a psychometrically validated instrument for measuring (un)happiness, with 2220 developers, yielding a rich and balanced sample of 1318 complete responses. Our results indicate that software developers are a slightly happy population, but the
Civic grassroots have proven their ability to create useful and scalable software that addresses pressing social needs. Although software engineering plays a fundamental role in the process of creating civic technology, academic literature that analyses the software development processes of civic tech grassroots is scarce. This paper aims to advance the understanding of how civic grassroots tackle the different activities in their software development processes. In this study, we followed the formation of two projects in a civic tech group (Code for Ireland) seeking to understand how their development processes evolved over time, and how the group carried out their work in creating new technology. Our preliminary findings show that such groups are capable of setting up systematic software engineering processes that address software specification, development, validation, and evolution. While they were able to deliver software according to self-specified quality standards, the group has challenges in requirements specification, stakeholder engagement, and reorienting from development to product delivery. Software engineering methods and tools can effectively support the future of ci
Machine learning (ML) components are being added to more and more critical and impactful software systems, but the software development process of real-world production systems from prototyped ML models remains challenging with additional complexity and interdisciplinary collaboration challenges. This poses difficulties in using traditional software lifecycle models such as waterfall, spiral, or agile models when building ML-enabled systems. In this research, we apply a Systems Engineering lens to investigate the use of V-Model in addressing the interdisciplinary collaboration challenges when building ML-enabled systems. By interviewing practitioners from software companies, we established a set of 8 propositions for using V-Model to manage interdisciplinary collaborations when building products with ML components. Based on the propositions, we found that despite requiring additional efforts, the characteristics of V-Model align effectively with several collaboration challenges encountered by practitioners when building ML-enabled systems. We recommend future research to investigate new process models, frameworks and tools that leverage the characteristics of V-Model such as the sy
Software patches are pivotal in refining and evolving codebases, addressing bugs, vulnerabilities, and optimizations. Patch descriptions provide detailed accounts of changes, aiding comprehension and collaboration among developers. However, manual description creation poses challenges in terms of time consumption and variations in quality and detail. In this paper, we propose PATCHEXPLAINER, an approach that addresses these challenges by framing patch description generation as a machine translation task. In PATCHEXPLAINER, we leverage explicit representations of critical elements, historical context, and syntactic conventions. Moreover, the translation model in PATCHEXPLAINER is designed with an awareness of description similarity. Particularly, the model is explicitly trained to recognize and incorporate similarities present in patch descriptions clustered into groups, improving its ability to generate accurate and consistent descriptions across similar patches. The dual objectives maximize similarity and accurately predict affiliating groups. Our experimental results on a large dataset of real-world software patches show that PATCHEXPLAINER consistently outperforms existing metho