Empirical research in reverse engineering and software protection is crucial for evaluating the efficacy of methods designed to protect software against unauthorized access and tampering. However, conducting such studies with professional reverse engineers presents significant challenges, including access to professionals and affordability. This paper explores the use of students as participants in empirical reverse engineering experiments, examining their suitability and the necessary training; the design of appropriate challenges; strategies for ensuring the rigor and validity of the research and its results; ways to maintain students' privacy, motivation, and voluntary participation; and data collection methods. We present a systematic literature review of existing reverse engineering experiments and user studies, a discussion of related work from the broader domain of software engineering that applies to reverse engineering experiments, an extensive discussion of our own experience running experiments ourselves in the context of a master-level software hacking and protection course, and recommendations based on this experience. Our findings aim to guide future empirical studies
The rapid emergence of generative AI models like Large Language Models (LLMs) has demonstrated its utility across various activities, including within Requirements Engineering (RE). Ensuring the quality and accuracy of LLM-generated output is critical, with prompt engineering serving as a key technique to guide model responses. However, existing literature provides limited guidance on how prompt engineering can be leveraged, specifically for RE activities. The objective of this study is to explore the applicability of existing prompt engineering guidelines for the effective usage of LLMs within RE. To achieve this goal, we began by conducting a systematic review of primary literature to compile a non-exhaustive list of prompt engineering guidelines. Then, we conducted interviews with RE experts to present the extracted guidelines and gain insights on the advantages and limitations of their application within RE. Our literature review indicates a shortage of prompt engineering guidelines for domain-specific activities, specifically for RE. Our proposed mapping contributes to addressing this shortage. We conclude our study by identifying an important future line of research within th
The software engineering researchers from countries with smaller economies, particularly non-English speaking ones, represent valuable minorities within the software engineering community. As researchers from Poland, we represent such a country. We analyzed the ICSE FOSE (Future of Software Engineering) community survey through reflexive thematic analysis to show our viewpoint on key software community issues. We believe that the main problem is the growing research-industry gap, which particularly impacts smaller communities and small local companies. Based on this analysis and our experiences, we present a set of recommendations for improvements that would enhance software engineering research and industrial collaborations in smaller economies.
The design of effective programming languages, libraries, frameworks, tools, and platforms for data engineering strongly depends on their ease and correctness of use. Anyone who ignores that it is humans who use these tools risks building tools that are useless, or worse, harmful. To ensure our data engineering tools are based on solid foundations, we performed a systematic review of common programming mistakes in data engineering. We focus on programming beginners (students) by analyzing both the limited literature specific to data engineering mistakes and general programming mistakes in languages commonly used in data engineering (Python, SQL, Java). Through analysis of 21 publications spanning from 2003 to 2024, we synthesized these complementary sources into a comprehensive classification that captures both general programming challenges and domain-specific data engineering mistakes. This classification provides an empirical foundation for future tool development and educational strategies. We believe our systematic categorization will help researchers, practitioners, and educators better understand and address the challenges faced by novice data engineers.
Large language models for code are advancing fast, yet our ability to evaluate them lags behind. Current benchmarks focus on narrow tasks and single metrics, which hide critical gaps in robustness, interpretability, fairness, efficiency, and real-world usability. They also suffer from inconsistent data engineering practices, limited software engineering context, and widespread contamination issues. To understand these problems and chart a path forward, we combined an in-depth survey of existing benchmarks with insights gathered from a dedicated community workshop. We identified three core barriers to reliable evaluation: the absence of software-engineering-rich datasets, overreliance on ML-centric metrics, and the lack of standardized, reproducible data pipelines. Building on these findings, we introduce BEHELM, a holistic benchmarking infrastructure that unifies software-scenario specification with multi-metric evaluation. BEHELM provides a structured way to assess models across tasks, languages, input and output granularities, and key quality dimensions. Our goal is to reduce the overhead currently required to construct benchmarks while enabling a fair, realistic, and future-proo
Peer review is the main mechanism by which the software engineering community assesses the quality of scientific results. However, the rapid growth of paper submissions in software engineering venues has outpaced the availability of qualified reviewers, creating a growing imbalance that risks constraining and negatively impacting the long-term growth of the Software Engineering (SE) research community. Our vision of the Future of the SE research landscape involves a more scalable, inclusive, and resilient peer review process that incorporates additional mechanisms for: 1) attracting and training newcomers to serve as high-quality reviewers, 2) incentivizing more community members to serve as peer reviewers, and 3) cautiously integrating AI tools to support a high-quality review process.
Software engineering researchers repeatedly argue that the impact of their research on industrial practice, while desired and intended, is rarely achieved. We believe that a possible explanation of this phenomenon is the opposition of "caring about" and "caring for", based on the ethics of care. Indeed, while software engineering is collaborative and hence builds on interpersonal relations, researchers tend to care about "industrial impact" and "practitioners" in abstract terms, but rarely care for specific individuals working in specific contexts facing specific challenges. In this position paper, we advocate for the adoption of ethics of care in software engineering and discuss the implications of this adoption for researchers and conference organizers.
Software Engineering (SE) faces simultaneous pressure from AI automation (reducing code production costs) and hardware-energy constraints (amplifying failure costs). We position that SE must redefine itself around human discernment-intent articulation, architectural control, and verification-rather than code construction. This shift introduces accountability collapse as a central risk and requires fundamental changes to research priorities, educational curricula, and industrial practices. We argue that Software Engineering, as traditionally defined around code construction and process management, is no longer sufficient. Instead, the discipline must be redefined around intent articulation, architectural control, and systematic verification. This redefinition shifts Software Engineering from a production-oriented field to one centered on human judgment under automation, with profound implications for research, practice, and education.
Research in software engineering is essential for improving development practices, leading to reliable and secure software. Leveraging the principles of quantum physics, quantum computing has emerged as a new computational paradigm that offers significant advantages over classical computing. As quantum computing progresses rapidly, its potential applications across various fields are becoming apparent. In software engineering, many tasks involve complex computations where quantum computers can greatly speed up the development process, leading to faster and more efficient solutions. With the growing use of quantum-based applications in different fields, quantum software engineering (QSE) has emerged as a discipline focused on designing, developing, and optimizing quantum software for diverse applications. This paper aims to review the role of quantum computing in software engineering research and the latest developments in QSE. To our knowledge, this is the first comprehensive review on this topic. We begin by introducing quantum computing, exploring its fundamental concepts, and discussing its potential applications in software engineering. We also examine various QSE techniques th
Over twenty years ago, the Software Engineering (SE) research community have been involved with Evidence-Based Software Engineering (EBSE). EBSE aims to inform industrial practice with the best evidence from rigorous research, preferably from systematic literature reviews (SLRs). Since then, SE researchers have conducted many SLRs, perfected their SLR procedures, proposed alternative ways of presenting their results (such as Evidence Briefings), and profusely discussed how to conduct research that impacts practice. Nevertheless, there is still a feeling that SLRs' results are not reaching practitioners. Something is missing. In this vision paper, we introduce Evidence to Decision (EtD) frameworks from the health sciences, which propose gathering experts in panels to assess the existing best evidence about the impact of an intervention in all relevant outcomes and make structured recommendations based on them. The insight we can leverage from EtD frameworks is not their structure per se but all the relevant criteria for making recommendations to practitioners from SLRs. Furthermore, we provide a worked example based on an SE SLR. We also discuss the challenges the SE research and pr
The aerospace industry operates at the frontier of technological innovation while maintaining high standards regarding safety and reliability. In this environment, with an enormous potential for re-use and adaptation of existing solutions and methods, Knowledge-Based Engineering (KBE) has been applied for decades. The objective of this study is to identify and examine state-of-the-art knowledge management practices in the field of aerospace engineering. Our contributions include: 1) A SWARM-SLR of over 1,000 articles with qualitative analysis of 164 selected articles, supported by two aerospace engineering domain expert surveys. 2) A knowledge graph of over 700 knowledge-based aerospace engineering processes, software, and data, formalized in the interoperable Web Ontology Language (OWL) and mapped to Wikidata entries where possible. The knowledge graph is represented on the Open Research Knowledge Graph (ORKG), and an aerospace Wikibase, for reuse and continuation of structuring aerospace engineering knowledge exchange. 3) Our resulting intermediate and final artifacts of the knowledge synthesis, available as a Zenodo dataset. This review sets a precedent for structured, semantic-
By treating data and models as the source code, Foundation Models (FMs) become a new type of software. Mirroring the concept of software crisis, the increasing complexity of FMs making FM crisis a tangible concern in the coming decade, appealing for new theories and methodologies from the field of software engineering. In this paper, we outline our vision of introducing Foundation Model (FM) engineering, a strategic response to the anticipated FM crisis with principled engineering methodologies. FM engineering aims to mitigate potential issues in FM development and application through the introduction of declarative, automated, and unified programming interfaces for both data and model management, reducing the complexities involved in working with FMs by providing a more structured and intuitive process for developers. Through the establishment of FM engineering, we aim to provide a robust, automated, and extensible framework that addresses the imminent challenges, and discovering new research opportunities for the software engineering field.
The discussion around AI-Engineering, that is, Software Engineering (SE) for AI-enabled Systems, cannot ignore a crucial class of software systems that are increasingly becoming AI-enhanced: Those used to enable or support the SE process, such as Computer-Aided SE (CASE) tools and Integrated Development Environments (IDEs). In this paper, we study the energy efficiency of these systems. As AI becomes seamlessly available in these tools and, in many cases, is active by default, we are entering a new era with significant implications for energy consumption patterns throughout the Software Development Lifecycle (SDLC). We focus on advanced Machine Learning (ML) capabilities provided by Large Language Models (LLMs). Our proposed approach combines Retrieval-Augmented Generation (RAG) with Prompt Engineering Techniques (PETs) to enhance both the quality and energy efficiency of LLM-based code generation. We present a comprehensive framework that measures real-time energy consumption and inference time across diverse model architectures ranging from 125M to 7B parameters, including GPT-2, CodeLlama, Qwen 2.5, and DeepSeek Coder. These LLMs, chosen for practical reasons, are sufficient to
The heterogeneity in the organization of software engineering (SE) research historically exists, i.e., funded research model and hands-on model, which makes software engineering become a thriving interdisciplinary field in the last 50 years. However, the funded research model is becoming dominant in SE research recently, indicating such heterogeneity has been seriously and systematically threatened. In this essay, we first explain why the heterogeneity is needed in the organization of SE research, then present the current trend of SE research nowadays, as well as the consequences and potential futures. The choice is at our hands, and we urge our community to seriously consider maintaining the heterogeneity in the organization of software engineering research.
Large language models (LLMs) accelerate software development but often exhibit instability, non-determinism, and weak adherence to development discipline in unconstrained workflows. While test-driven development (TDD) provides a structured Red-Green-Refactor process, existing LLM-based approaches typically use tests as auxiliary inputs rather than enforceable process constraints. We present an AI-native TDD framework that operationalizes classical TDD principles as structured prompt-level and workflow-level governance mechanisms. Extracted principles are formalized in a machine-readable manifesto and distributed across planning, generation, repair, and validation stages within a layered architecture that separates model proposal from deterministic engine authority. The system enforces phase ordering, bounded repair loops, validation gates, and atomic mutation control to improve stability and reproducibility. We describe architecture and discuss encoding software engineering discipline directly into prompt orchestration, which we think offers a promising direction for reliable LLM-assisted development.
The ability of Generative AI (GAI) technology to automatically check, synthesize and modify software engineering artifacts promises to revolutionize all aspects of software engineering. Using GAI for software engineering tasks is consequently one of the most rapidly expanding fields of software engineering research, with over a hundred LLM-based code models having been published since 2021. However, the overwhelming majority of existing code models share a major weakness - they are exclusively trained on the syntactic facet of software, significantly lowering their trustworthiness in tasks dependent on software semantics. To address this problem, a new class of "Morescient" GAI is needed that is "aware" of (i.e., trained on) both the semantic and static facets of software. This, in turn, will require a new generation of software observation platforms capable of generating large quantities of execution observations in a structured and readily analyzable way. In this paper, we present a vision and roadmap for how such "Morescient" GAI models can be engineered, evolved and disseminated according to the principles of open science.
Realisation of significant advances in capabilities of sensors, computing, timing, and communication enabled by quantum technologies is dependent on engineering highly complex systems that integrate quantum devices into existing classical infrastructure. A systems engineering approach is considered to address the growing need for quantum-secure telecommunications that overcome the threat to encryption caused by maturing quantum computation. This work explores a range of existing and future quantum communication networks, specifically quantum key distribution network proposals, to model and demonstrate the evolution of quantum key distribution network architectures. Leveraging Orthogonal Variability Modelling and Systems Modelling Language as candidate modelling languages, the study creates traceable artefacts to promote modular architectures that are reusable for future studies. We propose a variability-driven framework for managing fast-evolving network architectures with respect to increasing stakeholder expectations. The result contributes to the systematic development of viable quantum key distribution networks and supports the investigation of similar integration challenges re
The study of behavioral and social dimensions of software engineering (SE) tasks characterizes behavioral software engineering (BSE);however, the increasing significance of human-AI collaboration (HAIC) brings new directions in BSE by presenting new challenges and opportunities. This PhD research focuses on decision-making (DM) for SE tasks and collaboration within human-AI teams, aiming to promote responsible software engineering through a cognitive partnership between humans and AI. The goal of the research is to identify the challenges and nuances in HAIC from a cognitive perspective, design and optimize collaboration/partnership (human-AI team) that enhance collective intelligence and promote better, responsible DM in SE through human-centered approaches. The research addresses HAIC and its impact on individual, team, and organizational level aspects of BSE.
This paper explores how generative AI can help automate and improve key steps in systems engineering. It examines AI's ability to analyze system requirements based on INCOSE's "good requirement" criteria, identifying well-formed and poorly written requirements. The AI does not just classify requirements but also explains why some do not meet the standards. By comparing AI assessments with those of experienced engineers, the study evaluates the accuracy and reliability of AI in identifying quality issues. Additionally, it explores AI's ability to classify functional and non-functional requirements and generate test specifications based on these classifications. Through both quantitative and qualitative analysis, the research aims to assess AI's potential to streamline engineering processes and improve learning outcomes. It also highlights the challenges and limitations of AI, ensuring its safe and ethical use in professional and academic settings.
Large Language Models (LLMs) are increasingly integrated into software applications, giving rise to a broad class of prompt-enabled systems, in which prompts serve as the primary 'programming' interface for guiding system behavior. Building on this trend, a new software paradigm, promptware, has emerged, which treats natural language prompts as first-class software artifacts for interacting with LLMs. Unlike traditional software, which relies on formal programming languages and deterministic runtime environments, promptware is based on ambiguous, unstructured, and context-dependent natural language and operates on LLMs as runtime environments, which are probabilistic and non-deterministic. These fundamental differences introduce unique challenges in prompt development. In practice, prompt development remains largely ad hoc and relies heavily on time-consuming trial-and-error, a challenge we term the promptware crisis. To address this, we propose promptware engineering, a new methodology that adapts established Software Engineering (SE) principles to prompt development. Drawing on decades of success in traditional SE, we envision a systematic framework encompassing prompt requiremen