Powered by recent advances in code-generating models, AI assistants like Github Copilot promise to change the face of programming forever. But what is this new face of programming? We present the first grounded theory analysis of how programmers interact with Copilot, based on observing 20 participants—with a range of prior experience using the assistant—as they solve diverse programming tasks across four languages. Our main finding is that interactions with programming assistants are bimodal : in acceleration mode , the programmer knows what to do next and uses Copilot to get there faster; in exploration mode , the programmer is unsure how to proceed and uses Copilot to explore their options. Based on our theory, we provide recommendations for improving the usability of future AI programming assistants.
Though many tools are available to help programmers working on change tasks, and several studies have been conducted to understand how programmers comprehend systems, little is known about the specific kinds of questions programmers ask when evolving a code base. To fill this gap we conducted two qualitative studies of programmers performing change tasks to medium to large sized programs. One study involved newcomers working on assigned change tasks to a medium-sized code base. The other study involved industrial programmers working on their own change tasks on code with which they had experience. The focus of our analysis has been on what information a programmer needs to know about a code base while performing a change task and also on howthey go about discovering that information. Based on this analysis we catalog and categorize 44 different kinds of questions asked by our participants. We also describe important context for how those questions were answered by our participants, including their use of tools.
Traditionally, programmers received a range of training on programming languages and methodologies, but they rarely receive training on software energy consumption. Yet, the popularity of mobile devices and cloud computing requires increased awareness of software energy consumption. On mobile devices, battery life often limits computation. Under the demands of cloud computing, datacenters struggle to reduce energy consumption through virtualization and datacenter-infrastructure-management systems. Efficient software energy consumption is increasingly becoming an important nonfunctional requirement for programmers. However, are programmers knowledgeable enough about software energy consumption? Do they base their implementation decision on popular beliefs? Researchers surveyed more than 100 programmers regarding their knowledge of software energy consumption. They found that the programmers had limited knowledge of energy efficiency, lacked knowledge of the best practices to reduce software energy consumption, and were often unsure about how software consumes energy. These results highlight the need for better training and education on energy consumption and efficiency.
Computer programmers break apart large programs into smaller coherent pieces. Each of these pieces: functions, subroutines, modules, or abstract datatypes, is usually a contiguous piece of program text. The experiment reported here shows that programmers also routinely break programs into one kind of coherent piece which is not coniguous. When debugging unfamiliar programs programmers use program pieces called slices which are sets of statements related by their flow of data. The statements in a slice are not necessarily textually contiguous, but may be scattered through a program.
In 1995, Boehm predicted that by 2005, there would be "55 million performers" of "end user programming" in the United States. The original context and method which generated this number had two weaknesses, both of which we address. First, it relies on undocumented, judgment-based factors to estimate the number of end user programmers based on the total number of end users; we address this weakness by identifying specific end user sub-populations and then estimating their sizes. Second, Boehm's estimate relies on additional undocumented, judgment-based factors to adjust for rising computer usage rates; we address this weakness by integrating fresh Bureau of Labor Statistics (BLS) data and projections as well as a richer estimation method. With these improvements to Boehm's method, we estimate that in 2012 there will be 90 million end users in American workplaces. Of these, we anticipate that over 55 million will use spreadsheets or databases (and therefore may potentially program), while over 13 million will describe themselves as programmers, compared to BLS projections of fewer than 3 million professional programmers. We have validated our improved method by generating estimates for 2001 and 2003, then verifying that our estimates are consistent with existing estimates from other sources.
This volume contains the papers presented at the second workshop on Empirical Studies of Programmers. They represent a variety of approaches and topics covering the research in this area. All the chapters present research that bears on programmers. Together with the first volume edited by Elliot Soloway and Sitharama Iyengar, these chapters contribute to a growing knowledge base about how programmers go about their task and how they progress from novice to expert levels.
Interpreting compiler errors and exception messages is challenging for novice programmers. Presenting examples of how other programmers have corrected similar errors may help novices understand and correct such errors. This paper introduces HelpMeOut, a social recommender system that aids the debugging of error messages by suggesting solutions that peers have applied in the past. HelpMeOut comprises IDE instrumentation to collect examples of code changes that fix errors; a central database that stores fix reports from many users; and a suggestion interface that, given an error, queries the database for a list of relevant fixes and presents these to the programmer. We report on implementations of this architecture for two programming languages. An evaluation with novice programmers found that the technique can suggest useful fixes for 47% of errors after 39 person-hours of programming in an instrumented environment.
Source Code Summarization is an emerging technology for automatically generating brief descriptions of code. Current summarization techniques work by selecting a subset of the statements and keywords from the code, and then including information from those statements and keywords in the summary. The quality of the summary depends heavily on the process of selecting the subset: a high-quality selection would contain the same statements and keywords that a programmer would choose. Unfortunately, little evidence exists about the statements and keywords that programmers view as important when they summarize source code. In this paper, we present an eye-tracking study of 10 professional Java programmers in which the programmers read Java methods and wrote English summaries of those methods. We apply the findings to build a novel summarization tool. Then, we evaluate this tool and provide evidence to support the development of source code summarization systems.
In the process of learning a computer language, beginning programmers may develop mental models for the language. A mental model refers to the user's conception of the “invisible” information processing that occurs inside the computer between input and output. In this study, 30 undergraduate students learned BASIC through a self-paced, mastery manual and simultaneously had hands-on access to an Apple II computer. After instruction, the students were tested on their mental models for the execution of each of nine BASIC statements. The results show that beginning programmers—although able to perform adequately on mastery tests in program generation—possessed a wide range of misconceptions concerning the statements they had learned. This paper catalogs beginning programmers' conceptions of “what goes on inside the computer” for each of nine BASIC statements.
Designing PCR and sequencing primers are essential activities for molecular biologists around the world. This chapter assumes acquaintance with the principles and practice of PCR, as outlined in, for example, refs. 1–4.
Question and Answer (Q&A) websites, such as Stack Overflow, use social media to facilitate knowledge exchange between programmers and fill archives with millions of entries that contribute to the body of knowledge in software development. Understanding the role of Q&A websites in the documentation landscape will enable us to make recommendations on how individuals and companies can leverage this knowledge effectively. In this paper, we analyze data from Stack Overflow to categorize the kinds of questions that are asked, and to explore which questions are answered well and which ones remain unanswered. Our preliminary findings indicate that Q&A websites are particularly effective at code reviews and conceptual questions. We pose research questions and suggest future work to explore the motivations of programmers that contribute to Q&A websites, and to understand the implications of turning Q&A exchanges into technical mini-blogs through the editing of questions and answers.
We present a process model to explain bugs produced by novices early in a programming course. The model was motivated by interviews with novice programmers solving simple programming problems. Our key idea is that many programming bugs can be explained by novices inappropriately using their knowledge of step-by-step procedural specifications in natural language. We view programming bugs as patches generated in response to an impasse reached by the novice while developing a program. We call such patching strategies bug generators. Several of our bug generators describe how natural language preprogramming knowledge is used by novices to create patches. Other kinds of bug generators are also discussed. We describe a representation both for novice natural language preprogramming knowledge and novice fragmentary programming knowledge. Using these representations and the bug generators, we evaluate the model by analyzing four interviews with novice programmers.
Source code authorship attribution is a significant privacy threat to anonymous code contributors. However, it may also enable attribution of successful attacks from code left behind on an infected system, or aid in resolving copyright, copyleft, and plagiarism issues in the programming fields. In this work, we investigate machine learning methods to de-anonymize source code authors of C/C++ using coding style. Our Code Stylometry Feature Set is a novel representation of coding style found in source code that reflects coding style from properties derived from abstract syntax trees. Our random forest and abstract syntax tree-based approach attributes more authors (1,600 and 250) with significantly higher accuracy (94% and 98%) on a larger data set (Google Code Jam) than has been previously achieved. Furthermore, these novel features are robust, difficult to obfuscate, and can be used in other programming languages, such as Python. We also find that (i) the code resulting from difficult programming tasks is easier to attribute than easier tasks and (ii) skilled programmers (who can complete the more difficult tasks) are easier to attribute than less skilled programmers.
Of all of the revolutionary technological innovations of the 20th century, none is as widely recognized, as celebrated, or as profoundly influential as the invention of the electronic digital computer. But like all great social and technological developments, the revolution of the twentieth century didn't just happen. It had to be made to happen, and made to happen by people, not impersonal processes. In The Computer Boys Take Over, Nathan Ensmenger describes the emergence of a new breed of technical specialists -- programmers, systems analysts, and data processing managers -- who built their careers around the powerful new technology of electronic computing. It was these largely anonymous specialists who built the systems that transformed the novel technology of electronic computing from a scientific curiosity into the most powerful and ubiquitous technology of the modern era. Known collectively as whiz kids, hackers, and gurus, they were alternatively admired for their technical prowess and despised for their eccentric mannerisms and the disruptive potential of the technologies they developed. As the systems that they built and maintained became central to the operations of our modern computerized society, they became the focus of a series of critiques of the social and organizational impact computerization. To many of their contemporaries, it seemed the computer were taking over, not just in the corporate setting, but also in government, politics, and society in general. Ensmenger follows the rise of the boys as they struggled to establish a role for themselves within traditional organizational, professional, and academic hierarchies. Was programming a black art, a legitimate science, or an industrial discipline? Were specialists more like scientists, engineers, managers, or clerical workers? What was the appropriate relationship between technical expertise and other, more traditional forms of social, political, and organizational power? In telling the story of these influential but unrecognized revolutionaries, Ensmenger provides a nuanced social history of the computerization of modern society that highlights the many ways in which even the most complex technologies are nevertheless fundamentally human constructions.
Under normal instructional circumstances, some youngsters learn programming in BASIC or LOGO much better than others. Clinical investigations of novice programmers suggest that this happens in part because different students bring different patterns of learning to the programming context. Many students disengage from the task whenever trouble occurs, neglect to track closely what their programs do by reading back the code as they write it, try to repair buggy programs by haphazardly tinkering with the code, or have difficulty breaking problems down into parts suitable for separate chunks of code. Such problems interfere with students making the best of their own learning capabilities: students often invent programming plans that go beyond what they have been taught directly. Instruction designed to foster better learning practices could help students to acquire a repertoire of programming skills, perhaps with spinoffs having to do with “learning to learn.”
Programming is related to several fields of technology, and many university students are studying the basics of it. Unfortunately, they often face difficulties already on the basic courses. This work studies the difficulties in learning programming in order to support developing learning materials for basic programming courses. The difficulties have to be recognized to be able to aid learning and teaching in an effective way.An international survey of opinions was organized for more than 500 students and teachers. This paper analyses the results of the survey. The survey provides information of the difficulties experienced and perceived when learning and teaching programming. The survey results also provide basis for recommendations for developing learning materials and approaches.
Debugging is notoriously difficult and extremely time consuming. Researchers have therefore invested a considerable amount of effort in developing automated techniques and tools for supporting various debugging tasks. Although potentially useful, most of these techniques have yet to demonstrate their practical effectiveness. One common limitation of existing approaches, for instance, is their reliance on a set of strong assumptions on how developers behave when debugging (e.g., the fact that examining a faulty statement in isolation is enough for a developer to understand and fix the corresponding bug). In more general terms, most existing techniques just focus on selecting subsets of potentially faulty statements and ranking them according to some criterion. By doing so, they ignore the fact that understanding the root cause of a failure typically involves complex activities, such as navigating program dependencies and rerunning the program with different inputs. The overall goal of this research is to investigate how developers use and benefit from automated debugging tools through a set of human studies. As a first step in this direction, we perform a preliminary study on a set of developers by providing them with an automated debugging tool and two tasks to be performed with and without the tool. Our results provide initial evidence that several assumptions made by automated debugging techniques do not hold in practice. Through an analysis of the results, we also provide insights on potential directions for future work in the area of automated debugging.
A study by a ITiCSE 2001 working group ("the McCracken Group") established that many students do not know how to program at the conclusion of their introductory courses. A popular explanation for this incapacity is that the students lack the ability to problem-solve. That is, they lack the ability to take a problem description, decompose it into sub-problems and implement them, then assemble the pieces into a complete solution. An alternative explanation is that many students have a fragile grasp of both basic programming principles and the ability to systematically carry out routine programming tasks, such as tracing (or "desk checking") through code. This ITiCSE 2004 working group studied the alternative explanation, by testing students from seven countries, in two ways. First, students were tested on their ability to predict the outcome of executing a short piece of code. Second, students were tested on their ability, when given the desired function of short piece of near-complete code, to select the correct completion of the code from a small set of possibilities. Many students were weak at these tasks, especially the latter task, suggesting that such students have a fragile grasp of skills that are a prerequisite for problem-solving.
The blind application of machine learning runs the risk of amplifying biases present in data. Such a danger is facing us with word embedding, a popular framework to represent text data as vectors which has been used in many machine learning and natural language processing tasks. We show that even word embeddings trained on Google News articles exhibit female/male gender stereotypes to a disturbing extent. This raises concerns because their widespread use, as we describe, often tends to amplify these biases. Geometrically, gender bias is first shown to be captured by a direction in the word embedding. Second, gender neutral words are shown to be linearly separable from gender definition words in the word embedding. Using these properties, we provide a methodology for modifying an embedding to remove gender stereotypes, such as the association between between the words receptionist and female, while maintaining desired associations such as between the words queen and female. We define metrics to quantify both direct and indirect gender biases in embeddings, and develop algorithms to "debias" the embedding. Using crowd-worker evaluation as well as standard benchmarks, we empirically demonstrate that our algorithms significantly reduce gender bias in embeddings while preserving the its useful properties such as the ability to cluster related concepts and to solve analogy tasks. The resulting embeddings can be used in applications without amplifying gender bias.
Abstract: We propose the neural programmer-interpreter (NPI): a recurrent and compositional neural network that learns to represent and execute programs. NPI has three learnable components: a task-agnostic recurrent core, a persistent key-value program memory, and domain-specific encoders that enable a single NPI to operate in multiple perceptually diverse environments with distinct affordances. By learning to compose lower-level programs to express higher-level programs, NPI reduces sample complexity and increases generalization ability compared to sequence-to-sequence LSTMs. The program memory allows efficient learning of additional tasks by building on existing programs. NPI can also harness the environment (e.g. a scratch pad with read-write pointers) to cache intermediate results of computation, lessening the long-term memory burden on recurrent hidden units. In this work we train the NPI with fully-supervised execution traces; each program has example sequences of calls to the immediate subprograms conditioned on the input. Rather than training on a huge number of relatively weak labels, NPI learns from a small number of rich examples. We demonstrate the capability of our model to learn several types of compositional programs: addition, sorting, and canonicalizing 3D models. Furthermore, a single NPI learns to execute these programs and all 21 associated subprograms.