Adversarial examples are perturbed inputs designed to fool machine learning models. Adversarial training injects such examples into training data to increase robustness. To scale this technique to large datasets, perturbations are crafted using fast single-step methods that maximize a linear approximation of the model's loss. We show that this form of adversarial training converges to a degenerate global minimum, wherein small curvature artifacts near the data points obfuscate a linear approximation of the loss. The model thus learns to generate weak perturbations, rather than defend against strong ones. As a result, we find that adversarial training remains vulnerable to black-box attacks, where we transfer perturbations computed on undefended models, as well as to a powerful novel single-step attack that escapes the non-smooth vicinity of the input data via a small random step. We further introduce Ensemble Adversarial Training, a technique that augments training data with perturbations transferred from other models. On ImageNet, Ensemble Adversarial Training yields models with strong robustness to black-box attacks. In particular, our most robust model won the first round of the NIPS 2017 competition on Defenses against Adversarial Attacks. However, subsequent work found that more elaborate black-box attacks could significantly enhance transferability and reduce the accuracy of our models.
Recent work has demonstrated that deep neural networks are vulnerable to adversarial examples---inputs that are almost indistinguishable from natural data and yet classified incorrectly by the network. In fact, some of the latest findings suggest that the existence of adversarial attacks may be an inherent weakness of deep learning models. To address this problem, we study the adversarial robustness of neural networks through the lens of robust optimization. This approach provides us with a broad and unifying view on much of the prior work on this topic. Its principled nature also enables us to identify methods for both training and attacking neural networks that are reliable and, in a certain sense, universal. In particular, they specify a concrete security guarantee that would protect against any adversary. These methods let us train networks with significantly improved resistance to a wide range of adversarial attacks. They also suggest the notion of security against a first-order adversary as a natural and broad security guarantee. We believe that robustness against such well-defined classes of adversaries is an important stepping stone towards fully resistant deep learning models. Code and pre-trained models are available at https://github.com/MadryLab/mnist_challenge and https://github.com/MadryLab/cifar10_challenge.
A power grid is a complex system connecting electric power generators to consumers through power transmission and distribution networks across a large geographical area. System monitoring is necessary to ensure the reliable operation of power grids, and state estimation is used in system monitoring to best estimate the power grid state through analysis of meter measurements and power system models. Various techniques have been developed to detect and identify bad measurements, including interacting bad measurements introduced by arbitrary, nonrandom causes. At first glance, it seems that these techniques can also defeat malicious measurements injected by attackers. In this article, we expose an unknown vulnerability of existing bad measurement detection algorithms by presenting and analyzing a new class of attacks, called false data injection attacks , against state estimation in electric power grids. Under the assumption that the attacker can access the current power system configuration information and manipulate the measurements of meters at physically protected locations such as substations, such attacks can introduce arbitrary errors into certain state variables without being detected by existing algorithms. Moreover, we look at two scenarios, where the attacker is either constrained to specific meters or limited in the resources required to compromise meters. We show that the attacker can systematically and efficiently construct attack vectors in both scenarios to change the results of state estimation in arbitrary ways. We also extend these attacks to generalized false data injection attacks , which can further increase the impact by exploiting measurement errors typically tolerated in state estimation. We demonstrate the success of these attacks through simulation using IEEE test systems, and also discuss the practicality of these attacks and the real-world constraints that limit their effectiveness.
BACKGROUND: The scope of the terrorist attacks of September 11, 2001, was unprecedented in the United States. We assessed the prevalence and correlates of acute post-traumatic stress disorder (PTSD) and depression among residents of Manhattan five to eight weeks after the attacks. METHODS: We used random-digit dialing to contact a representative sample of adults living south of 110th Street in Manhattan. Participants were asked about demographic characteristics, exposure to the events of September 11, and psychological symptoms after the attacks. RESULTS: Among 1008 adults interviewed, 7.5 percent reported symptoms consistent with a diagnosis of current PTSD related to the attacks, and 9.7 percent reported symptoms consistent with current depression (with "current" defined as occurring within the previous 30 days). Among respondents who lived south of Canal Street (i.e., near the World Trade Center), the prevalence of PTSD was 20.0 percent. Predictors of PTSD in a multivariate model were Hispanic ethnicity, two or more prior stressors, a panic attack during or shortly after the events, residence south of Canal Street, and loss of possessions due to the events. Predictors of depression were Hispanic ethnicity, two or more prior stressors, a panic attack, a low level of social support, the death of a friend or relative during the attacks, and loss of a job due to the attacks. CONCLUSIONS: There was a substantial burden of acute PTSD and depression in Manhattan after the September 11 attacks. Experiences involving exposure to the attacks were predictors of current PTSD, and losses as a result of the events were predictors of current depression. In the aftermath of terrorist attacks, there may be substantial psychological morbidity in the population.
Wireless networks are built upon a shared medium that makes it easy for adversaries to launch jamming-style attacks. These attacks can be easily accomplished by an adversary emitting radio frequency signals that do not follow an underlying MAC protocol. Jamming attacks can severely interfere with the normal operation of wireless networks and, consequently, mechanisms are needed that can cope with jamming attacks. In this paper, we examine radio interference attacks from both sides of the issue: first, we study the problem of conducting radio interference attacks on wireless networks, and second we examine the critical issue of diagnosing the presence of jamming attacks. Specifically, we propose four different jamming attack models that can be used by an adversary to disable the operation of a wireless network, and evaluate their effectiveness in terms of how each method affects the ability of a wireless node to send and receive packets. We then discuss different measurements that serve as the basis for detecting a jamming attack, and explore scenarios where each measurement by itself is not enough to reliably classify the presence of a jamming attack. In particular, we observe that signal strength and carrier sensing time are unable to conclusively detect the presence of a jammer. Further, we observe that although by using packet delivery ratio we may differentiate between congested and jammed scenarios, we are nonetheless unable to conclude whether poor link utility is due to jamming or the mobility of nodes. The fact that no single measurement is sufficient for reliably classifying the presence of a jammer is an important observation, and necessitates the development of enhanced detection schemes that can remove ambiguity when detecting a jammer. To address this need, we propose two enhanced detection protocols that employ consistency checking. The first scheme employs signal strength measurements as a reactive consistency check for poor packet delivery ratios, while the second scheme employs location information to serve as the consistency check. Throughout our discussions, we examine the feasibility and effectiveness of jamming attacks and detection schemes using the MICA2 Mote platform.
With the fast spread of machine learning techniques, sharing and adopting public machine learning models become very popular. This gives attackers many new opportunities. In this paper, we propose a trojaning attack on neural networks. As the models are not intuitive for human to understand, the attack features stealthiness. Deploying trojaned models can cause various severe consequences including endangering human lives (in applications like autonomous driving). We first inverse the neural network to generate a general trojan trigger, and then retrain the model with reversed engineered training data to inject malicious behaviors to the model. The malicious behaviors are only activated by inputs stamped with the trojan trigger. In our attack, we do not need to tamper with the original training process, which usually takes weeks to months. Instead, it takes minutes to hours to apply our attack. Also, we do not require the datasets that are used to train the model. In practice, the datasets are usually not shared due to privacy or copyright concerns. We use five different applications to demonstrate the power of our attack, and perform a deep analysis on the possible factors that affect the attack. The results show that our attack is highly effective and efficient. The trojaned behaviors can be successfully triggered (with nearly 100% possibility) without affecting its test accuracy for normal input and even with better accuracy on public dataset. Also, it only takes a small amount of time to attack a complex neuron network model. In the end, we also discuss possible defense against such attacks.
As mobile ad hoc network applications are deployed, security emerges as a central requirement. In this paper, we introduce the wormhole attack, a severe attack in ad hoc networks that is particularly challenging to defend against. The wormhole attack is possible even if the attacker has not compromised any hosts, and even if all communication provides authenticity and confidentiality. In the wormhole attack, an attacker records packets (or bits) at one location in the network, tunnels them (possibly selectively) to another location, and retransmits them there into the network. The wormhole attack can form a serious threat in wireless networks, especially against many ad hoc network routing protocols and location-based wireless security systems. For example, most existing ad hoc network routing protocols, without some mechanism to defend against the wormhole attack, would be unable to find routes longer than one or two hops, severely disrupting communication. We present a new, general mechanism, called packet leashes, for detecting and thus defending against wormhole attacks, and we present a specific protocol, called TIK, that implements leashes.
Deep neural networks are vulnerable to adversarial examples, which poses security concerns on these algorithms due to the potentially severe consequences. Adversarial attacks serve as an important surrogate to evaluate the robustness of deep learning models before they are deployed. However, most of existing adversarial attacks can only fool a black-box model with a low success rate. To address this issue, we propose a broad class of momentum-based iterative algorithms to boost adversarial attacks. By integrating the momentum term into the iterative process for attacks, our methods can stabilize update directions and escape from poor local maxima during the iterations, resulting in more transferable adversarial examples. To further improve the success rates for black-box attacks, we apply momentum iterative algorithms to an ensemble of models, and show that the adversarially trained models with a strong defense ability are also vulnerable to our black-box attacks. We hope that the proposed methods will serve as a benchmark for evaluating the robustness of various deep models and defense methods. With this method, we won the first places in NIPS 2017 Non-targeted Adversarial Attack and Targeted Adversarial Attack competitions.
Machine learning (ML) models, e.g., deep neural networks (DNNs), are vulnerable to adversarial examples: malicious inputs modified to yield erroneous model outputs, while appearing unmodified to human observers. Potential attacks include having malicious content like malware identified as legitimate or controlling vehicle behavior. Yet, all existing adversarial example attacks require knowledge of either the model internals or its training data. We introduce the first practical demonstration of an attacker controlling a remotely hosted DNN with no such knowledge. Indeed, the only capability of our black-box adversary is to observe labels given by the DNN to chosen inputs. Our attack strategy consists in training a local model to substitute for the target DNN, using inputs synthetically generated by an adversary and labeled by the target DNN. We use the local substitute to craft adversarial examples, and find that they are misclassified by the targeted DNN. To perform a real-world and properly-blinded evaluation, we attack a DNN hosted by MetaMind, an online deep learning API. We find that their DNN misclassifies 84.24% of the adversarial examples crafted with our substitute. We demonstrate the general applicability of our strategy to many ML techniques by conducting the same attack against models hosted by Amazon and Google, using logistic regression substitutes. They yield adversarial examples misclassified by Amazon and Google at rates of 96.19% and 88.94%. We also find that this black-box attack strategy is capable of evading defense strategies previously found to make adversarial example crafting harder.
Deep learning is at the heart of the current rise of artificial intelligence. In the field of computer vision, it has become the workhorse for applications ranging from self-driving cars to surveillance and security. Whereas, deep neural networks have demonstrated phenomenal success (often beyond human capabilities) in solving complex problems, recent studies show that they are vulnerable to adversarial attacks in the form of subtle perturbations to inputs that lead a model to predict incorrect outputs. For images, such perturbations are often too small to be perceptible, yet they completely fool the deep learning models. Adversarial attacks pose a serious threat to the success of deep learning in practice. This fact has recently led to a large influx of contributions in this direction. This paper presents the first comprehensive survey on adversarial attacks on deep learning in computer vision. We review the works that design adversarial attacks, analyze the existence of such attacks and propose defenses against them. To emphasize that adversarial attacks are possible in practical conditions, we separately review the contributions that evaluate adversarial attacks in the real-world scenarios. Finally, drawing on the reviewed literature, we provide a broader outlook of this research direction.
Abstract. In security-sensitive applications, the success of machine learn-ing depends on a thorough vetting of their resistance to adversarial data. In one pertinent, well-motivated attack scenario, an adversary may at-tempt to evade a deployed system at test time by carefully manipulating attack samples. In this work, we present a simple but effective gradient-based approach that can be exploited to systematically assess the security of several, widely-used classification algorithms against evasion attacks. Following a recently proposed framework for security evaluation, we sim-ulate attack scenarios that exhibit different risk levels for the classifier by increasing the attacker’s knowledge of the system and her ability to manipulate attack samples. This gives the classifier designer a better pic-ture of the classifier performance under evasion attacks, and allows him to perform a more informed model selection (or parameter setting). We evaluate our approach on the relevant security task of malware detection in PDF files, and show that such systems can be easily evaded. We also sketch some countermeasures suggested by our analysis.
CONTEXT: The terrorist attacks of September 11, 2001, represent an unprecedented exposure to trauma in the United States. OBJECTIVES: To assess psychological symptom levels in the United States following the events of September 11 and to examine the association between postattack symptoms and a variety of indices of exposure to the events. DESIGN: Web-based epidemiological survey of a nationally representative cross-sectional sample using the Posttraumatic Stress Disorder (PTSD) Checklist and the Brief Symptom Inventory, administered 1 to 2 months following the attacks. SETTING AND PARTICIPANTS: Sample of 2273 adults, including oversamples of the New York, NY, and Washington, DC, metropolitan areas. MAIN OUTCOME MEASURES: Self-reports of the symptoms of PTSD and of clinically significant nonspecific psychological distress; adult reports of symptoms of distress among children living in their households. RESULTS: The prevalence of probable PTSD was significantly higher in the New York City metropolitan area (11.2%) than in Washington, DC (2.7%), other major metropolitan areas (3.6%), and the rest of the country (4.0%). A broader measure of clinically significant psychological distress suggests that overall distress levels across the country, however, were within expected ranges for a general community sample. In multivariate models, sex, age, direct exposure to the attacks, and the amount of time spent viewing TV coverage of the attacks on September 11 and the few days afterward were associated with PTSD symptom levels; sex, the number of hours of television coverage viewed, and an index of the content of that coverage were associated with the broader distress measure. More than 60% of adults in New York City households with children reported that 1 or more children were upset by the attacks. CONCLUSIONS: One to 2 months following the events of September 11, probable PTSD was associated with direct exposure to the terrorist attacks among adults, and the prevalence in the New York City metropolitan area was substantially higher than elsewhere in the country. However, overall distress levels in the country were within normal ranges. Further research should document the course of symptoms and recovery among adults following exposure to the events of September 11 and further specify the types and severity of distress in children.
Machine-learning (ML) algorithms are increasingly utilized in privacy-sensitive applications such as predicting lifestyle choices, making medical diagnoses, and facial recognition. In a model inversion attack, recently introduced in a case study of linear classifiers in personalized medicine by Fredrikson et al., adversarial access to an ML model is abused to learn sensitive genomic information about individuals. Whether model inversion attacks apply to settings outside theirs, however, is unknown. We develop a new class of model inversion attack that exploits confidence values revealed along with predictions. Our new attacks are applicable in a variety of settings, and we explore two in depth: decision trees for lifestyle surveys as used on machine-learning-as-a-service systems and neural networks for facial recognition. In both cases confidence values are revealed to those with the ability to make prediction queries to models. We experimentally show attacks that are able to estimate whether a respondent in a lifestyle survey admitted to cheating on their significant other and, in the other context, show how to recover recognizable images of people's faces given only their name and access to the ML model. We also initiate experimental exploration of natural countermeasures, investigating a privacy-aware decision tree training algorithm that is a simple variant of CART learning, as well as revealing only rounded confidence values. The lesson that emerges is that one can avoid these kinds of MI attacks with negligible degradation to utility.
This paper examines how monitoring power consumption signals might breach smart-card security. Both simple power analysis and differential power analysis attacks are investigated. The theory behind these attacks is reviewed. Then, we concentrate on showing how power analysis theory can be applied to attack an actual smart card. We examine the noise characteristics of the power signals and develop an approach to model the signal-to-noise ratio (SNR). We show how this SNR can be significantly improved using a multiple-bit attack. Experimental results against a smart-card implementation of the Data Encryption Standard demonstrate the effectiveness of our multiple-bit attack. Potential countermeasures to these attacks are also discussed.
Security is important for many sensor network applications. A particularly harmful attack against sensor and ad hoc networks is known as the Sybil attack [6], where a node illegitimately claims multiple identities. This paper systematically analyzes the threat posed by the Sybil attack to wireless sensor networks. We demonstrate that the attack can be exceedingly detrimental to many important functions of the sensor network such as routing, resource allocation, misbehavior detection, etc. We establish a classification of different types of the Sybil attack, which enables us to better understand the threats posed by each type, and better design countermeasures against each type. We then propose several novel techniques to defend against the Sybil attack, and analyze their effectiveness quantitatively.
An integral part of modeling the global view of network security is constructing attack graphs. Manual attack graph construction is tedious, error-prone, and impractical for attack graphs larger than a hundred nodes. In this paper we present an automated technique for generating and analyzing attack graphs. We base our technique on symbolic model checking algorithms, letting us construct attack graphs automatically and efficiently. We also describe two analyses to help decide which attacks would be most cost-effective to guard against. We implemented our technique in a tool suite and tested it on a small network example, which includes models of a firewall and an intrusion detection system.
BACKGROUND: People who are not present at a traumatic event may also experience stress reactions. We assessed the immediate mental health effects of the terrorist attacks on September 11, 2001. METHODS: Using random-digit dialing three to five days after September 11, we interviewed a nationally representative sample of 569 U.S. adults about their reactions to the terrorist attacks and their perceptions of their children's reactions. RESULTS: Forty-four percent of the adults reported one or more substantial stress symptoms; 91 percent had one or more symptoms to at least some degree. Respondents throughout the country reported stress syndromes. They coped by talking with others (98 percent), turning to religion (90 percent), participating in group activities (60 percent), and making donations (36 percent). Eighty-five percent of parents reported that they or other adults in the household had talked to their children about the attacks for an hour or more; 34 percent restricted their children's television viewing. Thirty-five percent of children had one or more stress symptoms, and 47 percent were worried about their own safety or the safety of loved ones. CONCLUSIONS: After the September 11 terrorist attacks, Americans across the country, including children, had substantial symptoms of stress. Even clinicians who practice in regions that are far from the recent attacks should be prepared to assist people with trauma-related symptoms of stress.
Recent research has revealed that the output of deep neural networks (DNNs) can be easily altered by adding relatively small perturbations to the input vector. In this paper, we analyze an attack in an extremely limited scenario where only one pixel can be modified. For that we propose a novel method for generating one-pixel adversarial perturbations based on differential evolution (DE). It requires less adversarial information (a black-box attack) and can fool more types of networks due to the inherent features of DE. The results show that 67.97% of the natural images in Kaggle CIFAR-10 test dataset and 16.04% of the ImageNet (ILSVRC 2012) test images can be perturbed to at least one target class by modifying just one pixel with 74.03% and 22.91% confidence on average. We also show the same vulnerability on the original CIFAR-10 dataset. Thus, the proposed attack explores a different take on adversarial machine learning in an extreme limited scenario, showing that current DNNs are also vulnerable to such low dimension attacks. Besides, we also illustrate an important application of DE (or broadly speaking, evolutionary computation) in the domain of adversarial machine learning: creating tools that can effectively generate low-cost adversarial attacks against neural networks for evaluating robustness.
Risk of first heart attack was found to be related inversely to energy expenditure reported by 16,936 Harvard male alumni, aged 35-74 years, of whom 572 experienced heart attacks in 117,680 person-years of followup. Stairs climbed, blocks walked, strenuous sports played, and a composite physical activity index all opposed risk. Men with index below 2000 kilocalories per week were at 64% higher risk than classmates with higher index. Adult exercise was independent of other influences on heart attack risk, and peak exertion as strenuous sports play enhanced the effect of total energy expenditure. Notably, alumni physical activity supplanted student athleticism assessed in college 16-50 years earlier. If it is postulated that varsity athlete status implies selective cardiovascular fitness, such selection alone is insufficient to explain lower heart attack risk in later adult years. Ex-varsity athletes retained lower risk only if they maintained a high physical activity index as alumni.
Distributed denial-of-service (DDoS) is a rapidly growing problem. The multitude and variety of both the attacks and the defense approaches is overwhelming. This paper presents two taxonomies for classifying attacks and defenses, and thus provides researchers with a better understanding of the problem and the current solution space. The attack classification criteria was selected to highlight commonalities and important features of attack strategies, that define challenges and dictate the design of countermeasures. The defense taxonomy classifies the body of existing DDoS defenses based on their design decisions; it then shows how these decisions dictate the advantages and deficiencies of proposed solutions.