The current excitement about artificial intelligence (AI), particularly machine learning (ML), is palpable and contagious. The expectation that AI is poised to "revolutionize," perhaps even take over, humanity has elicited prophetic visions and concerns from some luminaries.1-4 There is also a great deal of interest in the commercial potential of AI, which is attracting significant sums of venture capital and state-sponsored investment globally, particularly in China.5 McKinsey, for instance, predicts the potential commercial impact of AI in several domains, envisioning markets worth trillions of dollars.6 All this is driven by the sudden, explosive, and surprising advances AI has made in the last 10 years or so. AlphaGo, autonomous cars, Alexa, Watson, and other such systems, in game playing, robotics, computer vision, speech recognition, and natural language processing are indeed stunning advances. But, as with earlier AI breakthroughs, such as expert systems in the 1980s and neural networks in the 1990s, there is also considerable hype and a tendency to overestimate the promise of these advances, as market research firm Gartner and others have noted about emerging technology.7 It is quite understandable that many chemical engineers are excited about the potential applications of AI, and ML in particular,8 for use in such applications as catalyst design.9-11 It might seem that this prospect offers a novel approach to challenging, long-standing problems in chemical engineering using AI. However, the use of AI in chemical engineering is not new—it is, in fact, a 35-year-old ongoing program with some remarkable successes along the way. This article is aimed broadly at chemical engineers who are interested in the prospects for AI in our domain, as well as at researchers new to this area. The objectives of this article are threefold. First, to review the progress we have made so far, highlighting past efforts that contain valuable lessons for the future. Second, drawing on these lessons, to identify promising current and future opportunities for AI in chemical engineering. To avoid getting caught up in the current excitement and to assess the prospects more carefully, it is important to take such a longer and broader view, as a "reality check." Third, since AI is going to play an increasingly dominant role in chemical engineering research and education, it is important to recount and record, however incomplete, certain early milestones for historical purposes. It is apparent that chemical engineering is at an important crossroads. Our discipline is undergoing an unprecedented transition—one that presents significant challenges and opportunities in modeling and automated decision-making. This has been driven by the convergence of cheap and powerful computing and communications platforms, tremendous progress in molecular engineering, the ever-increasing automation of globally integrated operations, tightening environmental constraints, and business demands for speedier delivery of goods and services to market. One important outcome from this convergence is the generation, use, and management of massive amounts of diverse data, information, and knowledge, and this is where AI, particularly ML, would play an important role. Some of these are application-focused, such as game playing and vision. Others are methodological, such as expert systems and ML—the two branches that are most directly and immediately applicable to our domain, and hence the focus of this article. These are the ones that have been investigated the most in the last 35 years by AI researchers in chemical engineering. While the current "buzz" is mostly around ML, the expert system framework holds important symbolic knowledge representation concepts and inference techniques that could prove useful in the years ahead as we strive to develop more comprehensive solutions that go beyond the purely data-centric emphasis of ML. Many tasks in these different branches of AI share certain common features. They all require pattern recognition, reasoning, and decision-making under complex conditions. And they often deal with ill-defined problems, noisy data, model uncertainties, combinatorially large search spaces, nonlinearities, and the need for speedy solutions. But such features are also found in many problems in process systems engineering (PSE)—in synthesis, design, control, scheduling, optimization, and risk management. So, some of us thought, in the early-1980s, that we should examine such problems from an AI perspective.15-17 Just as it is today, the excitement about AI at that time was centered on expert systems. It was palpable and contagious, with high expectations for AI's near-term potential.18-20 Hundreds of millions of dollars were invested in AI start-ups as well as within large companies. AI spurred the development of special purpose hardware, called Lisp machines (e.g., Symbolics Lisp machines). Promising proof-of-concept systems were demonstrated in many domains, including chemical engineering (see below). In this phase, it was expected that AI would have a significant impact in chemical engineering in the near future. However, unlike optimization and model predictive control, AI did not quite live up to its early promise. So, what happened? Why was not AI as impactful? Before addressing this question, it is necessary to examine the different phases of AI, as I classify them, in chemical engineering. While major efforts to developing AI methods for chemical engineering problems started in the early 1980s, it is remarkable that some researchers (for instance, Gary Powers, Dale Rudd, and Jeff Siirola) were investigating AI in PSE in the late 1960s and early 1970s.21 In particular, the Adaptive Initial DEsign Synthesizer system, developed by Siirola and Rudd22 for process synthesis, represents a significant development. This was arguably the first system that employed AI methods such as means-and-ends analysis, symbolic manipulation, and linked data structures in chemical engineering. Phase I, the Expert Systems Era (from the early 1980s through the mid-1990s), saw the first broad effort to exploit AI in chemical engineering. Expert systems, also called knowledge-based systems, rule-based systems, or production systems, are computer programs that mimic the problem-solving of humans with expertise in a given domain.23, 24 Expert problem-solving typically involves large amounts of specialized knowledge, called domain knowledge, often in the form of rules of thumb, called heuristics, typically learned and refined over years of problem-solving experience. The amount of knowledge manipulated is often vast, and the expert system rapidly narrows down the search by recognizing patterns and by using the appropriate heuristics. The architecture of these systems was inspired by the stimulus–response model of cognition from psychology and pattern-matching-and-search model of symbolic computation, which originated in Emil Post's work in symbolic logic. Building on this work, Simon and Newell in the late 1960s and 1970s devised the production system framework, an important conceptual, representational, and architectural breakthrough, for developing expert systems.25-27 The crucial insight here was that one needs to, and one can, separate domain knowledge from its order of execution, that is, from search or inference, thereby achieving the necessary computational flexibility to address ill-structured problems. In contrast, conventional programs consist of a set of statements whose order of execution is predetermined. Therefore, if the execution order is not known or cannot be anticipated a priori, as in the case of medical diagnosis, for example, this approach will not work. Expert systems programming alleviated this problem by making a clear distinction between the knowledge base and the search or inference strategy. This not only allowed for flexible execution, it also facilitated the incremental addition of knowledge, without distorting the overall program structure. This rule-based knowledge representation and architecture are intuitive, and relatively easy to understand and generate explanations about the system's decisions. This new approach facilitated the development of a number of impressive expert systems, starting with MYCIN, an expert system for diagnosing infectious diseases28 developed at Stanford University during 1972–82. This led to other successful systems such as PROSPECTOR (for mineral prospecting29), R1 (configuring Vax computers30), and so on, in this era. These systems inspired the first expert system application in chemical engineering, CONPHYDE, developed in 1983 by Bañares-Alcántara, Westerberg, and Rychner at Carnegie Mellon16 for predicting thermophysical properties of complex fluid mixtures. CONPHYDE was implemented using Knowledge Acquisition System that was used for PROSPECTOR. This was quickly followed by DECADE, in 1985, again from the same CMU researchers,17 for catalyst design. There was other such remarkable early work in process synthesis, design, modeling, and diagnosis as well. In synthesis and in design, for instance, important conceptual advances were made by Stephanopoulos and his students, starting with Design-Kit,31 and in modeling, MODELL.LA, a language for developing process models.32 In process fault diagnosis, Davis33 and Kramer,34, 35 and their groups, made important contributions in the same period. My group developed causal model-based diagnostic expert systems,36 a departure from the heuristics-based approach, which was the dominant theme of the time. We also demonstrated the potential of learning expert systems, an unusual idea at that time as automated learning in expert systems was not in vogue.37 The need for causal models in AI, a topic that has emerged as very important now,38 was also recognized in those early years.39 This period also saw expert system work commencing in Europe,40 particularly for conceptual design support. An important large-scale program in this era was the Abnormal Situation Management (ASM) consortium, funded at $17 million by the National Institute of Standards and Technology's Advanced Technology Program and by the leading oil companies, under the leadership of Honeywell.41 Three different academic groups, led by Davis (Ohio State), Vicente (University of Toronto), and myself at Purdue, were also involved in the consortium. This program is the forerunner to the current Clean Energy Smart Manufacturing Innovation Institute that was funded in 2016.42 The first course on AI in PSE was developed and taught at Columbia University in 1986, and it was subsequently offered at Purdue University for many years. The earlier offerings had an expert systems emphasis, but as ML advanced, in later years, the course evolved to include topics such as clustering, neural networks, statistical classifiers, graph-based models, and genetic algorithms. In 1986, Stephanopoulos published an article43 titled, "Artificial Intelligence in Process Engineering", in which he discussed the potential of AI in process engineering and outlined a research program to realize it. Coincidentally, in the same issue, I had a article with the same title, which described the Columbia course.44 In my article, I discussed topics from the course, and it mirrored what Stephanopoulos had outlined as the research program. (Curiously, we did not know each other at that time and had written our articles independently, yet with the same title, at the same time, with almost the same content, and had submitted to the same journal for the same issue!) The first AIChE session on AI was organized by Gary Powers (CMU) at the annual meeting held in Chicago in 1985. The first national meeting on AI in process engineering was held in 1987 at Columbia University, co-organized by Venkatasubramanian, Stephanopoulos, and Davis, sponsored by the National Science Foundation, American Association for Artificial Intelligence, and Air Products. The first international conference, Intelligent Systems in Process Engineering (ISPE'95), sponsored by the Computer Aids for Chemical Engineering (CACHE) Corporation, was co-organized by Stephanopoulos, Davis, and Venkatasubramanian, held at Snowmass, CO, in July 1995. The CACHE Corporation had also organized an Expert Systems Task Force in 1985, under the leadership of Stephanopoulos, to develop tools for the instruction of AI in chemical engineering.45 The task force published a series of monographs on AI in process engineering during 1989–1993. Despite impressive successes, the expert system approach did not quite take-off as it suffered from serious drawbacks. It took a lot of effort, time, and money to develop a credible expert system for industrial applications. Furthermore, it was also difficult and expensive to maintain and update the knowledge base as new information came in or the target application changed, such as in the retrofitting of a chemical plant. This approach did not scale well for practical applications (more on this in sections Lack of impact of AI during Phases I and II and Are things different now for AI to have impact?). As the excitement about expert systems waned in the 1990s due to these practical difficulties, interest in another AI technique was picking up greatly. This was the beginning of Phase II, the Neural Networks Era, roughly from 1990 onward. This was a crucial shift from the top-down design paradigm of expert systems to the bottom-up paradigm of neural nets that acquired knowledge automatically from large amounts of data, thus easing the maintenance and development of models. It all started with the reinvention of the backpropagation algorithm by Rumelhart, Hinton, and Williams in 1986 for training feedforward neural networks to learn hidden patterns in input–output data. It had been proposed earlier, in 1974, by Paul Werbos as part of his Ph.D. thesis at Harvard. It is essentially an algorithm for implementing gradient descent search, using the chain rule in calculus, to propagate errors back through the network to adjust the strength (i.e., weights) of connections between nodes iteratively, to make the network learn the patterns. While the idea of neural networks had been around since 1943 from the work of McCulloch and Pitts, and was further developed by Rosenblatt, Minsky, and Papert in the 1960s, these earlier models were limited in scope as they could not handle problems with nonlinearity. The key breakthrough this time was the ability to solve nonlinear function approximation and nonlinear classification problems in an automated manner using the backpropagation learning algorithm. The typical structure of a feedforward neural network from this era is shown in Figure 1, with its input, hidden, and output layers of neurons, and their associated signals, weights and biases. The figure also shows examples of nonlinear function approximation and nonlinear classification problems such networks were able to solve provided enough data were available.46 (a) Architecture of a feedforward neural network. (b) Examples of nonlinear function approximation and classification problems. Adapted from: https://medium.com/@curiousily/tensorflow-for-hackers-part-iv-neural-network-from-scratch-1a4f504dfa8 https://neustan.wordpress.com/2015/09/05/neural-networks-vs-svm-where-when-and-above-all-why/ http://mccormickml.com/2015/08/26/rbfn-tutorial-part-ii-function-approximation/ This novel automated nonlinear modeling ability spurred a tremendous amount of work in a variety of domains including chemical engineering.47 Researchers made substantial progress on addressing challenging problems in modeling,48, 49 fault diagnosis,50-55 control,56, 57 and product design.58 In particular, the recognition of the connection between the autoencoder architecture and the nonlinear principal component analysis by Kramer,48 and the recognition of the nature of the basis function approximation of neural networks through the WaveNet architecture by Bakshi and Stephanopoulos49 are outstanding contributions. There were hundreds of articles in our domain during this phase and only some of the earliest and key articles are highlighted here. While this phase was largely driven by neural networks, researchers also made progress on expert systems (such as the ASM consortium) and genetic algorithms at that time. For instance, we proposed59 directed evolution of engineering polymers in silico using genetic algorithms. This led in subsequent years60 to the multiscale model-based informatics framework called Discovery Informatics61 for materials design. The discovery informatics framework led to the successful development of materials design systems using directed evolution in several industrial applications, such as gasoline additives,62 formulated rubbers,63 and catalyst design.64 During this period, researchers were also beginning to realize the challenges and opportunities in multiscale modeling using informatics techniques.65, 66 Other important advances not using neural networks included research into frameworks and architectures for building AI systems, such as blackboard architectures, integrated problem-solving-and-learning systems, and cognitive architectures. Architectures such as Prodigy and Soar are examples of this work.67 Similarly, there was progress in process synthesis and in design,68 domain-specific representations and languages,32, 69 domain-specific compilers,70 ontologies,71, 72 modeling environments,32, 73 molecular structure search engines,74 automatic reaction network generators,64 and chemical entities extraction systems.74 These references by no means constitute a comprehensive list. All this work, and others along similar lines, performed some two decades ago is still relevant and useful today in the modern era of data science. Building such systems using modern tools presents major opportunities. Despite the surprising success of neural networks in many practical applications, some especially challenging problems in vision, natural language processing, and speech understanding remained beyond the capabilities of the neural nets of this era. Researchers that one would need neural nets with many more hidden not but training these to be So, the was more or for about a or so a breakthrough for training neural thus the current phase which we in Phases of AI in Chemical In of all this effort over two AI was not as in chemical engineering as we had In it is clear this was the First, the problems we were are challenging even Second, we were the powerful and programming to address such challenging problems. Third, we were limited by data. that was was very There were of challenges in Phases I and and While we made progress on the conceptual such as knowledge representation and inference for problems in synthesis, design, diagnosis, and we could not the challenges and involved in practical applications. In there was no as it there was no in the that the in process engineering, in that period, could be more by optimization and by as algorithms and over the years, these well on problems for which we could and solve models. the problems for which such models are difficult to (e.g., diagnosis, analysis, and materials or almost to generate (e.g., speech which computational and data, of which were not during this period. This of practical success led to two one at the of the Expert Systems era and the other at the of the Neural Networks for AI research in computer and in the application This progress even In it typically to take about years for a to and have from discovery to For instance, for the such as to about market it took about years from the time computer of chemical was first proposed in the similar in optimization as for programming and nonlinear programming and for In during Phase I and II, AI as a was only about years It was early to This analysis that one could impact around While predicting and impact is an this given the current of AI. As it for those of us who started on AI in the early-1980s, we were early as as impact is but it was challenging and to these problems. Many of the such as developing AI methods and causal model-based AI systems, are still as I The progress of AI over the last or so has been very and the are largely have been and are also as have started to and from systems, such as Alexa, and more and for a variety of are beginning to and to work It is and to make the In 1985, arguably the most powerful computer was the computational was and it of The million machine million in was and a to it. So, what would it would the In fact, the is more powerful the The at of it is a on the There have been advances in the of algorithms and in programming such as and are the we had to program in Lisp for to what now be in a with a of We have also great progress in The other development is the of tremendous amounts of data, in many domains, which made the stunning advances in ML (more on this below). All this is for this to without for the last years, its expected making these stunning advances As a the is here. The I is also here of the that could be using optimization and have largely been for further for further one go up the and that means going challenging decision-making problems that require solutions. So, now we have a back some years from would that there were early milestones in AI. One is Gary in in the in and the is the surprising by in The AI advances that made these are now poised to have an impact that beyond game In my view, we Phase around the era of Science or This new phase was made by important or neural nets and statistical ML. These are the that are the AI success in game playing, natural language processing, robotics, and vision. neural nets of the 1990s, which typically had only one hidden of neurons, neural nets have hidden as shown in Figure an architecture has the potential to features for complex pattern However, such networks were to using the backpropagation or gradient descent algorithm. The breakthrough came in by using a training with considerable in processing in the form of processing In a called in the training of the neural made such extraction is a in the domain of processing, for features from a noisy the of the network architecture and the such as the and number of a during from a very large data this is a crucial appropriate that to a successful by the neural network from architectural was the neural feedforward neural network has no of and the only it is the current it has been This is not appropriate for problems which have information, such as time series data, where what typically on what has For instance, to the in a one needs to know which came it. networks address such problems by as their not the current example, but also what they have the output on what has the network as if it has This was further by another architectural called the The typical of a an an output and a The over time and the the of information into and of the networks are well for making on time series data, since there be of between important in a time While the key advances here are in the architecture and training of large-scale neural networks, the important be of as a for learning a of to a such as an It is a learning in which an the by its on the it in to its with the is the one to a a where one the with a if it the and it if it this is many one is essentially the patterns to the it the This learning is essentially programming in modern ML For this approach to work well for complex problems, such as the game of one needs millions of millions of to learn the game from of worth of expertise and during a period of a As stunning as this is, one that the game playing domain has the that it almost training data over training with a great deal of This is typically not the case in and engineering, where one is even in this era. But this might be the of the data is a computer as in some materials applications. For the of it is important to that learning from the other dominant learning and In the system the between and output given a set of input–output the other in only a set of is given with no (i.e., no The system is to the in the data on its hence One could that learning for