Detecting, analyzing, and predicting power outages is crucial for grid risk assessment and disaster mitigation. Numerous outages occur each year, exacerbated by extreme weather events such as hurricanes. Existing outage data are typically reported at the county level, limiting their spatial resolution and making it difficult to capture localized patterns. However, it offers excellent temporal granularity. In contrast, nighttime light satellite image data provides significantly higher spatial resolution and enables a more comprehensive spatial depiction of outages, enhancing the accuracy of assessing the geographic extent and severity of power loss after disaster events. However, these satellite data are only available on a daily basis. Integrating spatiotemporal visual and time-series data sources into a unified knowledge representation can substantially improve power outage detection, analysis, and predictive reasoning. In this paper, we propose GeoOutageKG, a multimodal knowledge graph that integrates diverse data sources, including nighttime light satellite image data, high-resolution spatiotemporal power outage maps, and county-level timeseries outage reports in the U.S. We des
We develop LENORI, a Large Event Number of Outages Resilience Index measuring distribution system resilience with the number of forced line outages observed in large extreme events. LENORI is calculated from standard utility outage data. The statistical accuracy of LENORI is ensured by taking the logarithm of the outage data. A related Average Large Event Number of Outages metric ALENO is also developed, and both metrics are applied to a distribution system to quantify the power grid strength relative to the extreme events stressing the grid. The metrics can be used to track resilience and quantify the contributions of various types of hazards to the overall resilience.
Orthogonal time frequency space (OTFS) modulation is widely acknowledged as a prospective waveform for future wireless communication networks.To provide insights for the practical system design, this paper analyzes the outage probability of OTFS modulation with finite blocklength.To begin with, we present the system model and formulate the analysis of outage probability for OTFS with finite blocklength as an equivalent problem of calculating the outage probability with finite blocklength over parallel additive white Gaussian noise (AWGN) channels.Subsequently, we apply the equivalent noise approach to derive a lower bound on the outage probability of OTFS with finite blocklength under both average power allocation and water-filling power allocation strategies, respectively.Finally, the lower bounds of the outage probability are determined using the Monte-Carlo method for the two power allocation strategies.The impact of the number of resolvable paths and coding rates on the outage probability is analyzed, and the simulation results are compared with the theoretical lower bounds.
Despite significant anecdotal evidence regarding the vulnerability of the U.S. power infrastructure, there is a dearth of longitudinal and nation-level characterization of the spatial and temporal patterns in the frequency and extent of power outages. A data-driven national-level characterization of power outage vulnerability is particularly essential for understanding the urgency and formulating policies to promote the resilience of power infrastructure systems. Recognizing this, we retrieved 179,053,397 county-level power outage records with a 15-minute interval across 3,022 US counties during 2014-2023 to capture power outage characteristics. We focus on three dimensions--power outage intensity, frequency, and duration--and develop multiple metrics to quantify each dimension of power outage vulnerability. The results show that in the past ten years, the vulnerability of U.S. power system has consistently been increasing. Counties experienced an average of 999.4 outages over the decade, affecting an average of more than 540,000 customers per county, with disruptions occurring approximately every week. Coastal areas, particularly in California, Florida and New Jersey, faced more f
Thunderstorm-driven power outages are difficult to predict because most storms do not cause damage, convective processes occur rapidly and chaotically, and the available public data are noisy and incomplete. Severe convective storms now account for a large and rising share of U.S. weather losses, yet thunderstorm-induced outages remain understudied. We develop a 48-hour early-warning model for summer thunderstorm-related outages in Michigan using only open-source outage (EAGLE-I) and weather (METAR) data. Relative to prior work, we (i) rely solely on public data, (ii) preserve convective extremes from a sparse station network via parameter-specific kriging and causal spatiotemporal features, and (iii) use a multi-level LSTM-based architecture evaluated on event-centric peak metrics. The pipeline builds rolling and k-NN inverse-distance aggregates to capture moisture advection, wind shifts, and pressure drops. A two-stage design uses a logistic gate followed by a long short-term memory (LSTM) regressor to filter routine periods and limit noise exposure. Evaluation focuses on state-level peaks of at least 50,000 customers without power, using hits, misses, false alarms, and peak-cond
Cloud services are omnipresent and critical cloud service failure is a fact of life. In order to retain customers and prevent revenue loss, it is important to provide high reliability guarantees for these services. One way to do this is by predicting outages in advance, which can help in reducing the severity as well as time to recovery. It is difficult to forecast critical failures due to the rarity of these events. Moreover, critical failures are ill-defined in terms of observable data. Our proposed method, Outage-Watch, defines critical service outages as deteriorations in the Quality of Service (QoS) captured by a set of metrics. Outage-Watch detects such outages in advance by using current system state to predict whether the QoS metrics will cross a threshold and initiate an extreme event. A mixture of Gaussian is used to model the distribution of the QoS metrics for flexibility and an extreme event regularizer helps in improving learning in tail of the distribution. An outage is predicted if the probability of any one of the QoS metrics crossing threshold changes significantly. Our evaluation on a real-world SaaS company dataset shows that Outage-Watch significantly outperfor
Transmission line outage rates are fundamental to power system reliability analysis. Line outages are infrequent, occurring only about once a year, so outage data are limited. We propose a Bayesian hierarchical model that leverages line dependencies to better estimate outage rates of individual transmission lines from limited outage data. The Bayesian estimates have a lower standard deviation than estimating the outage rates simply by dividing the number of outages by the number of years of data, especially when the number of outages is small. The Bayesian model produces more accurate individual line outage rates, as well as estimates of the uncertainty of these rates. Better estimates of line outage rates can improve system risk assessment, outage prediction, and maintenance scheduling.
In recent years, increasingly unpredictable and severe global weather patterns have frequently caused long-lasting power outages. Building resilience, the ability to withstand, adapt to, and recover from major disruptions, has become crucial for the power industry. To enable rapid recovery, accurately predicting future outage numbers is essential. Rather than relying on simple point estimates, we analyze extensive quarter-hourly outage data and develop a graph conformal prediction method that delivers accurate prediction regions for outage numbers across the states for a time period. We demonstrate the effectiveness of this method through extensive numerical experiments in several states affected by extreme weather events that led to widespread outages.
Visible Light Communication (VLC) is a promising solution to address the growing demand for wireless data, leveraging the widespread use of light-emitting diodes (LEDs) as transmitters. However, its deployment is challenged by link blockages that cause connectivity outages. Optical reconfigurable intelligent surfaces (ORISs) have recently emerged as a solution to mitigate these disruptions. This work considers a multi-user VLC system and investigates the optimal association of ORISs to LEDs and users to minimize the outage probability while limiting the number of ORISs used. Numerical results from our proposed optimization algorithm demonstrate that using ORISs can reduce the outage probability by up to 85% compared to a no-ORIS scenario.
Cloud systems have become increasingly popular in recent years due to their flexibility and scalability. Each time cloud computing applications and services hosted on the cloud are affected by a cloud outage, users can experience slow response times, connection issues or total service disruption, resulting in a significant negative business impact. Outages are usually comprised of several concurring events/source causes, and therefore understanding the context of outages is a very challenging yet crucial first step toward mitigating and resolving outages. In current practice, on-call engineers with in-depth domain knowledge, have to manually assess and summarize outages when they happen, which is time-consuming and labor-intensive. In this paper, we first present a large-scale empirical study investigating the way on-call engineers currently deal with cloud outages at Microsoft, and then present and empirically validate a novel approach (dubbed Oasis) to help the engineers in this task. Oasis is able to automatically assess the impact scope of outages as well as to produce human-readable summarization. Specifically, Oasis first assesses the impact scope of an outage by aggregating
Visible light communication (VLC) is a technology that complements radio frequency (RF) to fulfill the ever-increasing demand for wireless data traffic. The ubiquity of light-emitting diodes (LEDs), exploited as transmitters, increases the VLC market penetration and positions it as one of the most promising technologies to alleviate the spectrum scarcity of RF. However, VLC deployment is hindered by blockage causing connectivity outages in the presence of obstacles. Recently, optical reconfigurable intelligent surfaces (ORISs) have been considered to mitigate this problem. While prior works exploit ORISs for data or secrecy rate maximization, this paper studies the optimal placement of mirrors and ORISs, and the LED power allocation, for jointly minimizing the outage probability while keeping the lighting standards. We describe an optimal outage minimization framework and present solvable heuristics. We provide extensive numerical results and show that the use of ORISs may reduce the outage probability by up to 67% with respect to a no-mirror scenario and provide a gain of hundreds of kbit/J in optical energy efficiency with respect to the presented benchmark.
Measuring Internet outages is important to allow ISPs to improve their services, users to choose providers by reliability, and governments to understand the reliability of their infrastructure. Today's active outage detection provides good accuracy with tight temporal and spatial precision (around 10 minutes and IPv4 /24 blocks), but cannot see behind firewalls or into IPv6. Systems using passive methods can see behind firewalls, but usually, relax spatial or temporal precision, reporting on whole countries or ASes at 5 minute precision, or /24 IPv4 blocks with 25 minute precision. We propose Durbin, a new approach to passive outage detection that adapts spatial and temporal precision to each network they study, thus providing good accuracy and wide coverage with the best possible spatial and temporal precision. Durbin observes data from Internet services or network telescopes. Durbin studies /24 blocks to provide fine spatial precision, and we show it provides good accuracy even for short outages (5 minutes) in 600k blocks with frequent data sources. To retain accuracy for the 400k blocks with less activity, Durbin uses a coarser temporal precision of 25 minutes. Including short o
Phasor measurement units (PMUs) create ample real-time monitoring opportunities for modern power systems. Among them, line outage detection and identification remains a crucial but challenging task. Current works on outage identification succeed in full PMU deployment and single-line outages. Performance however degrades for multiple-line outage with partial system observability. We propose a novel framework of multiple-line outage identification using partial nodal voltage measurements. Using alternating current (AC) power flow model, phase angle signatures of outages are extracted and used to group lines into minimal diagnosable clusters. Identification is then formulated into an underdetermined sparse regression problem solved by lasso. Tested on IEEE 39-bus system with 25% and 50% PMU coverage, the proposed identification method is 93% and 80% accurate for single- and double-line outages. Our study suggests that the AC power flow is better at capturing outage patterns and sacrificing some precision could yield substantial improvement in identification accuracy. These findings could contribute to the development of future control schemes that help power systems resist and recove
This paper analyzes the outage probability of orthogonal time frequency space (OTFS) modulation under a lossy communication scenario. First of all, we introduce the channel model and the vector form representation of OTFS this paper uses. Then, we derive an exact expression of the OTFS outage probability in lossy communication scenarios, using Shannon's lossy source-channel separation theorem. Because the channel is time-varying, calculating the exact outage probability is computationally expensive. Therefore, this paper aims to derive a lower bound of the outage probability, which can relatively easily be calculated. Thus, given the distortion requirement and number of the resolvable paths, we can obtain a performance limit under the optimal condition as a reference. Finally, the experimental results of outage probability are obtained by Monte-Carlo method, and compared with the theoretical results calculated by the closed-from expression of the lower bound.
Line outage identification in distribution grids is essential for sustainable grid operation. In this work, we propose a practical yet robust detection approach that utilizes only readily available voltage magnitudes, eliminating the need for costly phase angles or power flow data. Given the sensor data, many existing detection methods based on change-point detection require prior knowledge of outage patterns, which are unknown for real-world outage scenarios. To remove this impractical requirement, we propose a data-driven method to learn the parameters of the post-outage distribution through gradient descent. However, directly using gradient descent presents feasibility issues. To address this, we modify our approach by adding a Bregman divergence constraint to control the trajectory of the parameter updates, which eliminates the feasibility problems. As timely operation is the key nowadays, we prove that the optimal parameters can be learned with convergence guarantees via leveraging the statistical and physical properties of voltage data. We evaluate our approach using many representative distribution grids and real load profiles with 17 outage configurations. The results show
This paper proposes a general analytical approach to derive the outage probability of hybrid automatic repeat request with incremental redundancy (HARQ-IR) over correlated fading channels in closed-form. Unlike prior analyses, the consideration of channel correlation is one of the key reasons making the outage analysis involved. Using conditional Mellin transform, the outage probability of HARQ-IR over correlated Rayleigh fading channels is exactly expressed as a mixture of outage probabilities of HARQ-IR over independent Nakagami fading channels, where the weights are negative multinomial probabilities. Its straightforward application is to conduct asymptotic outage analysis to gain more insights, in which the asymptotic outage probability is obtained in a concise form. The asymptotic outage probability possesses some special properties which ease optimal power allocation and rate selection of HARQ-IR. Finally, numerical results are presented for validations and discussions.
Understanding the exact fault location in the post-event analysis is the key to improving the accuracy of outage management. Unfortunately, the fault location is not generally well documented during the restoration process, creating a big challenge for post-event analysis. By utilizing various data source systems, including outage management system (OMS) data, asset geospatial information system (GIS) data, and vehicle location data, this paper creates a novel method to pinpoint the outage location accurately to create additional insights for distribution operations and performance teams during the post-event analysis.
Expansion planning problems refer to the monetary and unit investment needed for energy production or storage. An inherent element in these problems is the element of stochasticity in various aspects, such as the generation output of the units, climate change or frequency and duration of grid outages. Especially for the latter one, outage modeling is crucial to be carefully considered when designing systems with distributed generation at their core, such as microgrids. In most studies so far, a single statistical distribution is used, such as a Poisson Process. However, by taking a closer look at the real outage data provided by the state of NY, it is observed that the outages do not seem to come from the same distribution. In some years, there is a huge spike in the average duration per outage and this is because of catastrophic events. Therefore, in this study we propose and test an alternative modeling for outage events. This alternative scheme will be based on the premise that outages can be broadly classified into two categories: regular and severe. Under this taxonomy, it can still be assumed that each type of events follows a Poisson Process but outages, in general, follow a
Outage probability and capacity of a class of block-fading MIMO channels are considered with partial channel distribution information. Specifically, the channel or its distribution are not known but the latter is known to belong to a class of distributions where each member is within a certain distance (uncertainty) from a nominal distribution. Relative entropy is used as a measure of distance between distributions. Compound outage probability defined as min (over the transmit signal distribution) -max (over the channel distribution class) outage probability is introduced and investigated. This generalizes the standard outage probability to the case of partial channel distribution information. Compound outage probability characterization (via one-dimensional convex optimization), its properties and approximations are given. It is shown to have two-regime behavior: when the nominal outage probability decreases (e.g. by increasing the SNR), the compound outage first decreases linearly down to a certain threshold (related to relative entropy distance) and then only logarithmically (i.e. very slowly), so that no significant further decrease is possible. The compound outage depends on t
In this paper, the outage probability and outage-based beam design for multiple-input multiple-output (MIMO) interference channels are considered. First, closed-form expressions for the outage probability in MIMO interference channels are derived under the assumption of Gaussian-distributed channel state information (CSI) error, and the asymptotic behavior of the outage probability as a function of several system parameters is examined by using the Chernoff bound. It is shown that the outage probability decreases exponentially with respect to the quality of CSI measured by the inverse of the mean square error of CSI. Second, based on the derived outage probability expressions, an iterative beam design algorithm for maximizing the sum outage rate is proposed. Numerical results show that the proposed beam design algorithm yields better sum outage rate performance than conventional algorithms such as interference alignment developed under the assumption of perfect CSI.