<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing with OASIS Tables v3.0 20080202//EN" "journalpub-oasis3.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:oasis="http://docs.oasis-open.org/ns/oasis-exchange/table" xml:lang="en" dtd-version="3.0" article-type="research-article"><?xmltex \makeatother\@nolinetrue\makeatletter?>
  <front>
    <journal-meta><journal-id journal-id-type="publisher">NHESS</journal-id><journal-title-group>
    <journal-title>Natural Hazards and Earth System Sciences</journal-title>
    <abbrev-journal-title abbrev-type="publisher">NHESS</abbrev-journal-title><abbrev-journal-title abbrev-type="nlm-ta">Nat. Hazards Earth Syst. Sci.</abbrev-journal-title>
  </journal-title-group><issn pub-type="epub">1684-9981</issn><publisher>
    <publisher-name>Copernicus Publications</publisher-name>
    <publisher-loc>Göttingen, Germany</publisher-loc>
  </publisher></journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.5194/nhess-23-1665-2023</article-id><title-group><article-title>Probabilistic and machine learning methods for uncertainty quantification in power outage prediction due to extreme events</article-title><alt-title>Probabilistic and machine learning methods</alt-title>
      </title-group><?xmltex \runningtitle{Probabilistic and machine learning methods}?><?xmltex \runningauthor{P. Arora and L. Ceferino}?>
      <contrib-group>
        <contrib contrib-type="author" corresp="yes" rid="aff1">
          <name><surname>Arora</surname><given-names>Prateek</given-names></name>
          <email>prateek40.a@gmail.com</email>
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1 aff2">
          <name><surname>Ceferino</surname><given-names>Luis</given-names></name>
          
        </contrib>
        <aff id="aff1"><label>1</label><institution>Civil and Urban Engineering Department, New York University, Brooklyn, NY 11201, USA</institution>
        </aff>
        <aff id="aff2"><label>2</label><institution>Center for Urban Science and Progress, New York University, Brooklyn, NY 11201, USA</institution>
        </aff>
      </contrib-group>
      <author-notes><corresp id="corr1">Prateek Arora (prateek40.a@gmail.com)</corresp></author-notes><pub-date><day>3</day><month>May</month><year>2023</year></pub-date>
      
      <volume>23</volume>
      <issue>5</issue>
      <fpage>1665</fpage><lpage>1683</lpage>
      <history>
        <date date-type="received"><day>22</day><month>September</month><year>2022</year></date>
           <date date-type="rev-request"><day>10</day><month>October</month><year>2022</year></date>
           <date date-type="rev-recd"><day>27</day><month>March</month><year>2023</year></date>
           <date date-type="accepted"><day>31</day><month>March</month><year>2023</year></date>
      </history>
      <permissions>
        <copyright-statement>Copyright: © 2023 Prateek Arora</copyright-statement>
        <copyright-year>2023</copyright-year>
      <license license-type="open-access"><license-p>This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this licence, visit <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</ext-link></license-p></license></permissions><self-uri xlink:href="https://nhess.copernicus.org/articles/23/1665/2023/nhess-23-1665-2023.html">This article is available from https://nhess.copernicus.org/articles/23/1665/2023/nhess-23-1665-2023.html</self-uri><self-uri xlink:href="https://nhess.copernicus.org/articles/23/1665/2023/nhess-23-1665-2023.pdf">The full text article is available as a PDF file from https://nhess.copernicus.org/articles/23/1665/2023/nhess-23-1665-2023.pdf</self-uri>
      <abstract><title>Abstract</title>

      <p id="d1e96">Strong hurricane winds damage power grids and cause cascading power failures. Statistical and machine learning models have been proposed to predict the extent of power disruptions due to hurricanes.
Existing outage models use inputs including power system information, environmental parameters, and demographic parameters. This paper reviews the existing power outage models, highlighting their strengths and limitations. Existing models were developed and validated with data from a few utility companies and regions, limiting the extent of their applicability
across geographies and hurricane events.
Instead, we train and validate these existing outage models using power outages from multiple regions and hurricanes, including hurricanes Harvey (2017), Michael (2018), and Isaias (2020), in 1910 US cities. The dataset includes outages from 39 utility companies in Texas, 5 in Florida,  5 in New Jersey, and 11 in New York. We discuss the limited ability of state-of-the-art machine learning models to (1) make bounded outage predictions, (2) extrapolate predictions to high winds, and (3) account for physics-informed outage uncertainties at low and high winds.
For example, we observe that existing models can predict outages higher than the number of customers (in 19.8 % of cities with an average overprediction ratio of 5.2) and cannot capture well the outage variance for high winds, especially above  70 m s<inline-formula><mml:math id="M1" display="inline"><mml:msup><mml:mi/><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula>. Our findings suggest that further developments are needed for power outage models for proper representation of hurricane-induced outages.</p>
  </abstract>
    </article-meta>
  </front>
<body>
      

<sec id="Ch1.S1" sec-type="intro">
  <label>1</label><title>Introduction</title>
      <p id="d1e120">Hurricanes can cause significant damage to the power distribution systems, resulting in large power failures and losses of billions of US dollars <xref ref-type="bibr" rid="bib1.bibx59" id="paren.1"/>. Strong winds from hurricanes can destroy the exposed overhead distribution lines in a power grid and cause cascading power failures.
For example, Hurricane Isaias (2020) damaged old power infrastructure and caused more than 2 million power outages across the US. More than a million outages occurred in New Jersey (<uri>https://www.nytimes.com/2020/08/04/nyregion/isaias-ny.html</uri>, last access: 21 September 2022) even though Hurricane Isaias had transitioned to a tropical storm when it hit New Jersey, reducing its sustained winds to 25 m s<inline-formula><mml:math id="M2" display="inline"><mml:msup><mml:mi/><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula> <xref ref-type="bibr" rid="bib1.bibx37" id="paren.2"/>.
To address this issue, the US Department of Energy (DOE) has prioritized investing in enhancing power infrastructure resilience <xref ref-type="bibr" rid="bib1.bibx49" id="paren.3"/>.  The Senate of the US passed the Grid Research Security Research and Development Act (2020) with a budget of 573 million US dollars to be spent from 2020–2025 to improve grid security to withstand shocks and rapidly recover from disruptions
<xref ref-type="bibr" rid="bib1.bibx14" id="paren.4"/>.</p>
      <p id="d1e151">Hurricane-induced power interruptions can cause billions of dollars in losses and long-lasting impacts on vulnerable communities. The power outages caused by storms can last for several hours to weeks and even months (<ext-link xlink:href="https://www.reuters.com/business/environment/over-397000-still-without-power-florida-after-hurricane-ian-2022-10-04/">https://www.reuters.com/business/environment/</ext-link>, last access: 16 February, 2023). Large-scale power blackouts show the vulnerability of the power grid to hurricanes, e.g., (1) 8.1 million homes lost power during Superstorm Sandy (2012) <xref ref-type="bibr" rid="bib1.bibx58" id="paren.5"/>, (2) 1.7 million consumers in the southeast United States lost power in the aftermath of Hurricane Michael (2018) <xref ref-type="bibr" rid="bib1.bibx17" id="paren.6"/>, and (3) Hurricane Ida (2021) was responsible for 1.2 million electrical outages <xref ref-type="bibr" rid="bib1.bibx2" id="paren.7"/>. Critical infrastructure<?pagebreak page1666?> systems such as hospitals and fire departments are especially vulnerable since they need to have the power restored within a few hours after a power outage to respond to the disaster <xref ref-type="bibr" rid="bib1.bibx10 bib1.bibx11" id="paren.8"/>.</p>
      <p id="d1e169">Utilities must first assess the vulnerabilities in their power system infrastructure to enhance their resilience to hurricanes.
Researchers have developed machine learning models to
help utilities evaluate their vulnerabilities to predicting the extent of power
outages from hurricanes.  These outage models use inputs including hurricane winds, power systems, environmental information, and demographic information.  Outage prediction models can assist utilities in planning and placing their resources before and during an extreme event for an emergency response to rapidly recover the failed power distribution systems <xref ref-type="bibr" rid="bib1.bibx3" id="paren.9"/>. These models can also inform about the existing vulnerabilities, so utilities can also plan for grid hardening before a hurricane damages the power grid <xref ref-type="bibr" rid="bib1.bibx50" id="paren.10"/>.</p>
      <p id="d1e178"><xref ref-type="bibr" rid="bib1.bibx39" id="text.11"/> developed a negative binomial generalized linear model (GLM) to predict the power outages in North and South Carolina. This model used hurricane parameters such as maximum wind speeds and duration of wind speeds over  20 m s<inline-formula><mml:math id="M3" display="inline"><mml:msup><mml:mi/><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula>; environmental parameters including land cover, tree type, soil drainage properties, and precipitation; and utility information on the number of transformers and customers. The model included only a specific utility, which limited the use of the outage model in other regions. The model also included a storm indicator, making the model not applicable to hurricanes with different characteristics than the ones in the training data. <xref ref-type="bibr" rid="bib1.bibx40" id="text.12"/> also presented an accelerated failure time model to estimate the power outage duration. Next, <xref ref-type="bibr" rid="bib1.bibx41" id="text.13"/> investigated the spatial correlation of power outages through spatial generalized linear mixed models (GLMMs) but did not observe any significant improvements in power outage prediction.</p>
      <p id="d1e202"><xref ref-type="bibr" rid="bib1.bibx28" id="text.14"/> also developed a negative binomial GLM to predict outages in the Gulf Coast based on extensive information on hurricane parameters, additional environmental indicators (e.g., precipitation, soil moisture, tree type, land cover), and system information (e.g., number of poles, number of transformers). This model did not include any specific utility and storm indicators and instead used only generalizable features (e.g., wind speed, precipitation) to make the model applicable to any hurricanes. <xref ref-type="bibr" rid="bib1.bibx27" id="text.15"/> also developed  generalized additive models (GAMs) with the same input features as GLMs.
GAMs showed an improved accuracy over GLMs in power outage predictions because GAMs can effectively model the highly non-linear behavior of outages and the input parameters; e.g., precipitation and soil moisture can have non-linear effects on power outages <xref ref-type="bibr" rid="bib1.bibx27" id="paren.16"/>.</p>
      <p id="d1e213"><xref ref-type="bibr" rid="bib1.bibx23" id="text.17"/> and <xref ref-type="bibr" rid="bib1.bibx54" id="text.18"/> used decision tree models, classification and regression trees (CART), and Bayesian additive decision trees (BART), with additional topological and soil parameters, to better capture the variability of power outages. Decision trees provide a flexible way to represent the non-linear relation between input parameters and outages. More recently, researchers developed decision trees-based machine learning methods, which are robust to outliers and noise, called random forest <xref ref-type="bibr" rid="bib1.bibx5" id="paren.19"/>, to predict power outages caused by storms. Random forest regression is an extension of decision tree methods for regression. A series of parallel decision trees are fit in the random forest regression method to capture non-linearity and achieve high predictive accuracy. <xref ref-type="bibr" rid="bib1.bibx48" id="text.20"/> calibrated a random forest model to outage data from the Gulf Coast. <xref ref-type="bibr" rid="bib1.bibx48" id="text.21"/> used six input parameters to capture the damaging effects of trees on power lines. These six input parameters included 3 s gust wind speed, duration of strong winds, soil moisture at different depths, the number of customers served, and tree-trimming practices used to predict outages.
Later, <xref ref-type="bibr" rid="bib1.bibx24" id="text.22"/> used only publicly available data to develop a hurricane outage prediction model, independent of utility-specific input parameters, with random forest regression using 3 s gust wind speed, strong winds, and the number of customers served.</p>
      <p id="d1e233"><xref ref-type="bibr" rid="bib1.bibx42" id="text.23"/> improved accuracy in power outage predictions with random forest models by including information on tree species. <xref ref-type="bibr" rid="bib1.bibx62" id="text.24"/> used quantile regression forests <xref ref-type="bibr" rid="bib1.bibx38" id="paren.25"/> to predict power outages at different confidence intervals. <xref ref-type="bibr" rid="bib1.bibx24" id="text.26"/> and <xref ref-type="bibr" rid="bib1.bibx43" id="text.27"/> developed a two-stage zero-inflated power outage prediction model to better account for zero outages. The first stage of such a model is classification to predict outages or no outages. The second stage is the random forest regression to predict the count of outages on the point classified as having an outage. <xref ref-type="bibr" rid="bib1.bibx66" id="text.28"/> used a random forest model with lidar-derived tree height data to predict power outages. <xref ref-type="bibr" rid="bib1.bibx57" id="text.29"/> developed a three-stage power outage prediction model to improve the accuracy of power outage predictions further. The first stage of the model is a binary classification to predict the location of outages; the second intermediate stage is the clustering of outage locations into a low, moderate, and large number of outages to address high right-skewness of non-zero outage data points; and the third stage is the prediction of the number of outages.</p>
      <p id="d1e257">Previously, researchers have used more complex power outage prediction models, namely neural networks, kernel methods such as support vector machines, and other tree-ensemble methods, such as AdaBoost, which can model non-linear relationships between input parameters and outages <xref ref-type="bibr" rid="bib1.bibx73" id="paren.30"/>. <xref ref-type="bibr" rid="bib1.bibx34" id="text.31"/> employed AdaBoost to predict weather-related power outages. <xref ref-type="bibr" rid="bib1.bibx34" id="text.32"/> trained a separate model for each city for daily use, and they did not cover extreme weather outages. <xref ref-type="bibr" rid="bib1.bibx20" id="text.33"/> and  <xref ref-type="bibr" rid="bib1.bibx19" id="text.34"/> used power grid component-level data with support vector machines. <xref ref-type="bibr" rid="bib1.bibx56" id="text.35"/> ranked the power grid components (feeder failures, cables, joints, terminators, and<?pagebreak page1667?> transformers) based on their vulnerability to extreme weather events. <xref ref-type="bibr" rid="bib1.bibx29" id="text.36"/> used a neural network to predict the failure of the power grid components for pre-storm planning. Such models will require specialized high-resolution power grid component-level data for each city, which are not accessible given the data protocols of utility companies. <xref ref-type="bibr" rid="bib1.bibx61" id="text.37"/> used Twitter (<uri>https://twitter.com</uri>; last access: 13 January 2023) data to predict real-time outages. <xref ref-type="bibr" rid="bib1.bibx33" id="text.38"/> used repair log data employing natural processing with a recurrent neural network to predict real-time outage durations. However, tweets (<uri>https://twitter.com</uri>; last access: 13 January 2023) and repair logs were available after the hurricane made an impact on the city. Thus, leveraging repair logs is not possible to predict outages for pre-event planning ahead of a storm. Hence, data availability limits the applicability of these methods at a large scale for power outage predictions from extreme events.</p>
      <p id="d1e294">GLM <xref ref-type="bibr" rid="bib1.bibx39" id="paren.39"/>, GAM <xref ref-type="bibr" rid="bib1.bibx27" id="paren.40"/>, and random-forest-based power outage prediction models <xref ref-type="bibr" rid="bib1.bibx24 bib1.bibx43 bib1.bibx57" id="paren.41"/> provide outage predictions at a coarser level compared to predictions at component. However, these models are mostly based on open-source, publicly available data and can be generalized at a larger scale to the coastal cities in the United States. Hurricane-caused outages are mostly at the transmission level, which is responsible for city-wide outages <xref ref-type="bibr" rid="bib1.bibx6" id="paren.42"/> (<uri>https://poweroutage.us/faq</uri>, last access: 13 January 2023) rather than the customer meter level. So, predicting city-wide outages can still guide utilities to arrange for crews and emergency backup power ahead of a storm. Hence, for this paper, we focus on GLM, GAM, and random-forest-based power outage prediction models.
As the power outage data are generally not made publicly available by the utilities, the previous models are primarily calibrated to data from a few regions. For example, <xref ref-type="bibr" rid="bib1.bibx39 bib1.bibx40 bib1.bibx41" id="text.43"/> developed the outage prediction model for North and South Carolina. <xref ref-type="bibr" rid="bib1.bibx24" id="text.44"/>, <xref ref-type="bibr" rid="bib1.bibx48" id="text.45"/>, and <xref ref-type="bibr" rid="bib1.bibx57" id="text.46"/> developed the outage prediction models for the Gulf Coast. This paper addresses this gap by calibrating and validating existing models to extensive outage data from New Jersey, New York, Florida, and Texas at the city level.
Thus, we investigate the generalized behavior of power outage models across the United States and focus on publicly available input variables to make our calibrated models widely applicable.</p>
      <p id="d1e325">In this paper, Sect. 2 describes the input features, data sources, and data preprocessing used in the model development for power outage prediction. Section 3 explains the selection of important and uncorrelated input features for model development. GLM, GAM, and random forest power outage models are described in Sects. 4, 5, and 6, respectively. Section 7 describes the results for calibrated models and compares performance with the previous models in the literature. Section 8 highlights the limitations of existing state-of-the-art power outage prediction models to (1) make bounded outage predictions, (2) extrapolate for high winds, and (3) account for physics-informed uncertainties at low and high winds. Section 9 summarizes the findings of this paper.</p>
</sec>
<sec id="Ch1.S2">
  <label>2</label><title>Data description</title>
      <p id="d1e336">We acquired power outage data from PowerOutage (<uri>https://poweroutage.us/</uri>, last access: 21 September 2022), an organization that tracks and records outages from utilities at the city level across the US.  The automatic outage reporting points of utilities could be hindered during hurricanes, which could result in errors in outage counts (<uri>https://poweroutage.us/faq</uri>, last access: 13 January 2023). However, PowerOutage (<uri>https://poweroutage.us/</uri>, last access: 21 September 2022) regularly gets updates from utilities
to keep the outage count close to actual outages. The data covered the power outages for Hurricane Isaias (2020) for 11 utilities in New York and 5 in New Jersey, for Hurricane Michael (2018) for 5 utilities in Florida, and for Hurricane Harvey (2017) for 39 utilities in Texas.  Our dataset has about 3.6 million outages in total. Figure <xref ref-type="fig" rid="Ch1.F1"/> shows outages caused by Hurricane Isaias in New Jersey in 2020. Figures S1, S2, and S3 in the Supplement present power outages across New York due to Isaias, Florida due to Michael, and Texas due to Harvey, respectively.</p>
      <p id="d1e350">Previously, <xref ref-type="bibr" rid="bib1.bibx39" id="text.47"/> developed an outage model at zip code levels and smaller 1 km <inline-formula><mml:math id="M4" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> 1 km grid cells. However, the final selected model was zip code level, as the aggregation of input parameters at the 1 km <inline-formula><mml:math id="M5" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> 1 km grid level led to more errors in outage predictions.
Since then, outage models have been developed at coarser grids.
For example, <xref ref-type="bibr" rid="bib1.bibx28 bib1.bibx27" id="text.48"/>, <xref ref-type="bibr" rid="bib1.bibx23" id="text.49"/>, <xref ref-type="bibr" rid="bib1.bibx54" id="text.50"/>, and <xref ref-type="bibr" rid="bib1.bibx48" id="text.51"/> developed the models for 2.5 km <inline-formula><mml:math id="M6" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> 3.7 km grid cells. <xref ref-type="bibr" rid="bib1.bibx62" id="text.52"/> developed outage models at the zip code level, <xref ref-type="bibr" rid="bib1.bibx43" id="text.53"/> predicted outages at the resolution of census tracts, and <xref ref-type="bibr" rid="bib1.bibx57" id="text.54"/> predicted outages at 5 km <inline-formula><mml:math id="M7" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> 5 km grid cells.</p>
      <p id="d1e407">We calibrated outage models at the city level resolution, comparable to the most recent models by <xref ref-type="bibr" rid="bib1.bibx43" id="text.55"/> and <xref ref-type="bibr" rid="bib1.bibx57" id="text.56"/>. A model with finer resolution could be developed, provided there are higher-resolution power outage data and input parameters.
Here, we used data with reported outages for 1910 cities in
New York, New Jersey, Texas, and Florida.
Following the previous literature, we use the covariates listed in Table <xref ref-type="table" rid="Ch1.T1"/>.
We obtained model inputs at the city level across all covariates. Further description of the availability, resolution, and methods to obtain each variable at the city level is provided in the subsequent subsections. We also discuss the uncertainties in data that could inherently influence the accuracy of power outage predictions. The total number of data points available is 1910 (cities). The dataset has been divided into train and test datasets with a ratio of <inline-formula><mml:math id="M8" display="inline"><mml:mrow><mml:mn mathvariant="normal">80</mml:mn><mml:mo>:</mml:mo><mml:mn mathvariant="normal">20</mml:mn></mml:mrow></mml:math></inline-formula>.</p><?xmltex \hack{\newpage}?>
<?pagebreak page1668?><sec id="Ch1.S2.SS1">
  <label>2.1</label><title>Response variable</title>
      <p id="d1e439">We focused on two response variables: the number of outages which are equivalent to the number of customers without power in a city and the fraction of customers without power. GLM and GAM use Poisson and negative binomial distributions to assess the count of outages as they model discrete and non-negative variables. Random forests can model the fraction of households without power in a city, which is important to compare impact levels across cities.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F1"><?xmltex \currentcnt{1}?><?xmltex \def\figurename{Figure}?><label>Figure 1</label><caption><p id="d1e444">Power outages in aftermaths of Hurricane Isaias (2021) in New Jersey at the city level. © OpenStreetMap.</p></caption>
          <?xmltex \igopts{width=241.848425pt}?><graphic xlink:href="https://nhess.copernicus.org/articles/23/1665/2023/nhess-23-1665-2023-f01.png"/>

        </fig>

</sec>
<sec id="Ch1.S2.SS2">
  <label>2.2</label><title>Hurricane parameters</title>
      <p id="d1e461">The hurricane parameters considered for this study are 3 s gust wind speed and duration of strong winds over 20 m s<inline-formula><mml:math id="M9" display="inline"><mml:msup><mml:mi/><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula>, as in previous research <xref ref-type="bibr" rid="bib1.bibx39 bib1.bibx28 bib1.bibx27 bib1.bibx23 bib1.bibx24 bib1.bibx57" id="paren.57"/>. The overhead distribution system is the most vulnerable component of a power grid to high hurricane winds <xref ref-type="bibr" rid="bib1.bibx49" id="paren.58"/>. The distribution lines and poles are often close to trees and do not have considerable setbacks. The uprooting of trees due to strong winds often propagates damage to distribution lines. The poles are designed to sustain wind speeds of around 20 m s<inline-formula><mml:math id="M10" display="inline"><mml:msup><mml:mi/><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula> <xref ref-type="bibr" rid="bib1.bibx32" id="paren.59"/>. Thus, winds above 20 m s<inline-formula><mml:math id="M11" display="inline"><mml:msup><mml:mi/><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula> can cause substantial damage to the electric poles. The 3 s gust wind speed and duration of strong winds for the three hurricanes in the dataset were calculated based on a complete wind profile model for tropical cyclones by <xref ref-type="bibr" rid="bib1.bibx13" id="text.60"/>. We determined the wind speed for each city at its centroid. Figure <xref ref-type="fig" rid="Ch1.F2"/> illustrates the variation of 3 s wind gusts in New Jersey during Hurricane Isaias. However, wind speed is an important factor causing outages, and any approximation in wind speed estimates could lead to errors in outage predictions. We provide detailed information in Sect. S1 in the Supplement to demonstrate that wind speeds at the city centroid can be a reasonable estimate in determining the city-wide outages.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F2"><?xmltex \currentcnt{2}?><?xmltex \def\figurename{Figure}?><label>Figure 2</label><caption><p id="d1e517">Distribution of 3 s wind gust at the city level (mean: 36.95 m s<inline-formula><mml:math id="M12" display="inline"><mml:msup><mml:mi/><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula>, standard deviation: 0.41 m s<inline-formula><mml:math id="M13" display="inline"><mml:msup><mml:mi/><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula>) across New Jersey before the arrival of Hurricane Isaias. © OpenStreetMap.</p></caption>
          <?xmltex \igopts{width=241.848425pt}?><graphic xlink:href="https://nhess.copernicus.org/articles/23/1665/2023/nhess-23-1665-2023-f02.png"/>

        </fig>

</sec>
<sec id="Ch1.S2.SS3">
  <label>2.3</label><title>Land cover data</title>
      <p id="d1e558">Power grid patterns vary for different land use classes, resulting in different outage mechanisms. For example, rural areas can suffer larger power outages since they have radial grid patterns where component failures can propagate more than in cities with gridded patterns <xref ref-type="bibr" rid="bib1.bibx52" id="paren.61"/>. We obtained National Land Cover Data (NLCD) available from the Multi-Resolution Land Characteristics Consortium, which is maintained by the United States Geographical Survey (USGS) (<uri>https://www.mrlc.gov/viewer/</uri>, last access: 21 September 2022). National Land Cover Data have inaccuracies in the thematic pixel classification <xref ref-type="bibr" rid="bib1.bibx68" id="paren.62"/> that could introduce uncertainties in the land cover type. However, evaluating the effect of the inaccuracies in the thematic pixel classification on power outages is not within the scope of this paper. NLCD are available in raster format with a resolution of 30 m <inline-formula><mml:math id="M14" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> 30 m. USGS has classified the original land cover data into 20 different classes. We have reclassified the NLCD into nine classes to match previous power outage models. The nine different major classes of land cover data are developed area, water area, barren land, forest area, scrub area, grasslands, pasture land, cultivated cropland, and wetlands. We utilized the spatial analyst in ArcGIS (a tool for geographic information systems) <xref ref-type="bibr" rid="bib1.bibx21" id="paren.63"/> to clip the 30 m <inline-formula><mml:math id="M15" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> 30 m land cover raster for each city.<?pagebreak page1669?> We used zonal analysis within ArcGIS to determine the percentage of area covered by the nine major land cover classes.</p>
</sec>
<sec id="Ch1.S2.SS4">
  <label>2.4</label><title>Precipitation and soil moisture data</title>
      <p id="d1e596">Precipitation and soil moisture have been extensively used in power outage models, e.g., <xref ref-type="bibr" rid="bib1.bibx28" id="text.64"/>, <xref ref-type="bibr" rid="bib1.bibx48" id="text.65"/>, and <xref ref-type="bibr" rid="bib1.bibx43" id="text.66"/>.
These parameters have non-linear effects on power outages, as deviations below and above standard values can result in more outages.
The poles and overhead distribution lines in the vicinity of trees are susceptible to falling trees due to strong hurricane winds. The wet soil conditions from high precipitation and soil moisture increase the likelihood of trees and electric poles uprooting from strong hurricane winds <xref ref-type="bibr" rid="bib1.bibx28 bib1.bibx48" id="paren.67"/>.  Also, persistent drought conditions, e.g., low precipitation in the months before a hurricane, can weaken the roots of trees because of gaps in the soil layer,
making trees more susceptible to strong winds <xref ref-type="bibr" rid="bib1.bibx43" id="paren.68"/>.</p>
      <p id="d1e614">Precipitation and soil moisture data are available from the variable infiltration capacity (VIC) model from the National Land Data Assimilation System Phase 2 (NLDAS2) <xref ref-type="bibr" rid="bib1.bibx72 bib1.bibx71" id="paren.69"/>.
However, the limited temporal resolution of parameters required for computing soil moisture and precipitation could introduce errors in the final estimates of these variables <xref ref-type="bibr" rid="bib1.bibx67" id="paren.70"/>. The limitations of the variable infiltration capacity (VIC) model from the National Land Data Assimilation System Phase 2 (NLDAS2) <xref ref-type="bibr" rid="bib1.bibx72 bib1.bibx71" id="paren.71"/> to get soil moisture and precipitation are beyond the scope of this study.
Precipitation and soil moisture have been recorded each hour since 1979 with a resolution of  0.125<inline-formula><mml:math id="M16" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula> <inline-formula><mml:math id="M17" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> 0.125<inline-formula><mml:math id="M18" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula>. We used nearest-neighbor interpolation to obtain soil moisture and precipitation at the city centroid by taking the value available at the nearest point.</p>
      <p id="d1e652">Soil moisture from NLDAS2 is available for three depths: 0–10, 10–40, and 40–100 cm. We calculated daily soil moisture for these depths by taking the average hourly readings. Soil moisture can vary at different geographical locations due to different soil types in different regions. We first normalized soil moisture to compute deviations from average values by computing percentiles.
We fit Pearson type III distributions to the daily time series of soil moisture for all three layers to normalize the soil moisture across different geographies. We use maximum likelihood estimates (MLEs) to compute the parameters for Pearson type III distribution  <xref ref-type="bibr" rid="bib1.bibx31" id="paren.72"/>. Then, we evaluate the soil moisture percentile.
We denote the soil moisture percentiles for three layers of soil at 0–10, 10–40, and 40–100 cm depth as CDF1, CDF2, and CDF3, respectively.</p>
      <p id="d1e658">Precipitation data are represented in the form of the standard precipitation index (SPI) <xref ref-type="bibr" rid="bib1.bibx70 bib1.bibx25 bib1.bibx9" id="paren.73"/>. SPI for the months before the storm's impact can also be used as a proxy for the dryness and wetness of the soil. For the current study, we calculated SPI for durations of 1, 3, 6, and 12 months by adding hourly time series data for precipitation.
The following are three steps to compute SPI. First, we fit the Pearson type III distributions to the time series of precipitation using MLE. Second, we compute the percentile from the Pearson type III distribution. Third, we take the inverse of the calculated percentile using a standard normal distribution to get the SPI for each duration. Figure <xref ref-type="fig" rid="Ch1.F3"/> illustrates the variation of SPI 1 month in New Jersey before the arrival of Hurricane Isaias.</p>
      <p id="d1e667">We also included the expected precipitation after the hurricane makes landfall for the next 7 d, as heavy rain can lead to flooding resulting in clustered outages <xref ref-type="bibr" rid="bib1.bibx43" id="paren.74"/>. The soil moisture percentiles and SPI values are obtained from the day before the hurricane impacts the power systems.
This starting time allows for outage predictions to give an early warning to the utilities and community members and take precautionary steps before strong hurricane winds hit the area.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F3"><?xmltex \currentcnt{3}?><?xmltex \def\figurename{Figure}?><label>Figure 3</label><caption><p id="d1e675">Distribution of SPI 1 month (mean: 0.88, standard deviation: 0.93) across cities in New Jersey before the arrival of Hurricane Isaias. © OpenStreetMap.</p></caption>
          <?xmltex \igopts{width=241.848425pt}?><graphic xlink:href="https://nhess.copernicus.org/articles/23/1665/2023/nhess-23-1665-2023-f03.png"/>

        </fig>

</sec>
<sec id="Ch1.S2.SS5">
  <label>2.5</label><title>Root zone depth</title>
      <p id="d1e692">The effective root zone depth is defined as the depth of the soil from which plants and trees can effectively extract water and nutrients for growth (<uri>http://www.wood-database.com</uri>, last access: 21 September 2022). The more effective the root zone depth for trees, the less likely they will fail from strong hurricane winds <xref ref-type="bibr" rid="bib1.bibx43" id="paren.75"/>. We add root zone depth as an input parameter for outage predictions because it could indicate the hazard from falling trees to the power lines. Root zone data are available from the United States<?pagebreak page1670?> Department of Agriculture (USDA) under Gridded Soil Survey Geographic <xref ref-type="bibr" rid="bib1.bibx60" id="paren.76"/> at 30 m <inline-formula><mml:math id="M19" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> 30 m resolution as raster data. The root zone depth at the city level is calculated as the average of the root zone in a city using the spatial analytic tool in ArcGIS. Given the resolution of available outage data at the city scale, we were not able to consider the variations in root zone depth, which limits the ability of the power outage model to consider the variation of tree root strength within a city.</p>
</sec>
<sec id="Ch1.S2.SS6">
  <label>2.6</label><title>Percentage treed area</title>
      <p id="d1e719">USDA created National Insect and Disaster Risk Maps <xref ref-type="bibr" rid="bib1.bibx36" id="paren.77"/> in 2012, with the area covered by trees at 240 m <inline-formula><mml:math id="M20" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> 240 m as raster data. The raster tree data are used to calculate the percent of the area covered by trees at the city level using the spatial analytic tool in ArcGIS (<xref ref-type="bibr" rid="bib1.bibx21" id="altparen.78"/>). Figure <xref ref-type="fig" rid="Ch1.F4"/> illustrates the distribution of percent treed area in New Jersey.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F4"><?xmltex \currentcnt{4}?><?xmltex \def\figurename{Figure}?><label>Figure 4</label><caption><p id="d1e739">Distribution of percent treed area (mean: 73.45 %, standard deviation: 23.04 %) across cities in New Jersey. © OpenStreetMap.</p></caption>
          <?xmltex \igopts{width=241.848425pt}?><graphic xlink:href="https://nhess.copernicus.org/articles/23/1665/2023/nhess-23-1665-2023-f04.png"/>

        </fig>

</sec>
<sec id="Ch1.S2.SS7">
  <label>2.7</label><title>Elevation</title>
      <p id="d1e757">Previously, researchers have found that hurricane wind speeds (and thus damages) vary with surface topography <xref ref-type="bibr" rid="bib1.bibx12 bib1.bibx45 bib1.bibx23 bib1.bibx54 bib1.bibx43" id="paren.79"/>.  Additionally, varying elevations could also introduce variations in precipitations <xref ref-type="bibr" rid="bib1.bibx47" id="paren.80"/> and wind speeds <xref ref-type="bibr" rid="bib1.bibx12 bib1.bibx45" id="paren.81"/> within a city. Thus, we use the median and mean elevation at the city centroid, using nearest-neighbor interpolation, as topographic variables to capture the changes in elevations across the cities.
We obtained these parameters from the digital elevation model (DEM) at a 30 arcsec resolution scale developed by USGS as part of the DEM: Global Multi-resolution Terrain Elevation Data (GMTED2010) <xref ref-type="bibr" rid="bib1.bibx15" id="paren.82"/>.
Note that the resolution of available outage data at the city level limit our ability to account for the varying elevations within a city.
Future studies with high-resolution outage data might account for the variations in elevation within a city.</p>
</sec>
<sec id="Ch1.S2.SS8">
  <label>2.8</label><title>Density data</title>
      <p id="d1e780">Demographics data are available from the American Community Survey (ACS) (<uri>https://www.census.gov/programs-surveys/acs</uri>, last access: 21 September 2022). The ACS collects different demographic data for each US census tract. The ACS started data collection in 2010, and we have considered data from 2019. We obtained the population density
as it indicates the number of distribution poles and system components exposed to winds <xref ref-type="bibr" rid="bib1.bibx6" id="paren.83"/>.</p>

<?xmltex \floatpos{t}?><table-wrap id="Ch1.T1" specific-use="star"><?xmltex \currentcnt{1}?><label>Table 1</label><caption><p id="d1e792">Parameters to build the power outage prediction models: all variables are rescaled at the city level. Parameters are grouped into categories separated by horizontal lines. We selected one variable from each category of each group to minimize correlation across parameters.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="4">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="left"/>
     <oasis:colspec colnum="3" colname="col3" align="left"/>
     <oasis:colspec colnum="4" colname="col4" align="right"/>
     <oasis:thead>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2"/>
         <oasis:entry colname="col3"/>
         <oasis:entry colname="col4">Previous</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Feature</oasis:entry>
         <oasis:entry colname="col2">Abbreviation</oasis:entry>
         <oasis:entry colname="col3">Data source</oasis:entry>
         <oasis:entry colname="col4">applications</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Outages</oasis:entry>
         <oasis:entry colname="col2">Outages</oasis:entry>
         <oasis:entry colname="col3"><uri>https://poweroutage.us/</uri> (last access: 21 Sep 2022)</oasis:entry>
         <oasis:entry colname="col4">–</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">3 s gust wind speed</oasis:entry>
         <oasis:entry colname="col2">Vmax<inline-formula><mml:math id="M40" display="inline"><mml:msup><mml:mi/><mml:mn mathvariant="normal">19</mml:mn></mml:msup></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3">Hurricane parameters<inline-formula><mml:math id="M41" display="inline"><mml:msup><mml:mi/><mml:mn mathvariant="normal">1</mml:mn></mml:msup></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col4"><inline-formula><mml:math id="M42" display="inline"><mml:msup><mml:mi/><mml:mrow><mml:mn mathvariant="normal">9</mml:mn><mml:mo>-</mml:mo><mml:mn mathvariant="normal">18</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula></oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Duration of strong winds</oasis:entry>
         <oasis:entry colname="col2">Duration</oasis:entry>
         <oasis:entry colname="col3"/>
         <oasis:entry colname="col4"/>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Percent developed area</oasis:entry>
         <oasis:entry colname="col2">Developed</oasis:entry>
         <oasis:entry colname="col3">National Land Cover Data<inline-formula><mml:math id="M43" display="inline"><mml:msup><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col4"><inline-formula><mml:math id="M44" display="inline"><mml:msup><mml:mi/><mml:mrow><mml:mn mathvariant="normal">9</mml:mn><mml:mo>-</mml:mo><mml:mn mathvariant="normal">18</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Percent water area</oasis:entry>
         <oasis:entry colname="col2">Water</oasis:entry>
         <oasis:entry colname="col3"/>
         <oasis:entry colname="col4"/>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Percent barren area</oasis:entry>
         <oasis:entry colname="col2">Barren</oasis:entry>
         <oasis:entry colname="col3"/>
         <oasis:entry colname="col4"/>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Percent forest area</oasis:entry>
         <oasis:entry colname="col2">Forest</oasis:entry>
         <oasis:entry colname="col3"/>
         <oasis:entry colname="col4"/>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Percent scrub area</oasis:entry>
         <oasis:entry colname="col2">Scrub</oasis:entry>
         <oasis:entry colname="col3"/>
         <oasis:entry colname="col4"/>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Percent grassland area</oasis:entry>
         <oasis:entry colname="col2">Grassland</oasis:entry>
         <oasis:entry colname="col3"/>
         <oasis:entry colname="col4"/>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Percent pasture area</oasis:entry>
         <oasis:entry colname="col2">Pasture</oasis:entry>
         <oasis:entry colname="col3"/>
         <oasis:entry colname="col4"/>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Percent crops cultivated area</oasis:entry>
         <oasis:entry colname="col2">Crops</oasis:entry>
         <oasis:entry colname="col3"/>
         <oasis:entry colname="col4"/>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Percent wetlands area</oasis:entry>
         <oasis:entry colname="col2">Wetlands</oasis:entry>
         <oasis:entry colname="col3"/>
         <oasis:entry colname="col4"/>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Standard precipitation index 1 month</oasis:entry>
         <oasis:entry colname="col2">SPI1</oasis:entry>
         <oasis:entry colname="col3">NLDAS2 <inline-formula><mml:math id="M45" display="inline"><mml:msup><mml:mi/><mml:mrow><mml:mn mathvariant="normal">3</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">4</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col4"><inline-formula><mml:math id="M46" display="inline"><mml:msup><mml:mi/><mml:mrow><mml:mn mathvariant="normal">11</mml:mn><mml:mo>-</mml:mo><mml:mn mathvariant="normal">16</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Standard precipitation index 3 months</oasis:entry>
         <oasis:entry colname="col2">SPI3</oasis:entry>
         <oasis:entry colname="col3"/>
         <oasis:entry colname="col4"/>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Standard precipitation index 6 months</oasis:entry>
         <oasis:entry colname="col2">SPI6<inline-formula><mml:math id="M47" display="inline"><mml:msup><mml:mi/><mml:mn mathvariant="normal">19</mml:mn></mml:msup></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3"/>
         <oasis:entry colname="col4"/>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Standard precipitation index 12 months</oasis:entry>
         <oasis:entry colname="col2">SPI12</oasis:entry>
         <oasis:entry colname="col3"/>
         <oasis:entry colname="col4"/>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Soil moisture first layer</oasis:entry>
         <oasis:entry colname="col2">CDF1<inline-formula><mml:math id="M48" display="inline"><mml:msup><mml:mi/><mml:mn mathvariant="normal">19</mml:mn></mml:msup></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3">NLDAS2<inline-formula><mml:math id="M49" display="inline"><mml:msup><mml:mi/><mml:mrow><mml:mn mathvariant="normal">3</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">4</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col4"><inline-formula><mml:math id="M50" display="inline"><mml:msup><mml:mi/><mml:mrow><mml:mn mathvariant="normal">11</mml:mn><mml:mo>-</mml:mo><mml:mn mathvariant="normal">16</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Soil moisture second layer</oasis:entry>
         <oasis:entry colname="col2">CDF2</oasis:entry>
         <oasis:entry colname="col3"/>
         <oasis:entry colname="col4"/>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Soil moisture third layer</oasis:entry>
         <oasis:entry colname="col2">CDF3</oasis:entry>
         <oasis:entry colname="col3"/>
         <oasis:entry colname="col4"/>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">7 d precipitation</oasis:entry>
         <oasis:entry colname="col2">Precip</oasis:entry>
         <oasis:entry colname="col3">NLDAS2<inline-formula><mml:math id="M51" display="inline"><mml:msup><mml:mi/><mml:mrow><mml:mn mathvariant="normal">3</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">4</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col4"><inline-formula><mml:math id="M52" display="inline"><mml:msup><mml:mi/><mml:mrow><mml:mn mathvariant="normal">9</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">10</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula></oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Root zone depth</oasis:entry>
         <oasis:entry colname="col2">Rzone<inline-formula><mml:math id="M53" display="inline"><mml:msup><mml:mi/><mml:mn mathvariant="normal">19</mml:mn></mml:msup></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3">Gridded Soil Survey<inline-formula><mml:math id="M54" display="inline"><mml:msup><mml:mi/><mml:mn mathvariant="normal">5</mml:mn></mml:msup></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col4"><inline-formula><mml:math id="M55" display="inline"><mml:msup><mml:mi/><mml:mn mathvariant="normal">15</mml:mn></mml:msup></mml:math></inline-formula></oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Percent treed area</oasis:entry>
         <oasis:entry colname="col2">Trees<inline-formula><mml:math id="M56" display="inline"><mml:msup><mml:mi/><mml:mn mathvariant="normal">19</mml:mn></mml:msup></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3">NIDRM<inline-formula><mml:math id="M57" display="inline"><mml:msup><mml:mi/><mml:mn mathvariant="normal">6</mml:mn></mml:msup></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col4"><inline-formula><mml:math id="M58" display="inline"><mml:msup><mml:mi/><mml:mn mathvariant="normal">15</mml:mn></mml:msup></mml:math></inline-formula></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Mean elevation</oasis:entry>
         <oasis:entry colname="col2">Mean_Ele</oasis:entry>
         <oasis:entry colname="col3">GTDEM 2010<inline-formula><mml:math id="M59" display="inline"><mml:msup><mml:mi/><mml:mn mathvariant="normal">7</mml:mn></mml:msup></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col4"><inline-formula><mml:math id="M60" display="inline"><mml:msup><mml:mi/><mml:mrow><mml:mn mathvariant="normal">15</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">16</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula></oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Median elevation</oasis:entry>
         <oasis:entry colname="col2">Median_Ele</oasis:entry>
         <oasis:entry colname="col3"/>
         <oasis:entry colname="col4"/>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Population density</oasis:entry>
         <oasis:entry colname="col2">Pop_Den<inline-formula><mml:math id="M61" display="inline"><mml:msup><mml:mi/><mml:mn mathvariant="normal">19</mml:mn></mml:msup></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3">American Community Survey<inline-formula><mml:math id="M62" display="inline"><mml:msup><mml:mi/><mml:mn mathvariant="normal">8</mml:mn></mml:msup></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col4"><inline-formula><mml:math id="M63" display="inline"><mml:msup><mml:mi/><mml:mrow><mml:mn mathvariant="normal">14</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">15</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">16</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula></oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table><table-wrap-foot><p id="d1e795"><inline-formula><mml:math id="M21" display="inline"><mml:msup><mml:mi/><mml:mn mathvariant="normal">1</mml:mn></mml:msup></mml:math></inline-formula> <xref ref-type="bibr" rid="bib1.bibx13" id="text.84"/>. <inline-formula><mml:math id="M22" display="inline"><mml:msup><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:math></inline-formula> <uri>https://www.mrlc.gov/viewer/</uri>, last access: 21 September 2022. <inline-formula><mml:math id="M23" display="inline"><mml:msup><mml:mi/><mml:mn mathvariant="normal">3</mml:mn></mml:msup></mml:math></inline-formula> <xref ref-type="bibr" rid="bib1.bibx72" id="text.85"/>. <inline-formula><mml:math id="M24" display="inline"><mml:msup><mml:mi/><mml:mn mathvariant="normal">4</mml:mn></mml:msup></mml:math></inline-formula> <xref ref-type="bibr" rid="bib1.bibx71" id="text.86"/>. <inline-formula><mml:math id="M25" display="inline"><mml:msup><mml:mi/><mml:mn mathvariant="normal">5</mml:mn></mml:msup></mml:math></inline-formula> <xref ref-type="bibr" rid="bib1.bibx60" id="text.87"/>. <inline-formula><mml:math id="M26" display="inline"><mml:msup><mml:mi/><mml:mn mathvariant="normal">6</mml:mn></mml:msup></mml:math></inline-formula> <xref ref-type="bibr" rid="bib1.bibx36" id="text.88"/>. <inline-formula><mml:math id="M27" display="inline"><mml:msup><mml:mi/><mml:mn mathvariant="normal">7</mml:mn></mml:msup></mml:math></inline-formula>  <xref ref-type="bibr" rid="bib1.bibx15" id="text.89"/>. <inline-formula><mml:math id="M28" display="inline"><mml:msup><mml:mi/><mml:mn mathvariant="normal">8</mml:mn></mml:msup></mml:math></inline-formula> <uri>https://www.census.gov/programs-surveys/acs</uri> (last access: 21 September 2022).  <inline-formula><mml:math id="M29" display="inline"><mml:msup><mml:mi/><mml:mn mathvariant="normal">9</mml:mn></mml:msup></mml:math></inline-formula> <xref ref-type="bibr" rid="bib1.bibx39" id="text.90"/>. <inline-formula><mml:math id="M30" display="inline"><mml:msup><mml:mi/><mml:mn mathvariant="normal">10</mml:mn></mml:msup></mml:math></inline-formula> <xref ref-type="bibr" rid="bib1.bibx41" id="text.91"/>. <inline-formula><mml:math id="M31" display="inline"><mml:msup><mml:mi/><mml:mn mathvariant="normal">11</mml:mn></mml:msup></mml:math></inline-formula> <xref ref-type="bibr" rid="bib1.bibx28" id="text.92"/>. <inline-formula><mml:math id="M32" display="inline"><mml:msup><mml:mi/><mml:mn mathvariant="normal">12</mml:mn></mml:msup></mml:math></inline-formula> <xref ref-type="bibr" rid="bib1.bibx27" id="text.93"/>. <inline-formula><mml:math id="M33" display="inline"><mml:msup><mml:mi/><mml:mn mathvariant="normal">13</mml:mn></mml:msup></mml:math></inline-formula> <xref ref-type="bibr" rid="bib1.bibx48" id="text.94"/>. <inline-formula><mml:math id="M34" display="inline"><mml:msup><mml:mi/><mml:mn mathvariant="normal">14</mml:mn></mml:msup></mml:math></inline-formula> <xref ref-type="bibr" rid="bib1.bibx23" id="text.95"/>. <inline-formula><mml:math id="M35" display="inline"><mml:msup><mml:mi/><mml:mn mathvariant="normal">15</mml:mn></mml:msup></mml:math></inline-formula> <xref ref-type="bibr" rid="bib1.bibx43" id="text.96"/>. <inline-formula><mml:math id="M36" display="inline"><mml:msup><mml:mi/><mml:mn mathvariant="normal">16</mml:mn></mml:msup></mml:math></inline-formula> <xref ref-type="bibr" rid="bib1.bibx57" id="text.97"/>. <inline-formula><mml:math id="M37" display="inline"><mml:msup><mml:mi/><mml:mn mathvariant="normal">17</mml:mn></mml:msup></mml:math></inline-formula> <xref ref-type="bibr" rid="bib1.bibx66" id="text.98"/>. <inline-formula><mml:math id="M38" display="inline"><mml:msup><mml:mi/><mml:mn mathvariant="normal">18</mml:mn></mml:msup></mml:math></inline-formula> <xref ref-type="bibr" rid="bib1.bibx65" id="text.99"/>. <inline-formula><mml:math id="M39" display="inline"><mml:msup><mml:mi/><mml:mn mathvariant="normal">19</mml:mn></mml:msup></mml:math></inline-formula> Variables finally selected for model development after performing feature selection. </p></table-wrap-foot><?xmltex \gdef\@currentlabel{1}?></table-wrap>

</sec>
</sec>
<sec id="Ch1.S3">
  <label>3</label><title>Model development: feature selection</title>
      <p id="d1e1672">Machine learning models with high-dimensional input data can be hard to train, especially when datasets are sparse, as in the case of infrastructure failures.
Input features can be correlated, leading to higher generalization errors.
This means the machine learning model can fit well the training data, i.e., with small errors. However, we might observe significant errors after testing the model with additional data.
Also, correlated features can lead to a flawed understanding of the relation between input and predicted outages
<xref ref-type="bibr" rid="bib1.bibx63" id="paren.100"/>.</p>
      <p id="d1e1678">Feature selection, also called variable selection, is an essential step in machine learning model development to select relevant variables and discard redundant and highly correlated ones
<xref ref-type="bibr" rid="bib1.bibx7" id="paren.101"/>.
We performed the feature selection for outage prediction in two steps. First, we performed a forward selection with a linear regression <xref ref-type="bibr" rid="bib1.bibx35" id="paren.102"/> for an initial rank on feature importance (Fig. <xref ref-type="fig" rid="Ch1.F5"/>).
A linear model might not be the best model to forecast power outages. However, it can provide initial insights into the dependence of an input feature on outages.
We started with a set of empty features and added features one by one.
At each step, we selected the variable that led to the largest increase in the <inline-formula><mml:math id="M64" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>.
Our results show that, as expected, wind speed and duration of strong winds affect the power outages most. We found that precipitation and soil moisture are important for outage prediction, even for linear regression, suggesting that their relevance could be even higher for non-linear regressions. We also found that population density is critical for outage prediction, which could be explained by a positive correlation between density and the density of transformers, as described in <xref ref-type="bibr" rid="bib1.bibx40" id="text.103"/>.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F5"><?xmltex \currentcnt{5}?><?xmltex \def\figurename{Figure}?><label>Figure 5</label><caption><p id="d1e1705">Forward selection: selection of important input parameters based on importance to explain the variability in outage predictions. Feature descriptions are shown in Table <xref ref-type="table" rid="Ch1.T1"/>.</p></caption>
        <?xmltex \igopts{width=241.848425pt}?><graphic xlink:href="https://nhess.copernicus.org/articles/23/1665/2023/nhess-23-1665-2023-f05.png"/>

      </fig>

      <p id="d1e1717">In the second stage, we analyzed the correlations between the input parameters. Figure S4 shows the correlation coefficients for each pair of variables.
We found that input features within the same category in Table <xref ref-type="table" rid="Ch1.T1"/> are highly correlated.
For example, the maximum wind speed and duration of strong winds, which are at the top of the ranking in forward selection (Fig. <xref ref-type="fig" rid="Ch1.F5"/>), have a correlation coefficient of 0.89 (Fig. S4).
Hence, we kept only maximum wind speed as an input feature since it is better ranked than the duration of strong winds in the forward selection.
We conducted a similar analysis for the different categories listed in Table <xref ref-type="table" rid="Ch1.T1"/> to select the input variable with the strongest predictive power.
Due to their lower importance in our results, we did not include parameters from the elevation and land cover categories, as they contribute less than <inline-formula><mml:math id="M65" display="inline"><mml:mn mathvariant="normal">1</mml:mn></mml:math></inline-formula> % to <inline-formula><mml:math id="M66" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>.
Finally, we selected the following seven variables that will be used throughout the paper:
<list list-type="bullet"><list-item>
      <p id="d1e1747">3 s gust wind speed</p></list-item><list-item>
      <p id="d1e1751">7 d  precipitation</p></list-item><list-item>
      <p id="d1e1755">standard precipitation index 6 months</p></list-item><list-item>
      <p id="d1e1759">soil moisture first layer</p></list-item><list-item>
      <p id="d1e1763">population density</p></list-item><list-item>
      <p id="d1e1767">percent treed area</p></list-item><list-item>
      <p id="d1e1771">root zone depth.</p></list-item></list></p><?xmltex \hack{\newpage}?>
</sec>
<?pagebreak page1672?><sec id="Ch1.S4">
  <label>4</label><title>Generalized linear models</title>
      <p id="d1e1783">Generalized linear models (GLMs) are a generalization of ordinary linear regression. GLMs allow us to use a flexible link function to relate a linear model (of the input variables) to the response variable <xref ref-type="bibr" rid="bib1.bibx16" id="paren.104"/>.
Unlike ordinary linear regressions, GLMs do not
assume homoscedasticity,  i.e., when the variance of the response variable is constant across the values of the input variables. The assumption of homoscedasticity fails for the number of customers without power since this output variable has positive counts, and when damage to power infrastructure is negligible (e.g., little storm), the variable's variance (and mean) should change and approach zero <xref ref-type="bibr" rid="bib1.bibx16" id="paren.105"/>.</p>
      <p id="d1e1792">In addition, GLMs can utilize multiple statistical models to represent the data instead of the only normal distribution as in ordinary linear regressions. Outages have a lower bound of zero counts that normal distributions cannot capture. Thus, previous researchers have used the following distributions to represent outages with GLMs.</p>
<sec id="Ch1.S4.SS1">
  <label>4.1</label><title>Poisson GLMs</title>
      <p id="d1e1802">Poisson regression models, a category of GLMs, are applicable for positive count data where observations are independent. Outages are modeled as a Poisson random variable:
            <disp-formula id="Ch1.E1" content-type="numbered"><label>1</label><mml:math id="M67" display="block"><mml:mrow><mml:mi>P</mml:mi><mml:mo>(</mml:mo><mml:mi>y</mml:mi><mml:mo>;</mml:mo><mml:mi mathvariant="italic">μ</mml:mi><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:msup><mml:mi>e</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mi mathvariant="italic">μ</mml:mi></mml:mrow></mml:msup><mml:msup><mml:mi mathvariant="italic">μ</mml:mi><mml:mi>y</mml:mi></mml:msup></mml:mrow><mml:mrow><mml:mi>y</mml:mi><mml:mi mathvariant="normal">!</mml:mi></mml:mrow></mml:mfrac></mml:mstyle><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
          where <inline-formula><mml:math id="M68" display="inline"><mml:mi>y</mml:mi></mml:math></inline-formula> is the number of outages in a city.
The Poisson distribution is described by the parameter <inline-formula><mml:math id="M69" display="inline"><mml:mi mathvariant="italic">μ</mml:mi></mml:math></inline-formula>, the mean number of outages in a city.
A log link connects the parameter <inline-formula><mml:math id="M70" display="inline"><mml:mi mathvariant="italic">μ</mml:mi></mml:math></inline-formula> to the input variables, which assures that <inline-formula><mml:math id="M71" display="inline"><mml:mi mathvariant="italic">μ</mml:mi></mml:math></inline-formula> is greater than zero.
            <disp-formula id="Ch1.E2" content-type="numbered"><label>2</label><mml:math id="M72" display="block"><mml:mrow><mml:mi mathvariant="normal">ln</mml:mi><mml:mo>(</mml:mo><mml:mi mathvariant="italic">μ</mml:mi><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:mi mathvariant="italic">β</mml:mi><mml:mi>X</mml:mi><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
          where <inline-formula><mml:math id="M73" display="inline"><mml:mi mathvariant="italic">β</mml:mi></mml:math></inline-formula> is learned from the historical outages from extreme events, often through
maximum likelihood estimation (MLE).
MLE finds the value of <inline-formula><mml:math id="M74" display="inline"><mml:mi mathvariant="italic">β</mml:mi></mml:math></inline-formula> that maximizes the probability of observing the data.
Readers can refer to <xref ref-type="bibr" rid="bib1.bibx16" id="text.106"/> for more information on MLE estimates for GLMs. We use <italic>glm</italic> package in R studio (<uri>https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/glm</uri>, last access: 21 September 2022) to fit the Poisson GLM to our power outage data.</p>
      <p id="d1e1922">The variance in a Poisson distribution is equal to the mean, i.e.,  Var<inline-formula><mml:math id="M75" display="inline"><mml:mrow><mml:mo>(</mml:mo><mml:mi>y</mml:mi><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:mi mathvariant="italic">μ</mml:mi></mml:mrow></mml:math></inline-formula>.
Thus, the variance grows as <inline-formula><mml:math id="M76" display="inline"><mml:mi mathvariant="italic">μ</mml:mi></mml:math></inline-formula> increases.
However, previous research has found that outage variance from historical data is significantly bigger than the mean <xref ref-type="bibr" rid="bib1.bibx39 bib1.bibx28 bib1.bibx27" id="paren.107"/>, a phenomenon that is known as overdispersion in Poisson regressions <xref ref-type="bibr" rid="bib1.bibx16" id="paren.108"/>.
Overdispersion may arise from the interdependence of output variables, especially when they happen in clusters <xref ref-type="bibr" rid="bib1.bibx16" id="paren.109"/>.
Poisson distributions represent counts of events, e.g., customers without power, that are independent <xref ref-type="bibr" rid="bib1.bibx40" id="paren.110"/>.
In contrast, multiple outages in the city can happen due to the failure of the same power grid components <xref ref-type="bibr" rid="bib1.bibx39 bib1.bibx28" id="paren.111"/>.
Thus, outage counts are not independent.</p>
</sec>
<sec id="Ch1.S4.SS2">
  <label>4.2</label><title>Negative binomial GLMs</title>
      <?pagebreak page1673?><p id="d1e1972">Negative binomial GLM is a hierarchical model which can account for overdispersion effects in power outage count predictions <xref ref-type="bibr" rid="bib1.bibx16" id="paren.112"/>. Negative binomial GLMs are based on a Poisson–gamma mixture distribution; i.e., the outage count <inline-formula><mml:math id="M77" display="inline"><mml:mi>y</mml:mi></mml:math></inline-formula> is distributed as a Poisson random variable
            <disp-formula id="Ch1.E3" content-type="numbered"><label>3</label><mml:math id="M78" display="block"><mml:mrow><mml:mi>P</mml:mi><mml:mo>(</mml:mo><mml:mi>y</mml:mi><mml:mo>;</mml:mo><mml:mi mathvariant="italic">μ</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="italic">τ</mml:mi><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:msup><mml:mi>e</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mi mathvariant="italic">μ</mml:mi><mml:mi mathvariant="italic">τ</mml:mi></mml:mrow></mml:msup><mml:mo>(</mml:mo><mml:mi mathvariant="italic">μ</mml:mi><mml:mi mathvariant="italic">τ</mml:mi><mml:msup><mml:mo>)</mml:mo><mml:mi>y</mml:mi></mml:msup></mml:mrow><mml:mrow><mml:mi>y</mml:mi><mml:mi mathvariant="normal">!</mml:mi></mml:mrow></mml:mfrac></mml:mstyle><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
          where <inline-formula><mml:math id="M79" display="inline"><mml:mi mathvariant="italic">μ</mml:mi></mml:math></inline-formula> is a factor that when multiplied by <inline-formula><mml:math id="M80" display="inline"><mml:mi mathvariant="italic">τ</mml:mi></mml:math></inline-formula> equals the mean of the Poisson distribution. <inline-formula><mml:math id="M81" display="inline"><mml:mi mathvariant="italic">τ</mml:mi></mml:math></inline-formula> is an additional random variable to account for extra variance, with a mean equal to 1 and distributed as gamma
            <disp-formula id="Ch1.E4" content-type="numbered"><label>4</label><mml:math id="M82" display="block"><mml:mrow><mml:mi>P</mml:mi><mml:mo>(</mml:mo><mml:mi mathvariant="italic">τ</mml:mi><mml:mo>;</mml:mo><mml:mi>k</mml:mi><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:mo>(</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>/</mml:mo><mml:mi>k</mml:mi><mml:msup><mml:mo>)</mml:mo><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>/</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:msup></mml:mrow><mml:mrow><mml:mi mathvariant="normal">Γ</mml:mi><mml:mo>(</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>/</mml:mo><mml:mi>k</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mfrac></mml:mstyle><mml:msup><mml:mi mathvariant="italic">τ</mml:mi><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>/</mml:mo><mml:mi>k</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup><mml:msup><mml:mi>e</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mi mathvariant="italic">τ</mml:mi><mml:mo>/</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:msup><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
          where <inline-formula><mml:math id="M83" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula> is the overdispersion parameter and <inline-formula><mml:math id="M84" display="inline"><mml:mrow><mml:mi mathvariant="normal">Γ</mml:mi><mml:mo>(</mml:mo><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> is the gamma function; thus, the variance of <inline-formula><mml:math id="M85" display="inline"><mml:mi mathvariant="italic">τ</mml:mi></mml:math></inline-formula> equals <inline-formula><mml:math id="M86" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>.
After marginalizing the random variable <inline-formula><mml:math id="M87" display="inline"><mml:mi mathvariant="italic">τ</mml:mi></mml:math></inline-formula>,
            <disp-formula id="Ch1.E5" content-type="numbered"><label>5</label><mml:math id="M88" display="block"><mml:mrow><?xmltex \hack{\hbox\bgroup\fontsize{8.7}{8.7}\selectfont$\displaystyle}?><mml:mi>P</mml:mi><mml:mo>(</mml:mo><mml:mi>y</mml:mi><mml:mo>;</mml:mo><mml:mi mathvariant="italic">μ</mml:mi><mml:mo>,</mml:mo><mml:mi>k</mml:mi><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:mi mathvariant="normal">Γ</mml:mi><mml:mo>(</mml:mo><mml:mi>y</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>/</mml:mo><mml:mi>k</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mi mathvariant="normal">Γ</mml:mi><mml:mo>(</mml:mo><mml:mi>y</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>)</mml:mo><mml:mi mathvariant="normal">Γ</mml:mi><mml:mo>(</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>/</mml:mo><mml:mi>k</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mfrac></mml:mstyle><mml:msup><mml:mfenced open="(" close=")"><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mi mathvariant="italic">μ</mml:mi><mml:mrow><mml:mi mathvariant="italic">μ</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>/</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:mfrac></mml:mstyle></mml:mfenced><mml:mi>y</mml:mi></mml:msup><mml:msup><mml:mfenced close=")" open="("><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>-</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mi mathvariant="italic">μ</mml:mi><mml:mrow><mml:mi mathvariant="italic">μ</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>/</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:mfrac></mml:mstyle></mml:mrow></mml:mfenced><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>/</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:msup><mml:mo>,</mml:mo><?xmltex \hack{$\egroup}?></mml:mrow></mml:math></disp-formula>
          which is equivalent to a negative binomial distribution with a variance of  <inline-formula><mml:math id="M89" display="inline"><mml:mrow><mml:mi mathvariant="italic">μ</mml:mi><mml:mo>+</mml:mo><mml:mi>k</mml:mi><mml:msup><mml:mi mathvariant="italic">μ</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>.
This variance is higher than the one in the Poisson GLM with one term that is proportional to <inline-formula><mml:math id="M90" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="italic">μ</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>. Thus, negative binomial GLMs account for significantly higher variances.
<inline-formula><mml:math id="M91" display="inline"><mml:mi mathvariant="italic">μ</mml:mi></mml:math></inline-formula> is parameterized as in Eq. (<xref ref-type="disp-formula" rid="Ch1.E2"/>). Then,  <inline-formula><mml:math id="M92" display="inline"><mml:mi mathvariant="italic">β</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M93" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula> are estimated through MLE using the glm package in R studio (<uri>https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/glm</uri>, last access: 21 September 2022).</p>
</sec>
<sec id="Ch1.S4.SS3">
  <label>4.3</label><title>Zero-inflated GLMs</title>
      <p id="d1e2371">Researchers have also developed zero-inflation outage prediction models to improve statistical performance for unbalanced data, e.g., when there are a lot of data points with no outages <xref ref-type="bibr" rid="bib1.bibx43 bib1.bibx57" id="paren.113"/>.
The zero-inflation model has two levels of predictions <xref ref-type="bibr" rid="bib1.bibx43 bib1.bibx57" id="paren.114"/>. The first level can be a logistic regression or a decision tree model to check if there is at least one power outage <xref ref-type="bibr" rid="bib1.bibx26" id="paren.115"/>. The first level model predicts “0” in case of no outages and “1” in case of at least one outage. The second level is the regression model predicting the number of outages for cases where the prediction was “1” at the first level.
In this paper, we do not fit zero-inflated models, as our data are balanced; i.e., we observe at least one outage in each city.</p>
</sec>
</sec>
<sec id="Ch1.S5">
  <label>5</label><title>Generalized additive models</title>
      <p id="d1e2393">GLM models assume a linear relationship between the logarithm of the mean number of outages and input parameters (Eq. 2). However, previous research has shown that they have non-linear relationships <xref ref-type="bibr" rid="bib1.bibx27" id="paren.116"/>, which can be modeled with non-parametric extensions of GLMs <xref ref-type="bibr" rid="bib1.bibx74" id="paren.117"/>.
Generalized additive models (GAMs) capture non-linear relationships using smoothing functions
          <disp-formula id="Ch1.E6" content-type="numbered"><label>6</label><mml:math id="M94" display="block"><mml:mrow><mml:mi mathvariant="normal">ln</mml:mi><mml:mo>(</mml:mo><mml:mi mathvariant="italic">μ</mml:mi><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:msub><mml:mi mathvariant="italic">β</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub><mml:mo>+</mml:mo><mml:munder><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:munder><mml:msub><mml:mi mathvariant="italic">β</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:msub><mml:mi>f</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
        where <inline-formula><mml:math id="M95" display="inline"><mml:mi mathvariant="italic">μ</mml:mi></mml:math></inline-formula> is the mean outages for a city, <inline-formula><mml:math id="M96" display="inline"><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo>∈</mml:mo><mml:mi>X</mml:mi></mml:mrow></mml:math></inline-formula> is the individual input parameter, <inline-formula><mml:math id="M97" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">β</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> is the intercept, and <inline-formula><mml:math id="M98" display="inline"><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> are the smoothing functions for each input parameter.
Some examples of smoothing functions are regression splines, B splines, and P splines. Splines of any order could be used to fit GAMs, but accuracy increases negligibly after the quartic degree <xref ref-type="bibr" rid="bib1.bibx74" id="paren.118"/>.
Thus, we used quartic-order polynomials for all input variables except for maximum wind speeds.
For this variable, we reduced the order of the polynomial to 1 to always obtain a monotonically increasing relationship between winds and outages, as we would expect from the structural behavior of infrastructure against extreme loads.
We used MLE to estimate GAMs' parameters through iteratively reweighted least squares (IRLS) <xref ref-type="bibr" rid="bib1.bibx69" id="paren.119"/> using the <italic>MGCV</italic> library in R studio.
Poisson GAM assumes the Poisson distribution on the number of outages as described in Eq. (<xref ref-type="disp-formula" rid="Ch1.E1"/>), with link function equal to Eq. (<xref ref-type="disp-formula" rid="Ch1.E6"/>).
Similarly, negative binomial GAMs assume the Poisson–gamma distribution mixture for the number of outages as mentioned in Eq. (<xref ref-type="disp-formula" rid="Ch1.E5"/>), with link function equal to Eq. (<xref ref-type="disp-formula" rid="Ch1.E6"/>).</p>
</sec>
<sec id="Ch1.S6">
  <label>6</label><title>Random forests</title>
      <p id="d1e2537">Random forest regressions <xref ref-type="bibr" rid="bib1.bibx5" id="paren.120"/> are non-parametric ensembles of decision trees that do not assume any underlying probability distribution for the decision variable. Tree-based methods are easy to build and powerful machine learning tools. Decision trees split at each node based on some criteria involving the value of a particular input variable.
For regression trees,
binary splits at each node are performed for each variable (Fig. <xref ref-type="fig" rid="Ch1.F6"/>) <xref ref-type="bibr" rid="bib1.bibx30" id="paren.121"/>.
The split with the largest reduction of squared errors is selected at each step.
The splitting stops once there is no performance gain for the regression analysis.
For the prediction, the decision tree will point to a final leaf node based on the criterion for the splitting of feature space, and the output for the decision tree is the average of the predicted variable,
          <disp-formula id="Ch1.E7" content-type="numbered"><label>7</label><mml:math id="M99" display="block"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover><mml:mi>i</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mtext>average</mml:mtext><mml:mo>(</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>|</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>∈</mml:mo><mml:msub><mml:mi>L</mml:mi><mml:mi>m</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
        where <inline-formula><mml:math id="M100" display="inline"><mml:mrow><mml:msub><mml:mi>L</mml:mi><mml:mi>m</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is the final leaf node (Fig. <xref ref-type="fig" rid="Ch1.F6"/>).</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F6"><?xmltex \currentcnt{6}?><?xmltex \def\figurename{Figure}?><label>Figure 6</label><caption><p id="d1e2607">Example of a simple decision tree with two input features, maximum wind speed (Vmax) and precipitation (SPI6), to predict outages. The split at the root node (<inline-formula><mml:math id="M101" display="inline"><mml:mi>R</mml:mi></mml:math></inline-formula>) is done with Vmax. Thus, if the wind speed value is greater than the value <inline-formula><mml:math id="M102" display="inline"><mml:mrow><mml:mi>V</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula>, the points will belong to the right interior node <inline-formula><mml:math id="M103" display="inline"><mml:mrow><mml:mo>(</mml:mo><mml:mi>I</mml:mi><mml:mn mathvariant="normal">2</mml:mn><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>, otherwise they belong to the left node. Similarly, interior nodes are further divided into leaf nodes <inline-formula><mml:math id="M104" display="inline"><mml:mrow><mml:mi>L</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M105" display="inline"><mml:mrow><mml:mi>L</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M106" display="inline"><mml:mrow><mml:mi>L</mml:mi><mml:mn mathvariant="normal">3</mml:mn></mml:mrow></mml:math></inline-formula>, and <inline-formula><mml:math id="M107" display="inline"><mml:mrow><mml:mi>L</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:mrow></mml:math></inline-formula> using values of SPI6.</p></caption>
        <?xmltex \igopts{width=213.395669pt}?><graphic xlink:href="https://nhess.copernicus.org/articles/23/1665/2023/nhess-23-1665-2023-f06.png"/>

      </fig>

      <p id="d1e2688">Random forests “grow” a large number of parallel decision trees and bag new samples for each decision tree <xref ref-type="bibr" rid="bib1.bibx5" id="paren.122"/>. Bagging involves drawing new training points with replacements to fit each decision tree with a random selection of features.
The final output is the average of outputs from each decision tree modeled in parallel. The random selection of features results in the development of uncorrelated trees, reducing the variance in predictions <xref ref-type="bibr" rid="bib1.bibx30" id="paren.123"/>.</p>
      <p id="d1e2698">Random forest models can generally capture the non-linear between the input parameters and output predictions. However, a random forest is not easily interpretable, as it is based on multiple decision trees.
In this paper, we use the sci-kit learn module in Python to fit the random forest model. We also use the GridSearchCV module in Python <xref ref-type="bibr" rid="bib1.bibx51" id="paren.124"/> to tune for the parameters and select the model with the least error on out-of-bag samples.</p>
</sec>
<?pagebreak page1674?><sec id="Ch1.S7">
  <label>7</label><title>Application of existing models</title>
      <p id="d1e2712">In this section, we discuss the statistical performance of different outage models by first training the models on training data and then comparing the <inline-formula><mml:math id="M108" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> metrics on hold-out test data.
We use different <inline-formula><mml:math id="M109" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> metrics since traditional ones, like the coefficient of determination, have many limitations for counting variables, as discussed in the Appendix.
We also compare the performance of developed models for our generalized data covering the southeast, southwest, and northeast regions in the US, with the results from previous models applied to a particular region.</p>
<sec id="Ch1.S7.SS1">
  <label>7.1</label><title>Generalized linear models</title>
      <p id="d1e2744">We trained Poisson and negative binomial GLMs to predict the outage counts. The predictions are based on the seven input features mentioned in the feature selection section. All the input features are significant at a <inline-formula><mml:math id="M110" display="inline"><mml:mi>p</mml:mi></mml:math></inline-formula> value of <inline-formula><mml:math id="M111" display="inline"><mml:mrow><mml:mn mathvariant="normal">2</mml:mn><mml:mo>×</mml:mo><mml:msup><mml:mn mathvariant="normal">10</mml:mn><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">6</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>. We compared the statistical performance of the Poisson and negative binomial GLMs (Table <xref ref-type="table" rid="Ch1.T2"/>).
We have a total of 1528 training data points with seven input variables and one additional slope constant. Thus, the residual degree of freedom for each model is 1520.</p>

<?xmltex \floatpos{t}?><table-wrap id="Ch1.T2"><?xmltex \currentcnt{2}?><label>Table 2</label><caption><p id="d1e2777">Statistical performance measurements for generalized linear models.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="4">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="right"/>
     <oasis:colspec colnum="3" colname="col3" align="right"/>
     <oasis:colspec colnum="4" colname="col4" align="right"/>
     <oasis:thead>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">Residual</oasis:entry>
         <oasis:entry colname="col3"/>
         <oasis:entry colname="col4"/>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Model</oasis:entry>
         <oasis:entry colname="col2">deviance</oasis:entry>
         <oasis:entry colname="col3"><inline-formula><mml:math id="M112" display="inline"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">DEV</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col4"><inline-formula><mml:math id="M113" display="inline"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="italic">ψ</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula></oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">Poisson GLM</oasis:entry>
         <oasis:entry colname="col2">6 038 042</oasis:entry>
         <oasis:entry colname="col3">0.20</oasis:entry>
         <oasis:entry colname="col4">–</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Negative binomial GLM</oasis:entry>
         <oasis:entry colname="col2">1949</oasis:entry>
         <oasis:entry colname="col3">0.29</oasis:entry>
         <oasis:entry colname="col4">0.69</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table><?xmltex \gdef\@currentlabel{2}?></table-wrap>

      <p id="d1e2879">The high value of residual deviance, relative to the degree of freedom, in the Poisson GLM shows large overdispersion <xref ref-type="bibr" rid="bib1.bibx39 bib1.bibx28 bib1.bibx16" id="paren.125"/>.
Thus, using this new outage dataset, we confirm that the variance in historical outages largely exceeds the mean value.
A negative binomial GLM has a low residual deviance value compared to the Poisson model and is more similar to the degrees of freedom, indicating that a negative binomial GLM can handle overdispersion in power outage predictions more satisfactorily.</p>
      <p id="d1e2886"><inline-formula><mml:math id="M114" display="inline"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">DEV</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula>  in Table <xref ref-type="table" rid="Ch1.T2"/> is a measure of the deviance explained by the fitted model compared to the null model. The null model predicts the average of observed outages (<inline-formula><mml:math id="M115" display="inline"><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo mathvariant="normal">‾</mml:mo></mml:mover></mml:math></inline-formula>) for all the cities irrespective of the input parameters.  <inline-formula><mml:math id="M116" display="inline"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="italic">ψ</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> quantifies the amount of overdispersion explained by the additional variability introduced as a parameter in Eqs. (4) and (5) for the negative binomial fitted model. The <inline-formula><mml:math id="M117" display="inline"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">DEV</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> is higher for the negative binomial GLM, also suggesting the negative binomial's better statistical performance.
The <inline-formula><mml:math id="M118" display="inline"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="italic">ψ</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> for the negative binomial GLM is <inline-formula><mml:math id="M119" display="inline"><mml:mn mathvariant="normal">0.69</mml:mn></mml:math></inline-formula>, which means the model can capture <inline-formula><mml:math id="M120" display="inline"><mml:mn mathvariant="normal">69</mml:mn></mml:math></inline-formula> % of variability by considering the additional level of uncertainty in the form of Poisson–gamma mixture given by Eq. (<xref ref-type="disp-formula" rid="Ch1.E5"/>) for outage counts.
The reported value of <inline-formula><mml:math id="M121" display="inline"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="italic">ψ</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> for the negative binomial GLM in this paper is comparable to the values presented by <xref ref-type="bibr" rid="bib1.bibx28" id="text.126"/>, i.e.,  <inline-formula><mml:math id="M122" display="inline"><mml:mrow><mml:mo>∼</mml:mo><mml:mn mathvariant="normal">0.8</mml:mn></mml:mrow></mml:math></inline-formula>. However, we observe a lower value of <inline-formula><mml:math id="M123" display="inline"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">DEV</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> compared to previous literature, i.e., <inline-formula><mml:math id="M124" display="inline"><mml:mo>∼</mml:mo></mml:math></inline-formula> 0.6. The lower value of <inline-formula><mml:math id="M125" display="inline"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">DEV</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> may be due to the use of fewer parameters, e.g., 7 in this study versus 20 in <xref ref-type="bibr" rid="bib1.bibx27" id="text.127"/>, as <inline-formula><mml:math id="M126" display="inline"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">DEV</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> always increases with more predictors. For example, we get a value of 0.48 for <inline-formula><mml:math id="M127" display="inline"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">DEV</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> when all the input variables in Table <xref ref-type="table" rid="Ch1.T1"/> are included, but we considered fewer parameters to avoid correlated features and enhance the generalization of these models. We may also get a lower value of <inline-formula><mml:math id="M128" display="inline"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">DEV</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> because we have a generalized dataset covering different regions in the US, and previous models were applied to data from smaller regions.</p>
</sec>
<sec id="Ch1.S7.SS2">
  <label>7.2</label><title>Generalized additive models</title>
      <p id="d1e3082">We also trained the Poisson and negative binomial GAMs. Like for GLMs, GAMs are trained with the seven input features mentioned before.
All the input features are significant at a <inline-formula><mml:math id="M129" display="inline"><mml:mi>p</mml:mi></mml:math></inline-formula> value of <inline-formula><mml:math id="M130" display="inline"><mml:mrow><mml:mn mathvariant="normal">2</mml:mn><mml:mo>×</mml:mo><mml:msup><mml:mn mathvariant="normal">10</mml:mn><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">6</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> for both models.
Like for GLMs, the  residual degrees of freedom for each model is 1520.</p>

<?xmltex \floatpos{t}?><table-wrap id="Ch1.T3"><?xmltex \currentcnt{3}?><label>Table 3</label><caption><p id="d1e3113">Statistical performance measurements for generalized additive models</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="4">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="right"/>
     <oasis:colspec colnum="3" colname="col3" align="right"/>
     <oasis:colspec colnum="4" colname="col4" align="right"/>
     <oasis:thead>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">Residual</oasis:entry>
         <oasis:entry colname="col3"/>
         <oasis:entry colname="col4"/>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Model</oasis:entry>
         <oasis:entry colname="col2">deviance</oasis:entry>
         <oasis:entry colname="col3"><inline-formula><mml:math id="M131" display="inline"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">DEV</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col4"><inline-formula><mml:math id="M132" display="inline"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="italic">ψ</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula></oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">Poisson GAM</oasis:entry>
         <oasis:entry colname="col2">3 565 948</oasis:entry>
         <oasis:entry colname="col3">0.53</oasis:entry>
         <oasis:entry colname="col4">–</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Negative binomial GAM</oasis:entry>
         <oasis:entry colname="col2">1866</oasis:entry>
         <oasis:entry colname="col3">0.62</oasis:entry>
         <oasis:entry colname="col4">0.99</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table><?xmltex \gdef\@currentlabel{3}?></table-wrap>

      <p id="d1e3215">The residual deviance for Poisson GAM (Table <xref ref-type="table" rid="Ch1.T3"/>) is <inline-formula><mml:math id="M133" display="inline"><mml:mo>∼</mml:mo></mml:math></inline-formula> 38 % less than the one in Poisson GLM (Table <xref ref-type="table" rid="Ch1.T2"/>), which is a minimal improvement to overcome the large overdispersion, as the deviance is still significantly higher than the degrees of freedom.
However, <inline-formula><mml:math id="M134" display="inline"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">DEV</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> shows a more significant performance improvement as its value increases to 0.53 for Poisson GAM (Table <xref ref-type="table" rid="Ch1.T3"/>) over a value of 0.20 for Poisson GLM (Table <xref ref-type="table" rid="Ch1.T2"/>).</p>
      <p id="d1e3248">The negative binomial GAM has a low value of residual deviance, which indicates that the negative binomial GAM can handle overdispersion. Additionally, non-linear shapes from spline functions (Eq. <xref ref-type="disp-formula" rid="Ch1.E6"/>) for GAMs improve the outage predictions. The <inline-formula><mml:math id="M135" display="inline"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">DEV</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> for the negative binomial GAM improves significantly to a value of 0.62 (Table <xref ref-type="table" rid="Ch1.T3"/>) over the 0.29 in the negative binomial GLM (Table <xref ref-type="table" rid="Ch1.T2"/>). The observed value of 0.99 for <inline-formula><mml:math id="M136" display="inline"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="italic">ψ</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> is similar to the one reported by <xref ref-type="bibr" rid="bib1.bibx27" id="text.128"/> in their GAM model development for power outage predictions. The <inline-formula><mml:math id="M137" display="inline"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">DEV</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> for GAMs is not available in the previous literature, so further comparisons
with previous work could not be made.</p><?xmltex \hack{\newpage}?>
</sec>
<?pagebreak page1675?><sec id="Ch1.S7.SS3">
  <label>7.3</label><title>Random forest</title>
      <p id="d1e3309">We calibrated the random forest model to predict the fraction of customers without power.
We performed the hyperparameter tuning using the GridSearch tool in Python <xref ref-type="bibr" rid="bib1.bibx51" id="paren.129"/> with cross-validation to select the best input parameters for the random forest. The hyperparameter tuning resulted in a mean cross-validation <inline-formula><mml:math id="M138" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> of 0.52. However, we obtained a training <inline-formula><mml:math id="M139" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> of 0.84 when fitting the random forest with the tuned hyperparameters on the training data. The high training <inline-formula><mml:math id="M140" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> compared to cross-validation <inline-formula><mml:math id="M141" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> represents potential overfitting in the random forest model. We further tuned the model parameters by reducing the maximum depth. We obtained a cross-validation <inline-formula><mml:math id="M142" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> of 0.48 and training <inline-formula><mml:math id="M143" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> of 0.63. Finally, we obtained an <inline-formula><mml:math id="M144" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> of 0.48 on the hold-out test. The number of randomly grown trees in the selected random forest model is 500. Figure <xref ref-type="fig" rid="Ch1.F7"/> shows the predicted fractional outages against the observed outages using a tuned random forest (RF).
RF does not generalize well to the outages due to Hurricane Isaias, which had low wind speeds (<inline-formula><mml:math id="M145" display="inline"><mml:mrow><mml:mo>∼</mml:mo><mml:mn mathvariant="normal">36</mml:mn></mml:mrow></mml:math></inline-formula> m s<inline-formula><mml:math id="M146" display="inline"><mml:msup><mml:mi/><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula>) in New Jersey but still caused outages to <inline-formula><mml:math id="M147" display="inline"><mml:mn mathvariant="normal">60</mml:mn></mml:math></inline-formula> % of consumers in 213 cities (out of 565) in New Jersey. This effect is visible in Fig. <xref ref-type="fig" rid="Ch1.F7"/>, where RF underestimated the fractional outages for severely affected cities in New Jersey.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F7"><?xmltex \currentcnt{7}?><?xmltex \def\figurename{Figure}?><label>Figure 7</label><caption><p id="d1e3429">Random forest outage predictions on the 20 % holdout test data.</p></caption>
          <?xmltex \igopts{width=213.395669pt}?><graphic xlink:href="https://nhess.copernicus.org/articles/23/1665/2023/nhess-23-1665-2023-f07.png"/>

        </fig>

      <p id="d1e3438">The <inline-formula><mml:math id="M148" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> for the random forest model cannot be compared to different <inline-formula><mml:math id="M149" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> statistics calculated for GLM and GAM models, as the output for random forest (fraction of outages) differs from GLM and GAM models (outage counts). Also, we cannot calculate <inline-formula><mml:math id="M150" display="inline"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">DEV</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M151" display="inline"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="italic">ψ</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> statistics for random forest as there is no underlying probability distribution assumed for random forest predictions.</p>
      <p id="d1e3490">We present the variable importance in the random forest model in Fig. <xref ref-type="fig" rid="Ch1.F8"/>.
We calculated the variable importance by training a base model with all the input features and a permuted model resulting from training different random forests and removing one feature at a time from the base model.
We ranked importance by finding the variable that leads to the largest difference in the mean squared error between the base (full) model and the permuted (reduced) model.
We present the normalized importance factors in decreasing order of importance (Fig. <xref ref-type="fig" rid="Ch1.F8"/>).</p>
      <p id="d1e3497">We found that maximum wind
speed is the most important parameter in the random forest model (Fig. <xref ref-type="fig" rid="Ch1.F8"/>), which coincides with our findings from a simple linear regression in Fig. <xref ref-type="fig" rid="Ch1.F5"/>.
Precipitation is the second most important variable, with a relative importance of 0.33, as trees can more easily be torn out from wetter soil. Population density is the third most important variable in outage predictions since it is a proxy for cities’ density of transformers exposed to winds.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F8"><?xmltex \currentcnt{8}?><?xmltex \def\figurename{Figure}?><label>Figure 8</label><caption><p id="d1e3506">Random forest variable importance in decreasing order of importance. All importance factors are normalized by the highest value, i.e., the factor for Vmax.</p></caption>
          <?xmltex \igopts{width=213.395669pt}?><graphic xlink:href="https://nhess.copernicus.org/articles/23/1665/2023/nhess-23-1665-2023-f08.png"/>

        </fig>

      <p id="d1e3515">Random forest and negative binomial GAMs show superior performance in predicting the power outages caused by a hurricane. MSE (mean squared error) <xref ref-type="bibr" rid="bib1.bibx64" id="paren.130"/> has been used to compare the performance difference statistic models, given as
            <disp-formula id="Ch1.E8" content-type="numbered"><label>8</label><mml:math id="M152" display="block"><mml:mrow><mml:mi mathvariant="normal">MSE</mml:mi><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mn mathvariant="normal">1</mml:mn><mml:mi>n</mml:mi></mml:mfrac></mml:mstyle><mml:mo movablelimits="false">∑</mml:mo><mml:mo>(</mml:mo><mml:mi>y</mml:mi><mml:mo>-</mml:mo><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:msup><mml:mo>)</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
          where <inline-formula><mml:math id="M153" display="inline"><mml:mi>y</mml:mi></mml:math></inline-formula> is the predicted value, <inline-formula><mml:math id="M154" display="inline"><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover></mml:math></inline-formula> is the observed value,  and <inline-formula><mml:math id="M155" display="inline"><mml:mi>n</mml:mi></mml:math></inline-formula> is the total number of observations.
Thus, we report MSE on both negative binomial GAM and RF models to compare the performance of these models. We rescale the negative binomial GAM predictions by the total number of customers to compare the MSE values of negative binomial GAM and RF models at the same scale as fractional outages.  MSE for the negative binomial GAM is 45.13, and MSE for RF is 0.06. Researchers should be careful in making the direct comparison for MSE values of the fraction-based RF model and the count-based negative binomial GAM model, as these models are optimized for a different set of response variables. The high MSE for the negative binomial arises from overestimating outages, which we discuss next.</p>
</sec>
</sec>
<?pagebreak page1676?><sec id="Ch1.S8">
  <label>8</label><title>Limitations of state-of-the-art outage models</title>
      <p id="d1e3590">Different machine learning models discussed in previous sections can predict power outages for a hurricane-stricken city. Here, we discuss the limitations of state-of-the-art machine learning models for power outage predictions.</p>
<sec id="Ch1.S8.SS1">
  <label>8.1</label><title>GLM and GAM's predictions are unbounded</title>
      <p id="d1e3600">GLM and GAM regressions can predict the mean number of outages in the city. The models have a lower bound of zero as both Poisson and negative binomial distributions predict the count of outages.
However, there is no upper bound on the predicted number of outages. Hence, GLM or GAM models can predict more outages than the number of customers, resulting in an overestimation of power outages.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F9"><?xmltex \currentcnt{9}?><?xmltex \def\figurename{Figure}?><label>Figure 9</label><caption><p id="d1e3605">Outage predictions on  20 %  holdout test data using the negative binomial GAM outage model.
Black dots represent the cities with predicted outages larger than the number of customers. Grey dots represent the cities with predicted outages less than or equal to number of customers.</p></caption>
          <?xmltex \igopts{width=241.848425pt}?><graphic xlink:href="https://nhess.copernicus.org/articles/23/1665/2023/nhess-23-1665-2023-f09.png"/>

        </fig>

      <p id="d1e3614">For illustration, we present the power outage predictions on 20 % hold-out test data for the negative binomial GAM (Fig. <xref ref-type="fig" rid="Ch1.F9"/>). For 76 cities out of 382 (19.8 %) in test data, predicted outages are more than the number of customers, and the overestimation can be significant.
For example, the model predicted outages as high as 16 times those of Rockleigh, New Jersey, customers.
The average ratio of predicted outages over the number of customers in the cities that experienced overestimation was 5.2.
The cities that experienced overestimation had smaller populations, with an average of 5962; e.g., Rockleigh had only 106 customers. In contrast, cities without overestimation had an average population of 30 058.
Modelers could impose an upper bound on the predictions using the total number of customers as the maximum possible outage.
However, this adjustment would violate the assumptions in the Poisson–gamma mixture model (Eq. <xref ref-type="disp-formula" rid="Ch1.E5"/>) and GAM link function (Eq. <xref ref-type="disp-formula" rid="Ch1.E6"/>).</p>
</sec>
<sec id="Ch1.S8.SS2">
  <label>8.2</label><title>Random forest's lack of extrapolability for high winds</title>
      <p id="d1e3631">Random forest predictions are an average value of the outages in the training data (Eq. <xref ref-type="disp-formula" rid="Ch1.E7"/>). Thus, unlike GLMs, random forest predictions are bounded by the minimum and maximum values of power outages in training data. Based on simple physics, one would expect more damage and more outages from higher wind speeds. In order to understand the influence of wind speeds on the power outages in the random forest regression, we plotted the partial dependence of the fraction of customers without power against wind speed. The partial dependence, <inline-formula><mml:math id="M156" display="inline"><mml:mrow><mml:mi>g</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> <xref ref-type="bibr" rid="bib1.bibx30 bib1.bibx48" id="paren.131"/>, of the input variable <inline-formula><mml:math id="M157" display="inline"><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is given by
            <disp-formula id="Ch1.E9" content-type="numbered"><label>9</label><mml:math id="M158" display="block"><mml:mrow><mml:mi>g</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mn mathvariant="normal">1</mml:mn><mml:mi>N</mml:mi></mml:mfrac></mml:mstyle><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:munderover><mml:msub><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover><mml:mi>i</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>c</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
          where <inline-formula><mml:math id="M159" display="inline"><mml:mi>N</mml:mi></mml:math></inline-formula> is the total number of observations, <inline-formula><mml:math id="M160" display="inline"><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>c</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> are the variables other than <inline-formula><mml:math id="M161" display="inline"><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>, and <inline-formula><mml:math id="M162" display="inline"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is outage prediction (Eq. <xref ref-type="disp-formula" rid="Ch1.E7"/>) for the <inline-formula><mml:math id="M163" display="inline"><mml:mi>i</mml:mi></mml:math></inline-formula>th data point. To plot the partial dependence, we varied <inline-formula><mml:math id="M164" display="inline"><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> (wind for this case) and kept <inline-formula><mml:math id="M165" display="inline"><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>c</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> (input parameters other than wind speed) constant.
We estimated outages by averaging all observations in training data plotted against the <inline-formula><mml:math id="M166" display="inline"><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> (wind speed).</p>
      <p id="d1e3827">For this assessment, we trained the random forest model on a reduced dataset, with only New York and New Jersey, and on a complete dataset, including Florida and Texas.
We present the partial dependence of wind speed in Fig. <xref ref-type="fig" rid="Ch1.F10"/>, also including the distribution of wind speeds in the training data.
Hurricanes of category 3 or higher bring wind speeds above  40 m s<inline-formula><mml:math id="M167" display="inline"><mml:msup><mml:mi/><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula> that can significantly damage electric poles <xref ref-type="bibr" rid="bib1.bibx4" id="paren.132"/>.
However, they are significantly less observed in inland cities, especially in the northern United States, as storms often weaken in their transition to higher latitudes and after leaving the ocean.
For example, only tropical storms with wind speeds of less than 33 m s<inline-formula><mml:math id="M168" display="inline"><mml:msup><mml:mi/><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula> have impacted New York City for the past 20 years (<uri>https://coast.noaa.gov/hurricanes/#map</uri>, last access: 21 September 2022). For example, Superstorm Sandy (2012) transitioned to a tropical storm before impacting New York City (<uri>https://coast.noaa.gov/hurricanes/#map</uri>, last access: 21 September 2022). As per ASCE 7–10 wind hazard maps (<uri>https://hazards.atcouncil.org/</uri>, last access: 21 September 2022), a wind speed of  43 m s<inline-formula><mml:math id="M169" display="inline"><mml:msup><mml:mi/><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula> has a return period of 100 years for New York City. Thus, it is very unlikely, and evident from Fig. <xref ref-type="fig" rid="Ch1.F10"/>, to get outage data in New York and New Jersey for winds above 43 m s<inline-formula><mml:math id="M170" display="inline"><mml:msup><mml:mi/><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula>.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F10"><?xmltex \currentcnt{10}?><?xmltex \def\figurename{Figure}?><label>Figure 10</label><caption><p id="d1e3897"><bold>(a)</bold> Partial dependence of wind speed on power outages. <bold>(b)</bold> Distribution of wind speed in the complete and reduced dataset. The random forest model does not extrapolate for the wind speeds and outages not in the range of training data.</p></caption>
          <?xmltex \igopts{width=241.848425pt}?><graphic xlink:href="https://nhess.copernicus.org/articles/23/1665/2023/nhess-23-1665-2023-f10.png"/>

        </fig>

      <?pagebreak page1677?><p id="d1e3912">These limited outage datasets have strong implications for the validity and extrapolability of outage models based on random forest regressions.
Under the reduced dataset, i.e., with only New Jersey and New York, outage predictions increase as the wind speed increases from <inline-formula><mml:math id="M171" display="inline"><mml:mn mathvariant="normal">20</mml:mn></mml:math></inline-formula> to  40 m s<inline-formula><mml:math id="M172" display="inline"><mml:msup><mml:mi/><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula>.
However, the fraction of outages reached a maximum of 0.58 at wind speeds of 40 m s<inline-formula><mml:math id="M173" display="inline"><mml:msup><mml:mi/><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula> and does not increase with higher wind speeds. The random forest model cannot extrapolate for the higher winds, which limits the capability of random forest to make outage predictions for large hurricanes.</p>
      <p id="d1e3946">Under the complete dataset, results improve by including outages in Florida and Texas.
These states experience higher winds; e.g., their 100-year return-period winds are <inline-formula><mml:math id="M174" display="inline"><mml:mrow><mml:mo>∼</mml:mo><mml:mn mathvariant="normal">70</mml:mn></mml:mrow></mml:math></inline-formula> m s<inline-formula><mml:math id="M175" display="inline"><mml:msup><mml:mi/><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula> in contrast to the  43 m s<inline-formula><mml:math id="M176" display="inline"><mml:msup><mml:mi/><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula> in New York (<uri>https://hazards.atcouncil.org/</uri>, last access: 21 September 2022).
The reduced dataset (with only New York and New Jersey) did not have any data points with wind speeds above  40 m s<inline-formula><mml:math id="M177" display="inline"><mml:msup><mml:mi/><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula>.
In contrast, the complete dataset (including Florida and Texas) had 88 cities (<inline-formula><mml:math id="M178" display="inline"><mml:mrow><mml:mn mathvariant="normal">4.6</mml:mn><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi mathvariant="italic">%</mml:mi></mml:mrow></mml:math></inline-formula> of data points) with winds greater than  40 m s<inline-formula><mml:math id="M179" display="inline"><mml:msup><mml:mi/><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula> and 29 cities (<inline-formula><mml:math id="M180" display="inline"><mml:mrow><mml:mn mathvariant="normal">1.5</mml:mn><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi mathvariant="italic">%</mml:mi></mml:mrow></mml:math></inline-formula> of data points) with winds above  70 m s<inline-formula><mml:math id="M181" display="inline"><mml:msup><mml:mi/><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula>.
With the complete dataset, the random forest predictions reach a maximum value of 0.76 for winds of  75 m s<inline-formula><mml:math id="M182" display="inline"><mml:msup><mml:mi/><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula>. While these results show improvement,
they also show that the data are still insufficient to make the random forest model follow the physics of infrastructure failure and extrapolate predictions for high winds causing outages close to 100 %.</p>
</sec>
<sec id="Ch1.S8.SS3">
  <label>8.3</label><title>Lack of physics-based variance shapes</title>
      <p id="d1e4065">In catastrophic storms, we expect large outages with higher certainty, e.g., Hurricane Ida (2021) in Louisiana <xref ref-type="bibr" rid="bib1.bibx18" id="paren.133"/>  and Tropical Storm Fiona (2022) in Puerto Rico <xref ref-type="bibr" rid="bib1.bibx55" id="paren.134"/>, close to 100 %.
Structural models for power poles estimate failure probabilities close to 95 % for winds of  70 m s<inline-formula><mml:math id="M183" display="inline"><mml:msup><mml:mi/><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula> (<xref ref-type="bibr" rid="bib1.bibx50 bib1.bibx4" id="altparen.135"/>).
Thus, the physics of infrastructure failure suggests that variance in outages should be smaller for catastrophic winds.
To evaluate if existing models follow these principles from the physics of power infrastructure failure, we quantified the variance in predictions for Jersey City by varying the wind speed and keeping the other input variables unchanged (Fig. <xref ref-type="fig" rid="Ch1.F11"/>).</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F11" specific-use="star"><?xmltex \currentcnt{11}?><?xmltex \def\figurename{Figure}?><label>Figure 11</label><caption><p id="d1e4093">Prediction ranges of power outage as a function of wind speed for <bold>(a)</bold> the negative binomial GAM and <bold>(b)</bold> random forest.
For GAM, mean and standard deviations are based on Eq. (<xref ref-type="disp-formula" rid="Ch1.E5"/>). For random forest, we use quantile random forest to determine percentiles and assume normal distribution to find comparable intervals of the mean plus and minus the standard deviation. Black lines indicate the mean outage predictions. Blue dashed lines indicate the 1 standard deviation interval for outage predictions.</p></caption>
          <?xmltex \igopts{width=369.885827pt}?><graphic xlink:href="https://nhess.copernicus.org/articles/23/1665/2023/nhess-23-1665-2023-f11.png"/>

        </fig>

      <p id="d1e4110">As discussed previously,
negative binomial GAMs capture the variability of outages better than Poisson models.
Thus, we focus on the former models and estimate the variance according to Eq. (<xref ref-type="disp-formula" rid="Ch1.E5"/>).
Figure <xref ref-type="fig" rid="Ch1.F11"/>a shows the mean and 1 standard deviation interval for outage predictions with varying wind speeds.
We normalized the GAM's predictions to show fractions and compare them to the random forest model.
GAM's predictions go beyond 1, as discussed previously, but we truncated the <inline-formula><mml:math id="M184" display="inline"><mml:mi>y</mml:mi></mml:math></inline-formula> axis at 1 for comparison purposes.
The linear relationship in the link function ensures that the variance (function of the square of mean outages from Eq. <xref ref-type="disp-formula" rid="Ch1.E5"/>) grows with wind speed. For example,  for a wind of  40 m s<inline-formula><mml:math id="M185" display="inline"><mml:msup><mml:mi/><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula>, we have a standard deviation of 0.02, and for a wind of  70 m s<inline-formula><mml:math id="M186" display="inline"><mml:msup><mml:mi/><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula>, we have a standard deviation that is 15 times higher with a value of 0.3.
Thus, the variance shows higher values as the predicted outage fraction approaches to 1.
In fact, the variance is also unbounded in the negative binomial GAM and goes to infinity.</p>
      <p id="d1e4152">The random forest model can only predict the mean number of outages. Thus, it cannot evaluate variances.
However, the quantile regression forest (QRF) <xref ref-type="bibr" rid="bib1.bibx44" id="paren.136"/> can predict the outages at different confidence intervals, and we use it to quantify variance.
The QRF uses the recorded observation at the leaf node to predict
confidence intervals.
These intervals are fully data-driven, as random forests do not assume any underlying probability distribution on predicted outages <xref ref-type="bibr" rid="bib1.bibx1" id="paren.137"/>.
We presented random forest prediction intervals in Fig. <xref ref-type="fig" rid="Ch1.F11"/>b. The random forests had a standard deviation of 0.45 for high winds (<inline-formula><mml:math id="M187" display="inline"><mml:mrow><mml:mo>&gt;</mml:mo><mml:mn mathvariant="normal">70</mml:mn></mml:mrow></mml:math></inline-formula> m s<inline-formula><mml:math id="M188" display="inline"><mml:msup><mml:mi/><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula>), departing from the expected value of zero for catastrophic winds.
Additional data could improve these random forest variance estimates.
However, as mentioned earlier, infrastructure failure data are sparse.</p>
      <p id="d1e4185">Moreover, structural models predict no damage to power infrastructure at wind speeds lower than  10 m s<inline-formula><mml:math id="M189" display="inline"><mml:msup><mml:mi/><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula>  <xref ref-type="bibr" rid="bib1.bibx32 bib1.bibx4" id="paren.138"/>.
Thus, we expect outage predictions closer to 0 with a higher degree of certainty.
Negative binomial (and Poisson) GAMs handle this case well, as the variance is zero when the mean outage is zero (Eqs. <xref ref-type="disp-formula" rid="Ch1.E1"/> and <xref ref-type="disp-formula" rid="Ch1.E5"/>).
In contrast, we found that random forests had a standard deviation of 0.66 for zero wind speeds, showing that they also have limitations to represent physics-informed variance at low wind speeds.</p>
      <?pagebreak page1678?><p id="d1e4207">As standard deviation is high for outages with the variation of winds, and precipitation is the second most important variable in the RF model per Fig. <xref ref-type="fig" rid="Ch1.F8"/>, we explore the relationship between outage fraction and precipitation, based on Eq. (<xref ref-type="disp-formula" rid="Ch1.E9"/>), to understand the non-linearity in outages.
We show the partial dependence plot of precipitation in Fig. <xref ref-type="fig" rid="Ch1.F12"/>. The partial dependence plot of precipitation (Fig. <xref ref-type="fig" rid="Ch1.F12"/>) shows non-linearity with an increase in the fraction of outages from 0.3 to 0.4 over the range of observed precipitation for the historical storms. However, we can observe from Fig. <xref ref-type="fig" rid="Ch1.F12"/>, similar to Fig. <xref ref-type="fig" rid="Ch1.F11"/>, that precipitation also explains limited variability in the outages in this case. Also, at zero wind speed and zero precipitation RF model predicts non-zero outages.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F12"><?xmltex \currentcnt{12}?><?xmltex \def\figurename{Figure}?><label>Figure 12</label><caption><p id="d1e4225">Partial dependence plot of fractional outages with precipitation.</p></caption>
          <?xmltex \igopts{width=213.395669pt}?><graphic xlink:href="https://nhess.copernicus.org/articles/23/1665/2023/nhess-23-1665-2023-f12.png"/>

        </fig>

</sec>
</sec>
<sec id="Ch1.S9" sec-type="conclusions">
  <label>9</label><title>Conclusions and future research</title>
      <p id="d1e4243">This paper summarized existing power outage prediction models, (a) GLMs and (b) GAMs based on Poisson and negative binomial distributions and (c) random forest regressions.
Power outages depend on several factors, including hurricane, environmental, and demographic conditions.
To examine the existing models, we used power outage data with a total of 3.6 million outages for Hurricane Isaias (2020) in New York and New Jersey states, Hurricane Harvey (2017) in Texas, and Hurricane Michael (2018) in Florida.
We combined the outages from these states to develop a generalized power outage model across different regions, improving previous efforts that only calibrated outage models to a particular region or utility companies in the US. We conducted a feature selection to avoid multi-collinearity among input variables and calibrated the state-of-the-art outage models using seven input parameters: 3 s wind gust speed, 7 d  precipitation after the storm, standard precipitation index for 6 months before the storm, soil moisture for a depth between 0 and 10 cm, population density, the percent area covered by trees, and trees' root zone depth.</p>
      <p id="d1e4246">First, we found that Poisson regressions are unsuitable for modeling outages, as historical outages have larger variances than the mean, resulting in overdispersion. The overdispersion was evident by the large residual deviances of 6 038 042 and 3 565 948 for the Poisson GLM and GAM, respectively, for 1520 degrees of freedom.
We found that negative binomial regressions account for these larger variances better than Poisson regressions since we obtained residual deviances of 2078 and 1813 for GLM and GAM, respectively.
We also showed that GAMs could better model the non-linear behavior of predictors compared to GLMs since
<inline-formula><mml:math id="M190" display="inline"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">DEV</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M191" display="inline"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="italic">ψ</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> significantly increased to 0.62 and 0.99, respectively, compared to values of 0.29 and 0.69 in negative binomial GLMs.
We demonstrated that the random forest could also capture this non-linear behavior, as we found a value of 0.48 for the <inline-formula><mml:math id="M192" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> in the cross-validation.</p>
      <?pagebreak page1679?><p id="d1e4286">However, each model has its own merits and demerits in predicting outages.  Poisson and negative binomial estimates are unbounded and can overestimate power outages.
For example, the negative binomial regression predicted more outages than the number of customers for <inline-formula><mml:math id="M193" display="inline"><mml:mrow><mml:mn mathvariant="normal">19.8</mml:mn><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi mathvariant="italic">%</mml:mi></mml:mrow></mml:math></inline-formula> of cities in the test data, with an overprediction ratio of 5.2 for predicted outages compared to the actual number of customers.
Random forest predictions are hard to calibrate for extreme winds, as outage data are limited. As a result, we found that they could not be extrapolated for high winds since we only had 1.5 % observations with wind speeds greater than <inline-formula><mml:math id="M194" display="inline"><mml:mn mathvariant="normal">70</mml:mn></mml:math></inline-formula> m s<inline-formula><mml:math id="M195" display="inline"><mml:msup><mml:mi/><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula>.
The negative binomial GAM failed to account for low uncertainty in outage predictions at high winds, as we observed that instead the standard deviation in predictions grew 15 times with increasing wind speed from <inline-formula><mml:math id="M196" display="inline"><mml:mn mathvariant="normal">40</mml:mn></mml:math></inline-formula> to <inline-formula><mml:math id="M197" display="inline"><mml:mn mathvariant="normal">70</mml:mn></mml:math></inline-formula> m s<inline-formula><mml:math id="M198" display="inline"><mml:msup><mml:mi/><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula>.
We found that random forest also fails to account for low uncertainty at low winds. Overestimated power outages could result in prioritizing a less affected city, placing more resources on that city than required. Limited mobility of crews during a disaster can lead to prolonged outages, delaying the restoration effects <xref ref-type="bibr" rid="bib1.bibx49" id="paren.139"/>. In general, erroneous power outage estimates with high uncertainty can result in the non-optimal placement of resources, as optimal resource allocation algorithms will use predicted outages <xref ref-type="bibr" rid="bib1.bibx6" id="paren.140"/>.</p>
      <p id="d1e4352">We suggest beta <xref ref-type="bibr" rid="bib1.bibx22" id="paren.141"/> and binomial regressions <xref ref-type="bibr" rid="bib1.bibx16" id="paren.142"/> to model power outages in future research.
While testing their performance fell outside this paper's scope, beta and binomial distribution can help overcome existing limitations due to their fundamental properties.
For example, beta and binomial regressions are upper-bounded, unlike negative binomial GLM and GAM regressions.
Thus, beta or binomial GAMs can model the fraction of outages in a city, i.e., directly in the case of beta since it goes from 0 to 1, or after normalizing the total number of outages by the maximum number of customers in the case of the binomial regressions. Also, beta and binomial GAMs can extrapolate outages for the extreme (low and high) values of winds since they can model monotonically increasing outages as a function of environmental parameters.
Finally, beta and binomial GAMs have variances closer to zero at outage fraction observation values of 0 and 1, better representing the physics or power infrastructure failures.</p>
</sec>

      
      </body>
    <back><app-group>

<app id="App1.Ch1.S1">
  <?xmltex \currentcnt{A}?><label>Appendix A</label><?xmltex \opttitle{$R^{{2}}$ parameter}?><title><inline-formula><mml:math id="M199" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> parameter</title>
      <p id="d1e4383">The <inline-formula><mml:math id="M200" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> parameter, a goodness-of-fit measure, is used to compare and select among different models. The goodness-of-fit measure can quantify how good predictions are by the fitted model on unseen or test data.
          <disp-formula id="App1.Ch1.S1.E10" content-type="numbered"><label>A1</label><mml:math id="M201" display="block"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>-</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mi mathvariant="normal">RSS</mml:mi><mml:mi mathvariant="normal">TSS</mml:mi></mml:mfrac></mml:mstyle></mml:mrow></mml:math></disp-formula></p>
      <p id="d1e4419">The <inline-formula><mml:math id="M202" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> parameter mentioned in Eq. (<xref ref-type="disp-formula" rid="App1.Ch1.S1.E10"/>) represents the amount of variability explained by the fitted compared model to the null model. The null model predicts the average of observed outages (<inline-formula><mml:math id="M203" display="inline"><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo mathvariant="normal">‾</mml:mo></mml:mover></mml:math></inline-formula>) for all the cities irrespective of the input parameters.
In the Eq. (<xref ref-type="disp-formula" rid="App1.Ch1.S1.E10"/>), TSS is the residual sum of squares, defined by the sum of squares of the difference between the true value of the response variable and average of true values of response variable.
          <disp-formula id="App1.Ch1.S1.E11" content-type="numbered"><label>A2</label><mml:math id="M204" display="block"><mml:mrow><mml:mi mathvariant="normal">TSS</mml:mi><mml:mo>=</mml:mo><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mi>i</mml:mi><mml:mi>N</mml:mi></mml:munderover><mml:mo>(</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo mathvariant="normal">‾</mml:mo></mml:mover><mml:msup><mml:mo>)</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></disp-formula></p>
      <p id="d1e4482">RSS is the total sum of squares, defined by the sum of squares of the difference between the true value of the response variable and the predicted value of the response variable from the fitted model.
          <disp-formula id="App1.Ch1.S1.E12" content-type="numbered"><label>A3</label><mml:math id="M205" display="block"><mml:mrow><mml:mi mathvariant="normal">RSS</mml:mi><mml:mo>=</mml:mo><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mi>i</mml:mi><mml:mi>N</mml:mi></mml:munderover><mml:mo>(</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:msup><mml:mo>)</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></disp-formula></p>
<sec id="App1.Ch1.S1.SS1">
  <label>A1</label><?xmltex \opttitle{$R^{{2}}_{\mathrm{DEV}}$ parameter}?><title><inline-formula><mml:math id="M206" display="inline"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">DEV</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> parameter</title>
      <p id="d1e4541">We quantify overdispersion by calculating if the residual deviance is larger than the degrees of freedom. Degrees of freedom is defined as the number of data points in the training data minus the number of input parameters.
We estimate
            <disp-formula id="App1.Ch1.S1.E13" content-type="numbered"><label>A4</label><mml:math id="M207" display="block"><mml:mrow><mml:mi mathvariant="normal">Deviance</mml:mi><mml:mo>=</mml:mo><mml:mo>-</mml:mo><mml:mn mathvariant="normal">2</mml:mn><mml:mfenced open="(" close=")"><mml:mrow><mml:mi mathvariant="normal">LL</mml:mi><mml:mo>(</mml:mo><mml:mi mathvariant="normal">sat</mml:mi><mml:mo>)</mml:mo><mml:mo>-</mml:mo><mml:mi mathvariant="normal">LL</mml:mi><mml:mo>(</mml:mo><mml:mi mathvariant="normal">fit</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mfenced><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
          where LL(sat) is the maximum achievable log-likelihood for the saturated model,
and LL(fit) is the log-likelihood for the fitted model.
Simplified versions of Eq. (<xref ref-type="disp-formula" rid="App1.Ch1.S1.E13"/>) to calculate deviance for different distributions are given below.
<list list-type="bullet"><list-item>
      <p id="d1e4584"><italic>Poisson</italic>. Residual deviance <xref ref-type="bibr" rid="bib1.bibx8" id="paren.143"/> for the <inline-formula><mml:math id="M208" display="inline"><mml:mi>i</mml:mi></mml:math></inline-formula>th observation for Poisson GLM and GAM is<disp-formula id="App1.Ch1.S1.E14" content-type="numbered"><label>A5</label><mml:math id="M209" display="block"><mml:mtable class="split" rowspacing="0.2ex" displaystyle="true" columnalign="right left"><mml:mtr><mml:mtd><mml:mrow><mml:msub><mml:mi>d</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover><mml:mi>i</mml:mi></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mo>=</mml:mo><mml:mi mathvariant="normal">sign</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover><mml:mi>i</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mo>⋅</mml:mo><mml:mfenced close="" open="["><mml:mrow><mml:mn mathvariant="normal">2</mml:mn><mml:mfenced close="" open="{"><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mi mathvariant="normal">log</mml:mi><mml:mfenced open="(" close=")"><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mfrac></mml:mstyle></mml:mfenced></mml:mrow></mml:mfenced></mml:mrow></mml:mfenced></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd/><mml:mtd><mml:mrow><mml:msup><mml:mfenced close="]" open=""><mml:mfenced close="}" open=""><mml:mrow><mml:mo>-</mml:mo><mml:mo>(</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover><mml:mi>i</mml:mi></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:mfenced></mml:mfenced><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>/</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:mrow></mml:msup><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>where <inline-formula><mml:math id="M210" display="inline"><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> are <inline-formula><mml:math id="M211" display="inline"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> are the observed and predicted outages for <inline-formula><mml:math id="M212" display="inline"><mml:mi>i</mml:mi></mml:math></inline-formula>th city. Deviance for the model is the sum of the square of the residual deviance for each observation.<disp-formula id="App1.Ch1.S1.E15" content-type="numbered"><label>A6</label><mml:math id="M213" display="block"><mml:mrow><mml:mi>D</mml:mi><mml:mo>(</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:munderover><mml:msup><mml:mfenced open="(" close=")"><mml:mrow><mml:msub><mml:mi>d</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover><mml:mi>i</mml:mi></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:mfenced><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></disp-formula></p></list-item><list-item>
      <p id="d1e4825"><italic>Negative binomial</italic>. Residual deviance for the <inline-formula><mml:math id="M214" display="inline"><mml:mi>i</mml:mi></mml:math></inline-formula>th observation for a negative binomial GLM is<disp-formula id="App1.Ch1.S1.E16" content-type="numbered"><label>A7</label><mml:math id="M215" display="block"><mml:mtable rowspacing="0.2ex" class="split" displaystyle="true" columnalign="right left"><mml:mtr><mml:mtd><mml:mrow><mml:msub><mml:mi>d</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:mi>i</mml:mi></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mo>=</mml:mo><mml:mi mathvariant="normal">sign</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:mi>i</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mo>⋅</mml:mo><mml:mfenced close="" open="["><mml:mrow><mml:mn mathvariant="normal">2</mml:mn><mml:mfenced open="{" close=""><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mi mathvariant="normal">log</mml:mi><mml:mfenced close=")" open="("><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mfrac></mml:mstyle></mml:mfenced></mml:mrow></mml:mfenced></mml:mrow></mml:mfenced></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd/><mml:mtd><mml:mrow><mml:msup><mml:mfenced open="" close="]"><mml:mfenced close="}" open=""><mml:mrow><mml:mo>-</mml:mo><mml:mo>(</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>/</mml:mo><mml:mi>k</mml:mi><mml:mo>)</mml:mo><mml:mi mathvariant="normal">ln</mml:mi><mml:mfenced close="]" open="["><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>/</mml:mo><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover><mml:mi>i</mml:mi></mml:msub><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>/</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:mfrac></mml:mstyle></mml:mfenced></mml:mrow></mml:mfenced></mml:mfenced><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>/</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:mrow></mml:msup><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>and deviance for the fitted model is estimated by Eq. (<xref ref-type="disp-formula" rid="App1.Ch1.S1.E15"/>).</p></list-item></list>
For GLMs and GAMs, a <italic>pseudo-</italic><inline-formula><mml:math id="M216" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>, denoted as <inline-formula><mml:math id="M217" display="inline"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">DEV</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> <xref ref-type="bibr" rid="bib1.bibx8" id="paren.144"/>, is also defined to compare the statistical performance based on model deviance. Similar to the definition of <inline-formula><mml:math id="M218" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M219" display="inline"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">DEV</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> measures the reduction in deviance of the fitted model when compared with the null model. The value of <inline-formula><mml:math id="M220" display="inline"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">DEV</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> is given by
            <disp-formula id="App1.Ch1.S1.E17" content-type="numbered"><label>A8</label><mml:math id="M221" display="block"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">DEV</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>-</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:mi>D</mml:mi><mml:mo>(</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mi>D</mml:mi><mml:mo>(</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo mathvariant="normal">‾</mml:mo></mml:mover><mml:mo>)</mml:mo></mml:mrow></mml:mfrac></mml:mstyle><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
          <inline-formula><mml:math id="M222" display="inline"><mml:mrow><mml:mi>D</mml:mi><mml:mo>(</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> is the deviance for the fitted model already defined in Eq. (<xref ref-type="disp-formula" rid="App1.Ch1.S1.E13"/>). For a null model, predictions will always be <inline-formula><mml:math id="M223" display="inline"><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo mathvariant="normal">‾</mml:mo></mml:mover></mml:math></inline-formula> (average of observed outages in training data).
<inline-formula><mml:math id="M224" display="inline"><mml:mrow><mml:mi>D</mml:mi><mml:mo>(</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo mathvariant="normal">‾</mml:mo></mml:mover><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> is the deviance for a null model, which can be obtained by replacing  LL(fit)  in Eq. (<xref ref-type="disp-formula" rid="App1.Ch1.S1.E13"/>) with LL(null).</p>
      <p id="d1e5182">The value of <inline-formula><mml:math id="M225" display="inline"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">DEV</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> will increase after adding more predictors, as more predictors will always explain more variability in outage counts, decreasing the residual deviance. Also, the value of <inline-formula><mml:math id="M226" display="inline"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">DEV</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> is bounded from 0 to 1, and a value closer to 1 will indicate a good fit of the model <xref ref-type="bibr" rid="bib1.bibx8" id="paren.145"/>.</p>
</sec>
<?pagebreak page1680?><sec id="App1.Ch1.S1.SS2">
  <label>A2</label><?xmltex \opttitle{$R^{2}_{{k}}$ parameter}?><title><inline-formula><mml:math id="M227" display="inline"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi>k</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> parameter</title>
      <p id="d1e5235"><inline-formula><mml:math id="M228" display="inline"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi>k</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> is defined to measure the reduction in overdispersion for the fitted model negative binomial regression models when compared to the null model <xref ref-type="bibr" rid="bib1.bibx39 bib1.bibx28" id="paren.146"/>.
            <disp-formula id="App1.Ch1.S1.E18" content-type="numbered"><label>A9</label><mml:math id="M229" display="block"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi>k</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>-</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mi>k</mml:mi><mml:mrow><mml:msub><mml:mi>k</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub></mml:mrow></mml:mfrac></mml:mstyle></mml:mrow></mml:math></disp-formula>
          <inline-formula><mml:math id="M230" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula> is the overdispersion factor (Eq. <xref ref-type="disp-formula" rid="Ch1.E5"/>) for the fitted model, and <inline-formula><mml:math id="M231" display="inline"><mml:mrow><mml:msub><mml:mi>k</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> is the overdispersion factor (Eq. <xref ref-type="disp-formula" rid="Ch1.E5"/>) for the null model. Models with low overdispersion will have a low value of <inline-formula><mml:math id="M232" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>. The <inline-formula><mml:math id="M233" display="inline"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi>k</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> will be closer to 1 for a model with less overdispersion.</p>
</sec>
</app>
  </app-group><notes notes-type="codedataavailability"><title>Code and data availability</title>

      <p id="d1e5331">Power outage data are available from PowerOutage (<uri>https://poweroutage.us/</uri>, last access: 21 September 2022; <xref ref-type="bibr" rid="bib1.bibx53" id="altparen.147"/>). The data on Land Cover are available from the Multi-Resolution Land Characteristics Consortium (<uri>https://www.mrlc.gov/viewer/</uri>, last access: 21 September, 2022; <xref ref-type="bibr" rid="bib1.bibx46" id="altparen.148"/>). Precipitation and soil moisture data are available from the National Land Data Assimilation System Phase 2 (Xia et al., 2012; Xia, 2012). The soil data are available from Soil Survey Staff. The percent tree data are available from National Insect and Disaster Risk Maps (Krist et al., 2014). Elevation data are obtained from USGS (Danielson and Gesh, 2011). Additional analyses are available from the authors upon reasonable request.</p>
  </notes><app-group>
        <supplementary-material position="anchor"><p id="d1e5346">The supplement related to this article is available online at: <inline-supplementary-material xlink:href="https://doi.org/10.5194/nhess-23-1665-2023-supplement" xlink:title="pdf">https://doi.org/10.5194/nhess-23-1665-2023-supplement</inline-supplementary-material>.</p></supplementary-material>
        </app-group><notes notes-type="authorcontribution"><title>Author contributions</title>

      <p id="d1e5355">PA reviewed the existing models for power outage predictions during hurricanes, under LC's supervision. PA and LC collected and curated the data for outages and input parameters and fitted the power outage models for predicting outages.  PA drafted the paper with contributions and revisions from LC.</p>
  </notes><notes notes-type="competinginterests"><title>Competing interests</title>

      <p id="d1e5361">The contact author has declared that none of the authors has any competing interests.</p>
  </notes><notes notes-type="disclaimer"><title>Disclaimer</title>

      <p id="d1e5367">Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.</p>
  </notes><notes notes-type="sistatement"><title>Special issue statement</title>

      <p id="d1e5373">This article is part of the special issue “Advances in machine learning for natural hazards risk assessment”. It is not associated with a conference.</p>
  </notes><ack><title>Acknowledgements</title><p id="d1e5379">We acknowledge the financial support by the NYU Tandon School of Engineering Fellowship. Additionally, this research was also supported by the Coalition for Disaster Resilient Infrastructure Fellowship Grant 210924669. The authors are grateful for their generous support.</p></ack><notes notes-type="financialsupport"><title>Financial support</title>

      <p id="d1e5384">This research has been supported by the NYU Tandon School of Engineering Fellowship. This research has been additionally supported by the Coalition for Disaster Resilient Infrastructure Fellowship (grant no. 201924669).</p>
  </notes><notes notes-type="reviewstatement"><title>Review statement</title>

      <p id="d1e5390">This paper was edited by Vitor Silva and reviewed by two anonymous referees.</p>
  </notes><ref-list>
    <title>References</title>

      <ref id="bib1.bibx1"><?xmltex \def\ref@label{{Ahsanullah et~al.(2014)}}?><label>Ahsanullah et al.(2014)</label><?label Ahsanullah2014NormalDistribution?><mixed-citation>Ahsanullah, M., Kibria, B. M. G., and Shakil, M.: Normal Distribution,
7–50, <uri>https://link.springer.com/chapter/10.2991/978-94-6239-061-4_2</uri> (last access: 21 September 2022), 2014.</mixed-citation></ref>
      <ref id="bib1.bibx2"><?xmltex \def\ref@label{{{AJOT}(2021)}}?><label>AJOT(2021)</label><?label AJOT2021HurricaneAJOT.COM?><mixed-citation>AJOT: Hurricane Ida caused at least 1.2 million electricity customers to
lose power <inline-formula><mml:math id="M234" display="inline"><mml:mo>|</mml:mo></mml:math></inline-formula> AJOT.COM, <uri>https://ajot.com/news/hurricane-ida-caused-at-least-1.2-million-electricity-customers-to-lose-power</uri> (last access: 21 September, 2022),
2021.</mixed-citation></ref>
      <ref id="bib1.bibx3"><?xmltex \def\ref@label{{Arab et~al.(2016)}}?><label>Arab et al.(2016)</label><?label Arab2016ElectricEconomics?><mixed-citation>Arab, A., Khodaei, A., Khator, S. K., and Han, Z.: Electric Power Grid
Restoration Considering Disaster Economics, IEEE Access, 4, 639–649,
<ext-link xlink:href="https://doi.org/10.1109/ACCESS.2016.2523545" ext-link-type="DOI">10.1109/ACCESS.2016.2523545</ext-link>, 2016.</mixed-citation></ref>
      <ref id="bib1.bibx4"><?xmltex \def\ref@label{{Bjarnadottir et~al.(2013)}}?><label>Bjarnadottir et al.(2013)</label><?label Bjarnadottir2013HurricaneClimate?><mixed-citation>Bjarnadottir, S., Li, Y., and Stewart, M. G.: Hurricane Risk Assessment of
Power Distribution Poles Considering Impacts of a Changing Climate, J. Infrastruct. Sys., 19, 12–24,
<ext-link xlink:href="https://doi.org/10.1061/(asce)is.1943-555x.0000108" ext-link-type="DOI">10.1061/(asce)is.1943-555x.0000108</ext-link>, 2013.</mixed-citation></ref>
      <ref id="bib1.bibx5"><?xmltex \def\ref@label{{Breiman(2001)}}?><label>Breiman(2001)</label><?label L2001RandomForests?><mixed-citation>
Breiman, L.: Random forests, Mach. Learn.,  45, 5–32, 2001.</mixed-citation></ref>
      <ref id="bib1.bibx6"><?xmltex \def\ref@label{{Brown(2002)}}?><label>Brown(2002)</label><?label Brown2002ElectricReliability?><mixed-citation>Brown, R. E.: Electric Power Distribution Reliability,
<ext-link xlink:href="https://doi.org/10.1201/9780849375682" ext-link-type="DOI">10.1201/9780849375682</ext-link>), 2002.</mixed-citation></ref>
      <ref id="bib1.bibx7"><?xmltex \def\ref@label{{Cai et~al.(2018)}}?><label>Cai et al.(2018)</label><?label Cai2018FeaturePerspective?><mixed-citation>Cai, J., Luo, J., Wang, S., and Yang, S.: Feature selection in machine
learning: A new perspective, Neurocomputing, 300, 70–79,
<ext-link xlink:href="https://doi.org/10.1016/j.neucom.2017.11.077" ext-link-type="DOI">10.1016/j.neucom.2017.11.077</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bibx8"><?xmltex \def\ref@label{{Cameron and Windmeijer(1996)}}?><label>Cameron and Windmeijer(1996)</label><?label Cameron1996R-squaredUtilization?><mixed-citation>Cameron, A. C. and Windmeijer, F. A.: R-squared measures for count data
regression models with applications to health-care utilization,
J. Bus. Econ. Stat., 14, 209–220,
<ext-link xlink:href="https://doi.org/10.1080/07350015.1996.10524648" ext-link-type="DOI">10.1080/07350015.1996.10524648</ext-link>, 1996.</mixed-citation></ref>
      <ref id="bib1.bibx9"><?xmltex \def\ref@label{{Casey(2016)}}?><label>Casey(2016)</label><?label Casey2016TheStates?><mixed-citation>Casey, S.: The United States, The Ashgate Research Companion to the Korean
War, 49–60 pp., <ext-link xlink:href="https://doi.org/10.1007/978-1-349-08679-5_5" ext-link-type="DOI">10.1007/978-1-349-08679-5_5</ext-link>, 2016.</mixed-citation></ref>
      <ref id="bib1.bibx10"><?xmltex \def\ref@label{{Ceferino et~al.(2018)}}?><label>Ceferino et al.(2018)</label><?label CeferinoCasualtyApp2018?><mixed-citation>Ceferino, L., Kiremidjian, A., and Deierlein, G.: Regional Multiseverity
Casualty Estimation Due to Building Damage following a Mw 8.8 Earthquake
Scenario in Lima, Peru, Earthq. Spectra, 34, 1739–1761,
<ext-link xlink:href="https://doi.org/10.1193/080617EQS154M" ext-link-type="DOI">10.1193/080617EQS154M</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bibx11"><?xmltex \def\ref@label{{Ceferino et~al.(2020)}}?><label>Ceferino et al.(2020)</label><?label Ceferino2020?><mixed-citation>Ceferino, L., Mitrani-Reiser, J., Kiremidjian, A., Deierlein, G., and
Bambarén, C.: Effective plans for hospital system response to
earthquake emergencies, Nat. Commun., 11, 1–12,
<ext-link xlink:href="https://doi.org/10.1038/s41467-020-18072-w" ext-link-type="DOI">10.1038/s41467-020-18072-w</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx12"><?xmltex \def\ref@label{{Chapman(2000)}}?><label>Chapman(2000)</label><?label Chapman2000AssessingExposure?><mixed-citation>Chapman, L.: Assessing topographic exposure, Meteorol. Appl., 7,
335–340, <ext-link xlink:href="https://doi.org/10.1017/S1350482700001729" ext-link-type="DOI">10.1017/S1350482700001729</ext-link>, 2000.</mixed-citation></ref>
      <ref id="bib1.bibx13"><?xmltex \def\ref@label{{Chavas et~al.(2015)}}?><label>Chavas et al.(2015)</label><?label Chavas2015AStructure?><mixed-citation>Chavas, D. R., Lin, N., and Emanuel, K.: A model for the complete radial
structure of the tropical cyclone wind field. Part I: Comparison with
observed structure, J. Atmos. Sci., 72, 3647–3662,
<ext-link xlink:href="https://doi.org/10.1175/JAS-D-15-0014.1" ext-link-type="DOI">10.1175/JAS-D-15-0014.1</ext-link>, 2015.</mixed-citation></ref>
      <ref id="bib1.bibx14"><?xmltex \def\ref@label{{Congress.gov(2020)}}?><label>Congress.gov(2020)</label><?label Congress.gov2020H.R.5760Congress?><mixed-citation>Congress.gov: H.R.5760 – 116th Congress (2019-2020): Grid Security Research
and Development Act <inline-formula><mml:math id="M235" display="inline"><mml:mo>|</mml:mo></mml:math></inline-formula> Congress.gov <inline-formula><mml:math id="M236" display="inline"><mml:mo>|</mml:mo></mml:math></inline-formula> Library of Congress,
<uri>https://www.congress.gov/bill/116th-congress/house-bill/5760</uri>  (last access: 21 September 2022),
2020.</mixed-citation></ref>
      <ref id="bib1.bibx15"><?xmltex \def\ref@label{{Danielson and Gesh(2011)}}?><label>Danielson and Gesh(2011)</label><?label Danielson2011Global20111073?><mixed-citation>Danielson, J. and Gesh, D.: Global multi-resolution terrain elevation data
2010 (GMTED2010): U.S. Geological Survey Open-File Report 2011–1073,
<uri>https://www.usgs.gov/publications/global-multi-resolution-terrain-elevation-data-2010-gmted2010</uri>  (last access: last access: 21 September 2022),
2011.</mixed-citation></ref>
      <ref id="bib1.bibx16"><?xmltex \def\ref@label{{Dunn and
Smyth(2018)}}?><label>Dunn and
Smyth(2018)</label><?label Dunn2018GeneralizedHttps://link.springer.com/book/10.1007/978-1-4419-0118-7?><mixed-citation>Dunn, P. K. and Smyth, G. K.: Generalized Linear Models With Examples in R,
<uri>https://link.springer.com/book/10.1007/978-1-4419-0118-7</uri>  (last access: last access: 21 September 2022), 2018.</mixed-citation></ref>
      <ref id="bib1.bibx17"><?xmltex \def\ref@label{{{EIA.GOV}(2018)}}?><label>EIA.GOV(2018)</label><?label EIA.GOV2018U.S.Analysis?><mixed-citation>EIA.GOV: U.S. Energy Information Administration – EIA – Independent
Statistics and Analysis,
<uri>https://www.eia.gov/todayinenergy/detail.php?id=37332</uri> (last access: last access: 21 September 2022), 2018.</mixed-citation></ref>
      <ref id="bib1.bibx18"><?xmltex \def\ref@label{{Elamrouss(2021)}}?><label>Elamrouss(2021)</label><?label Elamrouss202175CNN?><mixed-citation>Elamrouss, A.: 75% of power outages reported in Louisiana after Hurricane
Ida have been restored, governor says <inline-formula><mml:math id="M237" display="inline"><mml:mo>|</mml:mo></mml:math></inline-formula> CNN,
<uri>https://www.cnn.com/2021/09/09/us/hurricane-ida-aftermath-louisiana-thursday/index.html</uri> (last access: last access: 21 September 2022),
2021.</mixed-citation></ref>
      <ref id="bib1.bibx19"><?xmltex \def\ref@label{{Eskandarpour and
Khodaei(2018)}}?><label>Eskandarpour and
Khodaei(2018)</label><?label Eskandarpour2018LeveragingPredictions?><mixed-citation>Eskandarpour, R. and Khodaei, A.: Leveraging accuracy-uncertainty tradeoff in
SVM to achieve highly accurate outage predictions,
IEEE T. Power Syst., 33, 1139–1141, <ext-link xlink:href="https://doi.org/10.1109/TPWRS.2017.2759061" ext-link-type="DOI">10.1109/TPWRS.2017.2759061</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bibx20"><?xmltex \def\ref@label{{Eskandarpour et~al.(2018)}}?><label>Eskandarpour et al.(2018)</label><?label Eskandarpour2018ArtificialEvents?><mixed-citation>Eskandarpour, R., Khodaei, A., Paaso, A., and Abdullah, N. M.: Artificial
Intelligence Assisted Power Grid Hardening in Response to Extreme Weather
Events, Cornell University, <ext-link xlink:href="https://doi.org/10.48550/arxiv.1810.02866" ext-link-type="DOI">10.48550/arxiv.1810.02866</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bibx21"><?xmltex \def\ref@label{{{ESRI}(2019)}}?><label>ESRI(2019)</label><?label ESRI2019ArcGIS10.8?><mixed-citation>ESRI: ArcGIS Desktop: Release 10.8, Tech. rep., Environmental Systems
Research Institute, Relands, CA, <uri>https://www.arcgis.com/index.html</uri> (last access: 21 September 2022), 2019.</mixed-citation></ref>
      <ref id="bib1.bibx22"><?xmltex \def\ref@label{{Ferrari and Cribari-Neto(2010)}}?><label>Ferrari and Cribari-Neto(2010)</label><?label Ferrari2010BetaProportions?><mixed-citation>Ferrari, S. L. and Cribari-Neto, F.: Beta Regression for Modelling Rates and
Proportions, 31, 799–815,
<ext-link xlink:href="https://doi.org/10.1080/0266476042000214501" ext-link-type="DOI">10.1080/0266476042000214501</ext-link>, 2010.</mixed-citation></ref>
      <ref id="bib1.bibx23"><?xmltex \def\ref@label{{Guikema et~al.(2010)}}?><label>Guikema et al.(2010)</label><?label Guikema2010PrestormSystems?><mixed-citation>Guikema, S. D., Quiring, S. M., and Han, S. R.: Prestorm Estimation of
Hurricane Damage to Electric Power Distribution Systems, Risk Anal., 30,
1744–1752, <ext-link xlink:href="https://doi.org/10.1111/j.1539-6924.2010.01510.x" ext-link-type="DOI">10.1111/j.1539-6924.2010.01510.x</ext-link>, 2010.</mixed-citation></ref>
      <ref id="bib1.bibx24"><?xmltex \def\ref@label{{Guikema et~al.(2014)}}?><label>Guikema et al.(2014)</label><?label Guikema2014PredictingPlanning?><mixed-citation>Guikema, S. D., Nateghi, R., Quiring, S. M., Staid, A., Reilly, A. C., and Gao,
M.: Predicting Hurricane Power Outages to Support Storm Response Planning,
IEEE Access, 2, 1364–1373, <ext-link xlink:href="https://doi.org/10.1109/ACCESS.2014.2365716" ext-link-type="DOI">10.1109/ACCESS.2014.2365716</ext-link>, 2014.</mixed-citation></ref>
      <ref id="bib1.bibx25"><?xmltex \def\ref@label{{Guttman(1998)}}?><label>Guttman(1998)</label><?label Guttman1998ComparingIndex?><mixed-citation>Guttman, N. B.: Comparing the palmer drought index and the standardized
precipitation index, J. Am. Water Resour. As.,
34, 113–121, <ext-link xlink:href="https://doi.org/10.1111/j.1752-1688.1998.tb05964.x" ext-link-type="DOI">10.1111/j.1752-1688.1998.tb05964.x</ext-link>, 1998.</mixed-citation></ref>
      <ref id="bib1.bibx26"><?xmltex \def\ref@label{{Hall(2000)}}?><label>Hall(2000)</label><?label Hall2000Zero-InflatedStudy?><mixed-citation>Hall, D. B.: Zero-Inflated Poisson and Binomial Regression with Random
Effects: A Case Study, Biometrics, 56, 1030–1039,
<ext-link xlink:href="https://doi.org/10.1111/J.0006-341X.2000.01030.X" ext-link-type="DOI">10.1111/J.0006-341X.2000.01030.X</ext-link>, 2000.</mixed-citation></ref>
      <ref id="bib1.bibx27"><?xmltex \def\ref@label{{Han et~al.(2009a)}}?><label>Han et al.(2009a)</label><?label Han2009ImprovingModels?><mixed-citation>Han, S. R., Guikema, S. D., and Quiring, S. M.: Improving the predictive
accuracy of hurricane power outage forecasts using generalized additive
models, Risk Anal., 29, 1443–1453,
<ext-link xlink:href="https://doi.org/10.1111/j.1539-6924.2009.01280.x" ext-link-type="DOI">10.1111/j.1539-6924.2009.01280.x</ext-link>, 2009a.</mixed-citation></ref>
      <ref id="bib1.bibx28"><?xmltex \def\ref@label{{Han et~al.(2009b)}}?><label>Han et al.(2009b)</label><?label Han2009EstimatingRegion?><mixed-citation>Han, S. R., Guikema, S. D., Quiring, S. M., Lee, K. H., Rosowsky, D., and
Davidson, R. A.: Estimating the spatial distribution of power outages during
hurricanes in the Gulf coast region,
Reliab. Eng. Syst. Safe., 94, 199–210, <ext-link xlink:href="https://doi.org/10.1016/j.ress.2008.02.018" ext-link-type="DOI">10.1016/j.ress.2008.02.018</ext-link>, 2009b.</mixed-citation></ref>
      <ref id="bib1.bibx29"><?xmltex \def\ref@label{{Haseltine and Eman(2017)}}?><label>Haseltine and Eman(2017)</label><?label Haseltine2017PredictionLearning?><mixed-citation>Haseltine, C. and Eman, E. E. S.: Prediction of power grid failure using
neural network learning, Proceedings – 16th IEEE International Conference on
Machine Learning and Applications, ICMLA 2017, 2017–December, 505–510,
<ext-link xlink:href="https://doi.org/10.1109/ICMLA.2017.0-111" ext-link-type="DOI">10.1109/ICMLA.2017.0-111</ext-link>, 2017.</mixed-citation></ref>
      <ref id="bib1.bibx30"><?xmltex \def\ref@label{{Hastie et~al.(2001)}}?><label>Hastie et al.(2001)</label><?label HastieSpringerPrediction?><mixed-citation>Hastie, T., Tibshirani, R., and Friedman, J.: Springer Series in Statistics
The Elements of Statistical Learning Data Mining, Inference, and Prediction, <uri>https://link.springer.com/book/10.1007/978-0-387-84858-7</uri> (last access: 21 September 2022),
2001.</mixed-citation></ref>
      <ref id="bib1.bibx31"><?xmltex \def\ref@label{{Hosking and Wallis(1997)}}?><label>Hosking and Wallis(1997)</label><?label hosking_wallis_1997?><mixed-citation>Hosking, J. R. M. and Wallis, J. R.: Regional Frequency Analysis: An Approach
Based on L-Moments, Cambridge University Press, 191–209,
<ext-link xlink:href="https://doi.org/10.1017/CBO9780511529443" ext-link-type="DOI">10.1017/CBO9780511529443</ext-link>, 1997.</mixed-citation></ref>
      <ref id="bib1.bibx32"><?xmltex \def\ref@label{{IEEE(2007)}}?><label>IEEE(2007)</label><?label 2007NationalC2-2007?><mixed-citation>IEEE: National Electrical Safety Code, ANSI/IEEE Standard C2-2007, 552, <uri>https://law.resource.org/pub/us/cfr/ibr/004/ieee.c2.2007.pdf</uri> (last access: 21 September 2022),  2007.</mixed-citation></ref>
      <ref id="bib1.bibx33"><?xmltex \def\ref@label{{Jaech et~al.(2018)}}?><label>Jaech et al.(2018)</label><?label Jaech2018Real-TimeOutages?><mixed-citation>Jaech, A., Zhang, B., Ostendorf, M., and Kirschen, D. S.: Real-Time Prediction
of the Duration of Distribution System Outages,
<uri>http://arxiv.org/abs/1804.01189</uri>  (last access: 20 January 2023), 2018.</mixed-citation></ref>
      <ref id="bib1.bibx34"><?xmltex \def\ref@label{{Kankanala et~al.(2014)}}?><label>Kankanala et al.(2014)</label><?label Kankanala2014Adaboost+:Systems?><mixed-citation>Kankanala, P., Das, S., and Pahwa, A.: Adaboost+: An ensemble learning
approach for estimating weather-related outages in distribution systems,
IEEE T. Power Syst., 29, 359–367,
<ext-link xlink:href="https://doi.org/10.1109/TPWRS.2013.2281137" ext-link-type="DOI">10.1109/TPWRS.2013.2281137</ext-link>, 2014.</mixed-citation></ref>
      <ref id="bib1.bibx35"><?xmltex \def\ref@label{{Kohavi and John(1997)}}?><label>Kohavi and John(1997)</label><?label Kohavi1997WrappersSelection?><mixed-citation>Kohavi, R. and John, G. H.: Wrappers for feature subset selection, Artif. Intell., 97, 273–324, <ext-link xlink:href="https://doi.org/10.1016/S0004-3702(97)00043-X" ext-link-type="DOI">10.1016/S0004-3702(97)00043-X</ext-link>, 1997.</mixed-citation></ref>
      <ref id="bib1.bibx36"><?xmltex \def\ref@label{{Krist et~al.(2014)}}?><label>Krist et al.(2014)</label><?label KristJr.201420132027Assessment?><mixed-citation>Krist Jr., F. J., Ellenwood, J. R., Woods, M. E., Mcmahan, A. J., Cowardin,
J. P., Ryerson, D. E., Sapio, F. J., Zweifler, M. O., and Romero, S. A.:
2013–2027 National Insect and Disease Forest Risk Assessment, 87–92, <ext-link xlink:href="https://doi.org/10.2737/SRS-GTR-209" ext-link-type="DOI">10.2737/SRS-GTR-209</ext-link>, 2014.</mixed-citation></ref>
      <ref id="bib1.bibx37"><?xmltex \def\ref@label{{Latto et~al.(2021)}}?><label>Latto et al.(2021)</label><?label Latto2021NationalIsaias?><mixed-citation>Latto, A., Hagen, A., and Berg, R.: National Hurricane Center Tropical Cyclone
Report. Hurricane Isaias, 1–32 pp., <uri>https://www.nhc.noaa.gov/data/tcr/AL092020_Isaias.pdf</uri>  (last access: 21 September 2022),  2021.</mixed-citation></ref>
      <ref id="bib1.bibx38"><?xmltex \def\ref@label{{Li and Peng(2011)}}?><label>Li and Peng(2011)</label><?label Li2011QuantileData?><mixed-citation>Li, R. and Peng, L.: Quantile Regression for Left-Truncated Semicompeting
Risks Data, Biometrics, 67, 701–710,
<ext-link xlink:href="https://doi.org/10.1111/j.1541-0420.2010.01521.x" ext-link-type="DOI">10.1111/j.1541-0420.2010.01521.x</ext-link>, 2011.</mixed-citation></ref>
      <ref id="bib1.bibx39"><?xmltex \def\ref@label{{Liu et~al.(2005)}}?><label>Liu et al.(2005)</label><?label Liu2005NegativeHurricanes?><mixed-citation>Liu, H., Davidson, R. A., Rosowsky, D. V., and Stedinger, J. R.: Negative
Binomial Regression of Electric Power Outages in Hurricanes,
J. Infrastruct. Syst., 11, 258–267,
<ext-link xlink:href="https://doi.org/10.1061/(asce)1076-0342(2005)11:4(258)" ext-link-type="DOI">10.1061/(asce)1076-0342(2005)11:4(258)</ext-link>, 2005.</mixed-citation></ref>
      <ref id="bib1.bibx40"><?xmltex \def\ref@label{{Liu et~al.(2007)}}?><label>Liu et al.(2007)</label><?label Liu2007StatisticalStorms?><mixed-citation>Liu, H., Davidson, R. A., and Apanasovich, T. V.: Statistical for<?pagebreak page1682?>ecasting of
electric power restoration times in hurricanes and ice storms, IEEE
Trans. Power Syst., 22, 2270–2279,
<ext-link xlink:href="https://doi.org/10.1109/TPWRS.2007.907587" ext-link-type="DOI">10.1109/TPWRS.2007.907587</ext-link>, 2007.</mixed-citation></ref>
      <ref id="bib1.bibx41"><?xmltex \def\ref@label{{Liu et~al.(2008)}}?><label>Liu et al.(2008)</label><?label Liu2008SpatialStorms?><mixed-citation>Liu, H., Davidson, R. A., and Apanasovich, T. V.: Spatial generalized linear
mixed models of electric power outages due to hurricanes and ice storms,
Reliab. Eng. Syst. Safe., 93, 897–912,
<ext-link xlink:href="https://doi.org/10.1016/j.ress.2007.03.038" ext-link-type="DOI">10.1016/j.ress.2007.03.038</ext-link>, 2008.</mixed-citation></ref>
      <ref id="bib1.bibx42"><?xmltex \def\ref@label{{Maderia(2015)}}?><label>Maderia(2015)</label><?label Maderia2015ImportanceOutages?><mixed-citation>Maderia, C. M.: Importance of Tree Species and Precipitation for Modeling
Hurricane-induced Power Outages,
<uri>https://oaktrust.library.tamu.edu/handle/1969.1/155728</uri>  (last access: 21 September 2022), 2015.</mixed-citation></ref>
      <ref id="bib1.bibx43"><?xmltex \def\ref@label{{McRoberts et~al.(2018)}}?><label>McRoberts et al.(2018)</label><?label McRoberts2018ImprovingFactors?><mixed-citation>McRoberts, D. B., Quiring, S. M., and Guikema, S. D.: Improving Hurricane
Power Outage Prediction Models Through the Inclusion of Local Environmental
Factors, Risk Anal., 38, 2722–2737, <ext-link xlink:href="https://doi.org/10.1111/risa.12728" ext-link-type="DOI">10.1111/risa.12728</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bibx44"><?xmltex \def\ref@label{{Meinshausen(2006)}}?><label>Meinshausen(2006)</label><?label Meinshausen2006QuantileForests?><mixed-citation>
Meinshausen, N.: Quantile Regression Forests, J. Mach. Learn.
Res., 7, 983–999, 2006.</mixed-citation></ref>
      <ref id="bib1.bibx45"><?xmltex \def\ref@label{{Miller et~al.(2013)}}?><label>Miller et al.(2013)</label><?label Miller2013Topographic2003?><mixed-citation>Miller, C., Gibbons, M., Beatty, K., and Boissonnade, A.: Topographic speed-up
effects and observed roof damage on Bermuda following Hurricane Fabian
(2003), Weather Forecast., 28, 159–174,
<ext-link xlink:href="https://doi.org/10.1175/WAF-D-12-00050.1" ext-link-type="DOI">10.1175/WAF-D-12-00050.1</ext-link>, 2013.</mixed-citation></ref>
      <ref id="bib1.bibx46"><?xmltex \def\ref@label{{MRLC(2023)}}?><label>MRLC(2023)</label><?label mrlc2023?><mixed-citation>MRLC: All NLCD Land Cover 2019 CONUS Land Cover, MRLC [data set], <uri>https://www.mrlc.gov/viewer/</uri> (last access: 21 September, 2022), 2023.</mixed-citation></ref>
      <ref id="bib1.bibx47"><?xmltex \def\ref@label{{Napoli et~al.(2019)}}?><label>Napoli et al.(2019)</label><?label Napoli2019VariabilityRegion?><mixed-citation>Napoli, A., Crespi, A., Ragone, F., Maugeri, M., and Pasquero, C.: Variability
of orographic enhancement of precipitation in the Alpine region, Sci.
Rep., 9, 13352, <ext-link xlink:href="https://doi.org/10.1038/S41598-019-49974-5" ext-link-type="DOI">10.1038/S41598-019-49974-5</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bibx48"><?xmltex \def\ref@label{{Nateghi et~al.(2014)}}?><label>Nateghi et al.(2014)</label><?label Nateghi2014PowerModels?><mixed-citation>Nateghi, R., Guikema, S., and Quiring, S. M.: Power Outage Estimation for
Tropical Cyclones: Improved Accuracy with Simpler Models, Risk Anal., 34,
1069–1078, <ext-link xlink:href="https://doi.org/10.1111/risa.12131" ext-link-type="DOI">10.1111/risa.12131</ext-link>, 2014.</mixed-citation></ref>
      <ref id="bib1.bibx49"><?xmltex \def\ref@label{{{National Academies of Sciences, Engineering, and
Medicine}(2017)}}?><label>National Academies of Sciences, Engineering, and
Medicine(2017)</label><?label NationalAcademiesofSciences2017EnhancingSystem?><mixed-citation>National Academies of Sciences, Engineering, and Medicine: Enhancing the
Resilience of the Nation's Electricity System, Enhancing the Resilience of
the Nation's Electricity System, The National Academies Press, 170 pp., <ext-link xlink:href="https://doi.org/10.17226/24836" ext-link-type="DOI">10.17226/24836</ext-link>, 2017.</mixed-citation></ref>
      <ref id="bib1.bibx50"><?xmltex \def\ref@label{{Ouyang and
Due{\~{n}}as-Osorio(2014)}}?><label>Ouyang and
Dueñas-Osorio(2014)</label><?label Ouyang2014Multi-dimensionalSystems?><mixed-citation>Ouyang, M. and Dueñas-Osorio, L.: Multi-dimensional hurricane resilience
assessment of electric power systems, Struct. Saf., 48, 15–24,
<ext-link xlink:href="https://doi.org/10.1016/j.strusafe.2014.01.001" ext-link-type="DOI">10.1016/j.strusafe.2014.01.001</ext-link>, 2014.</mixed-citation></ref>
      <ref id="bib1.bibx51"><?xmltex \def\ref@label{{Pedregosa et~al.(2011)}}?><label>Pedregosa et al.(2011)</label><?label scikit-learn?><mixed-citation>
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel,
O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J.,
Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E.:
Scikit-learn: Machine Learning in {P}ython, J. Mach.
Learn. Res., 12, 2825–2830, 2011.</mixed-citation></ref>
      <ref id="bib1.bibx52"><?xmltex \def\ref@label{{Petersen(1982)}}?><label>Petersen(1982)</label><?label Petersen1982ElectricityAreas?><mixed-citation>Petersen, H. C.: Electricity Consumption in Rural Vs. Urban Areas,
Western J. Agr. Econ., 07, 13–18,
<uri>http://econpapers.repec.org/RePEc:ags:wjagec:32417</uri>  (last access: 21 September 2022), 1982.</mixed-citation></ref>
      <ref id="bib1.bibx53"><?xmltex \def\ref@label{{PowerOutage.us(2023)}}?><label>PowerOutage.us(2023)</label><?label PowerOutage.us2023?><mixed-citation>PowerOutage.us: Electric customers
without power, PowerOutage.us [data set], <uri>https://poweroutage.us/</uri> (last access: 21 September 2022), 2023.</mixed-citation></ref>
      <ref id="bib1.bibx54"><?xmltex \def\ref@label{{Quiring et~al.(2011)}}?><label>Quiring et al.(2011)</label><?label Quiring2011ImportanceOutages?><mixed-citation>Quiring, S. M., Zhu, L., and Guikema, S. D.: Importance of soil and elevation
characteristics for modeling hurricane-induced power outages, Nat.
Hazards, 58, 365–390, <ext-link xlink:href="https://doi.org/10.1007/s11069-010-9672-9" ext-link-type="DOI">10.1007/s11069-010-9672-9</ext-link>, 2011.</mixed-citation></ref>
      <ref id="bib1.bibx55"><?xmltex \def\ref@label{{Rivera et~al.(2022)}}?><label>Rivera et al.(2022)</label><?label Rivera2022PuertoReuters?><mixed-citation>Rivera, I., Mckay, R., and Disavino, S.: Puerto Rico power grid no match for
Fiona; residents unsurprised <inline-formula><mml:math id="M238" display="inline"><mml:mo>|</mml:mo></mml:math></inline-formula> Reuters,
<ext-link xlink:href="https://www.reuters.com/business/environment/some-13-million-customers-without-power-puerto-rico-after-hurricane-fiona-2022-09-20/">https://www.reuters.com/business/environment</ext-link> (last access: 21 September 2022),
2022.</mixed-citation></ref>
      <ref id="bib1.bibx56"><?xmltex \def\ref@label{{Rudin et~al.(2012)}}?><label>Rudin et al.(2012)</label><?label Rudin2012MachineGrid?><mixed-citation>Rudin, C., Waltz, D., Anderson, R., Boulanger, A., Salleb-Aouissi, A., Chow,
M., Dutta, H., Gross, P., Huang, B., Ierome, S., Isaac, D. F., Kressner, A.,
Passonneau, R. J., Radeva, A., and Wu, L.: Machine learning for the New York
City power grid, IEEE T. Pattern Anal., 34, 328–345, <ext-link xlink:href="https://doi.org/10.1109/TPAMI.2011.108" ext-link-type="DOI">10.1109/TPAMI.2011.108</ext-link>, 2012.</mixed-citation></ref>
      <ref id="bib1.bibx57"><?xmltex \def\ref@label{{Shashaani et~al.(2018)}}?><label>Shashaani et al.(2018)</label><?label Shashaani2018Multi-StageOutages?><mixed-citation>Shashaani, S., Guikema, S. D., Zhai, C., Pino, J. V., and Quiring, S. M.:
Multi-Stage Prediction for Zero-Inflated Hurricane Induced Power Outages,
IEEE Access, 6, 62432–62449, <ext-link xlink:href="https://doi.org/10.1109/ACCESS.2018.2877078" ext-link-type="DOI">10.1109/ACCESS.2018.2877078</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bibx58"><?xmltex \def\ref@label{{Sheppard and DiSavino(2012)}}?><label>Sheppard and DiSavino(2012)</label><?label Sheppard2012SuperstormReuters?><mixed-citation>Sheppard, D. and DiSavino, S.: Superstorm Sandy cuts power to 8.1 million
homes <inline-formula><mml:math id="M239" display="inline"><mml:mo>|</mml:mo></mml:math></inline-formula> Reuters,
<ext-link xlink:href="https://www.reuters.com/article/us-storm-sandy-powercuts/superstorm-sandy-cuts-power-to-8-1-million-homes-idUSBRE89T10G20121030">https://www.reuters.com/article/us-storm-sandy-powercuts</ext-link> (last access: 21 September 2022),
2012.</mixed-citation></ref>
      <ref id="bib1.bibx59"><?xmltex \def\ref@label{{Smith(2020)}}?><label>Smith(2020)</label><?label Smith2020U.S.0209268?><mixed-citation>Smith, A. B.: U.S. Billion-dollar Weather and Climate Disasters, 1980–present (NCEI Accession 0209268), National Centers for Environmental
Information, <ext-link xlink:href="https://doi.org/10.25921/STKW-7W73" ext-link-type="DOI">10.25921/STKW-7W73</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx60"><?xmltex \def\ref@label{{{Soil Survey Staff}()}}?><label>Soil Survey Staff()</label><?label datasetSoilSurveyStaffGriddedCommons?><mixed-citation>Soil Survey Staff: Gridded Soil Survey Geographic Database (gSSURGO) <inline-formula><mml:math id="M240" display="inline"><mml:mo>|</mml:mo></mml:math></inline-formula> Ag
Data Commons, <uri>https://data.nal.usda.gov/dataset/gridded-soil-survey-geographic-database-gssurgo</uri> (last access: 21 September 2022).</mixed-citation></ref>
      <ref id="bib1.bibx61"><?xmltex \def\ref@label{{Sun et~al.(2018)}}?><label>Sun et al.(2018)</label><?label Sun2018CyberState-of-the-art?><mixed-citation>Sun, C. C., Hahn, A., and Liu, C. C.: Cyber security of a power grid:
State-of-the-art, Int. J. Elec. Power, 99, 45–56, <ext-link xlink:href="https://doi.org/10.1016/j.ijepes.2017.12.020" ext-link-type="DOI">10.1016/j.ijepes.2017.12.020</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bibx62"><?xmltex \def\ref@label{{Tonn et~al.(2016)}}?><label>Tonn et al.(2016)</label><?label Tonn2016HurricaneRisk?><mixed-citation>Tonn, G. L., Guikema, S. D., Ferreira, C. M., and Quiring, S. M.: Hurricane
Isaac: A Longitudinal Analysis of Storm Characteristics and Power Outage
Risk, Risk Anal., 36, 1936–1947, <ext-link xlink:href="https://doi.org/10.1111/risa.12552" ext-link-type="DOI">10.1111/risa.12552</ext-link>, 2016.</mixed-citation></ref>
      <ref id="bib1.bibx63"><?xmltex \def\ref@label{{Verleysen and Fran{\c{c}}ois(2005)}}?><label>Verleysen and François(2005)</label><?label Verleysen2005ThePrediction?><mixed-citation>Verleysen, M. and François, D.: The curse of dimensionality in data
mining and time series prediction, Lect. Notes Comput. Sci., 3512,
758–770, <ext-link xlink:href="https://doi.org/10.1007/11494669_93" ext-link-type="DOI">10.1007/11494669_93</ext-link>, 2005.</mixed-citation></ref>
      <ref id="bib1.bibx64"><?xmltex \def\ref@label{{Wallach and Goffinet(1989)}}?><label>Wallach and Goffinet(1989)</label><?label Wallach1989MeanModels?><mixed-citation>Wallach, D. and Goffinet, B.: Mean squared error of prediction as a criterion
for evaluating and comparing system models, Ecol. Modell., 44,
299–306, <ext-link xlink:href="https://doi.org/10.1016/0304-3800(89)90035-5" ext-link-type="DOI">10.1016/0304-3800(89)90035-5</ext-link>, 1989.</mixed-citation></ref>
      <ref id="bib1.bibx65"><?xmltex \def\ref@label{{Wanik et~al.(2015)}}?><label>Wanik et al.(2015)</label><?label Wanik2015StormUSA?><mixed-citation>Wanik, D. W., Anagnostou, E. N., Hartman, B. M., Frediani, M. E., and Astitha,
M.: Storm outage modeling for an electric distribution network in
Northeastern USA, Nat. Hazards, 79, 1359–1384,
<ext-link xlink:href="https://doi.org/10.1007/s11069-015-1908-2" ext-link-type="DOI">10.1007/s11069-015-1908-2</ext-link>, 2015.</mixed-citation></ref>
      <ref id="bib1.bibx66"><?xmltex \def\ref@label{{Wanik et~al.(2017)}}?><label>Wanik et al.(2017)</label><?label Wanik2017UsingUtilities?><mixed-citation>Wanik, D. W., Parent, J. R., Anagnostou, E. N., and Hartman, B. M.: Using
vegetation management and LiDAR-derived tree height data to improve outage
predictions for electric utilities, Electr. Pow. Syst. Res., 146,
236–245, <ext-link xlink:href="https://doi.org/10.1016/j.epsr.2017.01.039" ext-link-type="DOI">10.1016/j.epsr.2017.01.039</ext-link>, 2017.</mixed-citation></ref>
      <ref id="bib1.bibx67"><?xmltex \def\ref@label{{Wei et~al.(2013)}}?><label>Wei et al.(2013)</label><?label Wei2013ImprovementSimulation?><mixed-citation>Wei, H., Xia, Y., Mitchell, K. E., and Ek, M. B.: Improvement of the Noah land surface model for warm season processes: Evaluation of water and energy flux simulation, Hydrol. Process., 27, 297–303, <ext-link xlink:href="https://doi.org/10.1002/HYP.9214" ext-link-type="DOI">10.1002/HYP.9214</ext-link>,
2013.</mixed-citation></ref>
      <ref id="bib1.bibx68"><?xmltex \def\ref@label{{Wickham et~al.(2021)}}?><label>Wickham et al.(2021)</label><?label Wickham2021ThematicStates?><mixed-citation>Wickham, J., Stehman, S. V., Sorenson, D. G., Gass, L., and Dewitz, J. A.:
Thematic accuracy assessment of the NLCD 2016 land cover for the
conterminous United States, Remote Sens. Environ., 257,
<ext-link xlink:href="https://doi.org/10.1016/J.RSE.2021.112357" ext-link-type="DOI">10.1016/J.RSE.2021.112357</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx69"><?xmltex \def\ref@label{{Wood(2017)}}?><label>Wood(2017)</label><?label Wood2017GeneralizedEdition?><mixed-citation>Wood, S. N.: Generalized additive models: An introduction with R, second
edition, Generalized Additive Models: An Introduction with R, Second
Edition, 1–476 pp.,
<ext-link xlink:href="https://doi.org/10.1201/9781315370279/GENERALIZED-ADDITIVE-MODELS-SIMON-WOOD" ext-link-type="DOI">10.1201/9781315370279/GENERALIZED-ADDITIVE-MODELS-SIMON-WOOD</ext-link>, 2017.</mixed-citation></ref>
      <ref id="bib1.bibx70"><?xmltex \def\ref@label{{Wu et~al.(2007)}}?><label>Wu et al.(2007)</label><?label Wu2007AppropriateSeasons?><mixed-citation>Wu, H., Svoboda, M. D., Hayes, M. J., Wilhite, D. A., and Wen, F.: Approp<?pagebreak page1683?>riate
application of the Standardized Precipitation Index in arid locations and dry
seasons, Int. J. Climatol., 27, 65–79,
<ext-link xlink:href="https://doi.org/10.1002/joc.1371" ext-link-type="DOI">10.1002/joc.1371</ext-link>, 2007.</mixed-citation></ref>
      <ref id="bib1.bibx71"><?xmltex \def\ref@label{{Xia(2012)}}?><label>Xia(2012)</label><?label XiaNLDASV002?><mixed-citation>Xia, Y., Mitchell, K., Ek, M., Sheffield, J., Cosgrove, B., Wood, E., Luo, L., Alonge, C., Wei, H., Meng, J., Livneh, B., Lettenmaier, D., Koren, V., Duan, Q., Mo, K., Fan, Y., and Mocko, D.: NCEP/EMC (2014), NLDAS VIC Land Surface Model L4 Hourly 0.125 <inline-formula><mml:math id="M241" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> 0.125 degree V002, edited by: Mocko, D., NASA/GSFC/HSL, Greenbelt, Maryland, USA, Goddard Earth Sciences Data and Information Services Center (GES DISC), <ext-link xlink:href="https://doi.org/10.5067/ELBDAPAKNGJ9" ext-link-type="DOI">10.5067/ELBDAPAKNGJ9</ext-link>, 2012.</mixed-citation></ref>
      <ref id="bib1.bibx72"><?xmltex \def\ref@label{{Xia et~al.(2012)}}?><label>Xia et al.(2012)</label><?label Xia2012Continental-scaleProducts?><mixed-citation>Xia, Y., Mitchell, K., Ek, M., Sheffield, J., Cosgrove, B., Wood, E., Luo, L.,
Alonge, C., Wei, H., Meng, J., Livneh, B., Lettenmaier, D., Koren, V., Duan,
Q., Mo, K., Fan, Y., and Mocko, D.: Continental-scale water and energy flux
analysis and validation for the North American Land Data Assimilation System
project phase 2 (NLDAS-2): 1. Intercomparison and application of model
products, J. Geophys. Res.-Atmos., 117, 3109,
<ext-link xlink:href="https://doi.org/10.1029/2011JD016048" ext-link-type="DOI">10.1029/2011JD016048</ext-link>, 2012.
</mixed-citation></ref><?xmltex \hack{\newpage}?>
      <ref id="bib1.bibx73"><?xmltex \def\ref@label{{Xie et~al.(2020)}}?><label>Xie et al.(2020)</label><?label Xie2020AResilience?><mixed-citation>Xie, J., Alvarez-Fernandez, I., and Sun, W.: A review of machine learning
applications in power system resilience, IEEE Pow. Ener. Soc.
Ge., 2020–August, <ext-link xlink:href="https://doi.org/10.1109/PESGM41954.2020.9282137" ext-link-type="DOI">10.1109/PESGM41954.2020.9282137</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx74"><?xmltex \def\ref@label{{Yee(2012)}}?><label>Yee(2012)</label><?label Yee2012PackageModels?><mixed-citation>Yee, T. W.: Package “VGAM” (Vector generalized linear and additive models), <uri>http://www.springer.com/series/692</uri>  (last access: 21 September 2022), 2012.</mixed-citation></ref>

  </ref-list></back>
    <!--<article-title-html>Probabilistic and machine learning methods for uncertainty quantification in power outage prediction due to extreme events</article-title-html>
<abstract-html/>
<ref-html id="bib1.bib1"><label>Ahsanullah et al.(2014)</label><mixed-citation>
      
Ahsanullah, M., Kibria, B. M. G., and Shakil, M.: Normal Distribution,
7–50, <a href="https://link.springer.com/chapter/10.2991/978-94-6239-061-4_2" target="_blank"/> (last access: 21 September 2022), 2014.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib2"><label>AJOT(2021)</label><mixed-citation>
      
AJOT: Hurricane Ida caused at least 1.2 million electricity customers to
lose power | AJOT.COM, <a href="https://ajot.com/news/hurricane-ida-caused-at-least-1.2-million-electricity-customers-to-lose-power" target="_blank"/> (last access: 21 September, 2022),
2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib3"><label>Arab et al.(2016)</label><mixed-citation>
      
Arab, A., Khodaei, A., Khator, S. K., and Han, Z.: Electric Power Grid
Restoration Considering Disaster Economics, IEEE Access, 4, 639–649,
<a href="https://doi.org/10.1109/ACCESS.2016.2523545" target="_blank">https://doi.org/10.1109/ACCESS.2016.2523545</a>, 2016.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib4"><label>Bjarnadottir et al.(2013)</label><mixed-citation>
      
Bjarnadottir, S., Li, Y., and Stewart, M. G.: Hurricane Risk Assessment of
Power Distribution Poles Considering Impacts of a Changing Climate, J. Infrastruct. Sys., 19, 12–24,
<a href="https://doi.org/10.1061/(asce)is.1943-555x.0000108" target="_blank">https://doi.org/10.1061/(asce)is.1943-555x.0000108</a>, 2013.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib5"><label>Breiman(2001)</label><mixed-citation>
      
Breiman, L.: Random forests, Mach. Learn.,  45, 5–32, 2001.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib6"><label>Brown(2002)</label><mixed-citation>
      
Brown, R. E.: Electric Power Distribution Reliability,
<a href="https://doi.org/10.1201/9780849375682" target="_blank">https://doi.org/10.1201/9780849375682</a>), 2002.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib7"><label>Cai et al.(2018)</label><mixed-citation>
      
Cai, J., Luo, J., Wang, S., and Yang, S.: Feature selection in machine
learning: A new perspective, Neurocomputing, 300, 70–79,
<a href="https://doi.org/10.1016/j.neucom.2017.11.077" target="_blank">https://doi.org/10.1016/j.neucom.2017.11.077</a>, 2018.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib8"><label>Cameron and Windmeijer(1996)</label><mixed-citation>
      
Cameron, A. C. and Windmeijer, F. A.: R-squared measures for count data
regression models with applications to health-care utilization,
J. Bus. Econ. Stat., 14, 209–220,
<a href="https://doi.org/10.1080/07350015.1996.10524648" target="_blank">https://doi.org/10.1080/07350015.1996.10524648</a>, 1996.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib9"><label>Casey(2016)</label><mixed-citation>
      
Casey, S.: The United States, The Ashgate Research Companion to the Korean
War, 49–60 pp., <a href="https://doi.org/10.1007/978-1-349-08679-5_5" target="_blank">https://doi.org/10.1007/978-1-349-08679-5_5</a>, 2016.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib10"><label>Ceferino et al.(2018)</label><mixed-citation>
      
Ceferino, L., Kiremidjian, A., and Deierlein, G.: Regional Multiseverity
Casualty Estimation Due to Building Damage following a Mw 8.8 Earthquake
Scenario in Lima, Peru, Earthq. Spectra, 34, 1739–1761,
<a href="https://doi.org/10.1193/080617EQS154M" target="_blank">https://doi.org/10.1193/080617EQS154M</a>, 2018.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib11"><label>Ceferino et al.(2020)</label><mixed-citation>
      
Ceferino, L., Mitrani-Reiser, J., Kiremidjian, A., Deierlein, G., and
Bambarén, C.: Effective plans for hospital system response to
earthquake emergencies, Nat. Commun., 11, 1–12,
<a href="https://doi.org/10.1038/s41467-020-18072-w" target="_blank">https://doi.org/10.1038/s41467-020-18072-w</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib12"><label>Chapman(2000)</label><mixed-citation>
      
Chapman, L.: Assessing topographic exposure, Meteorol. Appl., 7,
335–340, <a href="https://doi.org/10.1017/S1350482700001729" target="_blank">https://doi.org/10.1017/S1350482700001729</a>, 2000.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib13"><label>Chavas et al.(2015)</label><mixed-citation>
      
Chavas, D. R., Lin, N., and Emanuel, K.: A model for the complete radial
structure of the tropical cyclone wind field. Part I: Comparison with
observed structure, J. Atmos. Sci., 72, 3647–3662,
<a href="https://doi.org/10.1175/JAS-D-15-0014.1" target="_blank">https://doi.org/10.1175/JAS-D-15-0014.1</a>, 2015.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib14"><label>Congress.gov(2020)</label><mixed-citation>
      
Congress.gov: H.R.5760 – 116th Congress (2019-2020): Grid Security Research
and Development Act | Congress.gov | Library of Congress,
<a href="https://www.congress.gov/bill/116th-congress/house-bill/5760" target="_blank"/>  (last access: 21 September 2022),
2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib15"><label>Danielson and Gesh(2011)</label><mixed-citation>
      
Danielson, J. and Gesh, D.: Global multi-resolution terrain elevation data
2010 (GMTED2010): U.S. Geological Survey Open-File Report 2011–1073,
<a href="https://www.usgs.gov/publications/global-multi-resolution-terrain-elevation-data-2010-gmted2010" target="_blank"/>  (last access: last access: 21 September 2022),
2011.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib16"><label>Dunn and
Smyth(2018)</label><mixed-citation>
      
Dunn, P. K. and Smyth, G. K.: Generalized Linear Models With Examples in R,
<a href="https://link.springer.com/book/10.1007/978-1-4419-0118-7" target="_blank"/>  (last access: last access: 21 September 2022), 2018.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib17"><label>EIA.GOV(2018)</label><mixed-citation>
      
EIA.GOV: U.S. Energy Information Administration – EIA – Independent
Statistics and Analysis,
<a href="https://www.eia.gov/todayinenergy/detail.php?id=37332" target="_blank"/> (last access: last access: 21 September 2022), 2018.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib18"><label>Elamrouss(2021)</label><mixed-citation>
      
Elamrouss, A.: 75% of power outages reported in Louisiana after Hurricane
Ida have been restored, governor says | CNN,
<a href="https://www.cnn.com/2021/09/09/us/hurricane-ida-aftermath-louisiana-thursday/index.html" target="_blank"/> (last access: last access: 21 September 2022),
2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib19"><label>Eskandarpour and
Khodaei(2018)</label><mixed-citation>
      
Eskandarpour, R. and Khodaei, A.: Leveraging accuracy-uncertainty tradeoff in
SVM to achieve highly accurate outage predictions,
IEEE T. Power Syst., 33, 1139–1141, <a href="https://doi.org/10.1109/TPWRS.2017.2759061" target="_blank">https://doi.org/10.1109/TPWRS.2017.2759061</a>, 2018.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib20"><label>Eskandarpour et al.(2018)</label><mixed-citation>
      
Eskandarpour, R., Khodaei, A., Paaso, A., and Abdullah, N. M.: Artificial
Intelligence Assisted Power Grid Hardening in Response to Extreme Weather
Events, Cornell University, <a href="https://doi.org/10.48550/arxiv.1810.02866" target="_blank">https://doi.org/10.48550/arxiv.1810.02866</a>, 2018.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib21"><label>ESRI(2019)</label><mixed-citation>
      
ESRI: ArcGIS Desktop: Release 10.8, Tech. rep., Environmental Systems
Research Institute, Relands, CA, <a href="https://www.arcgis.com/index.html" target="_blank"/> (last access: 21 September 2022), 2019.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib22"><label>Ferrari and Cribari-Neto(2010)</label><mixed-citation>
      
Ferrari, S. L. and Cribari-Neto, F.: Beta Regression for Modelling Rates and
Proportions, 31, 799–815,
<a href="https://doi.org/10.1080/0266476042000214501" target="_blank">https://doi.org/10.1080/0266476042000214501</a>, 2010.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib23"><label>Guikema et al.(2010)</label><mixed-citation>
      
Guikema, S. D., Quiring, S. M., and Han, S. R.: Prestorm Estimation of
Hurricane Damage to Electric Power Distribution Systems, Risk Anal., 30,
1744–1752, <a href="https://doi.org/10.1111/j.1539-6924.2010.01510.x" target="_blank">https://doi.org/10.1111/j.1539-6924.2010.01510.x</a>, 2010.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib24"><label>Guikema et al.(2014)</label><mixed-citation>
      
Guikema, S. D., Nateghi, R., Quiring, S. M., Staid, A., Reilly, A. C., and Gao,
M.: Predicting Hurricane Power Outages to Support Storm Response Planning,
IEEE Access, 2, 1364–1373, <a href="https://doi.org/10.1109/ACCESS.2014.2365716" target="_blank">https://doi.org/10.1109/ACCESS.2014.2365716</a>, 2014.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib25"><label>Guttman(1998)</label><mixed-citation>
      
Guttman, N. B.: Comparing the palmer drought index and the standardized
precipitation index, J. Am. Water Resour. As.,
34, 113–121, <a href="https://doi.org/10.1111/j.1752-1688.1998.tb05964.x" target="_blank">https://doi.org/10.1111/j.1752-1688.1998.tb05964.x</a>, 1998.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib26"><label>Hall(2000)</label><mixed-citation>
      
Hall, D. B.: Zero-Inflated Poisson and Binomial Regression with Random
Effects: A Case Study, Biometrics, 56, 1030–1039,
<a href="https://doi.org/10.1111/J.0006-341X.2000.01030.X" target="_blank">https://doi.org/10.1111/J.0006-341X.2000.01030.X</a>, 2000.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib27"><label>Han et al.(2009a)</label><mixed-citation>
      
Han, S. R., Guikema, S. D., and Quiring, S. M.: Improving the predictive
accuracy of hurricane power outage forecasts using generalized additive
models, Risk Anal., 29, 1443–1453,
<a href="https://doi.org/10.1111/j.1539-6924.2009.01280.x" target="_blank">https://doi.org/10.1111/j.1539-6924.2009.01280.x</a>, 2009a.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib28"><label>Han et al.(2009b)</label><mixed-citation>
      
Han, S. R., Guikema, S. D., Quiring, S. M., Lee, K. H., Rosowsky, D., and
Davidson, R. A.: Estimating the spatial distribution of power outages during
hurricanes in the Gulf coast region,
Reliab. Eng. Syst. Safe., 94, 199–210, <a href="https://doi.org/10.1016/j.ress.2008.02.018" target="_blank">https://doi.org/10.1016/j.ress.2008.02.018</a>, 2009b.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib29"><label>Haseltine and Eman(2017)</label><mixed-citation>
      
Haseltine, C. and Eman, E. E. S.: Prediction of power grid failure using
neural network learning, Proceedings – 16th IEEE International Conference on
Machine Learning and Applications, ICMLA 2017, 2017–December, 505–510,
<a href="https://doi.org/10.1109/ICMLA.2017.0-111" target="_blank">https://doi.org/10.1109/ICMLA.2017.0-111</a>, 2017.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib30"><label>Hastie et al.(2001)</label><mixed-citation>
      
Hastie, T., Tibshirani, R., and Friedman, J.: Springer Series in Statistics
The Elements of Statistical Learning Data Mining, Inference, and Prediction, <a href="https://link.springer.com/book/10.1007/978-0-387-84858-7" target="_blank"/> (last access: 21 September 2022),
2001.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib31"><label>Hosking and Wallis(1997)</label><mixed-citation>
      
Hosking, J. R. M. and Wallis, J. R.: Regional Frequency Analysis: An Approach
Based on L-Moments, Cambridge University Press, 191–209,
<a href="https://doi.org/10.1017/CBO9780511529443" target="_blank">https://doi.org/10.1017/CBO9780511529443</a>, 1997.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib32"><label>IEEE(2007)</label><mixed-citation>
      
IEEE: National Electrical Safety Code, ANSI/IEEE Standard C2-2007, 552, <a href="https://law.resource.org/pub/us/cfr/ibr/004/ieee.c2.2007.pdf" target="_blank"/> (last access: 21 September 2022),  2007.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib33"><label>Jaech et al.(2018)</label><mixed-citation>
      
Jaech, A., Zhang, B., Ostendorf, M., and Kirschen, D. S.: Real-Time Prediction
of the Duration of Distribution System Outages,
<a href="http://arxiv.org/abs/1804.01189" target="_blank"/>  (last access: 20 January 2023), 2018.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib34"><label>Kankanala et al.(2014)</label><mixed-citation>
      
Kankanala, P., Das, S., and Pahwa, A.: Adaboost+: An ensemble learning
approach for estimating weather-related outages in distribution systems,
IEEE T. Power Syst., 29, 359–367,
<a href="https://doi.org/10.1109/TPWRS.2013.2281137" target="_blank">https://doi.org/10.1109/TPWRS.2013.2281137</a>, 2014.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib35"><label>Kohavi and John(1997)</label><mixed-citation>
      
Kohavi, R. and John, G. H.: Wrappers for feature subset selection, Artif. Intell., 97, 273–324, <a href="https://doi.org/10.1016/S0004-3702(97)00043-X" target="_blank">https://doi.org/10.1016/S0004-3702(97)00043-X</a>, 1997.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib36"><label>Krist et al.(2014)</label><mixed-citation>
      
Krist Jr., F. J., Ellenwood, J. R., Woods, M. E., Mcmahan, A. J., Cowardin,
J. P., Ryerson, D. E., Sapio, F. J., Zweifler, M. O., and Romero, S. A.:
2013–2027 National Insect and Disease Forest Risk Assessment, 87–92, <a href="https://doi.org/10.2737/SRS-GTR-209" target="_blank">https://doi.org/10.2737/SRS-GTR-209</a>, 2014.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib37"><label>Latto et al.(2021)</label><mixed-citation>
      
Latto, A., Hagen, A., and Berg, R.: National Hurricane Center Tropical Cyclone
Report. Hurricane Isaias, 1–32 pp., <a href="https://www.nhc.noaa.gov/data/tcr/AL092020_Isaias.pdf" target="_blank"/>  (last access: 21 September 2022),  2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib38"><label>Li and Peng(2011)</label><mixed-citation>
      
Li, R. and Peng, L.: Quantile Regression for Left-Truncated Semicompeting
Risks Data, Biometrics, 67, 701–710,
<a href="https://doi.org/10.1111/j.1541-0420.2010.01521.x" target="_blank">https://doi.org/10.1111/j.1541-0420.2010.01521.x</a>, 2011.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib39"><label>Liu et al.(2005)</label><mixed-citation>
      
Liu, H., Davidson, R. A., Rosowsky, D. V., and Stedinger, J. R.: Negative
Binomial Regression of Electric Power Outages in Hurricanes,
J. Infrastruct. Syst., 11, 258–267,
<a href="https://doi.org/10.1061/(asce)1076-0342(2005)11:4(258)" target="_blank">https://doi.org/10.1061/(asce)1076-0342(2005)11:4(258)</a>, 2005.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib40"><label>Liu et al.(2007)</label><mixed-citation>
      
Liu, H., Davidson, R. A., and Apanasovich, T. V.: Statistical forecasting of
electric power restoration times in hurricanes and ice storms, IEEE
Trans. Power Syst., 22, 2270–2279,
<a href="https://doi.org/10.1109/TPWRS.2007.907587" target="_blank">https://doi.org/10.1109/TPWRS.2007.907587</a>, 2007.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib41"><label>Liu et al.(2008)</label><mixed-citation>
      
Liu, H., Davidson, R. A., and Apanasovich, T. V.: Spatial generalized linear
mixed models of electric power outages due to hurricanes and ice storms,
Reliab. Eng. Syst. Safe., 93, 897–912,
<a href="https://doi.org/10.1016/j.ress.2007.03.038" target="_blank">https://doi.org/10.1016/j.ress.2007.03.038</a>, 2008.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib42"><label>Maderia(2015)</label><mixed-citation>
      
Maderia, C. M.: Importance of Tree Species and Precipitation for Modeling
Hurricane-induced Power Outages,
<a href="https://oaktrust.library.tamu.edu/handle/1969.1/155728" target="_blank"/>  (last access: 21 September 2022), 2015.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib43"><label>McRoberts et al.(2018)</label><mixed-citation>
      
McRoberts, D. B., Quiring, S. M., and Guikema, S. D.: Improving Hurricane
Power Outage Prediction Models Through the Inclusion of Local Environmental
Factors, Risk Anal., 38, 2722–2737, <a href="https://doi.org/10.1111/risa.12728" target="_blank">https://doi.org/10.1111/risa.12728</a>, 2018.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib44"><label>Meinshausen(2006)</label><mixed-citation>
      
Meinshausen, N.: Quantile Regression Forests, J. Mach. Learn.
Res., 7, 983–999, 2006.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib45"><label>Miller et al.(2013)</label><mixed-citation>
      
Miller, C., Gibbons, M., Beatty, K., and Boissonnade, A.: Topographic speed-up
effects and observed roof damage on Bermuda following Hurricane Fabian
(2003), Weather Forecast., 28, 159–174,
<a href="https://doi.org/10.1175/WAF-D-12-00050.1" target="_blank">https://doi.org/10.1175/WAF-D-12-00050.1</a>, 2013.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib46"><label>MRLC(2023)</label><mixed-citation>
      
MRLC: All NLCD Land Cover 2019 CONUS Land Cover, MRLC [data set], <a href="https://www.mrlc.gov/viewer/" target="_blank"/> (last access: 21 September, 2022), 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib47"><label>Napoli et al.(2019)</label><mixed-citation>
      
Napoli, A., Crespi, A., Ragone, F., Maugeri, M., and Pasquero, C.: Variability
of orographic enhancement of precipitation in the Alpine region, Sci.
Rep., 9, 13352, <a href="https://doi.org/10.1038/S41598-019-49974-5" target="_blank">https://doi.org/10.1038/S41598-019-49974-5</a>, 2019.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib48"><label>Nateghi et al.(2014)</label><mixed-citation>
      
Nateghi, R., Guikema, S., and Quiring, S. M.: Power Outage Estimation for
Tropical Cyclones: Improved Accuracy with Simpler Models, Risk Anal., 34,
1069–1078, <a href="https://doi.org/10.1111/risa.12131" target="_blank">https://doi.org/10.1111/risa.12131</a>, 2014.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib49"><label>National Academies of Sciences, Engineering, and
Medicine(2017)</label><mixed-citation>
      
National Academies of Sciences, Engineering, and Medicine: Enhancing the
Resilience of the Nation's Electricity System, Enhancing the Resilience of
the Nation's Electricity System, The National Academies Press, 170 pp., <a href="https://doi.org/10.17226/24836" target="_blank">https://doi.org/10.17226/24836</a>, 2017.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib50"><label>Ouyang and
Dueñas-Osorio(2014)</label><mixed-citation>
      
Ouyang, M. and Dueñas-Osorio, L.: Multi-dimensional hurricane resilience
assessment of electric power systems, Struct. Saf., 48, 15–24,
<a href="https://doi.org/10.1016/j.strusafe.2014.01.001" target="_blank">https://doi.org/10.1016/j.strusafe.2014.01.001</a>, 2014.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib51"><label>Pedregosa et al.(2011)</label><mixed-citation>
      
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel,
O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J.,
Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E.:
Scikit-learn: Machine Learning in {P}ython, J. Mach.
Learn. Res., 12, 2825–2830, 2011.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib52"><label>Petersen(1982)</label><mixed-citation>
      
Petersen, H. C.: Electricity Consumption in Rural Vs. Urban Areas,
Western J. Agr. Econ., 07, 13–18,
<a href="http://econpapers.repec.org/RePEc:ags:wjagec:32417" target="_blank"/>  (last access: 21 September 2022), 1982.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib53"><label>PowerOutage.us(2023)</label><mixed-citation>
      
PowerOutage.us: Electric customers
without power, PowerOutage.us [data set], <a href="https://poweroutage.us/" target="_blank"/> (last access: 21 September 2022), 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib54"><label>Quiring et al.(2011)</label><mixed-citation>
      
Quiring, S. M., Zhu, L., and Guikema, S. D.: Importance of soil and elevation
characteristics for modeling hurricane-induced power outages, Nat.
Hazards, 58, 365–390, <a href="https://doi.org/10.1007/s11069-010-9672-9" target="_blank">https://doi.org/10.1007/s11069-010-9672-9</a>, 2011.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib55"><label>Rivera et al.(2022)</label><mixed-citation>
      
Rivera, I., Mckay, R., and Disavino, S.: Puerto Rico power grid no match for
Fiona; residents unsurprised | Reuters,
<a href="https://www.reuters.com/business/environment/some-13-million-customers-without-power-puerto-rico-after-hurricane-fiona-2022-09-20/" target="_blank">https://www.reuters.com/business/environment</a> (last access: 21 September 2022),
2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib56"><label>Rudin et al.(2012)</label><mixed-citation>
      
Rudin, C., Waltz, D., Anderson, R., Boulanger, A., Salleb-Aouissi, A., Chow,
M., Dutta, H., Gross, P., Huang, B., Ierome, S., Isaac, D. F., Kressner, A.,
Passonneau, R. J., Radeva, A., and Wu, L.: Machine learning for the New York
City power grid, IEEE T. Pattern Anal., 34, 328–345, <a href="https://doi.org/10.1109/TPAMI.2011.108" target="_blank">https://doi.org/10.1109/TPAMI.2011.108</a>, 2012.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib57"><label>Shashaani et al.(2018)</label><mixed-citation>
      
Shashaani, S., Guikema, S. D., Zhai, C., Pino, J. V., and Quiring, S. M.:
Multi-Stage Prediction for Zero-Inflated Hurricane Induced Power Outages,
IEEE Access, 6, 62432–62449, <a href="https://doi.org/10.1109/ACCESS.2018.2877078" target="_blank">https://doi.org/10.1109/ACCESS.2018.2877078</a>, 2018.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib58"><label>Sheppard and DiSavino(2012)</label><mixed-citation>
      
Sheppard, D. and DiSavino, S.: Superstorm Sandy cuts power to 8.1 million
homes | Reuters,
<a href="https://www.reuters.com/article/us-storm-sandy-powercuts/superstorm-sandy-cuts-power-to-8-1-million-homes-idUSBRE89T10G20121030" target="_blank">https://www.reuters.com/article/us-storm-sandy-powercuts</a> (last access: 21 September 2022),
2012.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib59"><label>Smith(2020)</label><mixed-citation>
      
Smith, A. B.: U.S. Billion-dollar Weather and Climate Disasters, 1980–present (NCEI Accession 0209268), National Centers for Environmental
Information, <a href="https://doi.org/10.25921/STKW-7W73" target="_blank">https://doi.org/10.25921/STKW-7W73</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib60"><label>Soil Survey Staff()</label><mixed-citation>
      
Soil Survey Staff: Gridded Soil Survey Geographic Database (gSSURGO) | Ag
Data Commons, <a href="https://data.nal.usda.gov/dataset/gridded-soil-survey-geographic-database-gssurgo" target="_blank"/> (last access: 21 September 2022).

    </mixed-citation></ref-html>
<ref-html id="bib1.bib61"><label>Sun et al.(2018)</label><mixed-citation>
      
Sun, C. C., Hahn, A., and Liu, C. C.: Cyber security of a power grid:
State-of-the-art, Int. J. Elec. Power, 99, 45–56, <a href="https://doi.org/10.1016/j.ijepes.2017.12.020" target="_blank">https://doi.org/10.1016/j.ijepes.2017.12.020</a>, 2018.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib62"><label>Tonn et al.(2016)</label><mixed-citation>
      
Tonn, G. L., Guikema, S. D., Ferreira, C. M., and Quiring, S. M.: Hurricane
Isaac: A Longitudinal Analysis of Storm Characteristics and Power Outage
Risk, Risk Anal., 36, 1936–1947, <a href="https://doi.org/10.1111/risa.12552" target="_blank">https://doi.org/10.1111/risa.12552</a>, 2016.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib63"><label>Verleysen and François(2005)</label><mixed-citation>
      
Verleysen, M. and François, D.: The curse of dimensionality in data
mining and time series prediction, Lect. Notes Comput. Sci., 3512,
758–770, <a href="https://doi.org/10.1007/11494669_93" target="_blank">https://doi.org/10.1007/11494669_93</a>, 2005.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib64"><label>Wallach and Goffinet(1989)</label><mixed-citation>
      
Wallach, D. and Goffinet, B.: Mean squared error of prediction as a criterion
for evaluating and comparing system models, Ecol. Modell., 44,
299–306, <a href="https://doi.org/10.1016/0304-3800(89)90035-5" target="_blank">https://doi.org/10.1016/0304-3800(89)90035-5</a>, 1989.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib65"><label>Wanik et al.(2015)</label><mixed-citation>
      
Wanik, D. W., Anagnostou, E. N., Hartman, B. M., Frediani, M. E., and Astitha,
M.: Storm outage modeling for an electric distribution network in
Northeastern USA, Nat. Hazards, 79, 1359–1384,
<a href="https://doi.org/10.1007/s11069-015-1908-2" target="_blank">https://doi.org/10.1007/s11069-015-1908-2</a>, 2015.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib66"><label>Wanik et al.(2017)</label><mixed-citation>
      
Wanik, D. W., Parent, J. R., Anagnostou, E. N., and Hartman, B. M.: Using
vegetation management and LiDAR-derived tree height data to improve outage
predictions for electric utilities, Electr. Pow. Syst. Res., 146,
236–245, <a href="https://doi.org/10.1016/j.epsr.2017.01.039" target="_blank">https://doi.org/10.1016/j.epsr.2017.01.039</a>, 2017.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib67"><label>Wei et al.(2013)</label><mixed-citation>
      
Wei, H., Xia, Y., Mitchell, K. E., and Ek, M. B.: Improvement of the Noah land surface model for warm season processes: Evaluation of water and energy flux simulation, Hydrol. Process., 27, 297–303, <a href="https://doi.org/10.1002/HYP.9214" target="_blank">https://doi.org/10.1002/HYP.9214</a>,
2013.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib68"><label>Wickham et al.(2021)</label><mixed-citation>
      
Wickham, J., Stehman, S. V., Sorenson, D. G., Gass, L., and Dewitz, J. A.:
Thematic accuracy assessment of the NLCD 2016 land cover for the
conterminous United States, Remote Sens. Environ., 257,
<a href="https://doi.org/10.1016/J.RSE.2021.112357" target="_blank">https://doi.org/10.1016/J.RSE.2021.112357</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib69"><label>Wood(2017)</label><mixed-citation>
      
Wood, S. N.: Generalized additive models: An introduction with R, second
edition, Generalized Additive Models: An Introduction with R, Second
Edition, 1–476 pp.,
<a href="https://doi.org/10.1201/9781315370279/GENERALIZED-ADDITIVE-MODELS-SIMON-WOOD" target="_blank">https://doi.org/10.1201/9781315370279/GENERALIZED-ADDITIVE-MODELS-SIMON-WOOD</a>, 2017.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib70"><label>Wu et al.(2007)</label><mixed-citation>
      
Wu, H., Svoboda, M. D., Hayes, M. J., Wilhite, D. A., and Wen, F.: Appropriate
application of the Standardized Precipitation Index in arid locations and dry
seasons, Int. J. Climatol., 27, 65–79,
<a href="https://doi.org/10.1002/joc.1371" target="_blank">https://doi.org/10.1002/joc.1371</a>, 2007.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib71"><label>Xia(2012)</label><mixed-citation>
      
Xia, Y., Mitchell, K., Ek, M., Sheffield, J., Cosgrove, B., Wood, E., Luo, L., Alonge, C., Wei, H., Meng, J., Livneh, B., Lettenmaier, D., Koren, V., Duan, Q., Mo, K., Fan, Y., and Mocko, D.: NCEP/EMC (2014), NLDAS VIC Land Surface Model L4 Hourly 0.125  ×  0.125 degree V002, edited by: Mocko, D., NASA/GSFC/HSL, Greenbelt, Maryland, USA, Goddard Earth Sciences Data and Information Services Center (GES DISC), <a href="https://doi.org/10.5067/ELBDAPAKNGJ9" target="_blank">https://doi.org/10.5067/ELBDAPAKNGJ9</a>, 2012.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib72"><label>Xia et al.(2012)</label><mixed-citation>
      
Xia, Y., Mitchell, K., Ek, M., Sheffield, J., Cosgrove, B., Wood, E., Luo, L.,
Alonge, C., Wei, H., Meng, J., Livneh, B., Lettenmaier, D., Koren, V., Duan,
Q., Mo, K., Fan, Y., and Mocko, D.: Continental-scale water and energy flux
analysis and validation for the North American Land Data Assimilation System
project phase 2 (NLDAS-2): 1. Intercomparison and application of model
products, J. Geophys. Res.-Atmos., 117, 3109,
<a href="https://doi.org/10.1029/2011JD016048" target="_blank">https://doi.org/10.1029/2011JD016048</a>, 2012.


    </mixed-citation></ref-html>
<ref-html id="bib1.bib73"><label>Xie et al.(2020)</label><mixed-citation>
      
Xie, J., Alvarez-Fernandez, I., and Sun, W.: A review of machine learning
applications in power system resilience, IEEE Pow. Ener. Soc.
Ge., 2020–August, <a href="https://doi.org/10.1109/PESGM41954.2020.9282137" target="_blank">https://doi.org/10.1109/PESGM41954.2020.9282137</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib74"><label>Yee(2012)</label><mixed-citation>
      
Yee, T. W.: Package “VGAM” (Vector generalized linear and additive models), <a href="http://www.springer.com/series/692" target="_blank"/>  (last access: 21 September 2022), 2012.

    </mixed-citation></ref-html>--></article>
