Satellite Hydrology Observations as Operational Indicators of Forecasted 1 Fire Danger across the Contiguous United States 2

13 Traditional methods for assessing fire danger often depend on meteorological forecasts, which 14 have reduced reliability after ~10 days. Recent studies have demonstrated long lead-time 15 correlations between pre-fire-season hydrological variables such as soil moisture and later fire 16 occurrence or area burned, yet no potential value of these relationships for operational forecasting 17 have not been studied. Here, we use soil moisture data refined by remote sensing observations of 18 terrestrial water storage from NASA’s GRACE mission and vapor pressure deficit from NASA’s 19 AIRS mission to generate monthly predictions of fire danger at scales commensurate with regional 20 management. We test the viability of predictors within nine US Geographic Area Coordination 21 Centers (GACCs) using regression models specific to each GACC. Results show that the model 22 framework improves interannual wildfire burned area prediction relative to climatology for all 23 GACCs. This demonstrates the importance of hydrological information to extend operational 24 forecast ability into the months preceding wildfire activity. 25


Introduction
Fires are a key disturbance globally, acting as a catalyst for terrestrial ecosystem change and contributing significantly to both carbon emissions (Page et al., 2002) and changes in surface albedo (Randerson et al., 2006).Furthermore, the socioeconomic impact of fires includes human casualties as well as approximately $21b loss in property from 1995-2015(USD 2015;;NatCatSERVICE, accessed October 2017).Several studies have shown that in the Western US, fires have demonstrated a positive trend in annual area burned that will likely continue into the future (Littell et al., 2010;Stavros et al., 2014b).In response to increasing annual area burned and detrimental losses, the US Forest Service has increased funding for active fire management from 16 to 52% of their total budget that would have otherwise been spent on land management and research (USFS, 2015) .These increased costs translate directly to increased USFS information needs because any intra-or interannual early warning helps decrease the cost of preparing for, managing, and, when necessary, suppressing fires that occur.
The severe consequences of wildfires motivate the need for capabilities to map fire potential on timescales ranging from days to months.Operational fire management agencies rely on two primary sources of information to predict fire danger: meteorological forecasts and expert judgment (e.g.https://www.predictiveservices.nifc.gov/outlooks/outlooks.htm; accessed 28 November 20).Fire danger forecasts are generally reported in the form of qualitative categories (e.g.normal, below-normal and above-normal).Such categories are used by the US National Interagency Fire Center (NIFC) to allocate fire management resources across jurisdictional boundaries (e.g., state or national) when local response capabilities are exhausted.These qualitative metrics are derived from many information layers including fire danger indices.
Fire danger indices (e.g., the US National Fire Danger Rating System -NFDRS; Bradshaw et al., 1983) typically use meteorological input (Abatzoglou & Brown, 2012;Holden & Jolly, 2011) that is sometimes not available with the long-lead time needed for regional, transboundary fire management planning.Gridded meteorological data often have several limitations.The data are interpolated between weather stations (Daly et al., 2008), or developed by combing spatial and temporal attributes of different climate data and validated with weather stations (Abatzoglou, 2013;Abatzoglou and Brown, 2012), or provided from meteorological reanalysis, i.e., numerical weather prediction models that assimilate weather station data (Kalnay et al., 1996;Roads et al., 1999).These weather stations are sometimes far removed from the location of interest, and are not always good estimates of local climate, especially in complex topography.Moreover, forecasts beyond 10 days for a given landscape location have low skill (Bauer et al., 2015).The mentioned limitations of current operational fire danger systems result in the need for additional information that could help improve predictions of fire danger at monthly intervals and to help allocate resources across the country as the active fire season progresses and resources become strained.This added information could result in less subjective and more accurate fire danger forecasts for larger areas and for timescales of a month or longer.
A number of previous studies have demonstrated relationships between fire and hydrological indicators (Parks et al., 2014;Shabbar et al., 2011;Westerling et al., 2002;Xiao and Zhuang, 2007).Vapor pressure deficit (VPD), specifically has been shown as an indicator of fire danger (Abatzoglou and Williams, 2016;Seager et al., 2015;Williams et al., 2014) and is considered a viable proxy for evapotranspiration demand and plant water stress during drought (Behrangi et al., 2015;Weiss et al., 2012).VPD is defined as the amount of moisture in the air compared to amount of moisture the air can hold.(Behrangi et al., 2016) shows that VPD in monthly time-scales has the advantage in capturing onsets of meteorological droughts earlier than other variables such as precipitation.This advantage could be helpful in developing fire-danger forecast models.More recently, a study using model-assimilated observations of terrestrial water storage from NASA's GRACE mission to asses pre-fire-season surface soil moisture conditions (January-April) demonstrated skill in predicting both the number of fires and fire burned area in the following May-April period (Jensen et al., 2017).
The goal of this work is to investigate the utility of remotely sensed hydrology observations for predicting fire danger, defined as the amount of area likely to burn given an ignition, at spatial and temporal scales commensurate with regional and global fire management decision-making.
Specifically, the objective is to investigate the utility of remotely sensed satellite-observed vapor pressure deficit (VPD) from NASA's AIRS mission and surface soil moisture (SSM) from a numerical data-assimilation of terrestrial water storage from NASA's GRACE mission as indicators for predicting monthly fire danger across the United States from 2002 until 2016 at the scale of the Geographic Area Coordination Centers (GACC) (Figure 1).To meet the objective, we test the hypotheses that burned area varies monthly as a function of previous months' water availability in the soil (SSM) and evaporative demand (i.e., previous months' VPD).

Datasets
For the purpose of this study, four input data sets were used (Figure 1): 1) Monthly VPD was generated from the AIRS near surface air temperature (Tmean) and relative humidity (RH) Version 6 (Aumann et al., 2003;Goldberg et al., 2003).Please refer to (Behrangi et al., 2016) for the formulation based on monthly air temperature (Tmean) and dewpoint temperature (Tdmean) as well as the reliability of this formulation for monthly VPD derivation.The data are in 0.5 degree spatial resolution and available since September 2002.
2) Monthly surface soil moisture data were produced at the NASA Goddard Space Flight Center (GSFC) using the Catchment Land Surface Model (CLSM) (a physically based land surface model) and assimilated ground and space-based meteorological observations ( Tapley et al., 2004;Houborg et al., 2012;Reager et al., 2015;Zaitchik et al., 2008).The SSM data are available since April 2004.
3) The Global Fire Emissions Database version 4 (GFED-4s) provided wildfire burned area, generated at 0.25 degree spatial resolution.GFED-4s is primarily derived from MODIS from 2001 to present and is reported as fraction of a cell burned for a given month (van der Werf et al., 2017).GFED data are available since 1997.In this study, we have excluded agricultural fires by masking out agricultural regions as classified by the 2011 National Landcover Database (NLCD 2011) (Homer et al., 2015).For consistency, all datasets were converted using linear interpolation into monthly, 0.25 degree spatial resolution products that were then used to perform the model training and analysis for the period 2003 through 2016.

Analysis
GACCs are geopolitical boundaries that represent similar fire-weather types and are used to allocate fire management resources across the contiguous United States (CONUS) (Figure 1).In this study, we predict anomalous monthly burned area using a linear regression model; a separate model is developed for each GACC and for each month in a climatological sense.All fire events, for a given GACC and a month of the year are selected as a single population for model training.For example, all fires occurring in the Northern Rockies GACC, during the months of February 2004, February 2005, February 2006, etc. through February 2016 are placed into a single population.Each monthly, 0.25 degree fire burned area observation has a matched SSM and VPD observation at the corresponding time and grid location.These sets are then used to train the model, and various time lags are imposed between the independent variables (SSM and VPD) and the dependent variable (burned area) in order to maximize predictive skill.
Each GACC uses the "best" prior VPD-SSM combination for all months.The "best" model was identified for each GACC by selecting the lagged model with the highest Weighted Nash-Sutcliffe efficiency ( " ): where  % is the mean historical fraction of annual area burned in month , and  % is the Nash-Sutcliffe (E) for any given month (j). % (Nash and Sutcliffe, 1970) is a metric that measures the skill of the model against the skill of the long term mean value (i.e.persistence), defined as: where  is total number of observations,  234 is observed area burned in month j and  < is the model simulated area burned for month j, and  9 is the mean area burned in month j over the climatological record.E can range between -∞ and 1. E of zero shows that the model performance is as good as the mean of observations over the entire record.If E exceeds 0, the model preforms better than the mean of observations and if E falls below zero, the mean of observations is a better predictor than the model simulations.An E of 1 represents the perfect prediction by the model.
We constructed a forecasting method that would only rely on the model prediction of burned area, as opposed to the burned area climatology, if the model had demonstrated skill for a given month.The estimation of  " for each GACC and for each monthly model ensures that months with higher predictive skill are assigned a higher weight in the combined time series.Also, months exhibiting higher amount of historical wildfire activity are assigned a higher weight as well.
The model is then defined as follows: 4 =  9 +  ?, ℎ  ?=  +  * ( ? ) +  * ( ? ) if  % > 0 ABs is the simulated area burned for a given month, ABc is the climatological area burned or the mean annual area burned by month, VPDA and SSMA are the anomalous VPD and SSM in one, two or three months prior to the wildfire month.Different combinations of prior VPD and SSM observations were tested to represent the reliability of a single VPD-SSM model per GACC for the entire year.

Results
Figure 2 shows the hydrologic variable combination used to develop the best model of anomaly burned area using the monthly Nash-Sutcliffe (E), the weighted Nash-Sutcliffe (Ew), and the fraction of annual area burned for each month, while Table 1 shows the best variable combination for each GACC.There are some notable patterns, though few without exceptions.For example, Northern California, Northern Rockies and the Northwest all have the same peak month (August) for area burned, while also having significant fractions of evergreen (Figure 1).Area burned in the Great Basin also peaks in August, however it does not have substantial evergreen landcover, although at this spatial scale we can not determine if that is where fires happen.The models with the highest relative predictive ability throughout the year (denoted by weighted Nash-Sutcliffe) are generally in the GACCs with substantial landcover and dominated by fuel limited systems (herbaceous and shrublands): Great Basin, Southern California, Rocky Mountains, Northwest, Northern Rockies; however, the Southwest also has heavy herbaceous, but has relative low predictability throughout the year.Similarly, the Northern Rockies, Northwest, Rocky Mountains and Great Basin all have high predictability in their peak burned area month and are all substantially covered by herbaceous, but the Southwest does not.One pattern that is robust is that the Great Basin, the Southwest, and Southern California all rely on 1-month lead soil moisture in their predictive model and all also have substantial shrubland cover.Notably, the Eastern, Northern Rockies, Rocky Mountains, Southern California and Southern GACCs all have bimodal burned area distributions, but no similar landcover characteristics to explain the pattern.Figure 3 shows two example cases of model predictions based on hydrological variables.We show results for our best and worst performing GACC in order to capture the range of model skill in different fire climate regions.For our best preforming GACC, the Northern Rockies, we see consistent peaks in between dominant hydrologic variable, VPD and the fire area burned, suggesting the dominant role of VPD in fire burned area prediction for that GACC (Table 1).These strong relationships between hydrology and wildfire occurrence in the Northern Rockies confirms the findings of the previous studies (Littell et al., 2009;Westerling et al., 2011).For our worst performing GACC, the Southern, two hydrologic variables are seemingly much more connected and it is less clear what drives the pattern of monthly area burned.
In order to evaluate the model predictions against the observations, we have calculated two Nash-Sutcliffe coefficients (Table 1).As shown, for all GACCs, the model is forecasting the wildfire activity with higher accuracy than the climatology, but the improvement is variable by GACC.The results reveal that the Rocky Mountains and Northern Rockies GACCs have the best model performance (E of 0.82 and 0.64 respectively), while the Southwest and Southern CA (E of 0.34 and 0.35 respectively) show the least model performance.Similar to the time series of the Eastern and Southern GACCs, the model has not improved the climatology to a great extent.In all other regions, the improvement of the simulated compared to the climatology is substantial.The key difference between overall evaluation metric (ES-C) and the time series is that the time series demonstrate the variability of predictive ability from month to month.
Figure 4 shows the time series of wildfire burned area observation (blue), simulation (red) and climatology (yellow) for nine different GACCs from 2003 through 2016.This figure shows that the performance of the model varies by location and months.In general, the models capture interannual variability for most GACCs.Notably in Figure 4, some months show the simulation has higher agreement to the observations than does the climatology.In the Southern GACCs, model performance is relatively similar to the climatology.In the Southern GACC, both the simulation and climatology indicate close agreement with the observations.In the Northern Rockies and Rocky Mountains show the highest agreement between model and observations in the higher than normal fire years.Specifically, in the Northern Rockies, the model detects expected burned area for the above-than normal fire activity years 2003, 2006, 2007and 2012;and in the Rocky Mountains GACC, years 2006, 2008, 2009, 2011, 2015and 2016 show high agreement between simulated and the observations.The model also detects higher than normal fire activity in Northern California years 2012, 2014and 2015, Northwest years 2006, 2007, 2012, 2014and 2015, Great basin years 2006, 2007, 2012and 2013, and Eastern for years 2004, 2012 for Eastern GACC.Lastly, the simulation outperforms the climatology slightly for Southern CA and the Southwest.However, neither model nor the climatology have detected inter-annual fire activity for these regions with high accuracy.
Lastly, the models were built using only either VPD or SSM to determine the relative influence of either variable on burned area within each GAAC (Table 1, ES,VPD and ES,SSM).For some of the GAACs, the influence of the variable appears to be associated with the relative fractions of landcover influenced by that variable.For example, in the Northern Rockies, it is roughly half evergreen forest and half herbaceous (Figure 1); evergreen forest typically need to be dried to sustain combustion (high VPD in the month prior), while herbaceous communities typically need wet conditions months prior to grow fuels (high SSM 2 months prior).Similarly, in the Northwest it is roughly half evergreen (high VPD two months prior) and half shrub (high SSM three months prior).The Rocky Mountains are mostly herbaceous and shrubland (high SSM three months prior) but has some evergreen (high VPD one month prior).In Northern California, landcover is mostly evergreen (high VPD one month prior) with some shrub (high soil moisture two months prior).The other GAACs have less obvious relationships between landcover and hydrology.

Discussion and Conclusion
Wildfire activity results in billions of dollars of losses every year (USD 2015; NatCatSERVICE, accessed October 2017).Forecasting wildfire activity could therefore substantially reduce the damages associated with wildfire burned area.Historical wildfire prediction models have limitations including the mismatch in scale between fire danger models and common application, as well as the unreliability of meteorological data in remote regions.As such, current operational wildfire forecast models for forecasts >10 days are heavily based on subjective expert knowledge to predict expected area burned.Thus, the aim of this study was to predict area burned in different geographic regions (GACCs) of the United States.There are some notable patterns in predictive model development across GACCs largely driven by landcover fractional cover and mesoscale climate (Table 1).The Great Basin, the Southwest, and Southern California GACCs all have substantial shrubland cover and have the same soil moisture predictor (1-month lead).This could be a function of the shallow rooting of shrubs.This was the only pattern by landcover that was not contradicted by mesoscale climatic influence.For example, the Great Basin, Southern California, Rocky Mountains, Northwest, and Northern Rockies models have the highest predictive ability throughout the year (Ew) and have substantial landcover dominated by fuel-limited systems (grasslands and shrublands).Fuel limited systems typically rely on pre-fire season conditions to grow fuels that carry fire, thus influencing the total burned area (Stavros et al., 2014a;Swetnam and Betancourt, 1998).Although the Southwest also has heavy grasslands, it has a relatively low predictability throughout the year, but is the GACC most influenced by the Southwest Monsoon, which can have variable onset that affects the fire season (Grissino Mayer and Swetnam, 2000).The southwest monsoon also explains why the Northern Rockies, Northwest, Rocky Mountains and Great Basin all have high predictability in their peak burned area month, but the Southwest (also substantially covered by grasslands) does not.Further substantiating the claim that mesoscale climate affects model predictability is the fact that Southern California has a bimodal distribution of fire area burned throughout the year.According to (Jin et al., 2014), there are two different kinds of fire in Southern California (those in the summer driven by hot and dry conditions and those in the fall driven by Santa Ana winds) and each have different climatic conditions explaining the number of fires and burned area.
Beyond climate and landcover, humans play a significant role in the predictability of area burned (Balch et al., 2017).This explains the bimodal fire distributions found in the Eastern, Northern Rockies, Rocky Mountains, and Southern GACCs.Most of the fires in the Eastern and Southern GACCs are prescribed burns, which can happen throughout the year (as denoted by the relatively flat, although slight bimodal distributions of percent annual area burned by month -Table 1).Also, there is a notable decoupling of the relationship between hydrologic variables and burned area (Figure 4) in the Southern GAAC, which has mostly anthropogenic fire starts, as compared to the Northern Rockies, which has mostly lightning caused ignitions when burned area peaks in Fall (Figure 2).This also explains why the simulation performs closely to the climatology (Figure 3), with only minor improvements in Nash-Sutcliffe as compared to other GACCs (Table 1).Notably, the GAACs that have a strong bimodal distribution perform less well than those that don't, however in all GAACs with bimodal distributions (Figure 2), there are substantial crop lands (which were excluded from the analysis) where agricultural burning occurs independent of the hydrologic conditions (Figure 1).
Mesoscale climate (e.g., monsoons) and anthropogenic influence on fire regimes, are likely less direct relationships between hydrologic variables and burned area.Specifically, the GACCs that are more influenced by mesoscale climate (Southern California and the Southwest) and by anthropogenic burning (Southern and Eastern) did not show a clear association between relative influence of the hydrologic variable and the relative fractions of landcover, unlike the Northern Rockies, Northwest, Northern California or Rocky Mountains.
In general, this work demonstrates how lead data on hydrologic variables that can be measured by satellite (i.e., not limited by proximity to in situ infrastructures) can be used to forecast fire danger 1-month before it happens.In all geographic regions, the models improved over normal (Table 2) and demonstrated the ability to capture interannual variability (Figure 2).Future work should Nat. Hazards Earth Syst.Sci.Discuss., https://doi.org/10.5194/nhess-2019-129Manuscript under review for journal Nat.Hazards Earth Syst.Sci. Discussion started: 2 May 2019 c Author(s) 2019.CC BY 4.0 License.
Finally, ABS is compared to ABC by comparing two Nash-Sutcliffe (E) values of the entire time series.The first E is measured using the 2003-2016 monthly time series of model predictions and observations (E simulated,observation).The second E is computed by using 2003-2016 monthly time Nat.Hazards Earth Syst.Sci.Discuss., https://doi.org/10.5194/nhess-2019-129Manuscript under review for journal Nat.Hazards Earth Syst.Sci. Discussion started: 2 May 2019 c Author(s) 2019.CC BY 4.0 License.series of climatology and observations (E climatology,observation).If E simulated,observation exceeds E climatology,observation, the model has more accuracy compared to the climatology.If E climatology,observation is greater than Esimulated,observation, then the climatology has more accuracy in forecasting wildfire activity.

Figure 1 :
Figure 1: Snapshot of August 2010 of the datasets used in relation to the Geographic Area 298 Coordination Centers (GACCs).299 300