Data-based wildfire risk model for Mediterranean ecosystems. Study case of Concepcion Metropolitan Area in Central Chile

Wildfire risk is latent in Chilean metropolitan areas characterized by the strong presence of Wildland15 Urban Interfaces (WUI). The Metropolitan Area of Concepción (CMA) constitutes one of the most representative samples of that dynamic. The wildfire risk in the CMA was addressed by establishing a model of 5 categories (Near Zero, Low, Medium, High, and Very High) that represent discernible thresholds in fire occurrence, using geospatial data and satellite images describing anthropic biophysical factors that trigger fires. Those were used to deliver a model of fire hazard using machine learning algorithms, including Principal 20 Component Analysis and Kohonen Self-Organizing Maps in two experimental scenarios: only native forest and only forestry plantation. The model was validated using fire spots obtained from the forestry government organization. The results indicated that 12.3% of the CMA’s surface area has a high and very high risk of a forest fire, 29.4% has a medium risk, and 58.3% has a low and very low risk. Lastly, the observed main drivers that have deepened this risk were discussed: first, the evident proximity between the increasing urban areas 25 with exotic forestry plantations, and second, climate change that threatens to trigger more severe and large wildfires because of human activities.


40
In the last few decades, the world has seen an increasing trend in wildfires affecting large populations (Moritz et al., 2012), generally being attributed to atmospheric warming fueled by anthropogenic climate change (Spies et al., 2014) and extreme weather events (Stott, 2016) creating a riskier environment. However, wildfire hazard is a product of interlinked socio-environmental processes including the proximity between Wildland-Urban 45 Interface (WUI) and urban areas (Kumagai et al., 2004;Kolden and Henson, 2019;Goldman, 2018;Sarricolea et al., 2018); unregulated extractive economic activities in fire-prone landscapes (Castree, 2008;Spies et al., 2014;Freudenburg, 1992;Gago and Mezzadra, 2017); traditional cultural practices which increase the availability of flammable material -construction, forestry or agriculture- (Harari, 2013;Frene and Nuñez, 2010); and the traditional practice of clearing land "slash and burn" (Shahriar et al., 2019:1). This way, the analysis of 50 this hazard must consider biophysical factors such as altitude, slope, climate conditions, solar radiation, and the vegetation cover (Chuvieco et al., 2004;Chuvieco et al., 2011). Likewise, windy, and dry conditions with steep slope, rapidly lead to quick fire spread and burn large areas of forest within a short time (Shahriar et al., 2019:2).
Identifying and managing fire hazards is part of a political agenda rather than a solely biophysical phenomenon (Pyne, 2009;Doerr and Santın, 2016;Change, 2017). The experiences with fire in the underdeveloped countries 55 are radically different from developed countries which have controlled burns, a strict forestry policy, solid territorial planning, and usually take advantage of the ecological benefits of the fire for the ecosystems and livelihoods (Hutto, 2008;González, 2005;González-Mathiesen and March, 2018;Adams, 2013).
Risk and vulnerability mapping usually identify the categories of wildfires' likelihood that corresponds to one of the most used tools in research. The use of risk categories is considered a useful method to provide 60 understandable information for policymaking and decision making as attested by the style of the "Summary for Policymakers", a document regularly delivered during the publication of the IPCC's (Intergovernmental Panel on Climate Change) assessment reports and that contains many examples of categorically organized information (IPCC, 2014 and. However, feeding the predictive models with precise data of land cover changes, accurate meteorological data or trace the human activities that could start a wildfire in real-time remains a 2009). Here, intensive land use changes interact with the replacement of native land-cover for plantations, urban sprawling, and socio-environmental conflicts associated with forest property (Andersson et al., 2016, Nahuelhual et al., 2012Altamirano et al., 2013;Heilmayr et al., 2016;McWethy et al., 2018;Cid, 2015;Schulz et al., 2010) the lead to a characteristic environment prone to wildfire occurrence.

80
The Concepción Metropolitan Area (CMA) is a conspicuous example of wildfire activity in this region of Chile.
Available studies suggest that wildfires will become more frequent and aggressive, given the changing climate conditions in the CMA (Castillo et al., 2003;CONAF, 2017CONAF, -2018Sarricolea et al., 2020;CR2, 2020) following global trends (Moritz et al., 2012). One of those changes are related with more frequent droughts (Fernández et al., 2018), which are coincident with recent findings that attribute part of precipitation decrease to anthropogenic 85 sources (Boisier et al., 2016) impacting the lives, crops, and neighborhoods of more than a million people (Gonzalez et al., 2018;de la Barrera et al., 2018, CONAF, 2018Araya-Muñoz et al., 2017;Cid, 2015).
In this work, a model for wildfire risk mapping in the CMA (~36.7°S, Fig.1) was applied and validated. An updated categorical map at relatively high spatial resolution was delivered. This model aims to support urban planning and further studies for wildfire hazards. The paper is organized as follows: Section II describes the 90 study area, materials, and methods; section III presents and analyzes the results; section IV corresponds to the discussion, and in section V we conclude while suggesting avenues of future work.

95
Located in Chile's Biobío administrative region, the CMA (~36.7°' and ~73° W, Figure 1) is the third-largest urban area with over 1 million total population (INE, 2021). CMA has many interconnected small urban centers (Rojas Quezada et al., 2009) which are expanding its surface every day, mostly for housing development and industry (Rojas Quezada et al., 2013). Also, has a variety of important biodiversity hotspots (Smith Ramírez, 2000) and wetlands (Martínez Poblete, 2014).

100
A Mediterranean climate with warm/dry summers and cold/wet winters characterize this region (Sarricolea et al., 2020), with an average temperature of 12.4ºC while annual rainfall is 1,332 mm, with 70% concentrated between May to August (BCN, 2017). The CMA has one of the largest Wildland-Urban Interfaces (WUI) in the country (Ruiz et al., 2017).
Several economic activities were developed in the CMA and its surroundings since its foundation in the 18th 105 century. However, today the region is mainly known for timber production and export from plantations of exotic fast-growing species (Torres et al., 2015). With more than 46,697 hectares of forestry plantations in 2016, the

115
Previous work on wildfire hazard mapping indicates the need to include a number of spatially distributed factors that contribute to the susceptibility of the landscape, as for example slopes, orientation and the effect of insolation (You et al. 2017), and these types of models organize space into categories as a result of weighted sums of contribution factors. In this case, available research portraits wildfires as products of human activities, topographic characteristics, land cover, and climate (CONAF, 2017, de la Barrera et al. 2018; Ubeda and 120 . While most approaches to map wildfires hazards have been based upon framework tested in other regions, data-driven approaches are still under-utilized. This hazard modeling takes advantage of available national databases and satellite products, ingested into machine learning algorithms to produce maps that allow spatially distributed identification and assessment of wildfire hazards. The model combines Principal Component Analysis (PCA) and Kohonen self-organizing maps (SOM) to determine locations classified in five 125 categories: near zero, low, median, high, and very high. The following subsections present the steps to compile the analyzed database, model development, and the experiments performed. Input data for the modeling corresponded to a 12-variable geodatabase that included several descriptors related to wildfire spot recurrence, such as topographic features, land cover characteristics, built environment descriptors, and climatic indices (Table 1). Wildfire spot locations were utilized as reference coordinates to produce a raster that counted the number of spots within 900m pixel size. Center coordinates of each pixel are the locations utilized in the compiled geodatabase. The 900m spatial resolution corresponds to a trade-off 135 between the representation of the different input databases, which range from 30m to 5km. Locations of wildfire spots used for the geodatabase correspond to the period 2008-2019 available from the Chilean Forest Service (CONAF, Corporación Nacional Forestal). The forest fire database is constructed from information collected by CONAF brigades and private forestry companies in peri-urban and rural (forestry) areas. Forest fire detection is carried out through three ways: a) Fixed terrestrial (observation towers), b) Mobile terrestrial (surveillance) 140 and c) Aerial detection (Tapia and Castillo, 2014). Detected fires are GPS georeferenced and subsequently added to a GIS. The minimum area for a forest fire to be mapped is 10m2.

Geodatabase Compilation
Spot locations were also used to determine their distances to the closest streams, urban centers, and major roads and then were averaged at the 900m pixel size to be assigned to the corresponding location in the geodatabase.
These vector data, including the stream network, was retrieved from the map portal 1 of the Centre for 145 Sustainable Urban Development (CEDEUS). Elevations of the study area were retrieved from the ASTERGDEM version 2, which is a digital elevation model produced at 30 m pixel size using stereo-correlation techniques applied to scenes from the ASTER sensor of the Terra satellite (Abrams et al., 2010). Three land cover characteristics were included in the database: a raster land cover map, a Normalized Difference Infrared Index (NDII), and a Normalized Difference Vegetation Index (NDVI). The land cover map was derived from 150 two Landsat images obtained from the platform Earth Explorer (https://earthexplorer.usgs.gov). The images were corrected geometrically, radiometrically and atmospherically (Chuvieco et al., 2002;Heilmayr et al., 2016). A maximum likelihood statistic of the supervised classification method (Chuvieco et al., 2002) was used to classify native forest, scrub, pasture/cropland, urban areas, exotic plantations, water bodies, bare soil and burned areas. We used approximately 700 training points for each classified image, acquired through two 155 sources, a) cadastral of the native plant resources of Chile (CONAF et al., 2015) and b) Google Earth (specifically its "time slider") to obtain input to classify images.
NDII and NDVI data entered the geodatabase correspond to a pixel wise linear trends map for each index. All these raster maps were aggregated by simple averaging into a 900m pixel size and assigned to the nearest center coordinate in the geodatabase.

160
Climatic descriptors included average summertime potential solar radiation, a temperature index, and a precipitation index. The ASTERGDEM was used to compute average summertime potential direct solar radiation employing the insol package within the R programming Language, package that implements algorithms presented by Corripio (2003 and references therein). The same procedure described for elevations was implemented to add these data into the geodatabase.

170
The climatic data utilized for these calculations was the CR2MET product, a gridded climatology at 5km pixel size at daily to monthly frequency produced by the Chilean Center for Climate Resilience Research (CR2) covering the period 1979-2016 2 . CR2MET was produced using a statistical downscaling of the ERA-Interim reanalysis supplemented by topographic data, land surface temperatures retrieved from satellites, and instrumental observations (Alvarez-Garreton et al., 2018). The database also included linear trends of skin 175 temperatures retrieved from the 0.05º (~5 km) daytime monthly land surface temperature product MODC11C6, version 6, derived from the Moderate Resolution Imaging Spectroradiometer (MODIS) aboard the Terra satellite (Wan et al. 2015), accessed from the GIOVANNI tool (Geospatial Interactive Online Visualization ANd aNalysis Infrastructure) at NASA's Goddard Earth Sciences Data and Information Services Center (Acker and Leptoukh, 2007). Within the geodatabase, trends in Tx90p, CDD, and skin temperatures were added to the 180 closest location falling within the respective 5 km pixel.

185
Implementing a data-driven approach allows for determination of discernible susceptibility thresholds according to the records available, which here are derived from observed spot recurrence. Thus, one of the firsts tasks in this research was the study of the 900m pixel map to determine whether there were detectable differences in spot recurrence. The categories were defined using a geometric sequence of the form: The 5 categories (C) were then computed by grouping recurrences (i.e., fires per year in each pixel) within the 2008-2019 period. R100% represents the maximum value in the study area, assumed to be 100% recurrence. Thus,

Model implementation, validation, and experiments
Based upon the known locations showing different recurrence categories, the modeling involved the development of a supervised classification scheme meant to determine the recurrence probability in the whole study area. To do so, two procedures were applied to the compiled geodatabase. First, a Principal Component

205
Analysis (PCA) to reduce dimensionality from the eleven descriptors (excluding spot recurrence) to a new set of uncorrelated variables, called principal components or PCs, which maximize the explained variance while reducing redundancy among similar variables from the original database (Demšar et al., 2013). In this procedure, land cover classes were ingested using binary encoding, effectively enlarging the database to 18 descriptors. Afterwards, the PCs explaining most of the variance were used as input to a supervised 210 classification using a Kohonen self-organizing map (SOM) algorithm (Kohonen, 1990). A SOM is a class of neural networks that reveals the structure of a dataset by competitive learning. The supervised classification was implemented as an iterative process where a random selection of locations from the recurrence categories was presented to the SOM, using the corresponding PCA output as descriptors. During a given iteration the algorithm selected 50 locations per category, classified by comparing those locations with the rest of the study 215 area. With output of all iterations, the model calculates a simple probability to determine which category a certain location falls more often, assigning the respective value. Computation of the SOM's network size and iteration number was determined following recommendations by Kohonen (1990) and Vesanto (2000).
Evaluation and validation of model output included using the MCD14ML product, a MODIS standard quality testing the sensitivity of the study area to different scenarios and thus two extreme situations were compared.
A first model configuration assumed all non-urban areas as covered by native vegetation while a second 225 scenario considered all non-urban areas as plantation. Ingestion of these two scenarios into the SOM model was through using the corresponding weights of the PCA to recalculate the score of each location relative to the selected PCs.

Analysis of geodatabase components
Spatial patterns of forest fire recurrence during the period under study was associated with the peri-urban areas of the main cities of the CMA, since the quadrants with a recurrence of more than 20 fires were found at less 235 than 650 m from urban centers and highways. This suggests that the causes of these forest fires are mostly anthropogenic. On the contrary, the areas that did not record fire outbreaks during the study period were associated with remote locations with an average elevation of 250 m, distant 8.1 km from urban centers and 1.5 km from highways. When the data of the geodatabase is inspected according to the categories determined from spot recurrence (equation (1)), several patterns emerge ( Figure 2). A first finding is that the Medium (M) to Very High (VH) categories tend to present much less spread, with almost no outliers. It is also noticeable that the Near Zero (NZ) and Low (L) categories spread to about the same range, suggesting that low recurrences do not conform 285 to a distinguishable pattern of recurrence and that instead they correspond to random events. Variables associated with vegetation characteristics, i.e., trends in NDII (t-NDII) and in NDVI (t-NDVI), tend to show that the VH category is mostly associated with negative trends, deepening the decreasing tendency from H.  high recurrence. Figure 3 shows the land use data pertaining to each category. The most striking pattern is that 100% of VH occurs over plantations, about 70% for H, and nearly 45% for M. In addition, the progressive importance of urban land use connected with plantations as the recurrence increases, suggests that the connection between these two land uses explains most of the damaging effects of wildfires in the CMA. As already seen in the distribution of other variables, L and NZ present a similar partitioning as the whole study area ("All" in Figures 2 and 3), further attesting for a random pattern of low recurrence. This way, 315 the analysis of input variables tends to indicate that there is a relatively consistent pattern of landscape conditions that allow for certain locations to record fires more recurrently than others.

320
Six principal components (PCs) explain about 71% of the geodatabase variance within the study area. Since the PCA was applied to the whole CMA, these results represent the relationships including zones with zero fire spots. Although no PC within these 6 accounts for more than 20%, certain patterns emerge that suggest the procedure has been able to suppress redundancies in the database (Figure 4)   SOM output compared with CONAF and MODIS data indicates that this data-driven model is skilled in predicting an increase in spot density according to the corresponding category ( for M, and just 12.3% for high and very high recurrence. The model also predicts that spot recurrence is a phenomenon that may affect almost the whole study area ( Figure 6A).

360
The native scenario tends to show more pixels in the medium category than the plantation. Also, the native scenario sees an increase in the VH category. On the other hand, plantations tend to show an increase in the L and H categories, while reducing the NZ. Although these differences are not extreme, they attest for a different dynamic depending on the prevalent land cover. Both models show clustering patterns in which very low and low values are associated with higher elevation sectors within the CMA, which in turn have the lowest insolation

Discussion
In Mediterranean Central Chile, land cover changes that characterize current landscape organization resulted mostly from the application of the government subsidies granted by Law Decree 701 for Forestry Development (DL701) in 1974 (Nahuelhual et al. 2012;INFOR, 2017:25). This policy favored plantations of exotic fast-385 growing species along the region, with staggering consequences: in 1974, the surface area of forestry plantations was 480,000 ha, during the 1990s close to 2 million ha (Aguayo et al. 2009), reaching nearly 5 million ha in 2015 (INFOR, 2017:49). This ten-fold increase in plantations motivated by public policy contrasts with the little attention paid to restoring native forests, which have historically contributed to the local population's livelihoods (Reyes and Nelson, 2014;Frene and Núñez, 2010) causing that the rural and indigenous communities must compete for the use of the land against the plantations inciting environmental conflicts (INDH, 2015).
Concomitant with the plantation spread along the region, an increase in the recurrence and magnitude of fire disturbances in WUI has been observed, due to the blurred border between land covers or the substitution of certain land use for others (Goldman, 2018;Ruiz et al. 2017;Ladislao et al. 2007). According to CONAF, over 395 35 million hectares of vegetation are vulnerable to fires, including grasslands and shrubland (20 million), native forest (13 million) and exotic plantations (2,1 million) (Castillo et al. 2003). Of this vegetation, over 50 thousand hectares are burned annually in approximately 5,900 wildfires. Under these political and economic conditions, the land change cover seems to become a critical factor that contributes to the wildfire risk, whose conflictive evolution has built a double pressure scenario that shows no sign of changing (Ubeda and  400 Andersson et al. 2016). Likewise, the urban expansion fomented by the National Policy of Urban Development (NPUD) from 1979 has been deregulating the land use market (Brites, 2017) fomenting the urbanization of agricultural lands, wetlands, or forests (IDB-ECLAC, 2015;Vilar del Hoyo et al., 2011;Hidalgo et al. 2018).
The machine learning model developed in this work shows that more than 90% of the CMA may be subject to some degree of fire risk, with about 40% with at least a medium probability of recurrence. Wildfire hotspot 405 density is well represented by the model, which suggests this tool could be a powerful decision-making tool for the public sector (i.e., national government, municipalities) and the private sector (universities, timber companies, real estate developers). Hotspot density is concentrated on roads (1.3 km), leaving far behind the water streams (5.5 km) and urban areas (6.6 km), consistent with the literature that assigns the major responsibility of the fire recurrence to the presence of human infrastructure and human activities (Harari, 2013; 410 Doerr and Santin, 2016), and is consistent with CONAF´s previous reports (CONAF 2017;. However, anthropic factor is not the only one to count in, as Barbati et al. (2013) said, the distance from the nearest water body is determinant for short-term fire recurrence in Mediterranean countries, between other landscape factors (slope roughness, exposure, pre-fire dominant forest type). Additionally, the proximity to roads and maximum temperature dynamics, both variables severely altered by the human activities, tend to organize the randomness 415 observed in this model. This high random component in the occurrence of events is associated with a lower wildfire hazard, which reveals that after a random appearance, the recurrence increases according to the conditions of each zone. This last idea of random distribution of low and almost zero recurrences has been around a long time, and literature reports similar results from their GIS models in Sardinia -Italy (Ricotta and Di Vito, 2014), California -USA (Minnich and Chou, 1997) and Spain (Chuvieco et al., 2011:49 Úbeda and Sarricolea, 2016), press reports (CIPER, 2018), and official reports (CONAF, 2017;CONAF-BIRF, 1999). Other studies suggest the same relevance of landscape drivers for Mediterranean countries (Darques, 2015;Pausas et al., 2008;Turco et al., 2016). By examining the features of the model presented here, it is 435 possible to propose two, no necessarily exclusive possibilities that may explain the relatively weak contribution of forest plantations to fire risk. The first is that anthropogenic activities may become more important at the local rather than the regional scale. For instance, whereas the model shows that the urban boundary is overwhelmingly associated with categories M to VH, if one zooms out to the whole Mediterranean Central Chile, cities become small spots. The second possibility may be a "saturation effect" in the sense that plantations 440 now occupy such a significant surface area within the CMA that the influence is already permanent in the current regime of fires, meaning that any data-driven treatment sees plantations as a constant and thus attributes a small contribution. That's the reason why the native vs plantation experiments are important, because indicates that the plantation tends to reduce the areas with near zero recurrence relative to the native scenario, although the difference is marginal, likely associated with the "saturation effect".

445
Results of the model thus are relevant as they serve to accumulate and analyze historical, cartographical, and other types of data, leading to a better understanding of controls and drivers on fire activity in the CMA at high resolution. However, the model can be substantially improved with near real-time (NRT) information from terrestrial platforms (e.g., vehicles, towers, cranes), airborne platforms (e.g., aircraft, unmanned aerial vehicles (UAVs), helicopters) or space-borne platforms (e.g., satellites) using electromagnetic sensors (Van Ackere et 450 al. 2019) leading us to a truly smart metropolitan area (Costa et al., 2020). In that sense, becomes necessary to put more effort in the future to extend the timeframe of the present study, as Chuvieco et al. (2011:54)

455
Future scenarios for the CMA are filled with uncertainty, especially for climate change and associated impacts.
Projections from the work of Araya-Muñoz et al. (2017) indicated that the most relevant hazards for the CMA will be wildfires, water scarcity, and heat stress. Likewise, the droughts are becoming more recurrent (Garreaud et al., 2020;Fernández et al., 2018). As the model suggests, climatic indicators play a role in fire recurrence, which allows us to infer that changes in those will lead to increases in wildfire hazard for the CMA. What is 460 changing fast are the climate conditions, creating riskier scenarios globally. Therefore, there is an opportunity to improve or made mandatory the nature-based solutions, controlled burns for a social-ecological transformation (Otero and Nielsen, 2017), the ecological restoration of soils, wetlands, and forests, REDD++, 20x20 initiative, promoting the carbon emission market for carbon sequestration (Wright et al. 2000). For example, the Pinus radiata plantations in Chile and Australia have a potential average net annual rate of CO 2 accumulation of 4.5 tons (IPCC, 1996), sequestering greenhouse gas emissions faster at a lower cost, returning the investments quickly, and mitigating some of the impacts of climate change (Pawson et al. 2013).

470
The results indicated that 12.3% of the CMA's surface area has a high and very high risk of a forest fire, 29.4% has a medium risk, and only 58.3% has a low and very low risk. This calls for reflection on the importance of spatial planning with a resilient focus on wildfires, according to the recurrence of these phenomena in these settings as they are increasingly more forced in the WUI, urban residential areas and industrial or port areas.
These maps and this model are of vital importance for the Chilean government emergency agencies (CONAF-

475
ONEMI) as well as for the city governments within the CMA. They are also relevant for understanding how these phenomena affect the Mediterranean ecosystems to which the CMA belongs, and therefore should be beneficial for researchers in other latitudes working on similar ecosystems: California, Australia, Italy, Spain, Portugal, etc.
This study aimed to establish the relationship between the natural and the anthropogenic factors associated with 480 infrastructure that has been proved to generate wildfires, providing valuable data about the model itself and the capacity of fire risk prediction in a holistic viewpoint. Nevertheless, it will be challenging to generate further studies of the change associated with the inclusion of meteorological data about this Mediterranean ecosystem as the indexes used to define the thresholds tend to put medium risk areas in higher ranges. Also, the CC has been an international challenge mandatory for most countries to fulfill the Paris agreements of December 2015 485 (COP21). Particularly, Chile has compromised, between other politics, the plantation of 100,000 ha with mostly native forests for carbon sequestration. Similarly, the 20x20 initiative and the National Restoration Landscape Program are searching for CC mitigation through soil organic carbon sequestration and the reduction of forest fire occurrences in Chile. This program aims to restore 100,000 ha of degraded native forest into forest plantations and restore 400,000 ha of degraded land for agriculture and cattle ranching through the System of 490 Incentives for Recuperation of Degraded Soil.

Code availability: Computer code for the model available on request
Data availability: all databases used in this research are free to access from the links included in the paper.

495
Author contributions: Edilia Jaque Castillo designed the study in consultation with Rodrigo Fuentes Robles and Carolina G. Ojeda. Alfonso Fernández developed, coded, and run the model aided by all co-authors. Rodrigo Fuentes Robles processed and analyzed remote sensing data. All co-authors participated in statistical analyzes, discussion of results and writing.