Using stochastic space-time models to map extreme precipitation in southern Portugal

The topographic characteristics and spatial climatic diversity are significant in the South of continental Portugal where the rainfall regime is typically Mediterranean. Direct sequential cosimulation is proposed for mapping an extreme precipitation index in southern Portugal using elevation as auxiliary information. The analysed index (R5D) can be considered a flood indicator because it provides a measure of medium-term precipitation total. The methodology accounts for local data variability and incorporates space-time models that allow capturing long-term trends of extreme precipitation, and local changes in the relationship between elevation and extreme precipitation through time. Annual gridded datasets of the flood indicator are produced from 1940 to 1999 on 800 m×800 m grids by using the space-time relationship between elevation and the index. Uncertainty evaluations of the proposed scenarios are also produced for each year. The results indicate that the relationship between elevation and extreme precipitation varies locally and has decreased through time over the study region. In wetter years the flood indicator exhibits the highest values in mountainous regions of the South, while in drier years the spatial pattern of extreme precipitation has much less variability over the study region. The uncertainty of extreme precipitation estimates also varies in time and space, and in earlier decades is strongly dependent on the density of the monitoring stations network. The produced maps will be useful in regional and local studies related to climate change, desertification, land and water resources management, hydrological modelling, and flood mitigation planning. Correspondence to: A. C. Costa (ccosta@isegi.unl.pt)


Introduction
Information on the spatial variability of extreme precipitation is important for river basins management, flood hazards protection, studies related to climate change, erosion modelling and other applications for hydrological impact modelling.It is long recognized that topography and other geographical factors are responsible for considerable spatial heterogeneity of the precipitation distribution at the sub-regional scale (e.g., Martínez-Cob, 1996;Daly, 2006).A comprehensive review on the complex relationship between precipitation, airflow and physiographic features of mountainous regions is presented by Johansson and Chen (2003), and Smith and Barstad (2004).Accordingly, it is commonly accepted that interpolation techniques that make use of the relationship between existing station data and explanatory physiographic variables (e.g., elevation or distance to the coastline) have the potential to better represent the actual climate spatial patterns, especially in mountainous areas and in regions with complex atmospheric influences (Prudhomme and Reed, 1998;Daly, 2006).
Published by Copernicus Publications on behalf of the European Geosciences Union.The number of studies on the interpolation of extreme precipitation indicators is limited.The works of Faulkner and Prudhomme (1998) and Prudhomme andReed (1998, 1999) focused on an index of extreme rainfall, named RMED -median of the annual maximum rainfall, while Hundecha and Bárdossy (2005) and Perry and Hollis (2005) describe the mapping of several extreme precipitation indices.Faulkner and Prudhomme (1998), Prudhomme and Reed (1999), and Perry and Hollis (2005) used techniques that combine distance weighting methods and regression to produce gridded datasets of extreme precipitation indices, whereas Hundecha and Bárdossy (2005) used kriging with external drift to interpolate daily precipitation observations on a 5 km×5 km grid and calculated the extreme precipitation indices, afterwards, on derived grids of 5, 10, 25 and 50 km 2 .Interpolation usually leads to a smoothing of the distribution inferred by the observations and thus to a loss of variance.For example, it is well known that kriging is locally accurate in the minimum error variance sense, but does not provide representations of spatial variability given the "smoothing" effect of kriging (Goovaerts, 1997).The smoothing effect in precipitation data is undesirable considering the modelling of floods or other extreme hydrological processes (Haberlandt, 2007).To overcome this limitation, geostatistical stochastic simulation has become a widely accepted procedure to reproduce the spatial distribution and uncertainty of highly variable phenomena in geosciences (e.g., Franco et al., 2006;Bourennane et al., 2007).Geostatistical simulation methods describe local data variability based on many, equally probable, realizations of the phenomenon, consistent with the data and its statistical characteristics.
The accuracy and uncertainty of gridded data sets is difficult to assess because the field that is being estimated is unknown between data points.Spatial interpolation errors are interdependent functions of the station-network distribution, the efficacy of the interpolation procedure, and the real (but unknown) spatial distribution of the underlying climatic field (Willmott and Matsuura, 2006).Unlike traditional interpolation methods (e.g., cokriging), geostatistical simulation procedures aim at reproducing the spatial uncertainty of the attribute under study.The series of simulated maps can be post-processed and the spatial uncertainty summarized (Goovaerts, 1997).For example, the uncertainty at an unsampled location can be evaluated through spread measures, such as the variance, derived from the corresponding local histogram.
As other southern European regions, the rainfall regime in southern Portugal is Mediterranean, and so highly variable in both the spatial and temporal dimensions.One particularly relevant feature of the rainfall regime in this area is the occurrence of short but very intensive rainfall events that may lead to significant damages, by causing flash floods that affect small drainage basins (Ramos and Reis, 2002).Fragoso and Gomes (2008) concluded that the most southern region (Algarve) is the one where episodes of heavy rainfall are most frequent and which exhibits the strongest torrential character.
In this work, the R5D index was chosen to characterize the extreme precipitation regime.This indicator is defined as the highest consecutive 5-day precipitation total (in millimetres) per year, and so provides a measure of medium-term precipitation total.For the interpolation and uncertainty assessment of extreme precipitation in the southern region of continental Portugal, we explore the application of direct sequential cosimulation, which allows incorporating covariates such as elevation.
The choice of cosimulation follows the premises that elevation and precipitation may interact differently in space and during drier and wetter periods (Goovaerts, 2000).Furthermore, there are evidences of a trend towards a drier climate in southern Europe as a result of increased evapotranspiration and a relatively slow decrease of rainfall amounts and precipitation frequency (Cubasch et al., 1996;Kostopoulou and Jones, 2005;Vicente-Serrano and Cuadrat-Prats, 2007).Moreover, a negative trend in March precipitation has been detected in southern Portugal (Corte-Real et al., 1998).Accordingly, the methodology not only accounts for local data variability by using stochastic simulation procedures, but also incorporates space-time models that allow capturing long-term trends of extreme precipitation, and local correlations between elevation and precipitation through time.
The current work has three objectives: (i) to assess the space-time relationship between elevation and extreme precipitation in southern Portugal; (ii) to use this relationship to produce annual gridded datasets of a flood indicator (R5D) from 1940 to 1999; and (iii) to provide an uncertainty evaluation of the produced scenarios.
The R5D index was computed using high quality daily precipitation observations measured at 105 monitoring stations with data within the period 1940/1999.The direct sequential cosimulation was performed for generating one map per year, using 800 m×800 m grids and elevation as exhaustive secondary information.
The methodology is briefly introduced in Sect.2, and the study region and data are described in Sect.3. The main results are presented and discussed in Sect.4, including the relationship between elevation and the flood indicator, the space-time patterns of the index in 1940/1999, and the uncertainty evaluation of the produced maps.Finally, Sect. 5 states the major conclusions.

Methods
This section briefly introduces the kriging techniques and describes the reasoning of the geostatistical stochastic simulation algorithms implemented.Interested readers should refer to geostatistical textbooks (e.g., Isaaks and Srivastava, 1989;Goovaerts, 1997) for detailed descriptions of univariate and multivariate geostatistical methods.For a thorough description of the direct sequential simulation, and cosimulation, algorithms the reader is referred to Soares (2001).
Geostatistical estimators, known as kriging, provide statistically unbiased estimates of surface values from a set of observations at recorded locations, using the estimated spatial (and temporal) covariance model of the observed data.
Consider the two dimensional problem of estimating a primary variable z at an unsampled location u 0 .Let {z(u α ), α=1, . . ., n} be the set of primary data measured at n locations u α .Most of geostatistics is based on the assumption that the set of unknown values is a set of spatially dependent random variables, hence each measurement z(u α ) is a particular realization of the random variable Z(u α ).Kriging uses a linear combination of neighbouring observations to estimate the unknown value at the unsampled location u 0 .This problem can be expressed in terms of random variables as: (1) The optimal kriging weights λ α are determined by solving the kriging equations that result from minimizing the estimation variance while ensuring unbiased estimation of Z(u 0 ) by Ẑ (u 0 ).
Kriging methods require a stationarity assumption, expressed in two parts.First, the mean of the process is assumed constant and invariant with spatial location (first order stationarity).Second, the variance of the difference between two values is assumed to depend only on the distance h between the two points, and not on their location u (second order stationarity).Stationarity assumptions on kriging are traditionally accounted for by using local search neighbourhoods so that the dependence on stationarity becomes local (Goovaerts, 1997).
When developing the kriging equations the model of spatial covariances, or variogram (inverse function of the spatial covariances), is assumed known.This is a key function of geostatistics and characterizes the variability of the spatial (and temporal) patterns of physical phenomena.
where N (h) is the number of pairs of data locations a vector h apart.
In bounded models (e.g., spherical and exponential), variogram functions increase with distance until they reach a maximum, named sill, at an approximate distance known as the range.The range is the distance h at which the spatial (or temporal) correlation vanishes, i.e. observations separated by a distance larger than the range are spatially (or temporally) independent observations.

Simple kriging
The simple kriging estimate of z(u 0 ) is: where m is the stationary mean of Z(u), and λ SK α (u 0 ) is the weight assigned to datum z(u α ) within a search neighbourhood that comprises n(u) samples.

Collocated cokriging
Consider now the situation where the set of primary data {z(u α ), α=1, . . ., n} is complemented by secondary data available at all estimation grid nodes and denoted by y(u).The collocated cokriging estimate is (Goovaerts, 2000): where m Z and m Y are the global means of the primary and secondary variables, Z(u) and Y (u), respectively.Note that only the secondary datum collocated at the location u 0 being estimated is retained for estimation.Soares (2001) describes the sequence of the direct sequential simulation (DSS) algorithm of a continuous variable as follows:

Direct sequential cosimulation
1. Randomly select the spatial location of a node z(u 0 ) in a regular grid of nodes to be simulated; 2. Estimate local mean and variance identified with simple kriging estimator (Eq. 3) and kriging variance.Sample from the global histogram a value z s (u i ) centred in the estimated local mean and variance.
3. Return to step 1) until all nodes have been visited by the random path.
Soares (2001) also extended the DSS algorithm for the joint simulation of different variables, thus named direct sequential cosimulation (coDSS) algorithm.Instead of simulating all variables simultaneously, it simulates each variable in turn conditioned to the previous simulated variable.
The coDSS algorithm uses collocated simple cokriging to estimate local means and variances, incorporating the secondary information and the relationship between secondary and primary variables.In this study, the collocated cokriging was applied with a Markov-type approximation (Goovaerts, 1997) for cross-continuity model.Hence, only the primary variable variogram model and a correlation model between primary and secondary data were required.
To reproduce the spatial distribution and uncertainty of the flood indicator, 100 equiprobable simulated realizations were generated through the coDSS algorithm on 800 m×800 m grids, one for each year, using different space-time continuity and correlation models for each decade.These models are briefly described in the following section and further detailed in Sect. 4. Elevation was used as secondary information.

Space-time continuity models
Simulated images were generated by the coDSS algorithm using for each decade a different space-time variogram model of the primary variable (R5D index), and a different correlation model between primary and exhaustive secondary data (elevation).This strategy allows accounting for The relationship between elevation and extreme precipitation, described by correlation models, was also assessed locally to allow accounting for changes in correlation across the study area.First, for each decade, local correlations were calculated using a search neighbourhood centred at each station's location (further details from this stage are described in Sect.4.2).To reproduce the spatial distribution of the relationship between elevation and extreme precipitation, the second stage used the DSS algorithm to interpolate the local correlations.In this stage, 50 equiprobable simulated realizations were generated through the DSS algorithm for each decade on 800 m×800 m grids.The correlation models used later with the coDSS algorithm were determined by computing the mean of the distribution of the 50 simulated values at each grid node, by decade.

Study region and data
The study domain refers to the South of continental Portugal (Fig. 1) and includes the Algarve region, in the far South, and most of the Alentejo region (limited in the North by the Tejo River).The study region has different physiographic characteristics.In the far South, the relief is dominated by the two main Algarve's mountains: Monchique on the West, and Caldeirão on the East.In contrast, the Alentejo region is characterized mainly by vast flat to rolling country, the peneplain, where the average altitude is approximately 200 m.The São Mamede mountain ridge, the highest in the Alentejo region with an altitude of 1000 m, lies in the extreme North-East.
Daily rainfall series from 105 stations distributed irregularly over the study region were collected (Fig. 1).Most of them were extracted from the National System of Water Resources Information (SNIRH -Sistema Nacional de Informac ¸ão de Recursos Hídricos) database (http://snirh.inag.pt), and three of them were compiled from the European Climate Assessment dataset (http://eca.knmi.nl).Each station series data was previously quality controlled and studied for homogeneity (e.g.Wijngaard et al., 2003;Costa and Soares, 2006;Costa et al., 2008).The length of the series is highly variable.The extreme precipitation index (R5D) was calculated for each station using all quality data available within the 1940/99 period (Fig. 2).The selected stations have less than 16% of the days missing in each year and most stations' data do not have any missing records.The R5D index is defined as the highest consecutive 5-day precipitation total (in millimetres) per year, and can be considered a flood indicator (Frich et al., 2002).Elevation data were taken from a digital elevation model (DEM) with a grid resolution of 20 m×20 m and resampled to an 800 m×800 m grid mesh.The topographic variable derived is defined as the elevation of the nearest grid point to the meteorological station location, sometimes named smoothed elevation.

Space-time continuity of extreme precipitation
Variograms were calculated according to Eq. 2. The variogram models are chosen and their parameters are estimated taking into account the experimental data values of variograms.The objective is to capture the spatial pattern of physical phenomenon in the variogram model rather than getting a best fit of a second moment (Goovaerts, 1997).In this study, we chose exponential models that capture the major spatial features of the attribute under study within each decade.The selected models were subjectively fitted to the experimental semivariogram values, not only by giving more importance to the smaller lags and the ones computed from more data pairs, but also by taking into account physical knowledge of the area and phenomenon.Considering the results of a thorough analysis on directional variograms, only the omnidirectional semivariograms for the spatial dimension were retained.Consequently, the spatial variability is assumed to be identical in all directions (i.e.isotropic) within each decade.
The parameters for each exponential variogram of the R5D index, used in the coDSS algorithm, are summarized in Table 1.In what concerns the temporal component, there are no relevant tendencies.However, the range of the models' spatial component shows a strong increase in the spatial continuity of the flood indicator on the last two decades.For illustration purposes, the exponential model fitted to the experimental semivariogram of the 1960s decade is shown in Fig. 3.

Relationship between extreme precipitation and elevation
An exploratory evaluation of the relationship between the extreme precipitation index and elevation was made by averaging the R5D index at each station, and by computing afterwards the Pearson's correlation coefficient between those values and elevation, not only measured by the actual stations altitude, but also by the stations grid point elevation (smoothed elevation).This regional analysis shows that the correlation is slightly stronger for the smoothed elevation (0.49) than the actual one (0.43).Other studies on this subject concluded likewise (e.g., Diodato, 2005).
A plot of elevation against R5D index values is given in Fig. 4. The coefficient of determination, r 2 , is small (0.10) and so there is little evidence of a (global) linear relationship between elevation and precipitation, as expected (e.g., Lloyd, 2005).Moreover, the correlation for elevation against R5D index values is not constant through time (Fig. 5), but rather shows a negative trend during the study period, although not statistically significant.The number of stations used in the computation of these correlations ranges from 19 (in 1940) to 93 (in 1991 and 1992), and is always less than 40 before 1980.Because of the sparse coverage of meteorological stations in some areas, especially until the 1980s decade (Table 2), the local relationship between elevation and extreme precipitation was assessed by decade.
The coDSS algorithm uses a different correlation model between the R5D index and elevation within each decade.In order to determine these models, first, the relationship between elevation and precipitation was assessed locally by computing, for each decade, Pearson's correlation coefficients using stations' data falling within a circle centred at each station's location.As in earlier decades meteorological stations are scarce, larger radii were used (Table 2).Sec- ond, the DSS algorithm was applied to interpolate the local correlations by decade.This procedure used a single space-time spherical variogram model of the local correlations, where the spatial dimension was modelled as isotropic, the estimated range of the spatial dimension was 110 000 m, the range of the temporal dimension was 6 decades, and the estimated sill was 0.053.Finally, the correlation models were determined by computing the mean of the distribution of 50 simulated values at each grid node, by decade (Fig. 6).
The estimated correlation between elevation and the R5D index ranges from very weak (−0.21) to moderately strong (0.72) across the region and through decades.In the 1940s decade, correlations range from −0.18 to 0.65 across the study region.The correlation model of this decade shows an unrealistic pattern of correlations on the West of the region, North of Algarve, caused by the scarce number of available stations.Accordingly, the variability associated with estimated correlations over this area is also high.Nevertheless, they correspond to moderate values of approximately 0.50, and so the contribution of elevation to the estimation of the flood indicator will also be moderate.The correlation model of the 1950s decade exhibits a more realistic pattern, with correlations ranging from −0.17 to 0.67.The model of the 1960s decade also shows a realistic pattern of correlations, although the West and North-East areas present high variability.Similarly, the 1970s decade model is realistic but with very high variability within the North-East region.Because of the higher density of stations, the correlation models of the 1980s and 1990s decades exhibit much less variability than the previous ones.
These results suggest that using elevation as a secondary variable in estimation will increase the accuracy of estimates in some locations, i.e. those mountainous areas where correlations are large (e.g. on the West of Algarve and on the North-East of the study region).In contrast, univariate interpolators (e.g., DSS) are likely to provide estimates as accurate as those provided by coDSS, with less computational effort, in places where correlations are small.

Extreme precipitation scenarios
Using the coDSS algorithm, a set of 100 equiprobable simulated realizations of the R5D index was computed at each simulated grid node, by year.For illustration purposes, two equiprobable realizations, for 1945 and 1949, are shown in Fig. 7.The space-time inference was performed by means of computing the mean of those distributions.Uncertainty was assessed by means of computing the standard-deviation (STD) and the coefficient of variation (CV) of the distribution of the 100 simulated values at each simulated grid node, by year.For illustration purposes, the flood indicator maps of six years are shown in Fig. 8, while Fig. 9 presents their uncertainty evaluation measured by the coefficient of variation (CV).
Although the correlation model for the 1940s decade seemed unrealistic, a visual inspection of all produced scenarios for those years reveals a rather realistic spatial pattern of extreme precipitation.On the other hand, the lack of precipitation data on the north-eastern area makes the estimates more uncertain.For that reason, the extreme precipitation values were possibly less accurately estimated over that area in several years.In fact, as expected, one of the regions where the distribution of extreme precipitation shows greater variability, thus more uncertainty, is in the northern part of the maps, corresponding to regions less densely sampled in most years.This is especially evident in the maps of the standard-deviation.Moreover, the scenarios for the last two decades of the twentieth century show less uncertainty than the previous ones because the availability of stations over the study region is considerably greater (Fig. 2).
Fig. 10 shows estimated local probabilities of the coefficient of variation of the scenarios produced for R5D to be greater than or equal to 25%.These probabilities were evaluated as the proportion of the sixty estimated values of CV (computed at each grid cell throughout the years) that exceed that threshold.
Only a few stations are located at medium (>400 m) and high elevations (Table 2), thus greater uncertainty would be expected at those regions.However, the uncertainty in the mountainous regions of the South is often small (Fig. 10), because of the use of elevation as secondary exhaustive information in the spatial interpolation procedure of R5D.
Other common spatial patterns can be observed.In wetter years, the flood indicator exhibits the highest values in mountainous regions of Algarve, especially over the Monchique mountains, due to the greater influence of altitude there than in the rest of the study domain.For this reason, high values also appear over North-East areas in several years.In drier years, the spatial pattern of extreme precipitation is much smoother as estimates have less variability over the study domain.As expected from the space-time continuity analysis (Sect.4.1), the spatial patterns of extreme precipitation are becoming more homogenous over time.This is especially noticeable in the maps of the last two decades of the twentieth century.
The precipitation regimes are of a different nature in northern and southern regions of Portugal: in the north, the precipitation regime has an orographic origin, whereas in the south it is mainly associated with vertical motions induced by cyclogenic activity (Trigo and DaCamara, 2000).In southern Portugal, summer precipitation, almost close to zero during this season, is sometimes associated with local convective activity.These storms can occur with a large degree of independence from the weather circulation type that characterizes the Iberian circulation for that specific day (Trigo and DaCamara, 2000).Changes in the North Atlantic Oscillation (NAO) are likely to be responsible for much of the observed change in the frequency of above 90th percentile winter precipitation between the 1960s and 1990s over Europe (Scaife et al., 2008).Moreover, the NAO-rainfall relationships tend to be stronger during the wet seasons of the last decades of the twentieth century in southern Portugal (Goodess and Jones, 2002;Trigo et al., 2004).Accordingly, changes in the NAO are likely to be responsible for the increase of the spatial continuity of R5D in southern Portugal, which is especially pronounced during the last two decades of the twentieth century.
To illustrate the usefulness of the annual gridded datasets produced for the R5D index, probability maps of extreme precipitation were computed as follows.First, the regional histogram and its basic statistics were calculated using the R5D values from maps corresponding to the climate normal 1961/1990.For example, the median was equal to 105 mm, the third quartile was 125 mm, and the 95th percentile was 160 mm.Second, the probability is approximated by the relative frequency computed at each grid node for 1940/1999.Frequencies correspond to the number of times that R5D values are equal or over a fixed threshold.For example, the probability map corresponding to the 105 mm threshold (Fig. 11) shows that the mountainous regions of Algarve and North-East, and also the West coast have high probability of extreme rainfall events.On the other hand, the probability map for 125 mm (Fig. 11) shows that the heaviest mediumterm rainfall events occur at the Algarve region, especially over the Monchique mountains, as expected.Probability maps such as these might be useful to identify regions at risk of erosion caused by extreme precipitation events.

Conclusions
The study was performed in the South of continental Portugal where the topographic characteristics and spatial climatic diversity are significant, although the region's relief is not very complex if compared to other European study regions (e.g., Prudhomme and Reed, 1999;Perry and Hollis, 2005).The correlation maps of elevation and R5D values indicate that, as the relationship varies locally, the benefits in using eleva- tion data to inform estimation will also vary locally.Therefore, using elevation data in the estimation process is likely to be beneficial, especially in mountainous areas.The estimates accuracy also varies in time, and in earlier decades is strongly dependent on the density of the stations network.
Nevertheless, the flood indicator values at locations between meteorological stations have been estimated to a good degree of accuracy in most years, producing detailed and representative maps of extreme precipitation space-time patterns over the South of Portugal for the 1940/1999 period.Therefore, the produced gridded datasets provide a consistent set of data that allows further investigations on the spatial and temporal variability and trends in the South Portugal climate, over 60 years.In fact, the usefulness of the datasets was illustrated by probability maps of extreme precipitation.
The methodology considered here has clear theoretical advantages over alternative ones, although no single method is optimal for all regions and time periods (e.g., Isaaks and Srivastava, 1989;Lloyd, 2005).Among the sequential algorithms of stochastic simulation, one advantage of direct sequential simulation and cosimulation is the use of original variables instead of transformed ones: Gaussian (sequential Gaussian simulation) or indicator (sequential indicator simulation).The direct sequential cosimulation proved to be a valuable technique to deepen the knowledge on the spacetime patterns of extreme precipitation over the study domain, because it allowed incorporating elevation information into prediction and also used all quality data available for every station.Furthermore, this approach accounted for local data variability and made possible uncertainty evaluations of the proposed scenarios.

Fig. 1 .
Fig. 1.Elevation of the study region in the South of Portugal and stations' locations.

Fig. 2 .
Fig. 2. Distribution of the number of available stations by year.

Fig. 3 .
Fig. 3. Spatial and temporal components of the experimental semivariogram of the 1960s decade data of R5D with the exponential model fitted.

Fig. 6 .
Fig. 6.Local correlation models between elevation and R5D values for each decade.

Fig. 10 .
Fig. 10.Probability of the uncertainty of the R5D index scenarios, measured by the coefficient of variation, to be greater than or equal to 25%.

Table 1 .
Parameters of the exponential variogram models for the R5D index, by decade.

Table 2 .
Distribution of weather stations (with records within each decade) by elevation classes, and radii of the search neighbourhoods used to calculate the local correlations.