Natural Hazards and Earth System Sciences Occurrence Analysis of Daily Rainfalls through Non-homogeneous Poissonian Processes

A stochastic model based on a non-homogeneous Poisson process, characterised by a time-dependent intensity of rainfall occurrence, is employed to explain seasonal effects of daily rainfalls exceeding prefixed threshold values. The data modelling has been performed with a partition of observed daily rainfall data into a calibration period for parameter estimation and a validation period for checking on occurrence process changes. The model has been applied to a set of rain gauges located in different geographical areas of Southern Italy. The results show a good fit for time-varying intensity of rainfall occurrence process by 2-harmonic Fourier law and no statistically significant evidence of changes in the validation period for different threshold values .


Introduction
Investigations on daily rainfall time series may contribute at various levels to the assessment of the existence of temporal and spatial variability in the rainfall field characteristics.The different models that can be used may be classified into four categories (Onof et al., 2000).The first category is formed by meteorological models, which involve complex sets of differential equations in order to represent the complete physical processes controlling precipitation and other weather variables.The second one is represented by stochastic models which describe the spatial evolution of the rainfall process in a scale-independent approach, like the multi-scaling models (Lovejoy and Schertzer, 1990;Gupta and Waymire, 1994).A third one is formed by statistical models (single or multisite temporal), able to represent spatial non-stationarity and temporal trend (Stern and Coe, 1984).Finally, the fourth Correspondence to: E. Ferrari (ferrari@dds.unical.it)category concerns temporal and spatial-temporal stochastic process models which try to summarize the main rainfall features by a few parameters, providing an understanding of the hierarchical structure of the rainfall process.In particular, these models follow a point process approach based upon the use of Poisson and Poisson-cluster processes for explaining the joint or separate arrival and magnitude processes of daily rainfalls (Waymire and Gupta, 1981a, b, c;Kavvas and Delleur, 1981;Yevjevich and Dyer, 1983;Rodriguez-Iturbe et al., 1984, 1987;Onof et al., 2000).
This latter approach has been used for a long time as a tool for the stochastic modelling of rainfall, combining in different ways the basic Poisson process of storm arrivals with the Neyman-Scott or the Bartlett-Lewis point processes.In more recent developments, rainfall variables, such as the duration and intensity of a rainy event, have been treated as correlated variables or, alternatively, correlation has been introduced using independent superposed point processes to represent the different kinds of storms in a region (Cowpertwait et al., 2007;Leonard et al., 2008).The statistical properties of the superposed process are normally obtained by aggregation of the properties of each independent process.In other words, spatial-temporal models can be flexibly structured with continuous distributions of storm types, thus providing a parameter configuration that can be efficiently fitted to a large range of historical rainfall records (Cowpertwait, 2010).Point process rainfall models are generally fitted to the observed sample properties using derived moments and model functions.A good fit is usually assessed by comparing observed properties of data not used to estimate the model parameters with equivalent properties simulated using the fitted model (Entekhabi et al., 1989;Cowpertwait et al., 2007).This procedure, if successfully extended to extreme values, can certify the model suitability, due to the sensitivity of the distributions of the extremes to any departures in the good fitting of the depth distributions.
B. Sirangelo et al.: Daily rainfalls through non-homogeneous Poissonian processes Alternatively, in order to reproduce the temporal and/or spatial non-stationary features in both the analysis and the simulation of rainfall variability, the generalized linear models (GLMs) can be applied (Chandler and Wheater, 2002;Wheater et al., 2005).These models explain daily rainfall variability by using probability distributions in which the probability of rainfall occurrence is modelled separately from the amount of rain if non-zero.By developing appropriate methods of spatial-temporal disaggregation of daily rainfalls, GLMs can also be linked to the single-site models to provide multi-site sequences incorporating sub-daily temporal structure.
With reference only to the temporal variability of the rainfall process, the stochastic point process models, described above, are widely used for at-site analysis of storm occurrences, in order to explain the intermittent feature of rainfalls and simulate inter-storm periods.Nevertheless, the stochastic models often hypothesize temporal stationary processes.In this case, the apparent seasonal periodicity exhibited by the frequency of the rainfall phenomena can be modelled as a stationary stochastic process in short temporal intervals, in which the behaviour can be considered reliably homogeneous.
Another approach of functional type, aimed to cope with seasonal and trend variability of the storm occurrences, is based on temporally non-homogeneous stochastic point processes with time depending parameters (Sirangelo, 1994;Pujol et al., 2007).In this case, great attention has to be paid to the parsimony of the models, as regards the number of parameters and the bias introduced into the generation of synthetic series, and to the influence of threshold values in extracting a peak storm database from daily rainfalls data.
In this work, the procedure adopted for explaining the occurrence process of daily rainfalls greater than the prefixed threshold values is a non-homogeneous Poisson model, with temporal variation of intensity expressed by using Fourier series.The model has been applied to a set of rain gauges located in different geographical areas of Southern Italy.The proposed procedure constitutes the first step of a more comprehensive peak-over-threshold (POT) analysis, in which the occurrence and exceedance of rainy events globally depend on rainfall threshold values.

Non-homogeneous Poisson model for storm occurrence
The storm occurrence process, as successive clustered events, has been explained by several researchers as the result of specific atmospheric conditions able to generate climatic perturbations (Waymire and Gupta, 1981a, b, c;Smith and Karr, 1983;Chang et al., 1984;Onof et al., 2000).This process is generally non-homogeneous in time, as shown for a typical rain gauge of Southern Italy where a marked reduction of occurrence intensity related to daily rainfalls during spring-summer period is detected (Fig. 1).Moreover, the statistical features of the storm occurrence process can be influenced by the threshold chosen for extracting exceedance values from the basic rainfall time series.Both the problems of choosing threshold value and selecting the criteria for retaining peaks represent some of the main difficulties classically associated with the POT analysis.This approach, a kind of compromise between annual maxima frequency and classical time series modelling, requires the analysis of both the magnitude and the arrival time of storm peaks, neglecting the autocorrelation structure of the basic data.The higher rainfall values of different sets of consecutive rainy days are generally assumed as rainfall peaks, thus, practically assuring the independence required for the Poisson process.For the choice of threshold value, which directly influences the distribution of peaks, some statistical tests and practical guidelines are available (Lang et al., 1999).Adopting a non-homogeneous Poisson process, the occurrence probability of n events in a time period [t,t + t] is given by: The mean number of events t,t+ t depends on both time t and interval t: where the law λ(.)represents the temporal variation of the intensity function of the occurrence process.For a homogeneous Poisson process the function λ(.) is constant, therefore, it holds t,t+ t = λ • t.
Actually, the temporal variation of intensity implies changes in some properties of the homogeneous Poisson processes.In particular, the times between consecutive events, which are independent and exponentially distributed in the case of homogeneous process, are distributed with the following conditional probability density: With this position, the waiting times form a Markov chain with transition density (Snyder, 1975): characterised by the following joint probability density: (5)

Temporal variation of rainfall occurrence intensity
Meteorological phenomena show statistically significant seasonal and daily features due to the revolution of the Earth around itself and the sun.Thus, the temporal variation of rainfall occurrence intensity λ(t) can be expressed through the Fourier series as a function of period P = 1 year: where c k (t) and s k (t) indicate the functions: To ensure parsimony of the model, Fourier series has to be limited to an optimal number of harmonics, n h , matched with the goal of the proposed procedure: By integrating the Eq. ( 2) with the expressions ( 7) and ( 8) one obtains: For time intervals equal to 1 yr, independently on the time value t, the mean number of events t,t+ t becomes: Moreover, it is well known that a non-homogeneous Poisson process for rainfall occurrences can be transformed into a homogeneous and unit one through a suitable transformation of the time variable (Cox and Isham, 1980).The transformation is based on the inverse function of Eq. ( 2), i.e.: Operatively, starting from the representation of the function t,t+ t in which t = 0 and t = w varying from 0 to P (Fig. 2), a subdivision of the interval 0; 0,P into n s equal subperiods is carried out on the vertical axis.Therefore, it is possible to determine the corresponding partition of the period P on the horizontal axis, thus, providing n s time subperiods.

Parameters estimation
The evaluation of the optimal number of harmonics n h in Eq. ( 8) plays a crucial role in parameters estimation of λ(t) function.In technical literature, cross-validation techniques are mostly adopted (Picard and Cook, 1984;Cressie, 1993;Balachandrudu et al., 2009), but a limit of these methods consists of the assumption that the residuals, between sample and theoretical values, are independent and Gaussian distributed.These hypotheses need further studying (Cressie, 1993).Consequently in this work, besides the application of cross-validation techniques and, in particular, the evaluation of Cross Validation Error (CVE) (Balachandrudu et al., 2009), we propose a more robust procedure, which is based on the property that a non-homogeneous Poisson process can be transformed into a homogeneous one (Sect.2.1).
In detail, for each rain gauge and for a prefixed number of harmonics n h , the cross-validation approach concerns the following steps: a1. the whole data sample is partitioned into N not overlapped subsamples; b1. a single subsample is withheld and the remaining subsamples are used to estimate the parameters a 0 and a k , b k , with k = 1, ..., n h , of Eq. ( 8) by using least squares method; c1. the mean-squared difference between the theoretical and sample values related to the withheld subsample is computed; d1. the point (b1) and (c1) are repeated for each subsample.
The average of the squared differences over the number of withheld subsamples provides the cross validated square error; its square root represents the CVE.
The optimal number of harmonics n h is characterised by the minimum value of the CVE.Instead, the procedure based on the transformation of a non-homogeneous process into a homogeneous one consists of the following steps, again for each rain gauge and for a prefixed number of harmonics n h : a2.evaluation of the parameters a 0 and a k , b k , with k = 1, ..., n h , of Eq. ( 8) by using least squares method and considering the whole data sample; b2.subdivision of the period P into n s subperiods through the partition of the segment 0; 0,P (Fig. 2); c2.application of two Hald tests (1952), aimed at verifying that the number of occurrences, related to each subperiod, may be derived from the same homogeneous Poisson process (see Appendix A1).
In this case, the optimal number of harmonics n h coincides with the minimum number which satisfies the statistical tests concerning the point (c2).
As regards the application of the least squares method to both the procedures (points a1 and a2), firstly a sample estimation of the occurrence intensity may be obtained.For this scope, considering P = 1 yr and consequently n P years of rainfall observations, each year is subdivided into n T very short time interval of size t (three or five days); in this way n m,p represents the number of sample occurrences related to the m th interval of the p th year (Fig. 3).Therefore, the sample evaluation of occurrence intensity is obtained as: and applying the least squares method the parameters of Eq. ( 8) are evaluated as: where w c,m is the temporal centre of the m th time interval t.
Once partitioned the period P into n s subperiods (Fig. 2), on the basis of the function t,t+ t which depends on the number of harmonics n h (Eq.9), it has been indicated as n (n h ) i,j the sample occurrences related to the j th sub-period (j = 1, . . ., n s ) into the i th year (i = 1, . . ., n P ).Considering the sum of the occurrences: for each of the joint set of sub-periods n s , it can be hypothesized that variable S (n h ) j is Poisson-distributed.The application of Hald tests (see Appendix A1) is aimed at verifying that all the S (n h ) j variables derive from the same homogeneous Poisson process.
The procedures described in this section can be used for modelling the temporal variability of intensity referred to storm occurrences of daily rainfall heights exceeding any threshold value T , defined as: where µ is the expected value of daily rainfall, σ is the standard deviation and K = 1, 2, 3, ... If K = 0, it is assumed T = 0.1 mm.

Dataset
The procedure has been applied to 8 daily rainfall series (Table 1) located in Calabria region (Southern Italy), out of which 4 belong to the western zone and 4 to the eastern zone of the region (Fig. 4).The choice of the database has been based on length and completeness of the time series.
In particular, the time series have been selected on the basis of availability of at least 50 yr in the time period 1921-1986, chosen as the calibration period, and of all the years of observation in the subsequent validation period 1987-2001, where the daily rainfall occurrence process variability is under hypothesis.The mean number of observation years of the dataset during the period 1921-2001 is 71.9, with a maximum of 80 yr (Cosenza rain gauge), while the mean values of daily rainfall frequency are 0.305 and 0.230, respectively, for the western and eastern sets of rain gauges.

Calibration of the non-homogeneous Poisson model
For each prefixed threshold value (K = 0, 1, 2, 3), the parameters estimation of the non-homogeneous Poisson model for different numbers of harmonics has been obtained with reference to the observed data during the period 1921-1986 (calibration period) under the hypothesis of unchanged daily rainfall process.The values of the parameters a 0 , a (n h ) k and b (n h ) k have been obtained by means of least squares method through Eq. ( 13) by considering t equal to 5 days (Fig. 3 and Table 2).For each threshold, Hald tests showed that n h = 2 satisfies the hypothesis of homogeneity for the transformed occurrence process by using n s = 10 (see Appendix A2).
The result n h = 2 is also confirmed by the application of the cross-validation approach.As examples, Fig. 5 illustrates the CVE evaluation related to Cosenza and Crotone rain gauges, for which the data sample was partitioned into N = 10 subsamples.
For each analysed rain gauge and threshold value, Fig. 6 shows the evaluation of a 0 2, which represents the expected value of intensity λ(t) on the whole period P (see Eqs. 8 and 13).For K = 0 (i.e.considering the occurrences of daily rainfall heights greater than 0.1 mm) there are no significant differences between the two zones.Focusing attention on each threshold K ≥ 1, it can be noticed that the western zone is characterised by higher values with respect to the eastern  zone (about twice), that is the frequency of heavy daily rainfalls is greater in the western part of the region.
As an example, the fit of λ(t) theoretical distribution through the Fourier series with 2 harmonics on sampling occurrence process referred to each threshold value is shown in Fig. 7 for Cosenza and Crotone rain gauges, as the representative stations of western and eastern zones, respectively.Analysing the diagrams, it can be noted that, except for K = 0, in the winter period the occurrence intensity λ(t) of Cosenza is about twice as high as the Crotone one, confirming what has been shown in Fig. 6.

Check on occurrence process changes
In order to test the hypothesis that the daily rainfall occurrence process preserves the same behaviour in more recent time periods, the theoretical distribution of intensity function λ(t), evaluated for each threshold value in the calibration period 1921-1986 (Sect. 3.1) (Sect. 3.1), has been also adopted for the 15-yr observation data collected in the validation period 1987-2001.
For each rain gauge, by using the Monte Carlo approach, 1000 synthetic generations of a 15-yr period of daily rainfall occurrences have been carried out and, for every simulation, sample λ(t) has been estimated.This procedure has been adopted because of the complexity of determining analytical  statistical confidence intervals referred to the sample intensity λ(t).
Sample intensity, theoretical function of the calibration period and 95% confidence interval, evaluated by Monte Carlo approach, are reported, as an example, for one rain gauge of each zone (Cosenza and Crotone) in Figs.8-9.The differences between the sample and theoretical λ(t) appear not important, as more than 95% of the sample occurrence intensities are inside the statistical bounds evaluated by synthetic generation.
A further test has been performed for each rain gauge: considering the different threshold values, the root-meansquare error (RMSE), evaluated between the theoretical λ(t) and the sample of recorded data, and the corresponding 95% one-tailed confidence interval, evaluated from the RMSE values between the sample λ(t) of each synthetic series and the theoretical one, were obtained and reported in Fig. 10.Analysing these results, it may be observed that the western zone is characterised by differences, between sample λ(t) of the 1987-2001 period and theoretical λ(t), which are higher than those concerning the eastern zone.This different behaviour between western and eastern zones emerges by considering threshold values K ≥ 1.However, for each zone and threshold, these deviations are statistically acceptable, as all the sample RMSEs are inside their 95% confidence interval.
As a conclusion, the hypothesis that the occurrence process of the observed rainfall, referred to the validation period, derives from the same λ(t) theoretical distribution of the calibration period cannot be rejected at 5% significance level.

Conclusions
In this paper, a non-homogeneous Poisson model has been adopted for the stochastic interpretation of the seasonal variability concerning the daily rainfall occurrence process in 8 rain gauges selected among the longest data series in two different zones of the Calabria region (Southern Italy).
The procedure applied to a calibration time interval  shows that a Fourier series with 2 harmonics represents a good fit for explaining the variability of the occurrence intensity function λ(t) for all the rain gauges, considering different threshold values (POT analysis).This result is obtained by carrying out both a classical approach concerning cross-validation and a more robust technique related to the property that a non-homogeneous Poisson process can be transformed into a homogeneous one.
The theoretical distribution so obtained has been adopted to verify possible changes of λ(t) function for the validation period (1987)(1988)(1989)(1990)(1991)(1992)(1993)(1994)(1995)(1996)(1997)(1998)(1999)(2000)(2001), by using the Monte Carlo approach to generate synthetic series of rainfall occurrences.The results showed that the differences between the sample and the theoretical λ(t) behaviour do not appear significant.Moreover, a statistical test based on RMSE has shown that the hypothesis of the same λ(t) theoretical distribution for calibration and validation periods cannot be rejected at 5% significance level.Therefore, there is no statistically significant evidence of rainfall occurrence process changes for more recent periods in the analysed area.
Further applications of the non-homogeneous Poisson model will concern the joint analyses of the storm occurrence process with the precipitation height marks, interpreted by using a temporally homogeneous model in suitable sub-year intervals.

Fig. 2 .
Fig. 2. Subdivision of the period P into n s subperiods (in this example n s = 10).

Fig. 3 .
Fig. 3. Subdivision of the period P (1 yr) into n T short time intervals of size t and localization of w c,m (m = 1, ..., n T ).

Fig. 4 .
Fig. 4. Daily rainfall series of Calabria region used in the present study.

Fig. 5 .
Fig. 5. Cross validated error (CVE) values estimated for each threshold value and the number of harmonics for (a) Cosenza and (b) Crotone rain gauges.The minimum CVE values occur for n h = 2.

Fig. 6 .
Fig. 6.Evaluation of a 0 2 for each analysed rain gauge and threshold values.

Table 1 .
Main statistics of daily rainfall occurrence process for the selected rain gauges.
Fig. 1.Example of time variability of the intensity function of daily rainfall occurrences.

Table 2 .
Parameters estimation for non-homogeneous Poisson model through least square method applied to daily rainfall occurrence process (data belong to the calibration period 1921-1986; Fourier series with n h = 2 harmonics).