Multimodel SuperEnsemble technique for quantitative precipitation forecasts in Piemonte region

The Multimodel SuperEnsemble technique is a powerful post-processing method for the estimation of weather forecast parameters reducing direct model output errors. It has been applied to real time NWP, TRMM-SSM/I based multi-analysis, Seasonal Climate Forecasts and Hurricane Forecasts. The novelty of this approach lies in the methodology, which differs from ensemble analysis techniques used elsewhere. Several model outputs are put together with adequate weights to obtain a combined estimation of meteorological parameters. Weights are calculated by least-square minimization of the difference between the model and the observed field during a so-called training period. Although it can be applied successfully on the continuous parameters like temperature, humidity, wind speed and mean sea level pressure, the Multimodel SuperEnsemble gives good results also when applied on the precipitation, a parameter quite difficult to handle with standard post-processing methods. Here we present our methodology for the Multimodel precipitation forecasts, involving a new accurate statistical method for bias correction and a wide spectrum of results over Piemonte very dense non-GTS weather station network.


Introduction
Quantitative precipitation forecast (QPF) is one of the more difficult tasks in the weather forecasts, and any attempt in evaluating the uncertainty of the precipitation forecast contributes to a better characterization of the weather effects on the territory.The uncertainty in the QPF evaluation propagates down to the hydrological evaluation of the discharge Correspondence to: D. Cane (daniele.cane@arpa.piemonte.it)forecast in small and medium-sized catchments that are typical of the Mediterranean area.
From a given forecast dataset obtained from different models, many ensemble techniques can be applied, like the "poor man" ensemble (Eq.1), a simple average of the models (requiring no training period), or a bias corrected ensemble (Eq.2).
where N is the number of models, a i are the SuperEnsemble weights, F i is the forecast value, F i is the mean forecast value in the training period and O is the mean observation in the training period.The calculation of the parameters a i is given by the minimisation of the mean square deviation where T is the training period length (days).By derivation Published by Copernicus Publications on behalf of the European Geosciences Union.
D. Cane and M. Milelli: Multimodel SuperEnsemble technique for quantitative precipitation forecasts we obtain a set of N equations: We then solve these equations using the Gauss-Jordan method.
In this paper, we will focus on the quantitative precipitation forecast, and in detail, we firstly examine the ensemble properties (Sect.2), then we introduce a modification to the Multimodel technique (Sect.3) that we consider appropriate for this variable.In Sect. 4 we describe the results obtained with or without this correction (Sect.4.1), the comparison among Multimodel SuperEnsemble, Ensemble, and Poor Man Ensemble (Sect.4.2) and the performance of the operational version of the technique which is running daily at ARPA Piemonte (Sect.4.3).Eventually in Sect. 5 we draw some conclusion.

Ensemble evaluation
The ensemble approach and the related weighting techniques can be easily applied on variables like temperature or wind speed (Cane andMilelli, 2005, 2006), but a certain attention must be put in applying Multimodel SuperEnsemble on a variable like precipitation.There are in fact theoretical and empirical evidences to support the fact that precipitation, as the atmospheric turbulence, has a so-called multi-fractal behaviour (see for instance Lovejoy et al., 1996).This implies that similar features are observed in precipitation fields on a continuum of spatial scales from the very small (centimetres) to the very large (kilometres).This also implies that the numerical models bring along the quantitative precipitation forecast not only their intrinsic error, but also the uncertainty due to the stochastic properties of the precipitation field.Therefore, it is important to study firstly the characteristics of the data ensemble itself, without the application of any weight.
A standard technique for the evaluation of the reliability of an ensemble (that is whether the observed probability distribution is well represented by the ensemble) is the rank histogram (see for example Anderson, 1996or Hamill, 2001), obtained by counting the rank of observations with respect to the ensemble member values.Precipitation forecast has to be treated with care, because the value is often equal to zero (i.e.no precipitation is forecasted or observed).In these cases, we can find a certain number of equal forecasts and the ranking of the observation can be among one of these equal values.Considering the example in Fig. 1: the rank of the observation can be assigned, is not unique.If we choose one of these ranks systematically, we could have an uncorrect calculation of the rank histogram.We then applied a random choice to all the cases where the rank is not uniquely determined, in order not to force the data towards an artificial distribution.
Arpa Piemonte manages a wide non-GTS weather station network.For this work, we used the data collected in the period March 2006-August 2008 from 342 stations.The data were averaged over the 13 warning areas designed by ARPA Piemonte in collaboration with the Civil Protection Department (Fig. 2) on a 6-h basis up to +72 h and for each of them the maximum values of observed precipitation has been assigned.Each warning areas so identified contains on average 26 stations, with a minimum of 11 and a maximum of 39.The chosen spatial scale is practical for the use in an alert system in medium-and large-scale catchments, but it is too much coarse for the discharge calculations in the smallest catchments.Finer resolution calculation are still under investigation.
The models used in this research work are the ECMWF IFS global model (resolution: 0.25 • ) and the 0.0625 • resolution limited area models of the COSMO Consortium covering North-Wester Italy: COSMO-I7 (developed by USAM, ARPA-SIM, ARPA Piemonte), COSMO-EU (Deutscher Wetterdienst) and .It has to be highlighted that our operational implementation (see Sect. 4.3) is based on the forecasts given by the ECMWF model and by the Italian COSMO model only (see www. cosmo-model.orgfor a more comprehensive overview of the Consortium activities and developments).The model forecasts are assigned to the same warning areas by taking the average and maximum values of the gridpoints falling into the given area.(ECMWF model: ∼5 points/warning area; COSMO models: ∼56 points/warning area) In Fig. 3 we show the rank histograms of 8 runs of the 4 models in the period September 2006-August 2008 over all the Piemonte warning areas as described in Fig. 3.The ensemble is in general over-dispersed, that is the ensemble spread is too large, but it has to be pointed out that the inclusion of correct negatives (i.e.days of correct zero precipitation forecast) may artificially change the properties of the ensemble.
If we remove all the correct negatives, although keeping the false alarms and misses, we obtain a different rank histogram (Fig. 4), telling us more information: the ensemble usually overestimates the observation (the rank histogram peaks at low ranks), but there is a significant amount of underestimated forecasts (comprising misses), as evidenced by the higher value of the last rank column.These features are more accentuated for the maxima forecasts over the warning areas.We evaluated also the scatter plots of the ensemble mean Root Mean Square Error vs the ensemble spread (spread-skill diagram) in Fig. 5: the linear relationship between these two values is quite clear, and in particular the correlation coefficient for average values is quite high.
The rank histograms tell us that 88% of observations for average values and 80% for maxima already fall into the uncalibrated ensemble, while the scatter plot allows us to search for a relationship between the ensemble means to optimize the ensemble mean value.In both cases, the plots have been obtained adding up in forecast time (up to +72 h) and in space (over the 13 warning areas, see Fig. 5).The message that is evident from this preliminary analysis is that the ensemble has skill, but is not calibrated, therefore some statistical postprocessing is necessary.The chosen method, as explained  previously, is the Multimodel SuperEnsemble (Eq. 3) which produces a deterministic but calibrated value from a given ensemble.

Multimodel evaluation of precipitation
Precipitation fields are quite difficult to handle with postprocessing techniques, although they are of great interest for operational NWP: the use of Multimodel technique on precipitation requires great care and we developed some simple correction to the basic method to obtain a better precipitation forecast.
D. Cane and M. Milelli: Multimodel SuperEnsemble technique for quantitative precipitation forecasts  the forecast time.We are using instead a moving window for the calculation of the weights: for each day, point and forecast time we evaluate the availability of models and then we calculated weights in the past with the given model number and dataset.This technique is more expensive than a fixed weight calculation, but gives better results and is more flexible for an operational use (Cane and Milelli, 2006).
2. In case of very scarce forecasted precipitation, it is possible to have a spurious bias from the method.Let we consider for example 8 models, 2 of them forecasting 1-2 mm for a given forecast time and the others 0 mm: looking at (Eq. 3), the result is a non-zero precipitation even if probably a zero precipitation should be the best forecast.We then developed a bias correction methodology based on contingency table evaluation during the training period.For each model we calculate the False Alarm Rate (FAR) and Hit Rate (HRR) and, treating them as independent probabilities, we evaluate the Multimodel FAR and HRR by multiplying the Z models having precipitation indices with the reverse indices of the other N-Z models: The Multimodel SuperEnsemble and Ensemble are set to 0 if FAR MM >= HRR MM , that is to say, if the probability of an incorrect rain forecast is higher than the probability of a correct forecast.

Bias-corrected vs. uncorrected Multimodel
Firstly we compared the behaviour of the original Su-perEnsemble with respect to the one with the correction introduced in Sect.3. In Fig. 6 we report the verification indices (Wilks, 1995) for Bias-corrected and uncorrected Multimodel on average and maximum values, calculated in the period September 2006-August 2008 using 8 runs of the 4 models from +12 h to +36 h.The training period is obtained with a moving window of 365 days.The bias correction has a neutral to positive impact: in particular, there is a significant improvement for low thresholds, with a reduction of the bias and of the FAR, while for high thresholds the two methods basically do not differ.This analysis is more evident for the maximum values.The impact on the lower thresholds is expected because the rain/norain limit has probably the largest uncertainty in the models (and this permits to have some improvement), while in case of a structured rain event, all the models can describe it (even though there are differences in the values) because the synoptic forcing is more clear.

Multimodel SuperEnsemble, Ensemble, and Poor Man Ensemble
In the following step of the work we compared Multimodel SuperEnsemble (Eq.3), Ensemble (Eq.2), and Poor Man Ensemble (Eq. 1) in the period September 2006-August 2008 using 8 runs of the 4 models from +12 h to +36 h.The bias correction as described before (Sect.3) is applied for each of them.As expected (Krishnamurti et al., 1999), in this case there is a clear difference between the three techniques (Fig. 7  a very good reduction of the BIAS, very close to 1.There are few differences among SuperEnsemble and Ensemble for maximum values (we guess that the use of weights is not so helpful in this case because the bias reduction applied with the ensemble is the biggest contribution to the error reduction), while for average values the SuperEnsemble shows a good reduction of FAR, with a lower HRR with respect to Ensemble.

Operational forecasts: a comparison with the Direct Model Outputs (DMO)
Eventually, we analysed the results of our operational suite, obtained with the Multimodel SuperEnsemble applied on the ECMWF IFS global model and on the COSMO-I7 limited area model in the period September 2006-August 2008 using 4 runs of the 2 models from +12 h to +36 h.Again, the contingency table correction has been applied.The postprocessed data perform always better than the original models, with a sensitive bias reduction both on average and maximum values (Fig. 8).ECMWF shows a lower FAR for maxima, but this is clearly due to the strong under-biasing due to the different horizontal resolution of the model in comparison to the others.ECMWF gives a smoother precipitation field and it is less prone to double-penalty errors that penalize the LAMs.For this reason the use of ECMWF together with the LAMs introduces a small added value also in maximum forecasts, because it usually underestimates values, but it is more reliable in time and space.
We could say that Multimodel SuperEnsemble is able to catch the best behaviour from each model: the lowest FAR for average values, with a BIAS slightly lower than 1; for maximum values a BIAS comparable with COSMO-I7, but better HRR and FAR.
Figure 9 shows the same comparison, for fixed significant thresholds, as a function of the season.SuperEnsemble is able to provide a stable-in-time performance, almost always better than the original models: in fact the BIAS is stably close to 1, the HRR is slightly lower for the average values and higher for maximum values and the FAR is significantly lower than the models (for maximum values only COSMO-I7 has to be taken into account due to the strong under-biasing of ECMWF).
The strong positive bias peak shown by the DMOs in Fig. 9 for DJF 2008 is probably due to a significant decrease of precipitation observed during the period with respect to the www.nat-hazards-earth-syst-sci.net/10/265/2010/Nat.Hazards Earth Syst.Sci., 10, 265-273, 2010 We are now working on Multimodel SuperEnsemble error evaluation methods, including bootstrap, and on ensemble calibration methods, in order to keep a probabilistic point of view.We are trying to evaluate the correct PDF of the Multimodel forecasts starting from the observed PDF conditioned to the forecasts, combining Multimodel SuperEnsemble and a BMA-like ensemble dressing: the first results are encouraging and we are planning to apply the so-found probabilistic precipitation forecasts as the input of our hydrologic chain, in order to evaluate the uncertainties in the discharge calculations.The use of finer spatial resolution (for example: at station locations) will be necessary for this application.
Moreover we are working on the downscaling of Regional Climatic Models with a Multimodel SuperEnsemble technique: in this case we use as training period the control runs and we apply the weights calculated in this period to the model scenarios.

Figure 1 .
Figure 1.Example of calculation of the rank of observation with respect to an ensemble of forecasts: the observation can not be assigned univocally.

Figure 2 .
Figure 2. A drawing of the 13 warning areas in Piemonte region.Blue numbers represent the observed (or forecast) 6-hours precipitation (mm) averaged over the catchment, while red numbers are the maximum precipitation observed (or forecast) in the catchment.

Fig. 1 .Figure 1 .
Fig. 1.Example of calculation of the rank of observation with respect to an ensemble of forecasts: the observation can not be assigned univocally.

Figure 2 .
Figure 2. A drawing of the 13 warning areas in Piemonte region.Blue numbers represent the observed (or forecast) 6-hours precipitation (mm) averaged over the catchment, while red numbers are the maximum precipitation observed (or forecast) in the catchment.

Fig. 2 .
Fig. 2. A drawing of the 13 warning areas in Piemonte region.Blue numbers represent the observed (or forecast) 6-h precipitation (mm) averaged over the catchment, while red numbers are the maximum precipitation observed (or forecast) in the catchment.

Figure 3 :Figure 3 :
Figure 3: Rank histogram of the observation with respect to the ensemble average precipitation (a) and maximum precipitation (b) over the warning areas.

Fig. 3 .
Fig. 3. Rank histogram of the observation with respect to the ensemble forecasts for average precipitation (a) and maximum precipitation (b) over the warning areas.

Figure 4 :Figure 4 :
Figure 4: Corrected rank histogram of the observation with respect to the ensemble forecasts for average precipitation (a) and maximum precipitation (b) over the warning areas.

Fig. 4 .
Fig. 4. Corrected rank histogram of the observation with respect to the ensemble forecasts for average precipitation (a) and maximum precipitation (b) over the warning areas.

Figure 5 :Figure 5 :
Figure 5: Ensemble mean Root Mean Square Error vs ensemble spread for precipitation (a) and maximum precipitation (b) over the warning areas.

Fig. 5 .
Fig. 5. Ensemble mean Root Mean Square Error vs ensemble spread for average precipitation (a) and maximum precipitation (b) over the warning areas.

Fig. 6 .
Fig. 6.Verification indices for Bias-corrected and uncorrected Multimodel as a function of the threshold (mm) for +12 h/+36 h forecasts.Average values (left) and maximum values (right) BIAS, HRR and FAR.

Fig. 7 .
Fig. 7. Verification indices for Multimodel Ensemble, SuperEnsemble and Poor Man Ensemble as a function of the threshold (mm) for +12 h/+36 h forecasts.Average values (left) and maximum values (right) BIAS, HRR and FAR.
): Poor Man Ensemble shows a quite high BIAS, resulting in a strong FAR.SuperEnsemble and Ensemble show Verification indices for Multimodel SuperEnsemble ECMWF IFS 00:00 UTC and 12:00 UTC, and COSMO-I7 00:00 UTC and 12:00 UTC runs as a function of the threshold (mm) for +12 h/+36 h forecasts.Average values (left) and maximum values (right) BIAS, HRR and FAR.