The validation service of the hydrological SAF geostationary and polar satellite precipitation products

The development phase (DP) of the EUMETSAT Satellite Application Facility for Support to Operational Hydrology and Water Management (H-SAF) led to the design and implementation of several precipitation products, after 5 yr (2005–2010) of activity. Presently, five precipitation estimation algorithms based on data from passive microwave and infrared sensors, on board geostationary and sun-synchronous platforms, function in operational mode at the H-SAF hosting institute to provide near real-time precipitation products at different spatial and temporal resolutions. In order to evaluate the precipitation product accuracy, a validation activity has been established since the beginning of the project. A Precipitation Product Validation Group (PPVG) works in parallel with the development of the estimation algorithms with two aims: to provide the algorithm developers with indications to refine algorithms and products, and to evaluate the error structure to be associated with the operational products. In this paper, the framework of the PPVG is presented: (a) the characteristics of the ground reference data available to H-SAF (i.e. radar and rain gauge networks), (b) the agreed upon validation strategy settled among the eight European countries participating in the PPVG, and (c) the steps of the validation procedures. The quality of the reference data is discussed, and the efforts for its improvement are outlined, with special emphasis on the definition of a ground radar Published by Copernicus Publications on behalf of the European Geosciences Union. 872 S. Puca et al.: The validation service of the hydrological SAF geostationary products quality map and on the implementation of a suitable rain gauge interpolation algorithm. The work done during the HSAF development phase has led the PPVG to converge into a common validation procedure among the members, taking advantage of the experience acquired by each one of them in the validation of H-SAF products. The methodology is presented here, indicating the main steps of the validation procedure (ground data quality control, spatial interpolation, upscaling of radar data vs. satellite grid, statistical score evaluation, case study analysis). Finally, an overview of the results is presented, focusing on the monthly statistical indicators, referred to the satellite product performances over different seasons and areas.


Introduction
The European Organization for the Exploitation of Meteorological Satellites (EUMETSAT) Satellite Application Facility on Support to Operational Hydrology and Water Management (H-SAF, http://hsaf.meteoam.it;Mugnai et al, 2013b) was initiated in 2005, and aims to provide remote-sensing estimates of relevant hydrological parameters: rain rate and cumulated rainfall, soil moisture at the surface and in the root zone, snow cover and snow water equivalent.The H-SAF project involves experts from 11 EUMETSAT members or cooperating states (Austria, Belgium, Bulgaria, Finland, France, Germany, Hungary, Italy, Poland, Slovakia and Turkey), and from the European Centre for Medium-range Weather Forecast.H-SAF is hosted by the Italian Air Force National Meteorological and Climatological Service (CN-MCA).
The H-SAF main objectives are two as from the H-SAF DP project plan: (1) to provide new satellite-derived products (precipitation, snow parameters and soil moisture) from existing and future satellites with sufficient time and space resolution to satisfy the needs of operational hydrology, and (2) to perform independent validation in order to assess the usefulness of the new products for fighting against floods, landslides, and avalanches, and evaluating water resources.
The H-SAF operational goal highlights the need to provide products with a reliable measure of their accuracy in order for the potential users to be aware of the advantages and drawbacks of the use of the H-SAF products in their operational activities.To this aim, a large effort is devoted within H-SAF to the estimation of the error structure for the different satellite products.This type of activity, normally related to the development of any remote-sensing retrieval technique, is often called "validation".The satellite product is compared with a reference field from sensors other than those involved in the product build-up, and a measure of the discrepancy is assumed as an error of the product.Three validation groups have been established within H-SAF according to the product typology: precipitation, soil moisture and snow.
The present work is focused on the activities of the Precipitation Products Validation Group (PPVG), established since the early beginning of H-SAF, and presents a summary of the first results of the validation, and an outline toward the development of a common validation algorithm.
The PPVG gathers experts from the national meteorological and hydrological institutes and/or research institutions from the contributing countries of Belgium, Bulgaria, Germany, Hungary, Italy, Poland, Slovakia and Turkey.The PPVG is in charge of collecting ground data from the national institutions of the participating countries, performing ground data pre-processing (quality control, upscaling/down-scaling), comparing ground fields and satellite products at the proper scales, and computing statistical quality indicators (Nurmi, 2003, Ebert, 2007).Finally, validation reports are delivered regularly to EUMETSAT and published on the H-SAF webpage.
The validation of precipitation products is particularly challenging, given the highly variable nature of the precipitation fields over a wide range of spatial and temporal scales (Zawadzki, 1975;Kursinski and Mullen, 2008).This makes it difficult to set up a reliable, spatially and temporally continuous reference field, suitable to be matched to the satellite estimates: ground weather radar (Chandrasekar et al., 2008;Capacci and Porcù, 2009;Lábó, 2012;Rinollo et al., 2013) and rain gauge networks (Dinku et al., 2007;Sohn et al., 2010) are mainly used to provide rainfall reference fields for validation studies.A number of studies, however, point out that care should be taken in comparing satellite and ground-based precipitation estimates for validation purposes.A representativeness error is introduced when comparing areal instantaneous data (from satellites) with punctual cumulated values (from rain gauges) (Zawadzki, 1975;Kitchen and Blackall, 1992;Habib et al., 2009), pointing out that this error is not negligible (Porcù et al., 2014).Intrinsic discrepancies between satellite and ground radar estimates are also to be expected due to the different points of view of the two sensors (Habib and Krajeski, 2002;Chandrasekar et al., 2008;Rinollo at al., 2013).To cope with these difficulties, satellite missions devoted to precipitation studies have developed their own validation structures, such as the Tropical Rainfall Measuring Mission (TRMM) (Wolff et al., 2005) and the Global Precipitation Measurement (GPM) Mission (Schwaller et al., 2011).
The paper is structured as follows.In Sect. 2 an overview of the PPVG and its components is presented, while Sect. 3 summarizes the satellite products validated by the PPVG.Section 4 aims at introducing the ground data characteristics and the pre-processing tools developed by the PPVG.Section 5 presents the up-scaling strategies for matching products and ground reference and Sect.6 introduces the statistical scores and summarizes some preliminary results.In Sect.7 a subset of the validation results is presented, while conclusions are drawn and the perspectives of future work are outlined in Sect.8.

Overview of the PPVG activity
The PPVG is a multidisciplinary group composed of hydrologists, meteorologists and precipitation ground data experts, under the coordination of the Italian Civil Protection (DPC).Each PPVG member is directly involved in the product validation activities, relying on about 4100 rain gauges and 59 meteorological radars (see Fig. 1).Since the beginning of the project, a twofold validation strategy has been defined: systematic -monthly scale -evaluation of statistical indicators (multi-categorical and continuous) and case study analysis.These two components are considered complementary in assessing the accuracy of the instantaneous and cumulated satellite products.Monthly analysis of statistical skill indicators helps in identifying the existence of discrepancies, while selected case studies are useful in identifying the roots of such discrepancies.
The heterogeneity of the H-SAF region, due to climatology, land cover, orography, and types of ground observations available for each country, represents an important resource for the PPVG, as it allows it to investigate different aspects of the satellite product accuracy, but it has also required the definition of and agreement on a common validation methodology among different countries.This common validation methodology has been defined and applied by all the PPVG members, in order to make the statistical results obtained by the different institutes comparable and to provide an overall picture of the satellite products' performances.
Each institute participating in the PPVG selects the ground data considered more reliable and representative of the precipitation field in its own country.This implies that the ground precipitation reference is not just the composite of the national operational ground networks, but is derived from ground data selected purposely for satellite precipitation product validation.
The main steps of the common validation methodology are: -ground data selection, error analysis and quality control for radar and rain gauges, During the DP each PPVG member had developed its locally implemented validation software, following the common validation methodology.As the project progressed, during the First Continuous Development Phase (CDOP-1) (2010-2012), the need for an improvement in the validation quality and consistency has led to the definition of a unified validation software called the "common validation code", currently in use for validation with radar data, and under testing for validation with rain gauge data, to be used by all the member institutions.

Satellite precipitation products
Five H-SAF satellite-based precipitation algorithms and associated products have been validated by the PPVG and the results of the validation activity are presented and discussed in Sect.7.These five precipitation products/algorithms are listed in Table 1, which provides for each product the base name acronym, a brief algorithm product description, the list of satellite data used, and the space and time resolutions of the products -not necessarily matching those of  the used sensors.Hereafter, a short description of these algorithms and products is provided.For a detailed description, the reader is referred to the companion paper by Mugnai et al. (2013b), presenting the complete set of precipitation algorithms/products developed and used, or under development, within the H-SAF CDOP-2 phase, and describes in some detail the six products developed during the DP and the CDOP-1 (among which, five have been validated).
Two of the validated products are based on passive microwave (MW) measurements taken from radiometers onboard different sun-synchronous near-polar-orbiting low-Earth-orbit (LEO) satellites: PR-OBS-1 (developed by CNR-ISAC) utilizes the Special Sensor Microwave Imager/Sounder (SSMIS) conically scanning radiometers flown onboard satellites of the US Defense Meteorological Satellite Program (DMSP), while PR-OBS-2 (also developed by CNR-ISAC) utilizes the coupled Advanced Microwave Sounding Unit A (AMSU-A) and Microwave Humidity Sounder (MHS) cross-track scanning radiometers that are flown onboard the US National Oceanic and Atmospheric Administration (NOAA) Polar-orbiting Operational Environmental Satellites (POES), referred to as NOAA-18 and NOAA-19, as well as on EUMETSAT's two Meteorological Operational satellites MetOp-A/B.Furthermore, there are two combined IR-MW precipitation products, PR-OBS-3 and PR-OBS-4 (both developed by CNR-ISAC), which utilize infrared (IR) measurements taken by the Spinning Enhanced Visible and Infrared Imager (SEVIRI) instrument onboard the geostationary (GEO) Meteosat Second Generation (MSG) satellites in combination with the MW-only precipitation estimates PR-OBS-1 and PR-OBS-2.Finally, there is an accumulation-based product, PR-OBS-5 (developed by CN-MCA), which cumulates precipitation on the SEVIRI grid presently obtained from PR-OBS-3.
PR-OBS-1 dwells on a physically based Bayesian MW precipitation retrieval algorithm that was developed according to a new methodology called the Cloud Dynamics and Radiation Database (CDRD) (Sanò et al., 2013;Smith et al., 2013; see also Mugnai et al., 2013a) (Casella et al., 2013).Note that the version of PR-OBS-1 that has undergone validation is a preliminary version which uses a subset of dynamical thermodynamichydrological parameter constraints in addition to the multispectral MW brightness temperatures (TBs) measured by available satellite-borne radiometers to retrieve instantaneous precipitation at 30 km ground resolution -which is about four times coarser than the 13.2 × 15.5 km 2 resolution (consistent with the SSMIS high-frequency window channel resolution) of the present version of the algorithm.PR-OBS-2 is based on an artificial neural network (ANN) algorithm.The version which has undergone validation was originally inspired by the ANN-based precipitation retrieval algorithm developed by Surussavadee and Staelin (2008a, b), which was trained through a database generated from CRM simulations of several precipitation events around the globe.Within H-SAF, a new version of the algorithm has been recently developed, optimized for the European/Mediterranean Basin area by means of a newly developed optimal threelayer ANN trained using the same 60 CRM simulations and the same radiative transfer code used for the CDRD algorithm of PR-OBS-1.This new version of the algorithm is called the Passive microwave Neural-network Precipitation Retrieval (PNPR) algorithm and is described by Mugnai et al. (2013b).The PR-OBS-2 product spatial resolution is defined according to the variable MHS sensor resolution, which varies from 16 × 16 km 2 (circular) at nadir to approximately 27 × 53 km 2 (elliptic) at scan edge.
PR-OBS-3 provides an instantaneous rain intensity product at the temporal (15 min) and spatial (∼ 8 km 2 over the H-SAF area) resolution of the MSG SEVIRI, using the blendedsatellite rapid-update technique originally developed at the US Naval Research Laboratory (NRL) -and therefore referred to as the NRL Technique (NRLT).The NRLT is based on a real-time, underlying collection of time and space matching of IR TBs at 10.8 µm from GEO satellites and rain intensity estimations from MW satellite sensors (Turk et al., 2000;Turk and Miller, 2005;Torricella et al., 2007).The NRLT technique for PS-OBS-3 is fed by PR-OBS-1 and PR-OBS-2 MW estimates.
PR-OBS-04 is based on the precipitation rate merging technique called the CPC MORPHing technique (CMORPH), which was developed at NOAA's Climate Prediction Center (CPC) (Joyce et al., 2004).CMORPH generates synthetic MW rain fields at any time between two successive MW observations using the rain estimates for these MW observations and the advection vectors, calculated with GEO IR data, to connect these estimates in space and time.Within H-SAF the CMORPH method uses the rain rate fields from PR-OBS-1 and PR-OBS-2, while the morphed rain fields are produced on a pre-assigned grid having 8 km spatial resolution and a 30 min sampling time.
PR-OBS-5 provides a cumulated precipitation product on the ground, which is based on a procedure that uses as input the precipitation intensities generated by PR-OBS-3 (and soon by PR-OBS-4).The product is generated for each SE-VIRI pixel, but 3-4 SEVIRI neighbouring pixels are convoluted in such a way that the actual PR-OBS-5 spatial resolution is 30 km.Nevertheless, the sampling is still made at ∼ 5 km intervals, roughly consistent with the SEVIRI pixel grid over Europe.The product is generated every 3 h, and provides the cumulated precipitation over 3, 6, 12, and 24 h prior to the reference time (i.e.nominal time).

Ground data description and pre-processing
Ground data used for validation by the PPVG are derived from about 4100 rain gauges and 59 meteorological radars, belonging to the eight involved countries.National rain gauge networks differ for instrument density, temporal sampling and instrument type.Radar data, on the other hand, are different for space and time resolution, antenna scan mode, pre-processing algorithms and rainfall retrieval technique.Moreover, the countries also differ for orography (which has strong effects on radar visibility and clutter, on the reliability of rain gauge interpolated measurement in the case of low spatial density, and on the precipitation structure itself), coastal and sea areas, and precipitation climatology.All such factors cause the reliability of ground data to vary from area to area, also affecting the validation results.
For all these reasons, it is considered important to evaluate the quality index maps to be associated with the reference ground data.The quality index, which is a function of position and time, summarizes into a number between 0 and 1 all the information useful for defining the reliability of the ground data with which it is associated.
Theoretical and empirical approaches are under development by the PPVG for quality evaluation of radar and rain gauge data: for radar data the index is already defined, and it is presented in Sect.4.4, while the index for rain gauge data is still under study.Even for the radar data, however, the quality index is not yet used operationally, and will be ingested in the common validation code.Currently each country applies its own quality filter to select reliable data.

Characteristics of the rain gauge national networks
Most of the gauges used in the national networks by the PPVG partners are the tipping bucket type, which is the most common device used worldwide to gather long-term rain rate ground measurements.Several sources of uncertainty in the measurements are well known, but difficult to mitigate.First, very light rain rates (1 mm h −1 and less) can be estimated incorrectly due to the long time it takes for the rain to fill the bucket (Tokay et al., 2003).On the other hand, high rain rates (above 50 mm h −1 ) are usually underestimated, due to   the loss of water during the tipping of the buckets (Duchon and Biddle, 2010).Wind can also greatly reduce the size of the effective catching area, as rain does not fall vertically, resulting in a rain rate underestimation assessed quantitatively at about 15 % for an average event (Duchon and Essenberg, 2001).
Further errors occur in the case of solid precipitation (snow or hail), when frozen particles are collected by the funnel but are not measured by the buckets, resulting in a temporal shift of the measurements, since the melting (and thus the measure) can take place several hours (or days, depending on the environmental conditions) after the precipitation event (Leitinger et al., 2010;Sugiura et al., 2003).All these errors can be mitigated and reduced, but in general not eliminated, by careful maintenance of the instrument and/or the use of longer cumulation intervals.
In Table 2, the main characteristics of the PPVG rain gauges are reported.A key feature of a rain gauge network is the instrument density: it expresses the capability of a network to detect small-scale precipitation patterns, especially in the case of convective rain, dominant during warm months at mid-latitudes.The distance between each rain gauge and the nearest neighbour, averaged over all the instruments considered in the network, is assumed as a measure of the rain gauge density, hereafter referred to as the average minimum distance (AMD).The AMD for the kth rain gauge is defined as AMD k = min(|x k −x j |) for j = k, where x j is the position vector of the j th rain gauge.Instrument number and network AMD are reported in Table 3 for all the national networks.The AMD ranges between 7 km (for Bulgaria, where only three river basins are considered) and 27 km (for Turkey).These numbers should be compared with the decorrelation distance for precipitation patterns at mid-latitudes.Usually the decorrelation distance is defined as the minimum distance between two measures to get the Pearson's correlation coefficient reduced to e −1 .In Fig. 2, the correlation coefficient between two hourly measures as a function of the mutual distance is shown for 2009.These plots, obtained for the Italian rain gauge network, but representative of mid-latitude precipitation, show that the decorrelation distance varies from about 10 km in warm months (where small-scale convection dominates) to 50 km in cold months, when stratified and long-lasting precipitation mostly occurs.
Table 3 also reports the type of data pre-processing carried out by each institute during the first H-SAF phase, and by each institute before the matching with satellite products.As mentioned, the PPVG decided to homogenize the data preprocessing within a common validation code, which will be used in the H-SAF second operational phase.
The wide range of AMD values reported in    algorithms generally depends on AMD, terrain physiography and precipitation climatology.

Rain gauge spatial interpolation
Rain gauge measurements used for validation derive from networks having different geographical distributions, densities and quality.After a first phase of the project, when each partner used its own approach (see Table 3), the PPVG decided to apply a common interpolation strategy to all the rain gauge national networks.In order to obtain a regular field for comparison with satellite products, the rain gauge measurements are interpolated onto a unique European grid, with grid cell size of 5 km (similar to the SEVIRI resolution).The spatial interpolation of the measurements of a rapidly variable quantity (such as the rainfall rate) is problematic because it is difficult to model the relationship among the rain rate at a given grid point and those measured by the nearby gauges.Other approaches, such as single gauge nearest neighbour or weighted average of gauges within a satellite instantaneous field of view (IFOV), attempted by the PPVG during the years, present similar shortcomings, assuming that a single gauge correctly represents the rain field within a satellite IFOV, which is several tens of km 2 wide.The PPVG finally assumed that the advantages of using interpolation overcome the drawbacks.Three different interpolation techniques have been proposed and tested here: the Barnes method (Barnes, 1964), Ordinary Kriging (based on the works following Krige, 1951) and the Random Generator of Spatial Interpolation from uncertain Observations (GRISO).The GRISO (Pignone et al., 2010) is an improved Kriging-based technique implemented by the International Centre on Environmental Monitoring (CIMA Research Foundation).The GRISO technique preserves the values observed at the rain gauge location allowing for a dynamical definition of the covariance structure associated with each rain gauge by the interpolation procedure.Each correlation structure may depend both on the rain gauge location and on the accumulation time considered.GRISO may also provide probabilistic maps of the variance of the interpolation and the probability of rain/no-rain areas.The comparison of these three different spatial interpolation techniques was performed on a data set of 50 hourly measurements (referring to six meteorological events in different seasons throughout the year 2009) from 340 rain gauges located in Tuscany, central Italy (Porcù et al., 2014).The original network density was gradually reduced to subnetworks with AMD ranging between 7.5 and 27.5 km (the same densities of the coarsest and densest H-SAF networks).The ability of each technique to reconstruct the original rain gauge measurement field is evaluated by comparing the interpolated field obtained from each reduced sub-network to the one obtained with the same interpolation technique, but considering the whole rain gauge network.A subset of statistical scores (POD, FAR, RMSE) was calculated as a function of the network density, and compared among the three different methods.The results show a better performance of the GRISO method, which was then adopted (Fig. 3).Note that this analysis aims at selecting the more stable interpolator with respect to AMD variations, and not to evaluate the performances of the different techniques.
In particular, the POD comparison shows the attitude of the GRISO technique to better reconstruct the original rain gauge information even for the coarsest grids.reconstruction of the original rain field is also better using GRISO, as the RMSE plot shows.These results induced the PPVG to adopt the GRISO technique as a common spatial interpolation of rain gauge data in the H-SAF area.

Characteristics of the national radar networks
The inventory of radar data, networks and products used by the PPVG has pointed out that all the institutes declared that the radar systems are well maintained and periodically checked.In Figure 1  All radars have Doppler capability, which means that ground clutter can be effectively removed from the radar data measurements.However, not all of them have dual polarization, which would be important to correct rain path attenuation.
The characteristics of the national radar networks are summarized in Table 4.The number of scanned plan position indicator (PPI) maps ranges between 4 and 15, with an average of around 10 for all countries.
Radar-based rainfall products are obtained after processing the measured radar reflectivity at different elevations (Rinollo et al., 2013).After each elevation, the PPI products and the constant altitude PPI (CAPPI) products are calculated.The institutes involved in the PPVG use mostly CAPPI products for calculation of rainfall intensities, except for Hungary, which uses the CMAX data (maximum radar reflec-tivity in each pixel column among all of the radar elevations).However, the rest of the countries chose different elevations for the CAPPI product, which provides the basis for rain rate estimations.Moreover, the countries apply different techniques for radar data composition.The composition technique is important in areas covered by more than one radar measurement.Also, the geographical projection varies from one country to the other.

Quality evaluation of radar data
All radars available to the PPVG are regularly maintained and calibrated, which is a good indicator of the continuous supervision of radar data quality: only the radar data passing the quality control of the owner institute are used by the PPVG for validation activities.However, each country has its own criteria to evaluate the data quality, depending on the radar characteristics and main sources of error in the radar measurements.Moreover, the rainfall rates are computed with different algorithms, so that the estimation of radar data quality provided by the different countries is not homogeneous.To mitigate this problem, the PPVG has defined a surface rain intensity (SRI) product and quality index directly from the available radar raw data, in order to unify the precipitation field and quality index generation.
It is well known that there is no unique way to evaluate radar quality as well as to deal with radar error sources.However, it is possible to provide a theoretical definition of data quality that might require a specific set-up for every radar system (Vulpiani et al., 2012).
Quantitative precipitation estimation from ground-based weather radars is a cumbersome task considering it is affected by several error sources.They might be classified into five main classes (Wilson and Brandes, 1979  -distance-related effects, i.e. distance from the radar, height of measurement, beam broadening, nonuniform beam filling, -propagation-related effects, i.e. attenuation, anomalous propagation, -inversion effects, i.e. microphysical variability resulting in inappropriate Z-R relations.
The effects taken into account in the elaboration of the radar quality information inside the PPVG are clutter, beam blocking, distance from the radar, and attenuation.
The height of measurement is also taken into account by correcting the estimations of the mean vertical profile of reflectivity.After such correction, the quality index associated with the height of measurement is considered equal to 1.
For every value of the measured reflectivity Z(r, azimuth, elevation), the associated quality index is expressed as where Q clutter (r, azimuth, elevation) is the partial quality index associated with ground clutter, calculated as the convolution of different parameters (static clutter map, radial velocity, texture of differential reflectivity, texture of co-polar correlation coefficient and texture of differential phase shift).
The data with Q clutter < 0.60 are rejected.Figure 4 shows an example of a radar image, the corresponding Q clutter map and the clutter-filtered image.Q vis (r, azimuth, elevation) is the partial quality index associated with partial beam blocking, calculated as 1-PBB, where PBB is the partial beam blocking proposed by Bech et al. (2003 where y is the difference between the height of the terrain and the height of the centre of the radar beam (h), and a is the radius of the beam cross section.The height of the centre of the radar beam h at a distance r can be written as (Doviak and Zrnić, 1993) where R is the Earth's radius, θ the antenna elevation, H 0 the radar antenna height and k e = 4/3 (assuming the wave propagation of the standard atmosphere).If Q vis is above 0.3 (PBB below 0.7), the partial beam blocking effect is corrected as in Tabary (2007) and Q vis for the corrected data reset to 1 (Fig. 5).
Q range (r) is the partial quality index associated with the beam broadening with the distance r from the radar.It can be expressed (Friedrich et al., 2006) as where r max can be set to 150 km and r min = r/2 ( r is the radar range resolution).Figure 6 shows the quality associated with the range distance for the sample radar image.
Q atten (r, azimuth, elevation) is the partial quality index associated with the path-integrated attenuation when the beam passes through rain (see Vulpiani et al., 2008).It is evaluated as where PIA min =1 dB, PIA max = 5 dB and PIA is the pathintegrated attenuation that can be computed from the radar reflectivity Z (expressed in mm 6 m −3 ) as follows: The specific attenuation is considered equal to zero above the freezing level height, and also in the 500 m immediately below it.
A median filter is applied to Z before evaluating the attenuation, in order to filter out unrealistic values.Figure 6 shows the path-integrated attenuation for the sample radar image.
As already stated above, the overall quality is the product of the partial qualities.Figure 7 shows the overall quality index for the sample radar image.
The quality index described above was agreed upon among all the PPVG members and the implementation of its ingestion in the common validation procedure is in progress.A preliminary impact study of the introduction of the quality index in the validation of satellite products using radar data has pointed out that introducing this quality information as a filter has a substantial impact on the statistical score evaluation, and even influences the process of reaching the user requirements by the precipitation products (Rinollo et al., 2013).In fact, the test cases considered for this impact study showed an improvement in the statistical indicators such as the fractional standard error and the relative RMSE of a factor even greater than 2, when threshold on quality index is increased from 0.0 to 0.8.Thus, the introduction of a filter based on the quality index can help to avoid a marked overestimation of the product error.

Up-scaling of ground data vs. satellite native grid and time matching
Since the beginning of the project the PPVG has decided to validate each satellite product on its native grid in order to evaluate the accuracy of the product as it is available to the users, and to avoid remapping and local smoothing.Thus, the radar data, which have resolutions higher than all the H-SAF satellite products, are always up-scaled to a product's native grid.For the interpolated rain gauge data, instead, when the resolution of the satellite product is comparable to 5 km (PR-OBS-3, PR-OBS-4 and PR-OBS-5), nearestneighbour matching is performed, while for coarser satellite product resolutions (PR-OBS-1, PR-OBS-2) the interpolated rain gauge data are up-scaled.The PPVG members that do not use interpolation (see Table 3) simply average the values measured by the rain gauges within the given satellite IFOV.

Microwave-based products
PR-OBS-1 is based on data from SSMIS conical scanners, while PR-OBS-2 is based on data from the AMSU crosstrack scanner.The conical scanners provide images where each IFOV is observed with the same viewing angle, which implies a constant optical path in the atmosphere and a homogeneous impact of the polarization effects (see Kunkee et al., 2008).Conical scanners provide constant resolution across the image, though changing with frequency.In SS-MIS, the IFOV has a constant elliptical dimension, with the major axis elongated along the viewing direction and the minor axis along scan, approximately 3/5 of the major.Its size is dictated by the antenna diameter (actually, the antenna is slightly elliptical, to partially compensate for panoramic  distortion), but also by the portion of the antenna effectively illuminated.As to the footprint, the area subtended as a consequence of the bi-dimensional sampling rate, the sampling distance along the satellite motion, i.e. from scan line to scan line, is invariably 12.5 km, determined by the satellite velocity on the ground and the scan rate.

Nat
The AMSU/MHS cross-track scanners provide images with constant angular sampling across tracks, which implies that the IFOV elongates as the beam moves from nadir toward the edge of the scan.The elongation is such that: -for AMSU-A the IFOV at nadir is 48 × 48 km 2 ; at the edge of the 2250 km swath it is 80 × 150 km 2 ; -for MHS the IFOV at nadir is 16 × 16 km 2 ; at the edge 27 × 53 km 2 .
PR-OBS-2 follows the scanning geometry and IFOV resolution of the MHS scan, so that each pixel along the scan has a precipitation value representative of an elliptical region.Please refer to Bennartz (2000) for the analytical expressions of AMSU-A and MHS radiometer IFOV resolutions.
In both cases (conical and cross-track scanners) the sensor measurement refers to an elliptic area and the measured value can be interpreted as the weighted average of the values in the ellipse, with a two-dimensional Gaussian function approximating the antenna pattern, as sketched in Fig. 8.The same applies for the derived rainfall product, so that the 2-D Gaussian function is used in the validation to weigh the ground data measurement falling into the ellipse, and the obtained weighted average is compared with the product value  Table 5. Classes for instantaneous rain rate.
corresponding to the ellipse.Figure 9 shows an example of a microwave-based product image (PR-OBS-1 detecting rainfall over Hungary).

Infrared-based products
The infrared-based products PR-OBS-3, PR-OBS-4 and PR-OBS-5 have higher spatial resolution compared to that of microwave products.The radar data, in this case, are up-scaled by simply averaging the rain rates of the radar cells contained in the satellite pixel.Regarding the interpolated rain gauge data, the resolution of the interpolation grid is nearly the same as that of IR-based satellite products.Thus, the satellite pixels and the interpolated rain gauge field grid points are matched following a nearest-neighbour approach.In both cases, errors due to the displacement between satellite and ground data are neglected.Figure 10 shows an example of an IR-based product image (PR-OBS-4 detecting a rainfall nucleus over Belgium), together with the corresponding upscaled ground data.

Temporal matching
Once the ground reference rain maps are obtained and remapped onto the proper satellite grid, temporal matching is needed in order to compute the statistical indicators.All the satellite products (apart from PR-OBS-5) have to be intended as "instantaneous" measures: the satellite sensors measure the radiance upwelling from the actual IFOV in a very short time, thus the rain rate inferred by the estimation techniques has to be referred properly to the exact time of observation.
The validation using rain gauges forces us to compare such instantaneous measures with time-integrated measures, over different time intervals (see Table 2).For PR-OBS-1 and PR-OBS-2, each overpass is compared with the rain gauge map cumulated over the time interval that contains the satellite overpass time.PR-OBS-3 and PR-OBS-4, based on geostationary IR data, provide more instantaneous estimates each hour (four data files for PR-OBS-3 and two for PR-OBS-4): in this case, an hourly cumulated value is estimated by averaging the measurements within the validation hour, and it is compared with the corresponding rain gauge value.Since the nominal acquisition time of the SEVIRI sensor is at 12, 27, 42 and 57 min of each hour, a weighted average of the five slots is performed to compute hourly cumulated PR-OBS-3 rain amounts.PR-OBS-5 provides cumulated precipitation and is matched with the rain gauge values over the cumulation intervals which correspond in time.For radar validation an image every 5 min (sometimes 10 or 15 min) is normally available.Thus, every satellite instantaneous product is compared with the closest-in-time upscaled radar image, while the cumulated PR-OBS-5 product is validated using cumulated radar products (in some cases gauge-adjusted) having the same cumulation time, and referring to the same time span.

Statistical scores and case studies
Once the ground data are up-scaled onto satellite grids and the temporal matching is applied, the validation is performed on satellite ground data pairs.The statistical scores are evaluated on a monthly basis for "land", "sea" and "coast" pixels in each country of the PPVG.
Precipitation below 0.25 mm h −1 for rain intensity products and 1 mm for cumulated rainfall products is classified as no-rain.For the measurements above this threshold, precipitation classes are introduced.Three precipitation classes (Table 5) are defined for instantaneous rain rate products, five precipitation classes for cumulated products (Table 6).
Moreover, rain rate probability distribution functions are computed on a monthly basis to evaluate the capability of the satellite products to describe the range of precipitation rates.
Each institute calculates statistics over its country area following the common validation procedure.Overall statistics for the entire H-SAF area are calculated by the DPC, as coordinating institute, using the up-scaled ground data and statistical scores provided by the participating members.Some examples of the validation results are reported in Sect.7.
Each institute, in addition to the common validation methodology, developed a more specific validation methodology based on the local knowledge and experience.This activity is focused on case study analysis.Each institute decides whether to use ancillary data such as lightning data, SEVIRI images, the output of numerical weather prediction and nowcasting products.
The main steps for the case studies are: -description of the meteorological event, -comparison between ground data and satellite products, www.nat-hazards-earth-syst-sci.net/14/871/2014/Nat.Hazards Earth Syst.Sci., 14, 871-889, 2014 -preparation of the ground data for satellite product developers.
The case study analysis highlights the behaviour of the satellite products in specific situations (convective or stratiform precipitation, snow over land, coastal effects, etc.), providing useful support to the developers for further improvements to the algorithms.
Examples of continuous and multi-categorical statistical scores evaluated for one year of data are reported in the following section.

Validation results
The analysis presented hereafter was performed on one year of data (July 2011-June 2012), aggregated at the seasonal and annual scale, and focuses on PR-OBS-1, PR-OBS-2 and PR-OBS-3.The seasonal aggregation is done as follows: July-August (summer 2011), September-October-November (autumn 2011), December-January-February (winter 2011-2012), March-April-May (spring 2012) and June (summer 2012).The continuous statistical indicators are computed only over the IFOV where at least one rain value (satellite product or reference field) is > 0.25 mm h −1 , to avoid the contribution of the dominant amount of zero-zero samples.
The validation results of the PR-OBS-1 product show a yearly RMSE ≈ 2.1 mm h −1 and MAE = 1.2 mm h −1 obtained in comparison with both radar and rain gauge data (Table 7).There is an overall tendency to overestimate the radar (ME = 0.35 mm h −1 ) and to underestimate the rain gauge rates (ME = -0.28mm h −1 ) at the European scale.
Similar results are obtained for PR-OBS-2, based on AMSU-A and MHS data (Table 8).Yearly statistical scores show a better agreement with reference rain rates: RMSE = 1.2 mm h −1 (using radar as a ground reference) and 1.6 mm h −1 (using rain gauges as a ground reference), and MAE = 0.7 mm h −1 (radar) and 1 mm h −1 (rain gauges).In this case, an underestimation with respect to both radar and rain gauge precipitation fields is observed (ME < 0).MWbased products reached the best performances during the winter period, meaning that the cold atmosphere and the frozen surfaces did not affect the product performance significantly, and the filters introduced in the algorithms to discriminate snow-covered surfaces are working properly.A further reason could be the higher rain rates and larger variability of rain patterns found during summer: this makes the error indicators grow more rapidly than in cold seasons, when lighter rain rates and less variability of precipitation intensity occur.
The overall seasonal tendency is confirmed by the countries' statistical evaluation.Figure 13 shows that the worse 53 Fig. 10: Satellite (PR-OBS-4 on the left) and radar (on the right) images observed at 01:00 UTC on May 26, 2009.The radar image on the right is the result of the up-scaling of the Wideumont, Belgium, radar data onto PR-OBS-4 grid.It is possible to observe here that the PR-OBS-4 product was able to detect the main precipitation zone.However, the area with the high precipitation rates appears to be shifted to the north-east, surrounded by a more extended precipitation zone than in the radar case.The small and big circles indicate the area respectively with a radius of 160Km and 240 Km. here that the PR-OBS-4 product was able to detect the main precipitation zone.However, the area with the high precipitation rates appears to be shifted to the northeast, surrounded by a more extended precipitation zone than in the radar case.The small and big circles indicate the area respectively with a radius of 160 km and 240 km.
results, in term of RMSE, are obtained in summer over all the countries using both radar and rain gauge as references.Similar behaviour is observed for different statistical indices: ME, MAE, SD and MB, for both PR-OBS-1 and PR-OBS-2.
Multi-categorical statistics were also performed on the same validation period, both with radar and rain gauge data.Contingency tables are obtained by dividing precipitation events into four classes, as reported in Table 5.The tables classify in each column the events detected by the radar/rain gauges falling into each class, while each row reports the rain rate classification of the satellite product.The percentages shown in a given column are computed with respect to the total number of satellite samples and represent how the satellite product classifies the events assigned to that class by the radar/rain gauges.The ideal condition should be 100 % of events in the main diagonal of the table.
Rain intensity distribution in the contingency table demonstrates that both algorithms are able to discriminate rain from no-rain events.More than 90 % (91-94 %) of no-rain events are correctly identified by PR-OBS-1 (tables 9 and 10) and 97 % by PR-OBS-2 (tables 11 and 12).However, the percentages are also very high in the other cells of the first row in all the tables, indicating that a large number of rain pixels are missed by the satellite products.Both satellite products tend to underestimate rain rate classes, especially when compared with rain gauges.PR-OBS-2 seems to resolve low intensity classes better, with higher percentages in the first two cells of the main diagonal, while PR-OBS-1 is more effective in classifying higher rain rate classes.
The main statistical scores were also evaluated for the combined IR/MW product PR-OBS-3.In Table 13, seasonal and annual values of the considered continuous indicators are reported for PR-OBS-3, compared with radar and rain gauge precipitation estimations.As for the MW products, the better performances are obtained for cold months and the analysis of the ME and the MB confirms the general rain intensity underestimation already highlighted for PR-OBS-1 (when referred to rain gauges) and generally for PR-OBS-2.Rain rate value distribution within the contingency tables for PR-OBS-3 (see tables 14 and 15) demonstrates the ability of the product to discriminate rain/no-rain conditions comparable to that of the MW products, and the underestimation problem is still evident.
The overall tendency is that MW-based products show better scores than IR/MW-based products: it means that the MW information is not always correctly maintained by the blended algorithm, especially during time periods not covered by MW sensor overpasses.
In comparing these results, one has also to remember that all comparisons are performed with respect to the native satellite grids, and the IFOV size is very different between MW-based products and the combined IR/MW one.Thus, ground data are treated in very different ways (Gaussian upscaling for PR-OBS-1 and PR-OBS-2, simple up-scaling or nearest-neighbour for PR-OBS-3).Finally, note that continuous statistical scores evaluated for PR-OBS-3 using rain gauge data as a reference are better than MW-based products.This could be an effect of the hourly precipitation integration adopted to validate PR-OBS-3 with rain gauge data differently from MW-based products.

Conclusions and future plans
This paper documents the efforts of a group of precipitation experts belonging to eight European countries to work together in setting up an unprecedented continental-scale validation exercise, aiming to assess the error structure of the instantaneous and cumulated satellite precipitation products generated by H-SAF.The PPVG relies on about 4100  rain gauges and 59 meteorological radars to derive reference ground data for the validation, and carries out all the steps of an agreed upon validation procedure, from the ground data pre-processing to the final computation of the error indicators.
Since 2007 monthly statistical scores (continuous and multi-categorical) have been regularly evaluated for all the satellite precipitation products over land, sea, and coastal areas following a common validation methodology applied to the space and time resolution of the satellite products.Each year more than 450 000 satellite-ground data pairs for PR-OBS-1 and PR-OBS-2, and nearly 100 000 000 data pairs for PR-OBS-3, are processed for product evaluation, and around twenty/thirty case studies representing the main meteorological events which have crossed the European area are analysed using different ground data, satellite products, lightning detection and numerical models.
Moreover, a ground data service within the project was set up by the PPVG: radar and rain gauge data, up-scaled onto satellite native grids, are available to developers for special testing and possible calibration of new product versions.
Since the beginning of the project, the first objective of the PPVG has been to perform the validation activities in order to highlight the main characteristics (weaknesses and strengths) of the satellite products, and to give useful feedback to the precipitation product developers.The intense collaboration between the PPVG and the developers has led to a parallel improvement in the validation methodology and satellite precipitation product performance.Examples of validation results are presented in this paper, highlighting the general characteristics of the products in terms of seasonal behaviour and the product capability in classifying precipitation rates correctly.The second objective, introduced during CDOP-1, the operational phase of the project (started in 2010), is to implement a validation service working on the statistics of the previous month.The efforts undertaken for this goal will result in the delivery of an improved validation common code that, by ingesting the raw ground reference data, performs all the steps of the validation procedure, including the ground data quality index evaluation, leading to the calculation of statistical indicators.The improved common code will also include the interpolation tool (for rain gauge data) presented in Sect.4.2, and the quality map of the ground data (for both radar and rain gauges), to be used to bring across the validation results better.
It is foreseen that during CDOP-2, with the generation of satellite products for the MSG full disk and collaboration with international groups and programmes such as the International Precipitation Working Group (IPWG) and the Global Precipitation Measurement (GPM) mission, PPVG activity will be extended to available sites in Africa with experts from non-European countries (in particular, from Africa and the Americas).By the same token, during CDOP-2 the H-SAF validation infrastructure will be used for validation and quality assessment of precipitation products developed by or shared with other SAFs, such as Climate Monitoring (CM-SAF).The PPVG also plans to use data from satellite-borne radars (from the TRMM precipitation radar over the African area and then from the dual-frequency precipitation radar onboard the GPM core satellite over both the H-SAF and African areas) as a reference for the validation activity.
An intercomparison study of H-SAF MW and combined MW/IR precipitation products with TRMM products and ground measurement was recently started, with the objective Nat.Hazards Earth Syst.Sci., 14, 871-889, 2014 www.nat-hazards-earth-syst-sci.net/14/871/2014/ Fig.1: H-SAF radar (upper) network is composed by the national radar networks of Belgium, Germany, Hungary, Italy, Poland, Slovakia and Turkey, and H-SAF rain gauge network (lower) is composed by the national rain gauge networks of Belgium, Bulgaria, Germany, Italy, Poland, Slovakia and Turkey (maps by M. Barbani, DPC).

Fig. 1 .
Fig. 1.H-SAF radar (upper) network is composed of the national radar networks of Belgium, Germany, Hungary, Italy, Poland, Slovakia and Turkey, and the H-SAF rain gauge network (lower) is composed of the national rain gauge networks of Belgium, Bulgaria, Germany, Italy, Poland, Slovakia and Turkey (maps by M. Barbani, DPC).

-
point-like measurement (rain gauge) spatial interpolation, -ground data up-scaling onto satellite native grids, -temporal comparison between precipitation products, -statistical score (continuous and multi-categorical) computation and evaluation, -case study analysis.
Fig. 2. Correlation coefficient between raingauge pairs as function of the distances between the raingauges.Colours refer to the months of the year 2009.

Fig. 2 .
Fig. 2. Correlation coefficient between rain gauge pairs as a function of the distances between the rain gauges.Colours refer to the months of the year 2009.

Fig. 3 .
Fig. 3. Comparison between statistical scores of the ability to reconstruct the original rain gauge measurement field for the different interpolation techniques: Barnes, Kriging and GRISO.Scores presented here are: POD (left), FAR (centre), and RMSE (right).

Fig. 7 .
Fig. 7. Reflectivity measured by radar "Il Monte", on 21 June 2009, at 14:00 UTC, elevation 0.4 (left), and overall quality map associated with it (right).The dominant component in quality is the range distance.

Fig. 8 .
Fig. 8. Gaussian filter (left) -section of Gaussian filter (right).Ex and Ey represent the full width at half peak respectively in the x and y directions.

Fig. 9 .
Fig.9.Precipitation rate map of PR-OBS-1 (top), precipitation rate map from the Hungarian radar network up-scaled at the PR-OBS-1 grid resolution (centre), Hungarian radar map at its original resolution (bottom) of 11 June 2009 at 15:30 UTC.We can see that the radar rain intensities up-scaled onto satellite grids are smoothed and the convective cells are aggregated.The PR-OBS-1 detects the convective spots well, even though an intensity overestimation and false alarms are observed in the southeastern part.Note that the grey area is a no-data area not covered by the satellite path.

Fig. 10 .
Fig. 10.Satellite (PR-OBS-4 on the left) and radar (on the right) images observed at 01:00 UTC on 26 May 2009.The radar image on the right is the result of the up-scaling of the Wideumont, Belgium, radar data onto the PR-OBS-4 grid.It is possible to observehere that the PR-OBS-4 product was able to detect the main precipitation zone.However, the area with the high precipitation rates appears to be shifted to the northeast, surrounded by a more extended precipitation zone than in the radar case.The small and big circles indicate the area respectively with a radius of 160 km and 240 km.

Fig. 11 .
Fig. 11.The seasonal RMSE of PR-OBS-1 evaluated by the PPVG.The product reaches the best performances during the winter period in all the countries.

Table 1 .
H-SAF precipitation algorithms/products that have been validated by the PPVG.

Table 2 .
Summary of the rain gauge characteristics.Two rain gauge types are present: tipping bucket (TP) and weighting (W).Only 300 out of 2000 gauges are heated.* * Information not available at the moment; a value about 300 mm h −1 can be assumed for tipping bucket rain gauges.

Table 3 .
Number and density of rain gauges within the H-SAF validation group.
*The number of rain gauges could vary from day to day due to operational efficiency within a maximum range of 10-15 %. * * Only in the Wallonia region.* * * Only in three river basins.* * * * Only covering the western part of Anatolia.

Table 4 .
Characteristics of the national radar networks.

Table 6 .
Classes for cumulated rain.

Table 7 .
Continuous indicators for PR-OBS-1: NS (number of considered satellite product samples); NR (number of reference field samples); ME (mean error); SD (standard deviation of ME); MAE (mean absolute error); RMSE (root mean square error); MB (multiplicative bias).

Table 8 .
Continuous indicators for PR-OBS-2: NS (number of considered satellite product samples); NR (number of reference field samples); ME (mean error); SD (standard deviation of ME); MAE (mean absolute error); RMSE (root mean square error); MB (multiplicative bias).

Table 9 .
Contingency table for the multi-categorical statistics for PR-OBS-1 as compared with radar-derived rain fields.

Table 10 .
Contingency table for the multi-categorical statistics for PR-OBS-1 as compared with rain gauge-derived rain fields.

Table 11 .
Contingency table for the multi-categorical statistics for PR-OBS-2 as compared with radar-derived rain fields.

Table 12 .
Contingency table for the multi-categorical statistics for PR-OBS-2 as compared with rain gauge-derived rain fields.

Table 13 .
Continuous indicators for PR-OBS-3: NS (number of considered satellite product samples); NR (number of reference field samples); ME (mean error); SD (standard deviation of ME); MAE (mean absolute error); RMSE (root mean square error); MB (multiplicative bias).

Table 14 .
Contingency table for the multi-categorical statistics for PR-OBS-3 as compared with radar-derived rain fields.
of identifying a validation strategy for H-SAF precipitation products on the MSG full disk.