Modelling soil erosion at European scale : towards harmonization and reproducibility

Introduction Conclusions References


Introduction
Soil erosion by rainfall and runoff is one of the main soil threats in Europe (Boardman and Poesen, 2006).Soil erosion costs a surprisingly large amount (Pimentel et al., 1995;Crosson, 1995;Telles et al., 2011;García-Ruiz, 2010).In many regions soil erosion affects soil quality, reducing soil nutrient content, with a consequent increasing in food production costs (Pimentel, 2006;Lal, 1998).
The loss in productivity may be significant; the upper part of the soil, which is the most fertile layer, is also the most prone to erosion.Recent research also showed that nutrient and carbon cycling are significantly altered by mobilization and deposition of soil (Quinton et al., 2010).An eroded soil may lose 75-80 % of its carbon content, with consequent release of carbon to the atmosphere (Morgan, 2005).
Several authors consider soil erosion as a natural hazard (Rawat et al., 2011;Berger and Rey, 2004;Gares et al., 1994;Mather, 1982) and intrinsically entangled with many other natural hazards (Markantonis et al., 2012).Soil erosion is connected with muddy floods in rural areas (Verstraeten and Poesen, 1999).In the presence of pronounced soil erosion Published by Copernicus Publications on behalf of the European Geosciences Union.within a drainage basin, local floods can take the form of muddy floods, covering streets and even floors inside houses with a cover of mud (Verstraeten and Poesen, 1999).Soil erosion is also part of a system of multiple interacting processes operating in a complex hierarchy.Mass movements may play an important role within soil erosion processes due to their capacity to remove and expose large parts of slopes in a relatively short time (Van Beek, 2002;Bosco and Sander, 2014).At the same time, high soil erosion rates can also lead to an increase of landslide susceptibility (Pradhan et al., 2012) especially because of the reduced capacity of the soil to support a good vegetation cover.Landslides may also directly occur on gullies created by surface erosion processes (Pla Sentis, 1997).It is also true that concentrated surface runoff and gully erosion may occur within depressions which could also be old landslide scars (Valentin et al., 2005;Mazaeva et al., 2013).Forest fires can also significantly affect soil erosion (Terranova et al., 2009;Di Leo et al., 2013).Their effects on the catchment hydrology may cause an increasing of soil erosion by water (Vafeidis et al., 2007).Wildfires can totally remove the protective surface layers (litter and vegetation cover) and increase the hydrophobic condition at or below the ground surface with a consequent increase of the soil erosion rate (Di Leo et al., 2013).Under these circumstances, the mentioned natural hazards might be interlinked with soil erosion rates even close to the maximum potential erosion (i.e. the erosion where the bare soil is exposed and in the absence of mitigating management practices -see also Sect. 2.3).
Soil erosion can also cause water pollution and siltation, loss of organic matter and reduction in water-holding capacity (Boardman and Poesen, 2006).The protection of soil resources has therefore been recognized as an important objective of environmental policy (CEC, 2006); this requires a correct assessment of erosion rates and their geographical distribution.
It is impractical to measure soil loss across whole landscapes.Directly measuring soil erosion by water across large areas using experimental plots, soil erosion markers (e.g.Caesium 137) or sampling river sediment load is technically and logistically difficult and financially expensive.Also, techniques such as remote sensing present some limits when applied at regional scale.The cost and availability of remotely sensed data on large areas with adequate resolution and frequency (especially on arable land where spectral patterns are extremely time-dependant) constitute a limit in applying this technique (Boardman, 2007).Furthermore, a standardized operational erosion assessment using satellite data is hampered by the complexity of soil erosion processes and the regional differences (Vrieling, 2006).
Therefore, research is needed to improve methods for estimating soil erosion rates using modelling approaches, upon which mitigation strategies can be assessed and implemented.Several models exist to predict soil erosion rates by water.These models differ greatly in terms of complexity, inputs, spatial and temporal scale.Heterogeneity of the models also affects the processes they represent, the manner in which these processes are represented and the types of output information they provide (de Vente et al., 2013).Many efforts have been made to describe soil erosion processes within models so as to achieve a better predictability and a more effective identification of the involved parameters.
Although models for assessing erosion on large areas have been developed (e.g.Gobin et al., 2004;Kirkby et al., 2008), required input data with sufficient accuracy may not always be available for large spatial extents (Jones et al., 2003) and applications outside the spatial domain in which erosion models have been tested could be problematic (Favis-Mortlock, 1998).It is recognized that the data needed to drive these models are scarce or do not exist even for the less demanding empirical models, if applied at regional scale.There is thus a need to integrate already available research methods with new generalization and integration techniques for better predicting the rate and extent of soil erosion at wider scales.
This paper presents an improved approach for modelling soil erosion at regional scale.The semantic array programming paradigm (de Rigo, 2012a, b) has recently been applied in integrated environmental modelling to support scalable generalization techniques (Bosco et al., 2013;Estreguil et al., 2014).This paradigm and a strong effort toward computational reproducibility (Bosco et al., 2014(Bosco et al., , 2011b) ) are at the basis of the proposed modelling architecture.An extended version of the Revised Universal Soil Loss Equation (RUSLE), Renard et al. (1997) will be implemented.The model is applied to predict soil erosion rates by water in Europe by exclusively relying on publicly available wide-scale data sets along with a series of locally reliable empirical relationships.

Physically based and empirical models
Distinct modelling approaches can lead to significantly different soil erosion rates even when the same model is applied within the same region (Shen et al., 2009).The way the model is implemented (i.e. with the selection of different algorithms when available), the use of data sets with different resolution or accuracy (Merritt et al., 2003) and the provenance of a given data set (Buneman et al., 2000;Simmhan et al., 2005) can play a key role on the output.When incomplete or missing, these pieces of information may affect the assessment of the actual accuracy of data to be used as input, therefore weakening -or in some circumstances even compromising -the application of theoretically accurate models (Saltelli et al., 2010).
While physically based models can in principle offer scientifically sound methods for deriving soil erosion rates from a plethora of detailed input data, their practical suitability for regional-or continental-scale assessment is controversial (Bras et al., 2003).The enormous gap between the type and accuracy of the needed input parameters and the actual avail-ability of harmonized, verifiable large-scale data sets limits the applicability of such models (Stroosnijder, 2005).In theory, when working with physically based models, possibly all the requested parameters are measurable and then could be considered as "known".In practice, often the parameters have to be calibrated against observed data (Beck et al., 1995;Wheater et al., 1993).This calibration adds nonnegligible uncertainty in the parameters' values.The heterogeneity, variability and uncertainty associated with input parameter values and their interpolation in spatial or temporal domains outside the observed ones should be considered as key factors (Saltelli et al., 2010;Jetten et al., 2003) which may partially explain why lumped regression-based models can perform better than more complex physically based models (Bosco et al., 2013;de Vente et al., 2013).
If at watershed scale a trend is observed (Daniel et al., 2011) to complement or replace physically based models with machine-learning techniques (which are advanced empirical modelling techniques), at regional scale the adaptation of widely adopted empirical models and their improvement with the same techniques could play a meaningful role.Regional-scale approximations with robust empirical modelling could provide useful -even if necessarily less accurate -support for risk assessors involved in decision-making processes over wide spatial extents.The main limit of such approach is that empirical models do not necessarily model the right process and should only be used for the range of conditions they were developed for (Hessel, 2002;de Vente et al., 2013).
Computational science is emerging as one of the central topics within environmental modelling (Casagrandi and Guariso, 2009).To overcome the above problems, reproducible computational methods based on free software and data are increasingly needed (Stallman, 2005(Stallman, , 2009;;Peng, 2011).This may also help to reuse in a controlled way empirical equations for compensating the lack of detailed data.

Data set
Electronic archives are an important data source for the scientific community.The added value and criteria for the selection of electronic archives are the accessibility of large volumes of data, their spatial coverage, their ability to preserve and harmonize historical data (Panagos et al., 2011) and often their free availability.
E-OBS is a European daily gridded observational data set for precipitation and air temperature that covers the period 1950-2012.E-OBS is based on the largest available pan-European precipitation data set, and its interpolation methods were chosen after careful evaluation of a number of alternatives.The gridded data are delivered on four spatial resolutions and for the presented activities the 0.25 • regular latlong grid resolution has been used.Another added value of the E-OBS data set is the daily estimates of interpolation uncertainty, provided as standard error.
Information on soil surface texture and rock fragment cover were derived from SGDBE and HWSD.The 1 : 1 000 000 SGDBE data set contains a list of Soil Typological Units (STU) representing nature and properties of European soils.The STUs are grouped in Soil Mapping Units (SMU) to form soil associations because of the difficulty in delineating the STUs at the database scale.HWSD is a 30 arcsec raster database containing over 16 000 soil mapping units.It combines soil information worldwide.
The CLC database provides consistent localized geographical information on land cover in Europe.The 1 : 100 000 database, having a minimum mapping unit (MMU) of 25 ha and minimum width of linear elements of 100 m (EEA, 2007) has been used in the 100 m gridded version (CLC 2006 version 15).The database version for the year 2006 (CLC2006) has been produced by integrating the data on land cover changes for the period 2000-2006 with the map of land cover for the year 2000 (CLC2000).
The SRTM is still probably the most complete and reliable freely available high-resolution digital topographic database of the Earth.Interferometric radar data were acquired and processed to digital topographic data at 1 arcsec resolution.The SRTM is available as 90 × 90 m gridded data set.In this study, version 4 of the database is used.

Modelling architecture
A modelling architecture based on an extended version of the RUSLE (e-RUSLE) is here proposed and applied to evaluate soil erosion by water at regional scale.The RUSLE model (derived from USLE, Wischmeier and Smith, 1978) has been selected because of its flexibility and low data demand, compared with other models.Furthermore, the family of models based on the USLE provides long-term average soil loss estimates and has been applied all over the world in different environments and various climatic conditions (e.g.Kinnell, 2010;Lu et al., 2004;Angima et al., 2003;Bosco et al., 2009).We have already discussed the (still open) challenges to obtain an appropriate parametrization for physically based models to run at the continental scale.Their failure to produce better results than achieved using the USLE/RUSLE family of models (Tiwari et al., 2000) encourages the use of the USLE/RUSLE model in applications for which it was not designed (Kinnell, 2010).Although even the e-RUSLE parametrization is not trivial, in particular considering the spatial heterogeneity to tackle in a pan-European application, the e-RUSLE array-based structure will be shown to offer a potential strategy to integrate uneven but locally accurate spatial information on key quantities -here illustrated for the case of erosivity.
The proposed architecture inherits from the RUSLE the ability to be easily linked to other related natural resources.For example, some effects of forest resources and generally of the vegetation component within land cover are straightforward to assess (de Rigo and Bosco, 2011).Furthermore, approximated rapid assessments of the impact of disturbances (e.g.wildfires, de Rigo et al., 2013a;Di Leo et al., 2013) may be performed by exploiting the RUSLE modular architecture which easily allows potential and actual erosion rates to be estimated for different environmental conditions by simply considering different arrays of layers.The further seamless integration of multiple estimates in the erosivity component (see Sect. 2.3.1)supports a more robust adaptability to the heterogeneous conditions which are typical at continental scale.This flexibility exemplifies the potential of e-RUSLE within integrated natural resources modelling and management (INRMM, de Rigo, 2012c).
The RUSLE retains from the USLE (based on empirical correlation) some limitations.Within the model there are no factors directly representing physical processes (i.e.runoff, infiltration).The RUSLE only predicts soil losses caused by sheet and rill erosion, not by (ephemeral) gully erosion.Another fundamental lack is linked to the absence of sediment deposition estimation that could lead to overestimation in soil erosion rates.However, the USLE multiplicative structure (Ferro, 2010) is well suited for transforming the modelled quantities into other correlated ones by simply adding custom factors.As an example, for overcoming the absence of sediment deposition calculation, Mitasova et al. (1996) replaced the LS (slope length) factor with a new index considering the spatial distribution of areas with topographic potential for soil erosion and sediment deposition.
The e-RUSLE preserves the structure of the RUSLE adding to its array of multiplicative factors one more factor for better considering the effect of stoniness on soil erodibility.An array of local estimations of rain erosivity (here based on empirical equations) has also been introduced to mitigate the extrapolation uncertainty associated with each single equation.The array-based estimation of erosivity is proposed to be an ensemble of multiple estimations from partly independent modules (empirical equations) aggregated by a similarity analysis so as to also increase the design diversity (de Rigo, 2013).
The model has been applied at a 1 km resolution for the whole of Europe.The resolution depends on data availability.Due to the use of the 1 : 1 000 000 ESGDB database for the calculation of the soil erodibility factor, it was not possible create a map having a better resolution.
The methodology relies on the paradigm of Semantic Array Programming (de Rigo, 2012a, b) which allows the multi-dimensional structure of the mathematical and computational model to be explicitly and concisely exploited.This is achieved by semantically enhancing the chain of involved data-transformation modelling (D-TM) modules so as to better focus on a compact, modular integration of the arrays of data and geospatial layers.

Applying the semantic array programming paradigm
Although the impact of the computational aspects in environmental modelling is steadily growing (Casagrandi and Guariso, 2009), they are not rarely undervalued (Merali, 2010) and the mitigation of the software-driven component of uncertainty in complex modelling might surprisingly be understated while focusing on more traditional sources of uncertainty (Cerf, 2012;de Rigo, 2013).
Part of the complication in computational models (affecting even their maintainability and readiness to constantly evolve) may be mitigated (McGregor, 2006).Compared to other computational approaches, array programming (AP) understands large arrays of data as if they were a single logical piece of information.For example, a continental-scale gridded layer may be managed by AP languages as if it were a single variable instead of a large matrix of elements.As a consequence, a disciplined use of AP (Iverson, 1980) may allow nontrivial data-processing to be expressed with terse expressions (Taylor, 2003) within a simpler control flow.
However, the powerful abstraction and conciseness of AP -without the additional use of a disciplined semantics-aware implementation -might offer a weak support for checking the correctness of some extremely terse algorithms, when used with a particular set of data (de Rigo, 2012b).This is why the e-RUSLE computational modelling methodology (de Rigo and Bosco, 2011; Bosco and de Rigo, 2013) followed the paradigm of Semantic Array Programming (SemAP, de Rigo, 2012a, b) by combining concise implementation of the model with its conceptual subdivision in semantically enhanced abstract modules.
SemAP is an extension of array programming (Iverson, 1980).It inherits from AP its programmatic goal: AP originated to ease and improve the translation of mathematical notation as algorithmic implementation.As Iverson (1980) suggested, "the advantages of executability and universality found in programming languages can be effectively combined, in a single coherent language, with the advantages offered by mathematical notation".This may be achieved by describing a model with a precise vector-based mathematical notation, which may ease the implementation of complex algorithms by also moving the "mathematical reasoning directly into the source code, where the mathematical description is actually expressed in a completely formalized and reproducible way" (de Rigo, 2012a).
It is worthy recalling two main aspects which characterize SemAP as a specialization of AP: -the modularization of sub-models and autonomous tasks, paying attention to their concise generalization and the potential reusability in other contexts; -the use of terse array-based constraints (SemAP semantic checks, de Rigo, 2012d) to emphasize the focus on the coherent flow of the information and data among modules -which are often nontrivial in computational science.The SemAP semantic constraints natively apply to AP variables irrespective of their size (e.g.large arrays such as continental-scale geospatial layers).The semantic coherence of the information entered in and returned by each module is checked locally instead of relying on external assumptions.This may be essential especially when different modules rely on different expertise.
This way, even the essential implementation details within each module (for example, the implementation of the erosivity layer in the e-RUSLE as a climatic-driven composition of an array of local empirical relationships) may be at least partially decoupled from the overall modelling architecture.Ideally, atomic modules might easily be replaced by more complex compositions of arrays of sub-modules and data, without implying a major change in the modelling architecture.For example, the same methodology exploited for the erosivity layer was also exploited in Bosco et al. (2013) for estimating landslide susceptibility.The effort towards increasing the reproducibility in soil erosion modelling (de Rigo and Bosco, 2011;Bosco et al., 2011a, b) focused our attention mainly on the use of public available databases, free scientific software tools and libraries and we introduced some reproducible techniques in applying the model and its submodels.An innovative ensemble model based on climatic similarity (de Rigo et al., 2013b;Bosco et al., 2013) is applied to estimate rain erosivity from multiple available empirical relationships.
SemAP array-based semantic constraints (de Rigo, 2012d) have been exploited in the model implementation.Some of them are exemplified hereinafter as active links ::constraint:: 1 . 1 The mathematical notation ::constraint:: refers to the online taxonomy of array-based semantic constraints which defines the Semantic Array Programming paradigm (http://mastrave.org/doc/mtv_m/check_is, de Rigo, 2012d).

Extended RUSLE model (e-RUSLE)
The RUSLE model is designed to predict only soil loss by sheet and rill erosion.Sediment deposition processes or concentrated overland flow erosion (ephemeral gully erosion) are not considered in the equation.The model uses different factors representing the effect of topography, land cover, climatic erosivity, management practice and soil erodibility for modelling soil erosion.The RUSLE uses a system of mathematical equations for computing soil erosion.The e-RUSLE retains all the equations of his predecessor implementing an extra factor for the computation of soil stoniness interaction.
The basic equation of the extended RUSLE is as follows: Given the multiplicative structure, all layers are expected to be defined in a given grid cell c without missing values (::nanless::3 ) in order for the soil loss to be computable in c.

Rainfall erosivity factor
The intensity of rainfall is one of the main factors driving soil water erosion processes.The Rainfall Erosivity Factor (R) is a measure of precipitation's erosivity.Wischmeier (1959) identified a composite parameter, EI30, as the best indicator of rain erosivity.
The rainfall erosivity factor has been implemented in numerous soil erosion models.AGNPS (Young et al., 1989), WATEM/SEDEM (Van Rompaey et al., 2001), USPED (Mitasova et al., 1996), SEMMED (de Jong et al., 1999) and MMF (Morgan et al., 1984) all implement the R factor as the relationship between rainfall energy and intensity developed by Wischmeier and Smith (1978).The rainfall erosivity factor has been widely applied all over the world and it is considered as an important factor for soil erosion assessments under climate change scenarios (e.g.changes in rainfall distribution and intensity, Diodato, 2004a).Despite its frequent use, it retains some limitations.The main weakness of the R factor is in not explicitly considering runoff and this highly influences the capacity of the model to account for event erosion C. Bosco et al.: Modelling soil erosion at European scale (Kinnell, 2010) and seasonal effects.Within the R factor an empirical relationship tends to exist between runoff amount and E, and between peak runoff rate and I30.However, the EI30 is not able to deal with the effect of runoff on soil loss at all well at some locations (Foster et al., 1982;Kinnell, 2010).In some cases the RUSLE estimates average event soil losses reasonably well, but in other cases, particularly when soils have a low runoff coefficient, it overestimates low event soil losses and underestimates high event soil losses (Nearing, 1998).Arguably, the failure to include direct consideration of runoff in the R factor is to a large extent responsible for this result (Kinnell, 2010).
The scarcity of accurate data sets for assessing soil water erosion rates at regional scale motivated the introduction of a climatic-based ensemble model to estimate rainfall erosivity.The climatic layers have been computed using GNU R (R Development Core Team, 2011) and GNU Octave (Eaton et al., 2008) with the Mastrave modelling library (de Rigo, 2012a, b).The ensemble is an unsupervised datatransformation model applied to climatic data to reconstruct rain erosivity.
The R factor has been computed using the E-OBS database as data source.Seven empirical equations have been selected from the literature in order for the erosivity to be correlated with climatic information.Spatially distributed climatic information (such as average annual precipitation, Fournier modified index, monthly rainfall for days with ≥ 10.0 mm (see Table 1b) has been computed from the daily reconstructed (E-OBS) patterns of precipitation in Europe (years 1980-2009).
The selected equations (Table 1c) refer to climate-erosivity regressions which have been validated in four geographical areas.Many other elementary relationships exist between climate and erosivity.The selected ones fulfil a series of expertbased criteria such as their reproducibility using the available data sets, the quality of literature and case studies, the climatic coverage of large European regions and the good regression performance validated in their spatial extent.
As an example, although linear in the parameters' regression, the empirical approach (Eq.R d1 , region A 3 in Table 1c) proposed by de Santos Loureiro and de Azevedo Coutinho (2001) received wide acceptance (Onyando et al., 2005;Taveira-Pinto et al., 2009;Ranzi et al., 2012).The relationship has been tested in Italy (Diodato, 2004b) where it provided estimates more stable than the ones provided by other widely used empirical equations (in the limited validation set, the estimates of R d1 did not show rank reversals when compared to the measured erosivity).The relationship with the original parameters has been used also in Spain (López-Vicente et al., 2008).
The rationale for not limiting the estimation of the R factor to the use of one preferred regression-based equation lies on the strengths and limitations that the empirical nature of those simplified equations have in different geographic and climatic conditions.The input variables selected for each em-pirical equation proved valuable in describing the local distribution of rain erosivity within that equation's spatial domain.While extrapolating the validity of the selected equations may be reasonable in order to cover other climatically similar areas, European regions with different climate could behave quite differently.For these regions, not only the equation's parameters but even the equation's input variables might poorly correlate with the observed soil erosion patterns.This justifies the use of multiple empirical equation families with parametrizations covering the climatic diversity of the European regions (Table 1c).The selected equations allow a pan-European reconstruction of the estimated rain erosivity to be carried on with the E-OBS data of daily precipitation.
The required integration exploited the array structure of the aforementioned quantities (semantic array programming).In particular, the array of regressors (R i , seven dimensions, Table 1c) and corresponding validated areas (A i , four dimensions, Table 1c), as well as the array of covariates (C j , 26 dimensions, Table 1b) have been used.A possible modelling strategy could have been to cluster the pan-European spatial extent A EU in climatic polygons (subsets) associated with corresponding empirical equations.The clustering could have been performed by extrapolating a given equation R i up to cover the polygon whose climate is closer to the climate observed in the validated area A i .However, this approach would have generated unnecessary artefacts along the polygon boundaries.Moreover, within each polygon only one estimate would have been exploited (one-to-one approach).
The proposed strategy considers each estimate R i as covering the whole of Europe with a spatially varying degree of reliability (many-to-one approach).This qualitative reliability may be described as a fuzzy set-membership and is based on the Relative Distance Similarity (RDS) algorithm as implemented by the Mastrave modelling library and is applied for each equation R i to compare the climatic spatial information of each cell in A EU with the corresponding values in A i .The RDS index has been successfully used in environmental fuzzy ensemble applications (de Rigo et al., 2013b;Bosco et al., 2013).It defines the relative distance between two values C j 1 and C j 2 of a given nonnegative covariate.The relative distance is a dimensionless ::possibility::4 between 0 (maximum dissimilarity) and 1 (maximum similarity) and is simply the ratio between the minimum and the maximum value: ).The behaviour of each empirical equation outside its definition domain was also assessed to prevent meaningless out-of-range values to degrade the ensemble estimation.Therefore, for both the inputs (covariates) and the output (erosivity estimates) of the regressors R i the RDS index has been computed and then aggregated cautiously considering the minimum index.This may be defined Fournier-Ferro index Country/region ISO3166 Definition Reference

Bollinne et al. (1979)
A 2 Bavaria, DE-BY A 4 Sicily, Italy IT-82 Ferro et al. (1999) here as where δC j is half of the measurement accuracy of the covariates and δR i is the half of the tolerance of the erosivity estimates.The is a statistical operator with which the relative distances along each dimension of the covariates are aggregated in the RDS index.Among the many possibilities, a simple median has been selected here.Median is also a typical robust statistical operator frequently used for ensem-ble models.The weighted median (de Rigo, 2012c) of the seven empirical models has here been used (using RDS i c as weights) for generating the final R factor layer.

Soil erodibility factor
The soil erodibility factor (K) "represents the effects of soil properties and soil profile characteristics on soil loss" (Renard et al., 1997).The K factor has been largely used within soil erosion models (e.g.PERFECT (Littleboy et al., 1992), AGNPS and USPED) and it is usually determined experimentally using runoff plots.For predicting the K factor information on soil properties is needed.Since the availability of such data is limited at the European scale, we applied a simplified equation calibrated on a world-wide data set of measured K values (Römkens et al., 1986;Renard et al., 1997) C. Bosco et al.: Modelling soil erosion at European scale using the percentage (::proportion::5 ) of sand, silt and clay present within the HWSD and ESGDB data sets.For the volcanic soils the value proposed by van der Knijff et al. (2000) was assigned.Further details on the implementation of this factor and the resulting map are available in Bosco and de Rigo (2013).
Rock fragments have a major effect on soil erosion rates as they alter soil properties such as water-holding capacity, soil erodibility, rooting volume or bulk density, influencing the hydrological response of a soil as well as its degradation and productivity (Poesen and Lavee, 1994).
In estimating the K factor only the effects of rock fragment within the soil profile are considered.The presence of rock fragments at the soil surface and within the soil profile require special considerations that led to the introduction of the stoniness correction factor within our model (Sect.2.3.5).

Topographic factor
The effect of topography within the model is simply based on the ::angle::6 of slope.It is accounted for by the L and S factors.Either slope length or slope steepness substantially affect sheet and rill erosion estimated by the model.L and S factors have been determined using the same approach and equations applied by Bosco et al. (2008).
The S factor has been evaluated using Nearing's formula (Nearing, 1997) that provides more reliable results for steep slopes (up to 50 %).In a precautionary way, due to the lack of information regarding the capacity of the applied equation to predict the S factor on slopes over 50 %, the cells having larger slope gradient values were removed from the map.
The LS factor of the RUSLE model is, similarly to the R factor, present within the architecture of many different soil erosion models (AGNPS, PERFECT, MUSLE; Williams, 1975).Our approach is based on the formulation of Moore and Burch (1986) which considers, within the calculation process, the concept of specific catchment area accounting for flow convergence and divergence through this term of the equation (Moore et al., 1991;Mitasova and Brown, 2002).Specific catchment area (upslope contributing area per unit length of contour) is one of the most commonly used terrain attributes in hydrological modelling (Erskine et al., 2006) and it gives to the LS factor stronger physical basis making it suitable for soil erosion modelling.
For considering local limits capable to affect our approach, we also assumed surface runoff concentrating in less than 300 m.This value has been selected after analysing the available literature (Renard et al., 1997;Engel and Mohtar, 1999) and considering both the hyper-dissected characteristic of the majority of the territories in South and Central Europe and the more coarse dissection of Northern Europe.

Cover and management factor
Vegetative canopy, by changing the impact and intensity of rainfall, the resistance to water flow and the amount of water available for transporting the sediments, influences soil and water losses (Rousseva, 2003).The cover management (C) factor represents, as a dimensionless ::proportion:: ∈ [0, 100 %], the influence of terrain cover, cropping and management practices to mitigate soil erosion.Numerous soil erosion models implement the C factor in their architecture (e.g.AGNPS, ANSWERS (Beasley et al., 1980) or WA-TEM/SEDEM).The C factor represents the relation between the soil loss in certain agricultural or cover conditions and the erosion that would occur in an area under clean-tilled continuous fallow conditions (Renard et al., 1997).
The cover management factor is probably, among the different erosion factors, the most important (Morgan and Nearing, 2010).However aggressive the climate, whatever the slope or slope length, whatever the soil type, erosion will be slight or absent with a good soil cover.
The calculation of the C factor is difficult due to its dependence on many different parameters such as the land cover, the protection offered by the vegetative canopy cover, the impact of soil moisture on reduction of runoff from lowintensity rainfall or the reduction of soil erosion due to surface roughness.Due to the difficulties in processing all the necessary parameters on a wide scale, values from the literature have been used for calculating the cover management factor in Europe (Angeli et al., 2004;DeCaro, 2007;Diodato et al., 2011;Morgan, 2005;Šúri et al., 2002;Cebecauer and Hofierka, 2008;Wischmeier and Smith, 1978;Bazzoffi, 2007;Marker et al., 2008).The implementation details and related C factor map are available in Bosco and de Rigo (2013).The calculation of the cover management factor has been processed using the third level of the CLC database.Table 2 (from Bosco and de Rigo, 2013) summarizes the applied land cover management values, as derived from the CLC classes.

Stoniness correction factor
Soil stoniness is known to have a strong influence on erosion rates (Poesen et al., 1994).Rock fragments in the soil top layers affect soil water erosion processes in various ways, both directly and indirectly.
In recent years, there has been a growing interest in soils containing considerable amounts of rock fragments (Cerdan et al., 2010).These soils are widespread and in particular are present in the Mediterranean area where they can occupy more than 60 % of the land (Poesen and Lavee, 1994).
The RUSLE model considers stoniness indirectly within the K and the C factor.Regarding the K factor, as already mentioned, only the effects of rock fragments within the soil profile are considered.For the C factor stoniness is taken into account in calculating the Surface Cover Sub-factor.Poesen and Ingelmo-Sanchez (1992) describing the negative relation between rock fragment cover (Rc) and relative interrill sediment yield (IR), found a relation between these two parameters: where b is a coefficient indicating the effectiveness of the rock cover (Rc, ::proportion:: ∈ [0, 1]) in reducing interrill soil loss.
The authors found an experimental value for the coefficient b of 0.02 if the rock fragments are partly embedded in the sealed topsoil and a value of 0.04 if the fragments are placed on the soil surface.These values are close to values reported by Box (1981) and Collinet and Valentin (1984) ranging from 0.0256 to 0.058.Equation ( 3) was applied within the model for calculating the stoniness correction factor, in order to consider the negative effect of rock fragments on soil erosion by water.The coefficient b in Eq. ( 3) was set equal to 0.04 as proposed by Weltz et al. (1987) and Poesen et al. (1994).Unfortunately, detailed information on soil stoniness at the European scale is scarce.The only reliable information at European scale that can be used to derive the stoniness correction factor is the volumetric rock fragment content of the soils contained in the ESGDB database.The volumetric content percentage of rock fragments in the top soil and the cover percentage of rock fragments at the soil surface are two different parameters (Poesen and Lavee, 1994).As a first approximation and due to the limited available data on soil stoniness at the Euro-pean scale, we assumed that the rock fragment cover equals the volumetric rock fragment content as suggested in Govers et al. (2006).The values of volumetric rock fragment content of the soils were used for calculating the Rc parameter in Eq. (3).

Human practices factor
By definition, the P factor is the ratio of soil loss with a specific support practice to the corresponding loss with upslope and downslope tillage (Renard, 1997).It represents how surface and management practices like terracing, stripcropping or contouring affect erosion rates.For areas where there are no support practices or without any information on this aspect, the P factor is set equal to 1. Due to the spatial scale the model was applied and because of the consequent lack of information on land management practices, the P factor was considered everywhere equal to 1.

Plausibility check
The validation of soil erosion estimates at the continental scale is problematic.The common validation procedures are not technically and financially applicable for large spatial extents.Nonetheless, some validation options are still applicable.
To validate the 1 km 2 European map of soil erosion by water, we applied a qualitative method based on visual interpretation (see Fig. 1).The methodology is based on a visual and categorical comparison between modelled and observed erosion rates.Also, country-level aggregated statistics have been compared among the proposed results and pre-existing European erosion maps, providing a further plausibility check.
A procedure employing high-resolution Google Earth (Google Earth.Mountain View, CA: Google Inc., 2009) images and pictures as data for a plausibility check was applied.High-resolution data from Google Earth is a relatively slightly tapped new source of validation data for many research fields.The resolution of the images allows for a visual qualitative estimation of soil erosion phenomena.The coverage of high-resolution Google Earth images has rapidly increased during the last years.
By overlaying the soil erosion by water map of Europe and the selected points in Google Earth, a visual plausibility check, inspired on the erosion/deposition categories for field validation of Warren et al. (2005), and also of Berry et al. (2003) and Kapalanga (2008), was carried out.A buffer of 3 × 3 km 2 around 85 selected points was analysed by the authors, with over 700 cells at 1 km × 1 km.For each cell, a visual assessment relied on high-resolution images.Overall, more than 10 000 visual plausibility checks were made (see Fig. 1).A statistical overview of the results is shown in Fig. 2a and Table 3.The full results of the plausibility check, also containing the comments of the evaluators, are available in Bosco et al. (2014).The cumulated percentage of points with a quality equal to a given class (among five classes from very accurate to inaccurate) is shown.A bootstrap analysis (with 10 000 runs) was performed on the data in order to assess uncertainty.In Fig. 2a, boxplots are associated with each quality class, highlighting the quartiles and the whiskers with 5 and 95 % of occurrence.
The land cover appears as rarely very accurate.However, inaccuracy is mostly moderate.The inaccuracy may be explained by classification errors in the CLC 2006 and its temporal gap with the remote sensing and photographic informa-tion in Google Earth.The e-RUSLE estimates appear as very accurate for more than one in four cells.The quality decrease seems to be at least partially correlated with land cover uncertainty.Nevertheless, the percentage of land cover cells with at least moderately accurate information is higher than the corresponding percentage in erosion estimates.This highlights other nonnegligible sources of uncertainty in the e-RUSLE model.The plausibility check underlines that erosion in 63 % of the cells is accurate or very accurate (this is a conservative assessment including 95 % of bootstrap variability (BV); median is higher than 72 %) and for 83 % (95 % BV) is at least moderately accurate (the median value is 90 %).Completely inaccurate estimates appear as rare.
The model results were also compared with other regional soil erosion assessments carried out by local experts using similar or alternative methodologies (Fig. 2b).The soil erosion maps (Fig. 4) provided by different countries through the EIONET-SOIL network and an extensive data set compiled by Cerdan et al. (2010) were used.
EIONET is a partnership of the European Environment Agency (EEA) involving more than 350 national institutes and approximately 1000 experts.The aim of the network is to collect, organize and disseminate data related to the European environment.Most of the EIONET maps were calculated by applying soil erosion models (mainly the USLE or one of its derivatives) using readily available highly accurate national data sets.The estimates are not seamless between different countries due to the autonomous assessment at country level.The data related to Germany and Poland refer only to agricultural lands.Cerdan et al. (2010) compiled an extensive database of soil erosion rates based on erosion plots in Europe under natural rainfall.The data were gathered by the authors from the literature and personal communication.The data, covering 19 countries, were collected on 85 experimental sites.Cerdan et al. (2010) implemented geostatistical techniques for interpolating the plot results to the whole European territory.
The average estimates and the standard deviation of the soil erosion rates at country level are heterogeneous.The maximum values also corroborate the expectation of an asymmetric distribution (with positive skewness) which may be linked to the distribution of precipitation intensity over large regions (due to varying orography, distance inland, latitude, . . . ) and of factors such as the one describing the cover management (whose range of values spans over 3 orders of magnitude, see Bosco and de Rigo, 2013).Right-tailed distributions are associated with high variance of the quantiles close to the maximum one.Therefore, the greater discrepancy between the values of EIONET-max and e-RUSLEmax when compared to the discrepancy between the corresponding standard deviations is not surprising.This however increases the sensitivity of the average to the modelling stochastic fluctuations of higher quantiles (median estimates would have been more stable: unfortunately, they are not available for EIONET and Cerdan et al. (2010), along with any more detailed quantile distribution).
As underlined by Lovejoy and Schertzer (2007), a growing body of evidence suggests that "geofields are scaling (have power law dependencies on spatial scale, resolution), over wide ranges".This may also suggest the scaling of the topography and other surface fields (e.g.soil erosion) to be "significant because the geophysical processes responsible for them (including orographic, erosional and hydrological processes) are strongly nonlinearly coupled so that the scaling in one is strong evidence for scaling in another" (Lovejoy and Schertzer, 2007).
Applying this hypothesis to the soil erosion distribution over European countries would allow a more articulated comparison to be done between e-RUSLE and the available information (EIONET and Cerdan et al., 2010).This conjecture originated a long debate between lognormality versus power low (Pareto distribution) -among others -for assessing the most appropriate spatial distribution of scaling geofields (Lovejoy and Schertzer, 2007).Here, we assess both the lognormal and generalized Pareto distribution (with the horizontal shift parameter set to 0, since soil erosion lies in R+) fitted by using the mean and standard deviation values.
As shown in Fig. 3, the true quantiles of the e-RUSLE estimations appear as reasonably reconstructed by both distributions for the higher values.Regarding the lower quantiles, the Pareto distribution generally outperforms the lognormal one.For some of the analysed countries (Germany, the Netherlands and Poland), the lower quantiles are poorly reconstructed.However, provided the distance between lower and higher erosion values exceeds 1 order of magnitude, risk assessors and policy makers are likely to focus more on high erosion rates whereas the inaccuracy of low quantiles might locally be relevant.Both distributions highlight the discrepancy of e-RUSLE estimates in Austria and the Netherlands.In Italy, e-RUSLE appears in line with EIONET results while higher than the reconstructed distribution of Cerdan et al. (2010).The low soil erosion values reported by EIONET in the Austrian alpine area are highly correlated with the presence of forest areas and other vegetation types (e.g.see the maps in Kempeneers et al., 2012;Martin et al., 2010).The corresponding values of the USLE cover management factor appear to have been set close to zero.The protective role of vegetation is essential even in the values of C factor we propose.However, in mountainous areas with high precipitation intensity vegetation may not completely prevent erosion.This different assumption may also be corroborated in Austria by considering the composition of its forests.In particular, deciduous tree species (whose erosion protection in winter is different from that of evergreen taxa) show a considerable cover in Austrian forests (e.g.Larix decidua L. and Fagus sylvatica).
The different methods applied in this paper indicate that the modified RUSLE satisfactorily estimates soil erosion rates.Some uncertainties still remain, but considering that most of the uncertainty can be attributed to a low input data quality, it can be expected that the model's results will improve with improving data quality.

Results and discussion
The resulting soil erosion map is shown in Fig. 4. The wellknown role of natural vegetation in mitigating soil erosion (Cerdan et al., 2010;de Rigo and Bosco, 2011;Maetens et al., 2012) may be observed by comparing the presented map with pan-European forest maps (e.g.Kempeneers et al., 2012) and vegetation maps (e.g.Martin et al., 2010, derived from CLC 2006).Brittany, northern Portugal and western Norway show high soil erosion rates that seem to be related to the pattern of the interpolated rainfalls.Especially in northern Portugal and Norway, the positive relationship between erosion rates and slope length (Cerdan et al., 2010), that increases the runoff rates, appears to be enhanced by the intense precipitation pattern.
Since some essential factors in the e-RUSLE (C, K factors and stoniness) are derived from categorical information, the uncertainty associated with the corresponding classification  (with possibly high misclassification discontinuities) may be propagated in the final erosion map (Fig. 4).
For actually improving the estimation of the C factor, which is still a weakness within the model, it may be necessary to develop new techniques or equations improving the collaboration between soil erosion scientists and remote sensing experts.Although good relationships have obtained since the 1980s between C factor and band ratios of NIR to red reflection (Cihlar, 1987;Stephens and Cihlar, 1982) and the studies of the C factor estimation using remote sensing techniques have progressed, improvements are still needed (Zhang et al., 2011).Further analysis with detailed forest types and tree species distribution maps seem to be necessary to increase the accuracy of the C factor (de Rigo and Bosco, 2011;Geißler et al., 2012).
The RUSLE model generally tends to overestimate soil loss (De Jong et al., 1986).Jointly with the weakness of data for some of the model's parameters and the use of coarse spatial data (e.g.E-OBS) along with data having sub-optimal resolution (e.g.SRTM), the application of the model can lead to noticeably uncertain soil erosion rates in certain areas.For example, the lack of appropriate data sets for soil stoniness can locally lead to an overestimation of the erosion rate (e.g.northern Portugal).However, the precise delimitation of such issues is very difficult as field investigations for validation are required, which is beyond the scope of this paper.
Readers should also be aware that because of the scale of the input data, results (soil erosion index) provide an overview of the soil erosion susceptibility in the landscape rather than an accurate estimation of the actual rate of soil erosion for a specific location.The soil erosion indicator adopted in this paper is the estimated soil loss (t ha −1 yr −1 ) as described in detail by Huber et al. (2008).
The classification scheme used for measuring the soil erosion rates (Fig. 4) is based on the one applied in the PE-SERA Project (Kirkby et al., 2008).From eight different categories for measuring soil erosion rates by water we derived seven classes by aggregating the 0.5-1 ha −1 yr −1 and 1-2 t ha −1 yr −1 into one single class 0.5-2 t ha −1 yr −1 for an easier qualitative classification of the soil erosion map.
The threshold above which soil erosion should be regarded as a serious problem is controversial, the soil formation processes and rates differing substantially throughout Europe.In Switzerland, the tolerable soil erosion rate is generally 1 t ha −1 yr −1 , which can increase to 2 t ha −1 yr −1 for some soil types (Schaub and Prasuhn, 1998).Verheijen et al. (2009) report an upper limit of 1.4 t ha −1 yr −1 , while 2 t ha −1 yr −1 is the threshold in Norway for considering the soil loss as tolerable (Srebotnjak et al., 2010).Our visualization using the 2 t ha −1 yr −1 threshold for low erosion rates is intended to help highlight the areas with higher rates of erosion by water.
The map shows that approximately 14 % of the European territory is characterized by a significant soil erosion rate (moderate/high level), which is in line with previous estima-tions that 15-16 % of Europe's land area is affected by soil erosion (Cerdan et al., 2010;EEA, 2003).
It is clear from the map that soil erosion by water is a major problem in many parts of Europe.The average rate of soil erosion by water across the EU-28 is 2.76 t ha −1 yr −1 , excluding Cyprus (CY), Greece (GR) and Malta (MT).Just over 7 % of cultivated land (arable and permanent cropland) in the EU-25 (excluding GR, CY and MT) is estimated to suffer from moderate to severe erosion (i.e.OECD definition of > 11 t ha −1 yr −1 ).This corresponds approximately to the entire area of Bulgaria.In comparison, only 2 % of permanent grasslands and pasture in the EU-25 is estimated to suffer from moderate to severe erosion.This demonstrates the importance of maintaining permanent vegetation cover as a mechanism to combat soil erosion.
Several countries appear as not affected by notable soil erosion.Others, mainly in the southern part of Europe, are particularly susceptible to erosion, showing a soil erosion rate much higher than the European average.However, such values can be misleading: erosion rates in many areas can be considerably higher, even in those countries having a low average.The opposite is also true for countries with higher values.
Considering the European ecozones (based on the FAO ecological zoning FAO, 2001FAO, , 2012)), the mountain system shows a mean soil erosion rate 2-3 times higher than the average (from 4.06 t ha −1 yr −1 of the subtropical mountain system to 7.8 t ha −1 yr −1 of the boreal mountain system).
As already mentioned, there is a high probability for some of the model results to be overestimated.The R factor uncertainty, the coarse resolution of the layers used for calculating the K factor, the CLC 2006 misclassification and the presence of areas having a stoniness value much higher than the value indicated by the underlying soil database (e.g.northern Scotland) can be at the basis of many of the uncertain estimations.Further investigations are suggested on the key role of land cover changes and misclassifications (CORINE Land Cover 2006 is found currently accurate in no more than 69 % of the sampled cells, bootstrap p ≤ 0.05) and of forests and vegetation, especially in mountainous areas with intense precipitation.
Another limit of the proposed approach is that the model does not consider erosion processes such as channel or gully erosion, that locally may cause very high soil losses (Poesen et al., 2003;Mathys et al., 2003;Collinet and Zante, 2005), tillage erosion that in Europe may have a similar rate as soil erosion by water (van Oost et al., 2009) or soil loss due to root and tuber crop harvesting (Poesen et al., 2001;Ruysschaert et al., 2005).
Nevertheless, the proposed architecture is designed to be least data demanding while being able to scale up to the continental scale.The fuzzy ensemble application here exemplified for the rain erosivity layer highlights the e-RUSLE approach, which recommends a modular, semantically aware selection of a multiplicity of available estimates for im-proving the reliability of critical components.This approach would easily allow statistical resampling analysis on the ensemble uncertainty to be performed.Furthermore, a subset of layers are suitable to be used for assessing the potential soil erosion by water under climate change scenarios.

Conclusions
An estimation of pan-European soil erosion by water using a modified version of the RUSLE model has been carried out by merging existing empirical rainfall-erosivity equations within a climatic ensemble model based on the relativedistance similarity and by adding a new factor for better considering soil stoniness.
Highly heterogeneous availability of data and knowledge is typical at the continental scale (de Rigo, 2015).The lack of high-resolution pan-European data sets, the nonuniform availability of validated and least data demanding relationships for erosivity in the different European climates and the limitations inherited from the RUSLE architecture lead to a considerable level of uncertainty.
Some of the individual factors can also be interdependent, which results in an even greater impact on the model results (van der Knijff et al., 1999).
As a consequence, quantitative assessment using the model should not be undertaken without the right awareness.The provided estimates cautiously model the erosion rates in the absence of mitigating management practices -which should be regarded as a main factor for limiting the impact of erosion, as well as land use policies.It is necessary to have in mind that the main objective of the present paper is not the production of a new soil erosion map of Europe but to contribute to soil erosion research introducing new techniques and algorithms for improving soil erosion estimation at the regional scale.
Anyway, our spatially distributed assessment of soil erosion, carried out using the e-RUSLE, even considering all the limits of our approach, helps to identify areas in Europe where to concentrate efforts for preventing soil degradation.The new model can support European policy making due to the adaptability of its architecture.As an example, DG-ESTAT of the European Commission, developing a set of Agri-Environmental indicators in order to track the integration of environmental concerns into the Common Agricultural Policy (CAP), exploited our map for estimating the soil erosion at NUTS3 level (Eurostat, 2013).
Improvement in these erosion estimates lies in better climate, soil and land cover data potentially available from national archives.Also, a better resolution of the digital topographic database could strengthen the obtained results.SRTM data at 30 m resolution are currently being publicly released by NASA.Data are already available for Africa; data for Europe should come in the next months.
In particular, land cover requires frequent updating, because changes in land use have a major impact on erosion rates.There is the potential to do this through the analysis of remotely sensed images.
where all factors are ::nonnegative:: 2 values referring to a given spatial grid cell c and are the average for a certain set of years Y = y 1 , . .., y i , . .., y n Y of the corresponding yearly values: Er c,Y = average annual soil loss (t ha −1 yr −1 ); R c,Y = rainfall erosivity factor (MJ mm ha −1 h −1 yr −1 ); K c,Y = soil erodibility factor (t ha h ha −1 MJ −1 mm −1 ); L c,Y = slope length factor (dimensionless); S c,Y = slope steepness factor (dimensionless); C c,Y = cover management factor (dimensionless); St c,Y = stoniness correction factor (dimensionless); P c,Y = support practice aimed at erosion control (dimensionless).

Table 1 .
Climatic information: auxiliary variables (A) and covariates (B) based on precipitation patterns in a given spatial grid cell c; P day,c and P m,c respectively refer to the precipitation in c for the day [day] and the month [m].The values are computed considering years y in a set of n Y years.In C, a list of empirical equations is provided for estimating the rainfall erosivity (EI 30 ). of days P 10 m,c = 1 nY month(day) ≡ m P day,c P day, c ≥ 10 mm 12 with daily rain ≥ 10 mm [

Figure 1 .
Figure 1.The e-RUSLE application to pan-European scale produces estimates of soil erosion by water which are aggregated at 1 km × 1 km spatial cell.These estimates also depend on the quality of input land cover information.An expert-based plausibility check was performed over random clusters of 3 × 3 km 2 grid cells (Level 1) by integrating Google Earth and Bing higher-resolution information which also includes (Level 2) high-resolution images -Google Street View images; georeferenced crowd-sourced pictures.The plausibility of Corine Land Cover 2006 was also checked to investigate the influence of land cover misclassification (even due to land cover changes occurred since 2006).Image from Bosco et al. (2014); Bing Maps, © 2013 Microsoft Corporation; Google Street View, © 2013 Google Inc., Mountain View, CA.

Figure 3 .
Figure 3. Quantiles (per country) of e-RUSLE estimated soil erosion by water.Boxplots: box with quartiles 25 %, 50 % (red line), 75 %; whiskers with 5 and 95 % quantiles.True quantiles are compared with the reconstructed distribution under the hypothesis of lognormal and extended Pareto distribution.The hypothesized distributions are also compared with the ones corresponding to the available statistics from EIONET and Cerdan et al. (2010).

Table 2 .
The cover management factor is computed by processing the third level of the 15th version of the Corine Land Cover 2006 (CLC) database(European Environment Agency, 2011).A unique C factor value is associated (aiming at describing the European average conditions) with each of the 44 CLC classes.The table is fromBosco and de Rigo (2013), where further information on the computed cover management factor is provided.Due to the impossibility of calculating the C factor for the whole European continent applying the original equations of the RUSLE model, the contribution of stoniness in reducing soil erosion by water was not really considered.For avoiding overestimation in many different areas of the European continent, a new factor has been added for calculating the contribution of stoniness in mitigating soil erosion by water (stoniness correction factor).

Table 3 .
Results of the bootstrap analysis (based on 10 000 runs) of the expert-based plausibility check carried out over the 85 randomly selected clusters of 3 × 3 km 2 grid cells (fromBosco et al., 2014).During the plausibility check more than 700 cells and around 10 000 images were assessed.The table reports the cumulated distribution of cells which are at least moderately accurate.