Data assimilation with an improved particle filter and its application in the TRIGRS landslide model

Particle filters have become a popular algorithm in data assimilation for their ability to handle nonlinear or non-Gaussian state-space models, but they have significant disadvantages. In this work, an improved particle filter algorithm is proposed. To overcome the particle degeneration and improve particles’ efficiency, the processes of particle resampling and particle transfer are updated. In this improved algorithm, particle propagation and the resampling method are ameliorated. The new particle filter is applied to the Lorenz-63 model, and its feasibility and effectiveness are verified using only 20 particles. The root-mean-square difference (RMSD) of estimations converges to stable when there are more than 20 particles. Finally, we choose a peristaltic landslide model and carry out an assimilation experiment of 20 days. Results show that the estimations of states can effectively correct the running offset of the model and the RMSD is convergent after 3 days of assimilation.


Introduction
Mountainous areas all over the world suffer frequent landslide disasters.Works of landslide monitoring, analysis and forecasting are crucial.Many numerical modeling methods of slope evolution, such as discontinuous deformation analysis (DDA) (Shi, 1992;Jing et al., 2001;Ma et al., 2011) and the distinct element methods (DEMs) (Lorig and Hobbs, 1990;Marcato et al., 2007;Li et al., 2012), have been proposed and developed recently.Iverson proposed a mathematical model that uses Richards' equation to evaluate effects of landslides in response to rainfall infiltration (Iverson, 2000).The Transient Rainfall Infiltration and Grid-Based Regional Slope-Stability (TRIGRS) model is a raster-based model and depends on time of transient rainfall infiltration (Baum et al., 2008).Jiang adopted the ensemble Kalman filter landslide movement model in relation to hydrological factors, which introduced data assimilation (DA) to landslide evaluation (Jiang et al., 2016).
DA is a common approach to estimating optimal states in dynamic systems.With DA algorithms and operators, DA merges different scales of observations into dynamic models to take advantage of all the information.Many DA algorithms have been developed and improved in recent years, and particle filters (PFs) are a popular algorithm for their ability to handle nonlinear and non-Gaussian distributed models (Arulampalam et al., 2002;Moradkhani et al., 2005).The application and improvement of PFs has been researched recently in DA and other fields.Salamon and Feyen (2009) applied the residual resampling particle filter (RRPF) to assess parameter, precipitation and predictive uncertainty in a rainfall-runoff model.Thirel et al. (2013) assimilated snow-covered areas in physical distributed hydrological models and MODIS satellite data to improve pan-European flood forecasts.Mattern et al. (2013) carried out assimilation experiments for a three-dimensional biological ocean model and satellite observations and verified the feasibility of biological state estimation with sequential importance resampling (SIR) for realistic models.
However, large computational complexity and particle degradation or collapse are still disadvantages of PFs.To solve these problems, some resampling algorithms have been proposed.One improvement is adding an item related to observations to make the proposal density dependent on future observations; accordingly most particles can situate into the range of observation error (van Leeuwen, 2010).This method can achieve good results using only 10-20 particles in high-dimensional assimilation experiments.But the number of key particles is reduced when the system variance is larger than the observed variance, and the values of added items are uncertain.Another improvement is to replace the duplicating process by generating a Halton sequence in residual resampling (Zhang et al., 2013).The disordered particle sets are turned into ordered sets and too few particles can hardly describe the posterior probability density function (PDF) better.
In Sect.2, a new resampling approach is proposed to improve the above method, maintaining both particle diversity and efficiency.The new algorithm formula and implementation process are listed in the text.To predict the safety factor of peristaltic landslide, a simulation experiment, applied to the Lorenz-63 model using different numbers of particles, ranging between 10 and 200, is explained in Sect.3, which demonstrates that the new method shows efficiency and sensitivity to the number of particles.Finally, a rainfall infiltration landslide model case is analyzed.We choose an experimental landslide model with a 10×10 size grid, of which the side length of each grid cell is 10 m.The improved assimilation algorithm is applied to the TRIGRS program to evaluate the change of factor of safety (FS) in the experimental model.

Improvements to residual resampling particle filtering
In sequential importance sampling, the state vector is represented by a set of particles: where x is the state vector with initial PDF p(x 0 ), k is the subscript of time steps, ε k−1 is system noise with zero mean at step k−1 and f (•) is the model operator.Initial N particles are sampled from p(x 0 ).The observation equation is where z k is the observation vector at step k, and h(•) is the observation operator.Weights of particles are calculated by Eq. ( 3) and normalized to obtain w i k by Eq. ( 4) where i is the index of particle number, p(z k |x i k ) is the likelihood of observation and q( Residual resampling is a way to solve particle degeneracy, which is an unavoidable problem in PFs.To keep most particles effective, low-weight particles are removed and highweight particles are duplicated.But with recursive progress the particle sets can hardly represent the prior PDF due to declining particle diversity.
Some improvements to the residual resampling algorithm are proposed in this paper.Firstly, in the process of particle transferring, we choose where J k is a coefficient like the "gain" in an extended Kalman filter: in which A k , B k are the linearization parameters of f (•) and h(•), respectively: D k/k is the estimation variance of state x k at step k.This process is equal to translating particles close to observations.But the value of J k is difficult to determine because the variance of state estimation To simplify the calculation, suppose that the translated particles are a series of virtual observations about the state at step k.Write the particle set as and replace D k−1/k−1 with the variance of particles.To keep the value of D k−1/k−1 unchanged before and after translation, we choose the posterior particles at step k − 1: Secondly, using the method of Zhang et al. (2013) to compute accumulative copy times (ACTs), each parent particle with high weight regenerates a set of new particles.Differently, instead of duplicating or generating a Halton sequence, it generates a series of normally distributed particles: where ACT i is the ACT of the ith particle, and the mean and variance are related to the value of the parent.Accordingly, the resampled particle set is composed of some different particle sets that obey normal distribution.Assuming that the j th particle of x i k is written as x ij k , the formula (3) can be written as Briefly, the improved RRPF in this section can be implemented by the following steps: -Step 1. Draw initial particles {x i 0 } from the prior PDF p(x 0 ).
-Step 2. Compute the mean and variance of posterior particles at step k-1.
-Step 3. Using the new method in this section, compute the gains of particles.
-Step 4. Transfer the particles close to the observation: -Step 5.In residual resampling, each particle generates a set of normally distributed progeny particles, and all progeny sets make up the resampled particle set: when ACT i = 0, X iACT i k is an empty set.
a measure to assess the accuracy of calculation is the root mean square difference (RMSD), which is defined as where T is the period of assimilation, and Xt and X obs t are the assimilated value and the observation of state at time t, respectively.
3 Application to the Lorenz-63 model We choose the Lorenz-63 model as an example to test the improved algorithm (Baines, 2008).
where the constants σ , ρ and β are system parameters proportional to the Prandtl number, Rayleigh number and certain physical dimensions of the layer itself.Parameters are given by dt = 0.01, σ = 10, ρ = 28, β = 8/3, the observation error σ obs = √ 2 and the model transmission error based on time interval σ mod = 2 √ t.Initialize the filter with the starting point, which is set to (x 0 , y 0 , z 0 ) = (1.50887,−1.531271, 25.46091).The truth is obtained by the formula of the model recursively.Observations are generated from the truth by adding a disturbance every 40 steps, with 1000 recurring steps, and assimilating the observation with the model when observation exists at the current step and moving to the next step when there is no observation.
Figure 1 shows the results of the x component using the new PF with 20 particles.Note that the new PF procedure is close to the truth with much fewer particles, which is more efficient than the standard PF procedure with hundreds of particles.Compute the confidence interval with the 95 % level using the posterior particles every step.Figure 2 shows that the intervals contain observations at almost all the steps at which observations exist.That means particle sets after translation are closer to observations and true states.The evolution of all particles is displayed in Fig. 3, in which most particles are very close to observations except for several ones at moments when the state changed obviously.The RMSD sequence is shown in Fig. 4; it tends to be stable when the number of particles is more than 20.This means the improved algorithm only needs no fewer than 20 particles.In this experiment, the FS is applied to assimilation.It is calculated as follows: in which c is soil cohesion, α is slope angle, φ is soil friction angle, ϕ is the groundwater pressure head depending on depth Z and time t, γ W is groundwater unit weight, and γ S is soil unit weight at saturation.An example of the 10 × 10 grid TRIGRS model is set to be the background, and each grid is a square with a length  The value is relatively high when the particle number is fewer than 20 and tends to be stable when higher than 20. of 10 m.The simulated observations are generated from the FS by adding a disturbance with normal distribution N(0.2, 0.3).Due to the difficulties of determining the parameter ϕ, the soil friction angle and its high sensitivity to results, we now generate a set of particles ϕ i k to form ϕ, in which k and i are indices of step and particle number, respectively.The input model variance of ϕ is 2 and observation variance of FS is 0.3.At each step, ϕ and FS will be updated, and the updated parameters continue to participate in the next step of operation as initial parameters.The number of particles is set to 20 in the PF program.Figure 5 shows the model running results and the assimilated results of the FS running for 5 days, 10 days, 15 days and 20 days.In the model running results, the value of FS is smaller and decreases rapidly, while in the assimilated results the change is relatively gentle.To evaluate the distribution variation in ϕ, we propose that the estimation of ϕ is calculated as formula in which w i k is calculated using Eqs.( 18) and ( 19). Figure 6 shows the distribution variation in ϕ running for 5 days, 10 days, 15 days and 20 days.Actually, the estimation of ϕ uses the same method and particles of the estimation of FS. Figure 6 shows the distribution variation in ϕ running for 5 days, 10 days, 15 days and 20 days.The change of ϕ estimation in a single cell is illustrated in Fig. 7, considering the middle unit, grid cell (5, 5).To assess estimations of all grid cells, the RMSD of the whole grid of points to measure the estimated error is modified to where N p is the total number of grid points, and i and j are the indices of the row and column number, respectively.The RMSD curve with assimilating days is shown in Fig. 8, which suggests the value is large on the first 2 days of initialization, fluctuates in the next days and is steady when there are no observations.

Conclusion and discussion
The problems of particle degeneration and efficient expression of posterior PDF are long-term difficulties that affect the performance of particle filters.Many resampling methods can improve effectiveness of particles, but they still need a large number of samples resulting in a large amount of computation.
In this study, we propose two approaches to improve the particle filter process.Firstly, for the problem of particle degeneration, new Gaussian-distributed offspring particles are generated for each mother particle.This avoids particle duplication and maintains particle diversity.Secondly, in order to improve the propagating efficiency of a priori particles into a posteriori particles, an additional item is added that is similar to the Kalman gain at the step of particle propagation, which greatly reduces the number of particles required.It uses only dozens of particles to achieve good results.A simulation experiment of the Lorenz-63 model is carried out to validate the feasibility of these methods.The TRIGRS landslide model is first proposed for application to the assimilation system.Results show that the assimilation process can make the estimation close to observations, which proves the feasibility of applying the improved particle filter to the landslide model.
However, some disadvantages are still present.Grid cells are independent of each other in TRIGRS, and this leads to the FS estimations possibly being greater than the actual values.Therefore, the FS estimations only provide a reference for the actual values.The experiment needs improvement.

Figure 1 .
Figure 1.Results of the new PF for the Lorenz-63 model of the x component.The red crosses are observations, the black line is the true state and the blue line represents the new PF results.

Figure 2 .
Figure 2. The 95 % confidence interval computed by posterior particles.The green dashed lines denote the upper and lower limits of the interval and the red crosses are observations.

Figure 3 .
Figure 3.The evolution of posterior particles in time.The green dashed lines show the traces of all particles; the red crosses denote the observations.

Figure 4 .
Figure4.RMSD of the estimation with respect to particle numbers.The value is relatively high when the particle number is fewer than 20 and tends to be stable when higher than 20.

Figure 5 .
Figure 5. Model results and assimilation results of FS.The maps in the first row are the model running for 5, 10, 15 and 20 days, and those in the second row are the assimilation results.The horizontal and vertical coordinates in each graph are the grid numbers of each cell.

Figure 6 .
Figure 6.The distribution variation in groundwater pressure head (ϕ) with assimilated time.The horizontal and vertical coordinates in each graph are the grid numbers of each cell.

Figure 7 .
Figure 7.The changing line of the groundwater pressure head (ϕ) estimation of grid cell (5, 5) with assimilation time.The value grows with the evolution of the landslide.

Figure 8 .
Figure 8. RMSD line of all grids depending on assimilation time.The TRIGRS model is assimilated with observations in the first 20 days, and results of days 21-30 are model running results without observations assimilated.