Data efficient Random Forest model for avalanche forecasting

. Fast downslope release of snow (avalanche) is a serious hazard to people living in snow bound mountains. Released snow mass can gain sufficient momentum on its down slope path to kill humans, uproot trees and rocks, destroy buildings. Direct reduction of avalanche threat is done by building control structures to add mechanical support to snowpack and reduce or deflect downward avalanche flow. On large terrains it is economically infeasible to use these methods on each high risk site. Therefore predicting and avoiding avalanches is the only feasible method to reduce threat but sufficient snow stability data for accurate forecasting is generally unavailable and difficult to collect. Forecasters infer snow stability from their knowledge of local weather, terrain and sparsely available snowpack observations. This inference process is vulnerable to human bias therefore machine learning models are used to find patterns from past data and generate helpful outputs to minimise and quantify uncertainty in forecasting process. These machine learning techniques require long past records of avalanches which are difficult to obtain. In this paper we propose a data efficient Random Forest model to address this problem. The model can generate a descriptive forecast showing reasoning and patterns which are difficult to observe manually. Our model advances the field by being inexpensive and convenient for operational forecasting due to its data efficiency, ease of automation and ability to describe its decisions.


Introduction
In snow bound areas avalanches cause loss of life and property worldwide. Avalanche deaths are estimated at 250 per year (Schweizer et al., 2015). Government and private agencies are funded to reduce avalanche threat for important activities and property e.g road/rail transport, construction, army movement etc. This effort and research funding has led to development of several techniques to reduce avalanche threat. For a specific site [< 1 km 2 ], avalanche threat is reduced by building control structures, modification of nearby terrain or use of explosives to trigger avalanches in controlled way (Fuchs et al., 2007).
Using such techniques on each risk site over large areas is economically infeasible therefore avalanche forecasting is done to plan passive risk reduction measures. Individuals can use information in forecast to plan their activities in snow bound areas. Avalanche forecasting aims to identify the location of snowpack weakness and its triggering risk. Observing snowpack stability at a high spatio temporal resolution over large terrain is a difficult problem. Therefore stability at most risk sites is deduced using secondary observable data e.g meteorological and snowpack parameters from a similar representative site, terrain parameters of the site, expected changes to snowpack by weather etc. Snow stability shows high variance with respect to terrain features. Deduction process for snow stability from secondary data has not been mathematically formulated therefore forecasters need to rely on their intuition of local terrain and snowpack patterns to estimate stability and collect more information to minimise uncertainty ( LaChapelle, 1980, Schweizer et al., 2008. Numerical and statistical models are important tools for adding objectivity to this process. Numerical models simulate the snowpack and weather processes that contribute significantly to avalanche hazard. CROCUS and SNOWPACK give accurate snow profile simulations at a microscale level( < 1 km 2 ) for sites where meteorological data is available (Vionnet et al., 2012;Lehning et al., 1999). Meteorological sensors cannot be setup at all risk sites therefore interpolated meteorological data from numerical weather models like SAFRAN is used as input for SNOWPACK model (Lehning et al., 1999). The output from SNOWPACK tells forecasters about slopes where snowpack stability is changing due to numerically modelled snowpack processes : weak layer formation due to temperature gradients, surface or deep wetting, compaction and refreezing (Lehning et al., 1999). A limitation of this model chain is its inability to account for some contributory processes e.g wind loading, its accuracy can be seriously affected by errors in interpolated meteorological data.
Statistical models take input from a specific site and use it as representative of conditions over a larger region (mesoscale 1 0 km 2 ) to estimate the avalanche threat. These models link weather and snowpack variables to avalanche threat using avalanche occurrences from historical data (Buser., 2009;Gassner et al., 2001). Information from multiple sources( possibly redundant) e.g wind loading indexes, local terrain features, location specific snowfall patterns , numerical snowpack simulations can be included in these models (Pozdnoukhov et al., 2011) , this makes them more robust to errors in individual parameters compared to numerical models. Forecasts of numerical models can be improved by using their results in statistical models.
Machine learning has been used for tasks where procedures cannot be precisely formulated but humans perform well e.g in handwriting and speech recognition (Liang and Hu, 2015). Machine learning models are not used to automate avalanche forecasting process, they instead help forecasters judgement by providing information from past data relevant to the forecasted day. In this paper we build a machine learning model using random forest technique. The model gives interpretable data mining outputs and is convenient to use for operational applications due to its data efficiency and automation. Nearest neighbours model is a frequently used statistical model for avalanche forecasting (Buser., 2009;Gassner et al., 2001;Singh et al., 2014) . It estimates threat by using a set of historical days most similar to the forecasted day. It is unable to directly model the inductive reasoning process used by avalanche forecasters, this may cause data ineffeciency. Models in (Buser, 2009;Gassner et al., 2001;Singh et al., 2014) required atleast 7 years training data.
Decision trees and expert systems have been used to model complex patterns which are missed by nearest neighbours (Rosenthal et al., 2001;Schweizer and Föhn, 1996;Heindrikx et al. 2014) . These techniques are capable of using expert knowledge by modeling known forecasting rules. Unfortunately individual trees are sensitive to small changes in data and unable to learn complex decision boundaries without overfitting (Hastie et al., 2009). Expert systems can be designed to satisfy all the criterion mentioned above but considerable human effort and expertise is required to build expert systems. (Pozdnoukhov et al., 2008) use support vector machines (SVM) on high dimensional feature vectors for avalanche forecasting. They use feature vectors from multiple data sources for geo spatial forecasting (Pozdnoukhov et al., 2011), these vectors include several features representing slope ,elevation, snow drift, snow stability and meteorological parameters.
SVM maybe difficult to interpret by a forecaster, in (Pozdnoukhov et al., 2011) it is proposed to explore the support vectors for interpreting model outputs, some features used by authors implementation currently require manual effort to record.
A model satisfying following criterion can be made operational at a low cost : 1. Can be trained to give acceptable performance using a low amount of historical data. This makes it useful for regions where long and reliable historical records are unavailable.
2. Can forecast using only data collected from automated sensors. Data with high spatio-temporal resolution can be used from a grid of sensors.
3. Can explain the reasoning used to arrive at conclusion and gives numerical estimates justifying the reasoning. The explaination should not require significant forecasting experience to interpret. 4. Acceptable forecast skill scores for operational use, high risk days should be detected with low rate of false positives.
We use an ensemble learning technique (Random Forest) , an ensemble of decision trees gives the prediction. Random forest ensemble can learn complex decision boundaries and is resistant to overfitting (Briemann., 2001). Decision trees have an interpretable output, trees from ensemble have been used to build a descriptive forecast. Our model satisfies all four criterion above, therefore it overcomes some of the problems caused in operational use of the models we surveyed. The model can be deployed at sites where data of three winters is available. In future we will explore the use of transfer learning techniques to reduce the data requirement further.

Random Forest
Random forest is an ensemble learning method (Opitz and Maclin, 1999). Individual decision trees have weak performance due to overfitting and high variance. Random forest uses a collection of decision trees to improve prediction. Each tree is trained on a random dataset derived from the training data using a process called bagging, this ensures that individual trees are uncorrelated (Brieman, 1996). The output of the collection for a data point is given by the mean output of trees at that point. The ensemble model is partially interpretable and depends on few parameters. Some useful properties of model are (Briaman and Friedman, 1984): 1. A method for ranking feature importance.
2. Robust to outliers and missing values.
3. Can handle both discrete and continuous features without special pre-processing.
4. Training process can be highly parallelized by training trees on separate threads. 5. State of the art accuracy on various tasks (Rogez et al., 2008).

Decision Trees
A decision tree describes a flow chart like process for classifying data points. Each non leaf node in the tree defines a test on the data point, each leaf node defines a classification. To classify a point we apply the test at root node to a data point, depending on the result we move to a child node. If child node is a non-leaf node the same process is repeated to move to subsequent child node. This is repeated till a leaf node is reached, this leaf node defines the classification for the data point.
We demonstrate this through an example (cf Figure 1 If for some day the following parameters are known to be: The test on root is logically equivalent to testing if there were any avalanches in past two days, since there were no avalanches in past two days we move to left child node as directed by arrows. We apply the test in left child node which returns false since snowfall was 48 cm in past 24 hours, therefore we move to right child node which classifies the day as moderate risk.

Training Decision Trees
Training algorithms for decision trees proceed by splitting the training dataset based on a feature value such that the resulting datasets are more homogeneous in their target variable, this splitting process continues recursively on split datasets till a termination criteria is reached which specifies that the dataset is sufficiently homogenous. Recursive splitting process naturally defines the decision tree. Each node corresponds to a dataset and the split mentioned in node corresponds to the split decided by the training process.
Algorithm starts with splitting the entire training dataset, writes the split in root node and adds two child nodes (left child C1 and right child C2) to root. Split datasets correspond to the child nodes and are split further adding more child nodes to the nodes C1 and C2. This process can therefore be repeated recursively till sufficiently homogenous datasets are In this paper we use C4.5 algorithm implemented in Scikit-learn, a python machine learning library (Bressert, 2012;Quinlan, 1993) . C4.5 splits on the attribute with highest normalised information gain. In later sections we use Gini coefficient to measure the homogeneity of target variables. Gini coefficient of observations y 1 , y 2 ,.... , y n is defined by: G ( y 1 , y 2 ,..... , y n ) = ∑ ∑| y i − y j | ∑ ∑ y j = ∑ ∑| y i − y j | n ∑ y j ……. Eq (1) If all values are almost equal G approaches 0, if few values dominate all others G approaches 1.

Random Forest Training
Trees are trained on subspaces of the dataset. A subspace is formed by drawing a random sample with replacement from training set and then selecting a random subset of features from the drawn sample. To build the ensemble a user specified number of decision trees are trained and stored in memory, each tree is trained on a independent random subspace of the training data.  The set of input parameters used for forecasting risk on a day are summarized in Table 1 and Table 2. While parameters in Table 1 can be observed in automated mode, parameters in Table 2 are derived from these along with avalanche occurrence data to represent the events of past few days. The prediction for a day can be done automatically if avalanche occurences of past days are known, these can be detected automatically using infrasonic sensors and radars.

Model performance and parameters
We define a confusion matrix of a classifier C as on a labeled dataset D as: Eq (2) This matrix is used for performance analysis of classifiers. Here we derive the following measures from confusion matrix to describe performance. a 11 a 00 − a 10 a 01 ( a 11 + a 01 )( a 01 + a 00 )+( a 11 +a 10 ) ( a 10 + a 00 )  Table 4]. Training a classifier on this dataset will bias it to forecast more non avalanche days. A solution is to use cost corrected classifiers with higher cost assigned to minority examples. Another approach is to discard majority class data randomly or synthetically generate more minority class data to make class sizes equal. This approach can lead to overfitted classifier when datasets are highly skewed.
To reduce skewness we remove from training and testing dataset all days for which avalanches are unlikely due to lack of sufficient standing snow. We discard examples where snow height is less than 50cm [ Table 4]. This filtering step removes poor examples which cause overfitting of trees. See Table 4 for justification of threshold choice and summary statistics of the dataset. When training decision trees of ensemble, the classes are weighted by their proportion in filtered dataset.

Parameter Tuning
We tune the following model parameters: D = Maximum depth allowed for a tree added to ensemble.

N = Number of trees used in ensemble.
Model output is the estimated probability of an avalanche given the input parameters for the day defined in Table 1. To get a binary classification and validate the model using scores in Table 3    We use statistics from Table 5 and Table 4 to find an acceptable FAR vs POD trade-off for operational forecasting. For example FAR of 0.3 and POD of 0.7 implies that of 52 non avalanche days approximately 15 were misclassified as avalanche days and of 29 avalanche days approximately 21 were classified correctly [Refer to Table 6]. Therefore the model detected 21/30 avalanche days in entire season by warning for 36 days. Table 6 gives such operational performance metrics of the model for various values of FAR.

Comparisons with similar models
We compared the model with similar models based on its skill scores, selection of features, data efficiency and descriptive forecasting[ Table 7 ]. The model uses lesser data, gives informative descriptive forecasts and acceptable skill. Sufficient historical avalanche data is not available for most places therefore a data efficient model is required for forecasting.

Descriptive Forecasting
Descriptive forecast includes information to analyse avalanche threat. Examples of descriptive forecast from some frequently used models: a. Nearest neighbours model lists similar days and their attributes Singh et al., 2014;Purves et al. 2003). b. Expert systems list applicable rules Schweizer and Föhn, 1996).

Decision tree visualisation and results
A path from root to a leaf node in a decision tree can be interpreted as a sequence of conditions. Forecasting rules can be defined by these condition sequences. Descriptive forecast is generated by visualising trees predicting highest avalanche threat probability. Visualised trees give rule based forecasting logic and the strength of its predictive value. In our experiments the trees show non-trivial logic which may be difficult to discover otherwise.
We show here a sample descriptive forecast for BG axis on 1-Feb-17, a day classified as high risk with predicted avalanche probability 0.54 . Ten decision trees which predicted probabilities greater than or equal to 0.9 were visualised. Most visualised trees show that snowfall in past 10 days and high wind speed caused risk. Tree in Figure 9 demonstrates this reasoning pattern: Following the reasoning path from root to red leaf node we get the following heuristic satisfied for the present day: If Standing snow > 79.5 AND Snow fall in past 10 days > 134.5 AND Average wind speed in past 10 days > 5.0 then the risk of avalanche is high ( > 90% ) Such reasoning is known to experienced forecasters. In this case model gives numerical estimates for intuition. Trees can also suggest patterns which are difficult for forecasters to observe manually. Figure 10 demonstrates such a pattern , this was visualised for descriptive forecast of 28-Mar-17. The temperature bounds show that snow melt maybe causing high threat. We check this hypothesis by additional data mining. Statistics from a filtered database containing only days which satisfy these bounds are compared to same statistics from the original database[ Table 8]. Other features correlated to the temperature bounds may be causing actual threat. To rule this out we made a simple univariate analysis, variables with significantly different distributions in filtered and original sets were analysed. Of these we believe only snow height is a variable leading to significant changes in hazard levels. To 18 410 415 420 425 430 analyse effect of snow height we make another filtered dataset from the original data containing only days where snow height is greater than the mean snow height of temperature filtered data. Statistics from these three datasets are compared in Table 8. Snowfall causes significantly higher risk when the temperature bounds are satisfied. In our analysis the contribution of temperature was much higher than standing snow height [see Table 8]. Higher triggering risks after snowfall at these temperatures is likely due to the formation of melt freeze crusts and higher snow density at higher temperatures. When rule is satisfied and no snowfall occurs the risk is higher than days when mean snow height is much higher , this suggests significant melting instability.
Temperature bounds can be similarly used in data base queries to get other important information from past data. Past high risk slopes which triggered under such temperature conditions, size of avalanches triggered, stability and snow profile data collected under similar temperature conditions can be searched from filtered databases.

Discussion
The model gives acceptable forecast accuracy of triggering risk by fresh snowfall or other natural causes. In 51 warnings it detected 25 out of an average of 29 avalanche days per winter [ Table 6 ]. On average half warnings of natural triggering are true, this precision is reasonable given the difficulty of predicting natural avalanches. The false alarms can indicate untriggered snow instability. Descriptive forecast can provide more information about nature of these instabilities and their probable locations.
The rule seems to predict melt avalanches, such a simple yet effective rule in terms of temperature only is difficult to find for a forecaster. The data mining results in table 8 show that snowfall when the rule is satisfied leads to higher triggering probability. This is due to combination of factors: formation of melt freeze crusts and higher density of fresh snow at higher temperatures. The fresh snow bonds poorly with crust , due to its higher density it is also more likely to slip from crust.
When rule is satisfied and no snowfall occurs the risk is higher than days when mean snow height is much higher , this suggests significant melting instability.
Such complicated reasoning was accounted by model without any significant feature engineering effort. An explanation of data effeciency is that decision trees model such reasoning and ensemble accounts for the different causes of avalanches.
Variables involved for avalanche threat are different for various situations therefore in avalanche datasets the important variables involved in causing threat vary across the sample space. Nearest neighbour models are unable to adopt to this variation in feature importance, they use the same distance metric to forecast in every neighbourhood of sample space. Trees in ensemble consider different features important hence this method can account for the differences in important variables.
The trees trained with splitting features matching the important features for input day give higher probabilty outputs than other trees.
Prediction is made using only parameters which can be measured automatically. Therefore such models can use data from dense sensor grid to improve performance. If additional parameters are required to improve forecasting process, less record of these new parameters is required for training an updated model. Therefore data effeciency of a model implies that economic returns from setting up and updating a sensor grid can be obtained in a reasonable time period. We expect the following approaches are promising for improvement of data effeciency: 2. Inclusion of numerical snowpack simulation data into input features.
3. Tuning of algorithm for avalanche forecasting, changing the bagging and feature splitting procedures to account for differing importance of various situations in forecasting.

Conclusions
Requirement of long term training data is a significant problem in operational use of machine learning models for avalanche forecasting. Data efficiency can reduce the cost of training a new model for a location or retraining an existing model to use different data. This paper demonstrates the use of Random Forest technique for avalanche forecasting on a dataset from BGaxis. The model shows significantly higher data effeciency than current operational models surveyed. This is likely due to the ability of decision trees to model specific avalanche forecast knowledge and the ensemble modeling the stochastic properties of data. The model gives acceptable forecast skill while using low amount of training data [3 winters data]. Future work can explore reducing the data requirements further by using transfer learning techniques and specialised tuning for avalanche forecasting. Data used by model for prediction on a day can be collected automatically, forecasts can be generated automatically.
Automated data collected in high volume from a dense sensor grid can be used for generating forecasts. Data efficiency and automated forecasting make the model economical for operational forecasting aplications.
Descriptive forecasting by visualising decision trees can give reasons for avalanche threat and help forecasters judgement by giving them numerical estimates and qualitative analysis of situation. Variable combinations causing threat and risk probabilities given the variable ranges are clear from decision trees. Further data mining can be done using these ranges and variables to find high risk slopes and type of instabilities.

Code Availability
Not permitted

Data Availability
Not permitted

Sample Availability
Not applicable to our work

Video Supplement
None

Appendices
None, Tables in text are attached after referrences.

Author contributions
Manesh Chawla: Implementation and validation of model.

Data exploration and visualisation.
Writing manuscript.
Amreek Singh: Help with writing Section 2 and 3.
Suggestions for model improvement. Review of manuscript.

Competing Interests
Authors were employed by Snow and Avalanche Study Establishment, a lab of Defence Research and Development organisation.

Disclaimer
The information contained in this paper is true and complete to the best of our knowledge. The authors disclaim any liability in connection with the use of this information.

Acknowledgements
We are thankful to field workers of Snow and Avalanche Study Establishment for collecting avalanche data and maintaining the field equipment in difficult weather conditions.
The model code was written using sci-kit learn python library, all graphs were made using matplotlib.