Data-driven automated predictions of the avalanche danger level for dry-snow conditions in Switzerland
- 1WSL Institute for Snow and Avalanche Research SLF, Davos, Switzerland
- 2Swiss Data Science Center, Zurich, Switzerland
- 1WSL Institute for Snow and Avalanche Research SLF, Davos, Switzerland
- 2Swiss Data Science Center, Zurich, Switzerland
Abstract. Even today, the assessment of avalanche danger is by large a subjective, yet data-based decision-making process. Human experts analyze heterogeneous data volumes, diverse in scale, and conclude on the avalanche scenario based on their experience. Nowadays, modern machine learning methods and the rise in computing power in combination with physical snow cover modelling open up new possibilities for developing decision support tools for operational avalanche forecasting. Therefore, we developed a fully data-driven approach to predict the regional avalanche danger level, the key component in public avalanche forecasts, for dry-snow conditions in the Swiss Alps. Using a large data set of more than 20 years of meteorological data measured by a network of automated weather stations, which are located at the elevation of potential avalanche starting zones, and snow cover simulations driven with these input weather data, we trained two random forest (RF) classifiers. The first classifier (RF #1) was trained relying on the forecast danger levels published in the avalanche bulletin. Given the uncertainty related to a forecast danger level as a target variable, we trained a second classifier (RF #2), relying on a quality-controlled subset of danger level labels. We optimized the RF classifiers by selecting the best set of input features combining meteorological variables and features extracted from the simulated profiles. The accuracy of the danger level predictions ranged between 74 % and 76 % for RF #1, and between 72 % and 78 % for RF #2, with both models achieving better performance than previously developed methods. To assess the accuracy of the forecast, and thus the quality of our labels, we relied on nowcast assessments of avalanche danger by well-trained observers. The performance of both models was similar to the accuracy of the current experience-based Swiss avalanche forecasts (which is estimated to 76 %). The models performed consistently well throughout the Swiss Alps, thus in different climatic regions, albeit with some regional differences. A prototype model with the RF classifiers was already tested in a semi-operational setting by the Swiss avalanche warning service during the winter 2020-2021. The promising results suggest that the model may well have potential to become a valuable, supplementary decision support tool for avalanche forecasters when assessing avalanche hazard.
- Preprint
(11189 KB) -
Supplement
(52345 KB) - BibTeX
- EndNote
Cristina Pérez-Guillén et al.
Status: closed
-
RC1: 'Comment on nhess-2021-341', Pascal Hagenmuller, 04 Jan 2022
Overall comment :
The paper tackles an interesting problem of providing decision-aid tools to avalanche forecasters based on modern simulation tools and large quantity of data. The authors used random forests to reproduce the regional avalanche danger (human forecast) on a four level scale in dry conditions from meteorological data measured at automatic weather stations and the corresponding simulated snow conditions. They evaluated their algorithm on two winter seasons (2018-2020) and showed that the model is able to predict the danger level chosen by the avalanche forecasters with an accuracy of about 75%. The avalanche danger is not directly measurable and the forecasted avalanche danger cannot be considered as a perfect ground truth. This limits a lot the capacity of this approach. However, to assess the quality of this accuracy, the authors elaborated interesting evaluation strategies based on different data sources: the nowcast of the local danger and on a subset containing verified regional danger data.
Overall, the paper is very interesting and tackles a relevant problem for the snow and avalanche community. The main methodology remains relatively simple and was already applied to different avalanche hazard data but the authors provide a deep analysis of their results to understand their algorithm behavior. In particular, they try to overcome the difficulty that their target variable (the forecasted avalanche danger) is an imperfect ground truth of the avalanche danger. The text is well written and easy to follow. The figures are of high quality. The paper is quite long but a reduction would be at the cost of completeness. My comments mainly concern minor clarifications of the methodology or some statements/findings should be qualified. I have only two major comments that should be adressed before publication.
- In the paper, the algorithm was trained on the winter seasons 1997-1998 to 2017-2018 and evaluated on the latest two winters 2018-2019 and 2019-2020 (line 215-221). The paper findings are thus only based on these two particular years that may exhibit specific avalanche situations. I do not understand why the authors have not repeated their evaluation by extracting any two successive years in their data set and using the rest of the data for training the random forest. Therefore, I am not completely convinced that some of the presented results (some of then based on tiny differences on the evaluation scores) are perfectly robust given the high inter-annual variability of snow conditions.
- The input meteorological and snow data is not forecasted but derived from measurements at AWS. This is somehow expressed in section 2.1 but it appears clearly to me only when it is discussed at the end of the paper (line 536-540): the predicted avalanche danger is a nowcast and not a forecast. I think this should more clearly stated in the abstract and in the methodology as the reader can easily be mixed by « the prediction of the nowcast of the forecast ». Besides, the authors mention in the abstract (line 18-19) that a prototype was used during one winter by the Swiss avalanche warning service. However, there is no more mention of this in the paper (except the same statement in the conclusion). This is not the main scope of the paper but it is legitimate to ask how the nowcast was used/accepted by the warning service.
Minor comments:
- L.3-4 « based on their experience ». Not only. I guess the forecasters also follow some general guidelines as for instance, picking the right level in the EAWS bavarian matrix.
- L.13 « the accuracy ». This term should be defined in the abstract or replaced by plain text, e.g. « the danger level was correctly predicted in the 72% of all cases ». Besides, the danger scale data is highly unbalanced, therefore accuracy might not be the best indicator of the algorithm performance (as explained and shown later in the paper). For instance, I can reach an accuracy of 60% by predicting always predicting 3 in Belledonne (France).
- L.14 « better than previously developed methods ». Remove. I think this is a bit slippery to compare to previous methods as the data, the evaluation strategy, etc. may be different.
- L.16-17 « the accuracy of the current experienced-based Swiss avalanche forecasts ». I would say « agreement » instead of accuracy as we cannot certainly consider the local nowcast as a perfect ground truth too.
- L.23 « predicting stability in time and space ». Generally, the avalanche size is supposed to be also a characteristic of the avalanche danger.
- L.28 « expert judgement » and general guidelines.
- L.47 « the only solution is to used avalanche detection systems ». No it is not the only solution, it is « another » solution. One may also take into account the uncertainty in the human based observation.
- L.68 « intrinsically noisy ». Could you please develop/explain this statement or give some references.
- L.68 « danger level is the most relevant component for communicating the avalanche hazard ». Replace by « an important component ». Indeed, depending on the target public (e.g. mountain guides), the information pyramid of the avalanche bulletin might be different (e.g. avalanche problems on top).
- L.69 « dry-snow conditions ». It might be not clear to every reader how you define dry-snow conditions. Here, I expected that you set a threshold on liquid water content. That is not the case. As far as I have understood there is always an avalanche danger level for dry snow conditions in the avalanche bulletin but sometimes there is also a wet avalanche danger scale when it is higher than the dry one. Is that correct? Please explain it somewhere in the introduction.
- L.98 and elsewhere « 1700 CET » check with the editor how you should write time in this journal. « 17:00 CET » ?
- Figure 2. It appears that there can be more than one station per forecast region. How do you deal with that?
- L.120 « the reliability, which is the trust … as 0.9». I do not understand the number. Provide precise definition.
- L.147 and 148 « accuracy ». Replace by « agreement ».
- L.163 « level was corrected ». You mean corrected during the morning update ? Clarify.
- L.168 « High: (0.3%) » incorrect parenthesis
- Section 4.1. Which hyper parameters did you optimize ? Number of trees, depth of the trees ? And what are their final values ?
- L.257. Explain with plain text how the feature importance is computed by scikit-learn.
- L.273-274. Why did you chose 30 features since you already reached the performance plateau for 20 features ?
- L.295 « This results highlights the impact of using better-balanced training detain RF#2 and less noisy labels ». I am not convinced by this statement. Indeed, you have already indirectly balanced your data set by weighting the different classes by 1 / frequency.
- L. 308-314. I am wondering if the observed bias is not linked to how you weight the different classes. Do you use the same weight for both D and D_tiny even they do not contain the same frequency of danger level? Please clarify how it is done.
- L.317 « The performance of both models improved when tested against the best possible test data ». Misleading statement (for RF2) to be changed. Indeed, you explain correctly that the RF perform at best on the set of data they were partially trained on, no link with data quality for RF2.
- Section 5.3. Reading this section raised a question on the methodology. The training is done on all station together (any station.day adds a line in the data set) or is there a RF per station ? Clarify in the methods and maybe discuss these two approaches.
- L.404-406. The impact of a slight distribution difference of the danger level on the overall accuracy might be quantified and I doubt that it is the reason for the geographical differences.
- Figure 10. Recall on the figure or in the legend the « sense » of Delta. E.g. Delta_ elevation = station elevation - bulletin elevation limit.
- Table 3. Add the distribution of increasing, equal and decreasing danger level for each level.
- L420-430. Add the unit « m » when giving numbers for Delta_elevation.
- L.455; « intrisically noisier ». Again give justification when you state that earlier in the text.
- L.475 « REF2 performs better on D_tidy ». Not the point here and not a justification of what is stated just before. RF2 performs better on D_tidy compared to RF1 because it is trained on D_tidy (the test subset).
- L.485 « cost sensitive learning ». I am wondering whether this is somehow not equivalent to duplicating the minority classes and the following statement « reflecting the positive impact of balancing the training ratio » seems over-stated (no proof).
- L.527. « phenomenon » . Avalanche danger is not a phenomenon.
- Section 6.5. Clarify if the described studies apply also only to dry snow conditions.
- Conclusion. Mention the fact that for the moment it is only a nowcast tool.
Pascal Hagenmuller
- AC1: 'Reply on RC1', Cristina Pérez-Guillén, 27 Jan 2022
-
RC2: 'Comment on nhess-2021-341', Karsten Müller, 05 Jan 2022
Review of Data-driven automated predictions of the avalanche danger level for dry-snow conditions in Switzerland
### General comments
The paper presents the development of a machine-learning model capable of assessing the avalanche danger level based on input data from automatic weather stations and a snowpack model in the Swiss Alps. The models are trained using a large data set of forecasted danger levels and a filtered subset of "re-assessed" danger levels from local nowcasts.
Compared to previous studies the presented paper uses a much larger and well-refined data set. The trained machine-learning models achieve performances comparable to human forecasters throughout the region of the Swiss Alps. Previous studies did either have either poorer performance or were more limited in their spatial extend.The topic is of scientific interest and value for avalanche researchers, forecasting services and stakeholders. The topic is within the scope of NHESS.
The authors present their study in a clear manner. The manuscript is well written and structured.
The abstract provides a good summary of the goals, methods and conclusions of the presented study.
Tables and figures are of high quality and readability contributing to the good overall impression of the paper. The language is precise and understandable.The paper is long. However, it combines the field of avalanche forecasting and machine-learning using the Random Forest algorithm and needs to (and does) explain both concepts to the reader potentially being unfamiliar with one or both of them. I therefore only have minor suggestion on how to shorten it - see specific comments.
It is not clear from this paper how you apply or intend to apply the model in a forecasting setting since it is trained and run on input data measured and modeled at an automatic weather station. I also miss a discussion on the how to apply the models in an operational setting and the expected benefits in supporting the human avalanche forecaster - see specific comments.
### Specific comments
l-171 Your models are trained on station data. That means they require a measurement and a subsequent SNOWPACK model output to be applied. Thus, RF#1 and RF#2 as described in this paper only provide a hindcast or nowcast.
In order to be used operational your models need be run with input data from weather prediction models and the corresponding output from SNOWPACK at the location of IMIS stations. As far as I can see this is not addressed in your paper. Please add or reference information on how this is or could be done. I expect that the transition from the spatial resolution of the weather model to the station site (especially in mountainous terrain) poses some scaling issues which might have an effect on performance/accuracy. This should be addressed in the discussion e.g. in connection to section 6.3.l-207 Why do you only filter by elevation and not by aspect? I assume you do not filter by aspect because most (all used?) IMIS stations are on a flat field and thus cannot be assigned an aspect. Please add a short explanation.
l-216 It seems legitimate to use the most recent winter seasons as test data. However, it should be ensured and stated that these do not exhibit any special avalanche conditions not or barely seen during previous winters - have you considered/tested a random draw from all data with an equal amount from each month as an alternative? If yes, what was the effect on model accuracy.
l-275 "Note that this last step..." - what do you mean by this sentence? It is not clear to me to which "last step" you refer and what the effect on model performance is. Could you clarify?
l-355 While the section "Exemplary case studies" is useful for the reader in order to get an overview over potential model outcomes in relation to published avalanche forecasts, it is not necessary for the understanding of the paper. Considering that the paper is already very long, I suggest to move this section and Fig.8 to the Appendix or provide it as supplementary material.
l-328 What is the "daily averaged accuracy"? Is it the average of the predictions from RF#1 and RF#2 or is it the average of the results from all stations within a forecasting region with regard to Dforecast for that region?
l-405 The last two sentences in this section should be revised. I understand it such that performance was lower because the danger levels (1 and 3) - that have highest prediction performance - are less common in these regions. However, I had to read it several times to understand what you mean.
It would also be interesting to know if you could identify common traits for stations/sites that had a high accuracy (e.g. >0.8): specific elevations, typical snow or weather conditions?l-540 see comment for l-171
l-573 Your features include several stability indices and information on weak layers. Does that mean the provided stability information from SNOWPACK is not good enough to detect/predict persistent weak layers or the stability related to them?
l-591 Could you discuss the intended operational application of the models and their main benefits to the human forecaster in more depth. I could imagine that the models would be useful in deciding when to increase or decrease the danger level and to assess the spatial or temporal extend of a given danger level.
l-602 It would also be interesting to know in the discussion what your expectations on model performance are. I would argue that your results are as best as it can get. You state that a human forecaster has an average accuracy of 76%. You use the assessment by the human forecaster as your labels. Thus, the model inherits human mistakes and biases. For RF#2 these biases are somewhat corrected for or at least replaced by biases or mistakes in human assessed nowcasts.
l-603 It is not clear from you paper that your model "predicts" avalanche danger. I read it that your model can be used to validate or quality control a published forecast once data has been measured at an IMIS station.
l-610 see comment for l-573
### Technical commentsl-141 "...which jointly account for more than 75% of the cases." Change to "which jointly account for 77% of the cases."
Fig.3 ideally the y-axis of the DL proportion [%] plot for Dforecast would have the same maximum value - currently these are 50% and 40%.
l-311 "...the two models...", missing "s"
l-318 remove one "particularly"
l-422 spelling "Eq. 1"
l-463 Split this sentence in two.
l-474 "...only the 10%..." - remove "the"
l-581 Change to "..., predicting high probabilities for both danger levels."
l-587 remove one "the" and the end of the line
- AC2: 'Reply on RC2', Cristina Pérez-Guillén, 27 Jan 2022
Status: closed
-
RC1: 'Comment on nhess-2021-341', Pascal Hagenmuller, 04 Jan 2022
Overall comment :
The paper tackles an interesting problem of providing decision-aid tools to avalanche forecasters based on modern simulation tools and large quantity of data. The authors used random forests to reproduce the regional avalanche danger (human forecast) on a four level scale in dry conditions from meteorological data measured at automatic weather stations and the corresponding simulated snow conditions. They evaluated their algorithm on two winter seasons (2018-2020) and showed that the model is able to predict the danger level chosen by the avalanche forecasters with an accuracy of about 75%. The avalanche danger is not directly measurable and the forecasted avalanche danger cannot be considered as a perfect ground truth. This limits a lot the capacity of this approach. However, to assess the quality of this accuracy, the authors elaborated interesting evaluation strategies based on different data sources: the nowcast of the local danger and on a subset containing verified regional danger data.
Overall, the paper is very interesting and tackles a relevant problem for the snow and avalanche community. The main methodology remains relatively simple and was already applied to different avalanche hazard data but the authors provide a deep analysis of their results to understand their algorithm behavior. In particular, they try to overcome the difficulty that their target variable (the forecasted avalanche danger) is an imperfect ground truth of the avalanche danger. The text is well written and easy to follow. The figures are of high quality. The paper is quite long but a reduction would be at the cost of completeness. My comments mainly concern minor clarifications of the methodology or some statements/findings should be qualified. I have only two major comments that should be adressed before publication.
- In the paper, the algorithm was trained on the winter seasons 1997-1998 to 2017-2018 and evaluated on the latest two winters 2018-2019 and 2019-2020 (line 215-221). The paper findings are thus only based on these two particular years that may exhibit specific avalanche situations. I do not understand why the authors have not repeated their evaluation by extracting any two successive years in their data set and using the rest of the data for training the random forest. Therefore, I am not completely convinced that some of the presented results (some of then based on tiny differences on the evaluation scores) are perfectly robust given the high inter-annual variability of snow conditions.
- The input meteorological and snow data is not forecasted but derived from measurements at AWS. This is somehow expressed in section 2.1 but it appears clearly to me only when it is discussed at the end of the paper (line 536-540): the predicted avalanche danger is a nowcast and not a forecast. I think this should more clearly stated in the abstract and in the methodology as the reader can easily be mixed by « the prediction of the nowcast of the forecast ». Besides, the authors mention in the abstract (line 18-19) that a prototype was used during one winter by the Swiss avalanche warning service. However, there is no more mention of this in the paper (except the same statement in the conclusion). This is not the main scope of the paper but it is legitimate to ask how the nowcast was used/accepted by the warning service.
Minor comments:
- L.3-4 « based on their experience ». Not only. I guess the forecasters also follow some general guidelines as for instance, picking the right level in the EAWS bavarian matrix.
- L.13 « the accuracy ». This term should be defined in the abstract or replaced by plain text, e.g. « the danger level was correctly predicted in the 72% of all cases ». Besides, the danger scale data is highly unbalanced, therefore accuracy might not be the best indicator of the algorithm performance (as explained and shown later in the paper). For instance, I can reach an accuracy of 60% by predicting always predicting 3 in Belledonne (France).
- L.14 « better than previously developed methods ». Remove. I think this is a bit slippery to compare to previous methods as the data, the evaluation strategy, etc. may be different.
- L.16-17 « the accuracy of the current experienced-based Swiss avalanche forecasts ». I would say « agreement » instead of accuracy as we cannot certainly consider the local nowcast as a perfect ground truth too.
- L.23 « predicting stability in time and space ». Generally, the avalanche size is supposed to be also a characteristic of the avalanche danger.
- L.28 « expert judgement » and general guidelines.
- L.47 « the only solution is to used avalanche detection systems ». No it is not the only solution, it is « another » solution. One may also take into account the uncertainty in the human based observation.
- L.68 « intrinsically noisy ». Could you please develop/explain this statement or give some references.
- L.68 « danger level is the most relevant component for communicating the avalanche hazard ». Replace by « an important component ». Indeed, depending on the target public (e.g. mountain guides), the information pyramid of the avalanche bulletin might be different (e.g. avalanche problems on top).
- L.69 « dry-snow conditions ». It might be not clear to every reader how you define dry-snow conditions. Here, I expected that you set a threshold on liquid water content. That is not the case. As far as I have understood there is always an avalanche danger level for dry snow conditions in the avalanche bulletin but sometimes there is also a wet avalanche danger scale when it is higher than the dry one. Is that correct? Please explain it somewhere in the introduction.
- L.98 and elsewhere « 1700 CET » check with the editor how you should write time in this journal. « 17:00 CET » ?
- Figure 2. It appears that there can be more than one station per forecast region. How do you deal with that?
- L.120 « the reliability, which is the trust … as 0.9». I do not understand the number. Provide precise definition.
- L.147 and 148 « accuracy ». Replace by « agreement ».
- L.163 « level was corrected ». You mean corrected during the morning update ? Clarify.
- L.168 « High: (0.3%) » incorrect parenthesis
- Section 4.1. Which hyper parameters did you optimize ? Number of trees, depth of the trees ? And what are their final values ?
- L.257. Explain with plain text how the feature importance is computed by scikit-learn.
- L.273-274. Why did you chose 30 features since you already reached the performance plateau for 20 features ?
- L.295 « This results highlights the impact of using better-balanced training detain RF#2 and less noisy labels ». I am not convinced by this statement. Indeed, you have already indirectly balanced your data set by weighting the different classes by 1 / frequency.
- L. 308-314. I am wondering if the observed bias is not linked to how you weight the different classes. Do you use the same weight for both D and D_tiny even they do not contain the same frequency of danger level? Please clarify how it is done.
- L.317 « The performance of both models improved when tested against the best possible test data ». Misleading statement (for RF2) to be changed. Indeed, you explain correctly that the RF perform at best on the set of data they were partially trained on, no link with data quality for RF2.
- Section 5.3. Reading this section raised a question on the methodology. The training is done on all station together (any station.day adds a line in the data set) or is there a RF per station ? Clarify in the methods and maybe discuss these two approaches.
- L.404-406. The impact of a slight distribution difference of the danger level on the overall accuracy might be quantified and I doubt that it is the reason for the geographical differences.
- Figure 10. Recall on the figure or in the legend the « sense » of Delta. E.g. Delta_ elevation = station elevation - bulletin elevation limit.
- Table 3. Add the distribution of increasing, equal and decreasing danger level for each level.
- L420-430. Add the unit « m » when giving numbers for Delta_elevation.
- L.455; « intrisically noisier ». Again give justification when you state that earlier in the text.
- L.475 « REF2 performs better on D_tidy ». Not the point here and not a justification of what is stated just before. RF2 performs better on D_tidy compared to RF1 because it is trained on D_tidy (the test subset).
- L.485 « cost sensitive learning ». I am wondering whether this is somehow not equivalent to duplicating the minority classes and the following statement « reflecting the positive impact of balancing the training ratio » seems over-stated (no proof).
- L.527. « phenomenon » . Avalanche danger is not a phenomenon.
- Section 6.5. Clarify if the described studies apply also only to dry snow conditions.
- Conclusion. Mention the fact that for the moment it is only a nowcast tool.
Pascal Hagenmuller
- AC1: 'Reply on RC1', Cristina Pérez-Guillén, 27 Jan 2022
-
RC2: 'Comment on nhess-2021-341', Karsten Müller, 05 Jan 2022
Review of Data-driven automated predictions of the avalanche danger level for dry-snow conditions in Switzerland
### General comments
The paper presents the development of a machine-learning model capable of assessing the avalanche danger level based on input data from automatic weather stations and a snowpack model in the Swiss Alps. The models are trained using a large data set of forecasted danger levels and a filtered subset of "re-assessed" danger levels from local nowcasts.
Compared to previous studies the presented paper uses a much larger and well-refined data set. The trained machine-learning models achieve performances comparable to human forecasters throughout the region of the Swiss Alps. Previous studies did either have either poorer performance or were more limited in their spatial extend.The topic is of scientific interest and value for avalanche researchers, forecasting services and stakeholders. The topic is within the scope of NHESS.
The authors present their study in a clear manner. The manuscript is well written and structured.
The abstract provides a good summary of the goals, methods and conclusions of the presented study.
Tables and figures are of high quality and readability contributing to the good overall impression of the paper. The language is precise and understandable.The paper is long. However, it combines the field of avalanche forecasting and machine-learning using the Random Forest algorithm and needs to (and does) explain both concepts to the reader potentially being unfamiliar with one or both of them. I therefore only have minor suggestion on how to shorten it - see specific comments.
It is not clear from this paper how you apply or intend to apply the model in a forecasting setting since it is trained and run on input data measured and modeled at an automatic weather station. I also miss a discussion on the how to apply the models in an operational setting and the expected benefits in supporting the human avalanche forecaster - see specific comments.
### Specific comments
l-171 Your models are trained on station data. That means they require a measurement and a subsequent SNOWPACK model output to be applied. Thus, RF#1 and RF#2 as described in this paper only provide a hindcast or nowcast.
In order to be used operational your models need be run with input data from weather prediction models and the corresponding output from SNOWPACK at the location of IMIS stations. As far as I can see this is not addressed in your paper. Please add or reference information on how this is or could be done. I expect that the transition from the spatial resolution of the weather model to the station site (especially in mountainous terrain) poses some scaling issues which might have an effect on performance/accuracy. This should be addressed in the discussion e.g. in connection to section 6.3.l-207 Why do you only filter by elevation and not by aspect? I assume you do not filter by aspect because most (all used?) IMIS stations are on a flat field and thus cannot be assigned an aspect. Please add a short explanation.
l-216 It seems legitimate to use the most recent winter seasons as test data. However, it should be ensured and stated that these do not exhibit any special avalanche conditions not or barely seen during previous winters - have you considered/tested a random draw from all data with an equal amount from each month as an alternative? If yes, what was the effect on model accuracy.
l-275 "Note that this last step..." - what do you mean by this sentence? It is not clear to me to which "last step" you refer and what the effect on model performance is. Could you clarify?
l-355 While the section "Exemplary case studies" is useful for the reader in order to get an overview over potential model outcomes in relation to published avalanche forecasts, it is not necessary for the understanding of the paper. Considering that the paper is already very long, I suggest to move this section and Fig.8 to the Appendix or provide it as supplementary material.
l-328 What is the "daily averaged accuracy"? Is it the average of the predictions from RF#1 and RF#2 or is it the average of the results from all stations within a forecasting region with regard to Dforecast for that region?
l-405 The last two sentences in this section should be revised. I understand it such that performance was lower because the danger levels (1 and 3) - that have highest prediction performance - are less common in these regions. However, I had to read it several times to understand what you mean.
It would also be interesting to know if you could identify common traits for stations/sites that had a high accuracy (e.g. >0.8): specific elevations, typical snow or weather conditions?l-540 see comment for l-171
l-573 Your features include several stability indices and information on weak layers. Does that mean the provided stability information from SNOWPACK is not good enough to detect/predict persistent weak layers or the stability related to them?
l-591 Could you discuss the intended operational application of the models and their main benefits to the human forecaster in more depth. I could imagine that the models would be useful in deciding when to increase or decrease the danger level and to assess the spatial or temporal extend of a given danger level.
l-602 It would also be interesting to know in the discussion what your expectations on model performance are. I would argue that your results are as best as it can get. You state that a human forecaster has an average accuracy of 76%. You use the assessment by the human forecaster as your labels. Thus, the model inherits human mistakes and biases. For RF#2 these biases are somewhat corrected for or at least replaced by biases or mistakes in human assessed nowcasts.
l-603 It is not clear from you paper that your model "predicts" avalanche danger. I read it that your model can be used to validate or quality control a published forecast once data has been measured at an IMIS station.
l-610 see comment for l-573
### Technical commentsl-141 "...which jointly account for more than 75% of the cases." Change to "which jointly account for 77% of the cases."
Fig.3 ideally the y-axis of the DL proportion [%] plot for Dforecast would have the same maximum value - currently these are 50% and 40%.
l-311 "...the two models...", missing "s"
l-318 remove one "particularly"
l-422 spelling "Eq. 1"
l-463 Split this sentence in two.
l-474 "...only the 10%..." - remove "the"
l-581 Change to "..., predicting high probabilities for both danger levels."
l-587 remove one "the" and the end of the line
- AC2: 'Reply on RC2', Cristina Pérez-Guillén, 27 Jan 2022
Cristina Pérez-Guillén et al.
Cristina Pérez-Guillén et al.
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
447 | 193 | 16 | 656 | 29 | 9 | 9 |
- HTML: 447
- PDF: 193
- XML: 16
- Total: 656
- Supplement: 29
- BibTeX: 9
- EndNote: 9
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1