the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A Comparative Analysis of Machine Learning Algorithms for Snowfall Prediction Models in South Korea
Abstract. Heavy snowfall is a natural disaster that causes extensive damage in South Korea. Therefore, it is crucial to predict snowfall occurrence and establish countermeasures to reduce the damage caused by heavy snowfall. In this study, the meteorological and geographic data of the past 30 years were collected, and four machine learning algorithms were used: multiple linear regression (MLR), support vector regression (SVR), random forest regressor (RFR), and eXtreme gradient boosting (XGB). Subsequently, the performances of the machine learning algorithms were compared. Machine-learning algorithms were selected as regression models to predict heavy snowfall. Additionally, grid search and five-fold cross-validation techniques were used to improve learning performance. Model performance was evaluated by comparing the observed and predicted data. It was observed that the RFR model accurately predicted the occurrence of snowfall (R2 = 0.64) compared with other models with various statistical criteria. This result demonstrates the possibility of using the RFR model for heavy snowfall prediction. The proposed study can aid the government, local governments, and public institutions in developing strategies to respond to heavy snowfall in the fields of facilities, roads, and transportation.
- Preprint
(583 KB) - Metadata XML
- BibTeX
- EndNote
Status: closed
-
RC1: 'Comment on nhess-2022-118', Anonymous Referee #1, 25 Apr 2022
This manuscript compared the skill of four machine learning algorithms, including multiple linear regression (MLR), support vector regression (SVR), random forest regressor (RFR), and eXtreme gradient boosting (XGB) for snowfall estimation in South Korea. Meteorological data (minimum temperature, maximum temperature, precipitation, and relative humidity) from 1991–2020 during the winter season (October to April) collected from the automated synoptic observing system, and geographic data (latitude, longitude, and altitude) were used as the input variables and the measured snow depth was used as the output variable for machine learning model training. The results indicate the RFR performs the best among the four machine learning algorithms with an R2 of 0.64.
The work is interesting, however, the main drawback of this work is that it is too basic and simple. A great deal of similar works have been carried out in previous studies, and some of them have been summarized by the authors (Line 55-97). In the introduction, the authors only mentioned such previous works, but did not point out the problem which remains to be solved in the current work (i.e., the motivation of this study). In other words, if the paper is only a simple imitation of previous studies, it is not innovative.
Other comments:
Line 41-45: add references.
Line 81: where is the reference of “Liang et al. (2015)”?
Section 2.1: only meteorological data were used in the study. Due to the limited spatial coverage of the stations, why the authors did not consider other large-scale data such as remote sensing data or model (reanalysis) based data?
Fig. 1: this figure lacks longitude and latitude information. Moreover, its quality can be improved, e.g., you can use the legend information to represent the stations but do not need to list all the station names.
Line 129: where is the reference of “Ainiyah et al., 2016”?
Line 130: where is the reference of “Mallick et al., 2021”?
Line 140: should be seven inputs and one output?
Fig. 2: isn't the average temperature excluded due to the high collinearity issue?
Line 153: it is better to add a section entitled “2.2 Machine Learning Methods” before “2.2 MLR”. Moreover, there are numerous machine learning methods, why did you select the four methods?
Line 198: delete “Tianqi”.
Line 215: MSE and RMSE play the same role in the evaluation. You can only preserve RMSE.
Line 255-256: add unit for MAE, MSE, and RMSE.
Table 4: add unit for MAE, MSE, and RMSE.
Fig. 5: add unit for snowfall.
Citation: https://doi.org/10.5194/nhess-2022-118-RC1 - AC2: 'Reply on RC1', Sang-Guk Yum, 27 Jun 2022
-
AC1: 'Comment on nhess-2022-118', Sang-Guk Yum, 26 Apr 2022
I hope all is well with you. In the submitted manuscript, I have found a minor typo in the authors' affiliations. In the current preprint version, the author's information is below;
Moon-Soo Song1, Hong-Sik Yun1, Jae-Joon Lee2, and Sang-Guk Yum3
1Post-doctorate, Ph.D., Interdisciplinary Program in Crisis, Disaster and Risk Management, Sungkyunkwan University, Suwon, 16419, Korea
2Professor, Ph.D., School of Civil, Architectural Engineering & Landscape Architecture, Sungkyunkwan University, Suwon, 16419, Korea
3Professor, Ph.D., Department of Civil Engineering, College of Engineering, Gangneung-Wonju National University, Gangneung, 25457, Korea
Please correct the author's affiliations as below;Moon-Soo Song1, Hong-Sik Yun2, Jae-Joon Lee1, and Sang-Guk Yum3
1Post-doctorate, Ph.D., Interdisciplinary Program in Crisis, Disaster and Risk Management, Sungkyunkwan University, Suwon, 16419, Korea
2Professor, Ph.D., School of Civil, Architectural Engineering & Landscape Architecture, Sungkyunkwan University, Suwon, 16419, Korea
3Professor, Ph.D., Department of Civil Engineering, College of Engineering, Gangneung-Wonju National University, Gangneung, 25457, Korea
I hope it can be fixed in this review or revision stage.Citation: https://doi.org/10.5194/nhess-2022-118-AC1 -
RC2: 'Comment on nhess-2022-118', Anonymous Referee #2, 24 May 2022
The authors compared four machine learning algorithms, and derived an optimal model to predict heavy snowfall; however, the method and analysis process used are general and lack novelty.
- In Chapter 1. Introduction, the authors only mentioned previous research, but did not describe this research’s superiority such as contributions and novelty of the research.
- In Chapter 2. Materials and methods, it should have mentioned why the four machine learning algorithms (MLR, SVR, RFR, and XGB) were chosen among the other various machine learning algorithms.
- In Chapter 4. Discussion and Conclusions, specific discussions on applicability of the optimal model, which is connected with RCP scenario and heavy snowfall disaster management, should be presented. Furthermore, there were a few sentences to confuse readers (for example, lines 269-272). The authors need to clarify those sentences in the manuscript, and proofread the manuscript before submission.
- Furthermore, there are additional explanation of the a, b, c, and d of figure 5 should be added to make clear to the readers.
- Please add more relevant literature review with up-to-date.
Citation: https://doi.org/10.5194/nhess-2022-118-RC2 - AC3: 'Reply on RC2', Sang-Guk Yum, 27 Jun 2022
Status: closed
-
RC1: 'Comment on nhess-2022-118', Anonymous Referee #1, 25 Apr 2022
This manuscript compared the skill of four machine learning algorithms, including multiple linear regression (MLR), support vector regression (SVR), random forest regressor (RFR), and eXtreme gradient boosting (XGB) for snowfall estimation in South Korea. Meteorological data (minimum temperature, maximum temperature, precipitation, and relative humidity) from 1991–2020 during the winter season (October to April) collected from the automated synoptic observing system, and geographic data (latitude, longitude, and altitude) were used as the input variables and the measured snow depth was used as the output variable for machine learning model training. The results indicate the RFR performs the best among the four machine learning algorithms with an R2 of 0.64.
The work is interesting, however, the main drawback of this work is that it is too basic and simple. A great deal of similar works have been carried out in previous studies, and some of them have been summarized by the authors (Line 55-97). In the introduction, the authors only mentioned such previous works, but did not point out the problem which remains to be solved in the current work (i.e., the motivation of this study). In other words, if the paper is only a simple imitation of previous studies, it is not innovative.
Other comments:
Line 41-45: add references.
Line 81: where is the reference of “Liang et al. (2015)”?
Section 2.1: only meteorological data were used in the study. Due to the limited spatial coverage of the stations, why the authors did not consider other large-scale data such as remote sensing data or model (reanalysis) based data?
Fig. 1: this figure lacks longitude and latitude information. Moreover, its quality can be improved, e.g., you can use the legend information to represent the stations but do not need to list all the station names.
Line 129: where is the reference of “Ainiyah et al., 2016”?
Line 130: where is the reference of “Mallick et al., 2021”?
Line 140: should be seven inputs and one output?
Fig. 2: isn't the average temperature excluded due to the high collinearity issue?
Line 153: it is better to add a section entitled “2.2 Machine Learning Methods” before “2.2 MLR”. Moreover, there are numerous machine learning methods, why did you select the four methods?
Line 198: delete “Tianqi”.
Line 215: MSE and RMSE play the same role in the evaluation. You can only preserve RMSE.
Line 255-256: add unit for MAE, MSE, and RMSE.
Table 4: add unit for MAE, MSE, and RMSE.
Fig. 5: add unit for snowfall.
Citation: https://doi.org/10.5194/nhess-2022-118-RC1 - AC2: 'Reply on RC1', Sang-Guk Yum, 27 Jun 2022
-
AC1: 'Comment on nhess-2022-118', Sang-Guk Yum, 26 Apr 2022
I hope all is well with you. In the submitted manuscript, I have found a minor typo in the authors' affiliations. In the current preprint version, the author's information is below;
Moon-Soo Song1, Hong-Sik Yun1, Jae-Joon Lee2, and Sang-Guk Yum3
1Post-doctorate, Ph.D., Interdisciplinary Program in Crisis, Disaster and Risk Management, Sungkyunkwan University, Suwon, 16419, Korea
2Professor, Ph.D., School of Civil, Architectural Engineering & Landscape Architecture, Sungkyunkwan University, Suwon, 16419, Korea
3Professor, Ph.D., Department of Civil Engineering, College of Engineering, Gangneung-Wonju National University, Gangneung, 25457, Korea
Please correct the author's affiliations as below;Moon-Soo Song1, Hong-Sik Yun2, Jae-Joon Lee1, and Sang-Guk Yum3
1Post-doctorate, Ph.D., Interdisciplinary Program in Crisis, Disaster and Risk Management, Sungkyunkwan University, Suwon, 16419, Korea
2Professor, Ph.D., School of Civil, Architectural Engineering & Landscape Architecture, Sungkyunkwan University, Suwon, 16419, Korea
3Professor, Ph.D., Department of Civil Engineering, College of Engineering, Gangneung-Wonju National University, Gangneung, 25457, Korea
I hope it can be fixed in this review or revision stage.Citation: https://doi.org/10.5194/nhess-2022-118-AC1 -
RC2: 'Comment on nhess-2022-118', Anonymous Referee #2, 24 May 2022
The authors compared four machine learning algorithms, and derived an optimal model to predict heavy snowfall; however, the method and analysis process used are general and lack novelty.
- In Chapter 1. Introduction, the authors only mentioned previous research, but did not describe this research’s superiority such as contributions and novelty of the research.
- In Chapter 2. Materials and methods, it should have mentioned why the four machine learning algorithms (MLR, SVR, RFR, and XGB) were chosen among the other various machine learning algorithms.
- In Chapter 4. Discussion and Conclusions, specific discussions on applicability of the optimal model, which is connected with RCP scenario and heavy snowfall disaster management, should be presented. Furthermore, there were a few sentences to confuse readers (for example, lines 269-272). The authors need to clarify those sentences in the manuscript, and proofread the manuscript before submission.
- Furthermore, there are additional explanation of the a, b, c, and d of figure 5 should be added to make clear to the readers.
- Please add more relevant literature review with up-to-date.
Citation: https://doi.org/10.5194/nhess-2022-118-RC2 - AC3: 'Reply on RC2', Sang-Guk Yum, 27 Jun 2022
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
874 | 308 | 51 | 1,233 | 34 | 38 |
- HTML: 874
- PDF: 308
- XML: 51
- Total: 1,233
- BibTeX: 34
- EndNote: 38
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1