A Comparative Analysis of Machine Learning Algorithms for Snowfall Prediction Models in South Korea

Song, Moon-Soo; Yun, Hong-Sik; Lee, Jae-Joon; Yum, Sang-Guk

doi:https://doi.org/10.5194/nhess-2022-118

Preprints

https://doi.org/10.5194/nhess-2022-118

Preprints

22 Apr 2022

| 22 Apr 2022

Status: this discussion paper is a preprint. It has been under review for the journal Natural Hazards and Earth System Sciences (NHESS). The manuscript was not accepted for further review after discussion.

A Comparative Analysis of Machine Learning Algorithms for Snowfall Prediction Models in South Korea

Moon-Soo Song, Hong-Sik Yun, Jae-Joon Lee, and Sang-Guk Yum

Abstract. Heavy snowfall is a natural disaster that causes extensive damage in South Korea. Therefore, it is crucial to predict snowfall occurrence and establish countermeasures to reduce the damage caused by heavy snowfall. In this study, the meteorological and geographic data of the past 30 years were collected, and four machine learning algorithms were used: multiple linear regression (MLR), support vector regression (SVR), random forest regressor (RFR), and eXtreme gradient boosting (XGB). Subsequently, the performances of the machine learning algorithms were compared. Machine-learning algorithms were selected as regression models to predict heavy snowfall. Additionally, grid search and five-fold cross-validation techniques were used to improve learning performance. Model performance was evaluated by comparing the observed and predicted data. It was observed that the RFR model accurately predicted the occurrence of snowfall (R²= 0.64) compared with other models with various statistical criteria. This result demonstrates the possibility of using the RFR model for heavy snowfall prediction. The proposed study can aid the government, local governments, and public institutions in developing strategies to respond to heavy snowfall in the fields of facilities, roads, and transportation.

Received: 08 Apr 2022 – Discussion started: 22 Apr 2022

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.

Download & links

Moon-Soo Song, Hong-Sik Yun, Jae-Joon Lee, and Sang-Guk Yum

Status: closed

RC1:
'Comment on nhess-2022-118', Anonymous Referee #1, 25 Apr 2022

This manuscript compared the skill of four machine learning algorithms, including multiple linear regression (MLR), support vector regression (SVR), random forest regressor (RFR), and eXtreme gradient boosting (XGB) for snowfall estimation in South Korea. Meteorological data (minimum temperature, maximum temperature, precipitation, and relative humidity) from 1991–2020 during the winter season (October to April) collected from the automated synoptic observing system, and geographic data (latitude, longitude, and altitude) were used as the input variables and the measured snow depth was used as the output variable for machine learning model training. The results indicate the RFR performs the best among the four machine learning algorithms with an R² of 0.64.

The work is interesting, however, the main drawback of this work is that it is too basic and simple. A great deal of similar works have been carried out in previous studies, and some of them have been summarized by the authors (Line 55-97). In the introduction, the authors only mentioned such previous works, but did not point out the problem which remains to be solved in the current work (i.e., the motivation of this study). In other words, if the paper is only a simple imitation of previous studies, it is not innovative.

Other comments:

Line 41-45: add references.

Line 81: where is the reference of “Liang et al. (2015)”?

Section 2.1: only meteorological data were used in the study. Due to the limited spatial coverage of the stations, why the authors did not consider other large-scale data such as remote sensing data or model (reanalysis) based data?

Fig. 1: this figure lacks longitude and latitude information. Moreover, its quality can be improved, e.g., you can use the legend information to represent the stations but do not need to list all the station names.

Line 129: where is the reference of “Ainiyah et al., 2016”?

Line 130: where is the reference of “Mallick et al., 2021”?

Line 140: should be seven inputs and one output?

Fig. 2: isn't the average temperature excluded due to the high collinearity issue?

Line 153: it is better to add a section entitled “2.2 Machine Learning Methods” before “2.2 MLR”. Moreover, there are numerous machine learning methods, why did you select the four methods?

Line 198: delete “Tianqi”.

Line 215: MSE and RMSE play the same role in the evaluation. You can only preserve RMSE.

Line 255-256: add unit for MAE, MSE, and RMSE.

Table 4: add unit for MAE, MSE, and RMSE.

Fig. 5: add unit for snowfall.

Citation: https://doi.org/10.5194/nhess-2022-118-RC1
- AC2: 'Reply on RC1', Sang-Guk Yum, 27 Jun 2022
  
  Thanks for your constructive comments on this manuscript. We have attached the response letter.
  
  Citation: https://doi.org/10.5194/nhess-2022-118-AC2
AC1: 'Comment on nhess-2022-118', Sang-Guk Yum, 26 Apr 2022

I hope all is well with you. In the submitted manuscript, I have found a minor typo in the authors' affiliations. In the current preprint version, the author's information is below;
Moon-Soo Song¹, Hong-Sik Yun¹, Jae-Joon Lee², and Sang-Guk Yum³
¹Post-doctorate, Ph.D., Interdisciplinary Program in Crisis, Disaster and Risk Management, Sungkyunkwan University, Suwon, 16419, Korea
²Professor, Ph.D., School of Civil, Architectural Engineering & Landscape Architecture, Sungkyunkwan University, Suwon, 16419, Korea
³Professor, Ph.D., Department of Civil Engineering, College of Engineering, Gangneung-Wonju National University, Gangneung, 25457, Korea

Please correct the author's affiliations as below;

Moon-Soo Song¹, Hong-Sik Yun², Jae-Joon Lee¹, and Sang-Guk Yum³

¹Post-doctorate, Ph.D., Interdisciplinary Program in Crisis, Disaster and Risk Management, Sungkyunkwan University, Suwon, 16419, Korea

²Professor, Ph.D., School of Civil, Architectural Engineering & Landscape Architecture, Sungkyunkwan University, Suwon, 16419, Korea

³Professor, Ph.D., Department of Civil Engineering, College of Engineering, Gangneung-Wonju National University, Gangneung, 25457, Korea

I hope it can be fixed in this review or revision stage.

Citation: https://doi.org/10.5194/nhess-2022-118-AC1
RC2:
'Comment on nhess-2022-118', Anonymous Referee #2, 24 May 2022
The authors compared four machine learning algorithms, and derived an optimal model to predict heavy snowfall; however, the method and analysis process used are general and lack novelty.

In Chapter 1. Introduction, the authors only mentioned previous research, but did not describe this research’s superiority such as contributions and novelty of the research.

In Chapter 2. Materials and methods, it should have mentioned why the four machine learning algorithms (MLR, SVR, RFR, and XGB) were chosen among the other various machine learning algorithms.

In Chapter 4. Discussion and Conclusions, specific discussions on applicability of the optimal model, which is connected with RCP scenario and heavy snowfall disaster management, should be presented. Furthermore, there were a few sentences to confuse readers (for example, lines 269-272). The authors need to clarify those sentences in the manuscript, and proofread the manuscript before submission.

Furthermore, there are additional explanation of the a, b, c, and d of figure 5 should be added to make clear to the readers.

Please add more relevant literature review with up-to-date.
Citation: https://doi.org/10.5194/nhess-2022-118-RC2
- AC3: 'Reply on RC2', Sang-Guk Yum, 27 Jun 2022
  
  We have attached the response letter in the attachment. Thanks for your valuable comments on the manuscript.
  
  Citation: https://doi.org/10.5194/nhess-2022-118-AC3

Status: closed

RC1:
'Comment on nhess-2022-118', Anonymous Referee #1, 25 Apr 2022

This manuscript compared the skill of four machine learning algorithms, including multiple linear regression (MLR), support vector regression (SVR), random forest regressor (RFR), and eXtreme gradient boosting (XGB) for snowfall estimation in South Korea. Meteorological data (minimum temperature, maximum temperature, precipitation, and relative humidity) from 1991–2020 during the winter season (October to April) collected from the automated synoptic observing system, and geographic data (latitude, longitude, and altitude) were used as the input variables and the measured snow depth was used as the output variable for machine learning model training. The results indicate the RFR performs the best among the four machine learning algorithms with an R² of 0.64.

The work is interesting, however, the main drawback of this work is that it is too basic and simple. A great deal of similar works have been carried out in previous studies, and some of them have been summarized by the authors (Line 55-97). In the introduction, the authors only mentioned such previous works, but did not point out the problem which remains to be solved in the current work (i.e., the motivation of this study). In other words, if the paper is only a simple imitation of previous studies, it is not innovative.

Other comments:

Line 41-45: add references.

Line 81: where is the reference of “Liang et al. (2015)”?

Section 2.1: only meteorological data were used in the study. Due to the limited spatial coverage of the stations, why the authors did not consider other large-scale data such as remote sensing data or model (reanalysis) based data?

Fig. 1: this figure lacks longitude and latitude information. Moreover, its quality can be improved, e.g., you can use the legend information to represent the stations but do not need to list all the station names.

Line 129: where is the reference of “Ainiyah et al., 2016”?

Line 130: where is the reference of “Mallick et al., 2021”?

Line 140: should be seven inputs and one output?

Fig. 2: isn't the average temperature excluded due to the high collinearity issue?

Line 153: it is better to add a section entitled “2.2 Machine Learning Methods” before “2.2 MLR”. Moreover, there are numerous machine learning methods, why did you select the four methods?

Line 198: delete “Tianqi”.

Line 215: MSE and RMSE play the same role in the evaluation. You can only preserve RMSE.

Line 255-256: add unit for MAE, MSE, and RMSE.

Table 4: add unit for MAE, MSE, and RMSE.

Fig. 5: add unit for snowfall.

Citation: https://doi.org/10.5194/nhess-2022-118-RC1
- AC2: 'Reply on RC1', Sang-Guk Yum, 27 Jun 2022
  
  Thanks for your constructive comments on this manuscript. We have attached the response letter.
  
  Citation: https://doi.org/10.5194/nhess-2022-118-AC2
AC1: 'Comment on nhess-2022-118', Sang-Guk Yum, 26 Apr 2022

I hope all is well with you. In the submitted manuscript, I have found a minor typo in the authors' affiliations. In the current preprint version, the author's information is below;
Moon-Soo Song¹, Hong-Sik Yun¹, Jae-Joon Lee², and Sang-Guk Yum³
¹Post-doctorate, Ph.D., Interdisciplinary Program in Crisis, Disaster and Risk Management, Sungkyunkwan University, Suwon, 16419, Korea
²Professor, Ph.D., School of Civil, Architectural Engineering & Landscape Architecture, Sungkyunkwan University, Suwon, 16419, Korea
³Professor, Ph.D., Department of Civil Engineering, College of Engineering, Gangneung-Wonju National University, Gangneung, 25457, Korea

Please correct the author's affiliations as below;

Moon-Soo Song¹, Hong-Sik Yun², Jae-Joon Lee¹, and Sang-Guk Yum³

¹Post-doctorate, Ph.D., Interdisciplinary Program in Crisis, Disaster and Risk Management, Sungkyunkwan University, Suwon, 16419, Korea

²Professor, Ph.D., School of Civil, Architectural Engineering & Landscape Architecture, Sungkyunkwan University, Suwon, 16419, Korea

³Professor, Ph.D., Department of Civil Engineering, College of Engineering, Gangneung-Wonju National University, Gangneung, 25457, Korea

I hope it can be fixed in this review or revision stage.

Citation: https://doi.org/10.5194/nhess-2022-118-AC1
RC2:
'Comment on nhess-2022-118', Anonymous Referee #2, 24 May 2022
The authors compared four machine learning algorithms, and derived an optimal model to predict heavy snowfall; however, the method and analysis process used are general and lack novelty.

In Chapter 1. Introduction, the authors only mentioned previous research, but did not describe this research’s superiority such as contributions and novelty of the research.

In Chapter 2. Materials and methods, it should have mentioned why the four machine learning algorithms (MLR, SVR, RFR, and XGB) were chosen among the other various machine learning algorithms.

In Chapter 4. Discussion and Conclusions, specific discussions on applicability of the optimal model, which is connected with RCP scenario and heavy snowfall disaster management, should be presented. Furthermore, there were a few sentences to confuse readers (for example, lines 269-272). The authors need to clarify those sentences in the manuscript, and proofread the manuscript before submission.

Furthermore, there are additional explanation of the a, b, c, and d of figure 5 should be added to make clear to the readers.

Please add more relevant literature review with up-to-date.
Citation: https://doi.org/10.5194/nhess-2022-118-RC2
- AC3: 'Reply on RC2', Sang-Guk Yum, 27 Jun 2022
  
  We have attached the response letter in the attachment. Thanks for your valuable comments on the manuscript.
  
  Citation: https://doi.org/10.5194/nhess-2022-118-AC3

Moon-Soo Song, Hong-Sik Yun, Jae-Joon Lee, and Sang-Guk Yum

Viewed

Total article views: 1,420 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
982	381	57	1,420	64	90

HTML: 982
PDF: 381
XML: 57
Total: 1,420
BibTeX: 64
EndNote: 90

Views and downloads (calculated since 22 Apr 2022)

Month	HTML	PDF	XML	Total
Apr 2022	142	25	7	174
May 2022	142	30	2	174
Jun 2022	46	10	0	56
Jul 2022	27	6	0	33
Aug 2022	27	14	2	43
Sep 2022	18	8	0	26
Oct 2022	26	11	2	39
Nov 2022	19	12	0	31
Dec 2022	26	4	0	30
Jan 2023	22	6	0	28
Feb 2023	24	18	1	43
Mar 2023	19	12	1	32
Apr 2023	19	7	0	26
May 2023	25	11	1	37
Jun 2023	20	5	1	26
Jul 2023	14	15	2	31
Aug 2023	11	8	1	20
Sep 2023	22	3	0	25
Oct 2023	18	15	1	34
Nov 2023	5	4	1	10
Dec 2023	19	4	2	25
Jan 2024	12	10	0	22
Feb 2024	20	12	2	34
Mar 2024	13	8	0	21
Apr 2024	23	6	5	34
May 2024	12	2	7	21
Jun 2024	25	2	2	29
Jul 2024	11	11	3	25
Aug 2024	17	8	6	31
Sep 2024	34	8	2	44
Oct 2024	13	8	0	21
Nov 2024	10	8	0	18
Dec 2024	5	4	1	10
Jan 2025	10	11	0	21
Feb 2025	12	5	0	17
Mar 2025	16	11	1	28
Apr 2025	12	11	1	24
May 2025	17	9	2	28
Jun 2025	20	11	1	32
Jul 2025	9	8	0	17

Cumulative views and downloads (calculated since 22 Apr 2022)

Month	HTML	PDF	XML	Total
Apr 2022	142	25	7	174
May 2022	142	30	2	174
Jun 2022	46	10	0	56
Jul 2022	27	6	0	33
Aug 2022	27	14	2	43
Sep 2022	18	8	0	26
Oct 2022	26	11	2	39
Nov 2022	19	12	0	31
Dec 2022	26	4	0	30
Jan 2023	22	6	0	28
Feb 2023	24	18	1	43
Mar 2023	19	12	1	32
Apr 2023	19	7	0	26
May 2023	25	11	1	37
Jun 2023	20	5	1	26
Jul 2023	14	15	2	31
Aug 2023	11	8	1	20
Sep 2023	22	3	0	25
Oct 2023	18	15	1	34
Nov 2023	5	4	1	10
Dec 2023	19	4	2	25
Jan 2024	12	10	0	22
Feb 2024	20	12	2	34
Mar 2024	13	8	0	21
Apr 2024	23	6	5	34
May 2024	12	2	7	21
Jun 2024	25	2	2	29
Jul 2024	11	11	3	25
Aug 2024	17	8	6	31
Sep 2024	34	8	2	44
Oct 2024	13	8	0	21
Nov 2024	10	8	0	18
Dec 2024	5	4	1	10
Jan 2025	10	11	0	21
Feb 2025	12	5	0	17
Mar 2025	16	11	1	28
Apr 2025	12	11	1	24
May 2025	17	9	2	28
Jun 2025	20	11	1	32
Jul 2025	9	8	0	17

Viewed (geographical distribution)

Total article views: 1,353 (including HTML, PDF, and XML) Thereof 1,353 with geography defined and 0 with unknown origin.

Country	#	Views	%

Cited

Latest update: 24 Jul 2025

Short summary

In this study, emerging engineering techniques such as machine learning and deep learning technique was applied to predict heavy snowfall prediction in the Korean Peninsula. More specifically, it was observed that the predictive model using the RFR algorithm had the best performance based on a comparison between the observed and predicted data. In addition, it was observed that the performance of the ensemble models (RFR and XGB) was better than that of the single regression models.


Total:	0
HTML:	0
PDF:	0
XML:	0

A Comparative Analysis of Machine Learning Algorithms for Snowfall Prediction Models in South Korea

Viewed

Viewed (geographical distribution)

Cited

1 citations as recorded by crossref.