Determination of Heavy Rain Damage-Triggering Rainfall Criteria Based on Data Mining
Abstract. Heavy rainfall occurs over the Korean peninsula mainly because of typhoons and a localized heavy rainfall, leading to severe flooding and landslide risk. KMA (Korean Meteorological Administration) has the criteria for issuing a Heavy Rain Advisory (HRA) over the peninsula even though each region or local government has different conditions in capability of disaster prevention (CDP) and different characteristics in rainfall and heavy rain damage. Therefore, the aim of this study is to suggest the methodology for the determination of Heavy rain Damage-Triggering Rainfall Criteria (HD-TRC) that HRA can be issued in each region. The study regions are local governments in Gyeonggi-province, Seoul-city, and Incheon-city in Korea. HD-TRC can be determined based on rainfall and heavy rain damage data. The data from 2005 to 2018 are collected and then the data for flood or rainy season from June to September are extracted. The rainfall data is provided in KMA and heavy rain damage data during disaster periods (DPs) can be obtained from the statistical yearbook of natural disaster (SYND) published by MOIS (Minstry of Interior and Safety) every year. Training set of 2005 to 2014 is used for obtaining HD-TRC and test set of 2015 to 2018 is used for evaluating three criteria of HD-TRC, Advanced HD-TRC, and HRA. Analysis for determining the best criteria is performed through data mining processes as follows: (1) Maximum rainfalls in durations of 1 to 24-hr (X1) and antecedent rainfalls of 1 to 7-day (X2) are obtained and used as independent variables. Heavy rain damage data are divided into damage day (1
) and no damage day (0
) used as dependent variables (Y). Principal component analysis (PCA) is performed and PCs (principal components) are obtained as PC.X1 and PC.X2 for independent variables. Then Risk Index (RI) is defined as PC.X1 + PC.X2 and RIs become the candidates for HD-TRC. The predicted damage (Ŷ) is obtained based on RIs and confusion matrix is constructed then the best HD-TRC is determined through the evaluation of classification performance. (2) However, ‘abnormal days’ (ADs) in a DP that the damage is occurred exists. The ADs mean the days which we do not have rainfall or have small rainfall amount during DP. Say, ADs have too small rainfall to damage even during DP. The ADs are defined as days below rainfall of 20 mm and 5 cases of ADs are also defined as 0, 0–5, 0–10, 0–15, and 0–20 mm in this study. We count total days in all the DPs and in ADs for a case. The ratio of ADs to total days during DPs could be the occurrence probability or prior probability (PP) of ADs for a case and 5 PPs are obtained. Also, the average AD for each case can be obtained and defined as risk range (RR). Then we define Advanced HD-TRC using MCS (Monte Carlo Simulation) linked with PP, RR, and from HD-TRC for each case. Therefore, HD-TRC is determined based on RI and Advanced HD-TRC for each case based on PP and RR. Finally, three criteria of HD-TRC, Advanced HD-TRC, and HRA are compared based on performance evaluation by test set. As a result, Advanced HD-TRC shows the best performance and so the suggested methodology can be used for regional heavy rain damage warning information.