Assessment of Flood Susceptibility Using Support Vector Machine in the Belt and Road Region

Floods have occurred frequently all over the world. During 2000-2020, nearly half (44.9%) of 15 global floods occurred in the Belt and Road region because of its complex geology, topography, and climate. However, the degree of flood susceptibility of each sub-region and country in the Belt and Road region remains unclear. Here, based on 11 flood condition factors, the support vector machine (SVM) model was used to generate a flood susceptibility map. Then, we introduced the flood susceptibility comprehensive index (FSCI) for the first time to quantify the flood susceptibility levels of the sub-regions 20 and countries in the Belt and Road region. The results reveal the following. (1) The SVM model used in this study has an excellent accuracy, and the AUC values of the success-rate curve and prediction-rate curve were higher than 0.9 (0.917 and 0.934 respectively). (2) The areas with the highest and high flood susceptibility account for 12.22% and 9.57% of the total study area respectively, and these areas are mainly located in the southeastern part of Eastern Asia, almost the entirely of Southeast Asia and South 25 Asia. (3) Of the seven sub-regions in the Belt and Road region, Southeast Asia is most susceptible to flooding and has the highest FSCI (4.49), followed by South Asia. (4) Of the 66 countries in this region, 16 of the countries have the highest flood susceptibility level (normalized FSCI > 0.8) and 5 countries https://doi.org/10.5194/nhess-2021-80 Preprint. Discussion started: 25 May 2021 c © Author(s) 2021. CC BY 4.0 License.

For machine learning methods, due to their improvement in recent years, their use in flood susceptibility assessment has become increasingly common. Random forest (RF) , artificial neural networks (ANN) (Li et al., 2013), support vector machines (SVM) (Tehrany et al., 2015b), and decision tree (DT) (Tehrany et al., 2013) are popular machine learning algorithms. These machine learning 90 methods can solve non-linear problems better, but their accuracy is extremely dependent on the quality of the sample points. Distinctly, each of the above four methods has inherent advantages and disadvantages. Thus, at present, there is no consensus on which type of model should be applied to a given scenario and which model is best (Khosravi et al., 2018a). Taking into account the characteristics of the study area and the availability of data, in this study, the SVM model was used to generate a flood 95 susceptibility map. The excellent generalization of the SVM (Bahram et al., 2019) was one reason for selecting this method. In addition, as a machine learning method, the SVM not only avoids the subjective determination of weights as occurs in MCDA methods, but it is also does not require a large number of model parameters compared with physically based models.
For the analysis of the flood susceptibility results, most studies Zhang and Chen, 100 2019; Hu et al., 2017b) only semi-quantitatively assessed the proportion and distribution of the flood susceptibility classes. These assessments cannot provide a fully quantitative representation of the degree of flood susceptibility in a region, so more in-depth studies are needed to quantify the flood susceptibility level of each region. To this end, in this study, the flood susceptibility comprehensive index (FSCI) is introduced to quantify the flood susceptibility level of each country and sub-region in the study area 105 based on the concept and calculation method of the ecological vulnerability synthesis index (EVSI) (Tian, 2018).
In this study, we divided the Belt and Road region into 627,454 0.1°×0.1° grids and used each grid as a research unit to assess the flood susceptibility. Then, a flood susceptibility map of the study area was generated using the SVM model. Based on this, the main purposes of the current study are as follows: 110 (1) analyzing the spatial pattern of the areas prone to flooding in the Belt and Road region; and (2) evaluating the flood susceptibility levels of countries and sub-regions in the Belt and Road region by calculating the FSCI. https://doi.org/10.5194/nhess-2021-80 Preprint. Discussion started: 25 May 2021 c Author(s) 2021. CC BY 4.0 License.

Study area 115
To strengthen the ties between Asia, Europe, and Africa three continents and form a human community with a shared density, "The Belt and Road" Initiative was proposed by China in 2015(Jiang et al., 2018. The Belt and Road region ( Fig. 1) spans Asia, Africa, and Europe, encompassing three continent and 66 countries (including Kashmir). It contains a population of about 4.4 billion people and has a combined gross domestic product (GDP) of 2.3 billion dollars, accounting for 63% and 29% of the world totals, 120 respectively (Zhang, 2018). The study area is highly undulatory, with altitudes ranging from −438 m to 8,728 m. In addition, landforms are also complex, including mountains, hills, valleys, plateaus and several other types of terrains (Yu et al., 2019). There are eight types of climate in this region, including both monsoon and continental climate characteristics (Zhou et al., 2020).The precipitation, is spatially heterogeneous. The annual mean total precipitation in this region during 2000 -2018 increased from 125 0.92 mm in the southwest to 6067.71 mm in the southeast. In conclusion, due to the vast area and complex geology, topography and climate in the region, favorable disaster-conditions have been formed. These conditions lead to the occurrence of diversity, frequency and severe natural disasters in this region.
Among which, flooding is the most frequent. According to the statistic provides by the EM-DAT, of the 3483 natural disasters that occurred in the Belt and Road region from 2000 to 2020, 1438 were floods, 130 accounting for about 41.3% of all disasters. Therefore, it is of great significance to conduct flood susceptibility assessment in this region.

Flood inventory map
To apply the machine learning method to predicting the areas where floods may occur in the future, existing flood records are needed as a training reference (Khosravi et al., 2018b). For the global scale, two flood datasets are commonly used, the International Disaster Database (EM-DAT) and the Global Active Archive of Large Flood Events. These two datasets provide various information about floods so 140 that people can better understand the impacts of floods (Sampson et al., 2015). The flood location dataset used in this study was obtained from the Global Active Archive of Large Flood Events, Dartmouth Flood Observatory, University of Colorado (http://floodobservatory.colorado.edu/), which has been supported by the National Aeronautics and Space Administration (NASA) and used in several studies around the world (Li et al., 2019). 145 For the Belt and Road region, this study selected 1,500 flooded points from January 2000 to March 2020 as the sample dataset. Based on the information in this sample dataset, first we extracted the areas where no flood has occurred (Fig. 1). Then, the same number of non-flooded points as the number of flooded points were randomly selected in the non-flooded areas, and values of 1 and 0 were assigned to the flooded and non-flooded points, respectively. Finally, all of the sample points were randomly divided, 150 with 70% used as training points and 30% used as verification points.

Flood conditioning factors
In flood susceptibility mapping, the first task is to construct a spatial database that contains the flood condition factors. However, the suitable flood condition factors vary with the characteristics of the different areas (Tehrany et al., 2013), and the same factors have very different influences in different 155 areas (Kia et al., 2012). After comprehensive consideration of the actual characteristics in the study area, the review of relevant studies (Mahmoud and Gan, 2018;Ali et al., 2020), and the availability of data, a total of 11 factors that are closely related to flood disasters were chosen for use in this study. The selected factors include the maximum three-day precipitation (M3DP), altitude (AL), standard deviation of elevation (SDE), slope (SL), flow accumulation (FA), topographic wetness index (TWI), river density 160 (RD), fractional vegetation cover (FVC), percentage of impervious surface (PIS), land cover (LC), and soil texture (ST). condition factor was converted into a grid database with a spatial resolution of 0.1×0.1° in ArcGIS 10.6 and was normalized using the maximum normalization method. These normalized flood condition factors are shown in Fig. 2. 165 In particular, heavy rainfall with a short duration has a great potential for flooding (Ali et al., 2020).
Many studies (Liu et al., 2017;Huang and Zhang, 2016) have shown that M3DP has a non-negligible 170 influence on the occurrence of floods. The possibility of flooding is considered to increase with increasing M3DP. In the current study, the M3DP factor map ( Fig. 2a) was calculated using Global Precipitation Measurement (GPM) data, which records the average daily precipitation everywhere in the world from 2000 to 2018.

(b) Altitude (AL) 175
Altitude is also an important factor affecting the occurrence of flood disasters. It is usually inversely related to flood susceptibility since water flows from higher elevations to lower elevations (Mohamoud, 1992;Vojtek and Vojtekova, 2019). In general, the higher the altitude, the less prone the area is to flooding.
In this study, the altitude ( The standard deviation of elevation reflects the degree of topographic variation within a certain range (Zhou et al., 2000). The undulations in the topography directly affect the gathering of the water flow, thus affecting the occurrence of floods. Generally, the susceptibility to flooding decreases as the degree of topographic undulation increase (Zhou et al., 2000). In this study, the SDE ( Fig. 2c) was obtained by 185 calculating the elevations of 25 grids (including itself) in the 5 × 5 neighborhood around a grid.

(d) Slope (SL)
Slope is given a higher priority in flood sensitivity mapping (Zaharia et al., 2017). The size of the slope has a significant influence on the surface runoff, soil erosion, and vertical percolation (Samanta et al., 2018). Therefore, the slope can affect the occurrence of flooding. Generally, floods occur more frequently 190 in low-slope areas. In contrast, high-slope areas have fast water flow, resulting in a low permeability and high runoff (Chapi et al., 2017), so floods in these areas are relatively rare. In this study, the slope ( Several previous research studies have reported that the TWI is a meaningful factor for the study of flood susceptibility (Ali et al., 2020). Its role is to quantify the topographical control over hydrological processes. In other words, the TWI measures the impact of the topography on runoff generation (Das, 2018). According to previous studies (Regmi et al., 2010), the TWI ( Fig. 2f) where As is the specific catchment area (m 2 m -1 ) and β (radian) is the slope gradient (in degrees).
where NDVIsoil is the bare land NDVI and NDVIveg is the vegetation NDVI value of a full vegetation NDVIveg and the 5% NDVI value as the NDVIsoil.

(i) Percentage of impervious surfaces (PIS)
The percentage of impervious surfaces has been used in flood risk assessment studies (Hu et al., 2017a) because it has a certain impact on the occurrence of floods. The impervious surfaces affect the vertical percolation of water flow. In general, the larger the percentage of impervious surfaces, the more prone 230 the area is to water accumulation, leading to flooding. The impervious surface data used in this study were obtained from the Global Human Settlement (GHSL, https://ghslsys.jrc.ec.europa.eu/), and the percentage of impervious surfaces ( Fig. 2i) in each grid was calculated using the Zonal Statistics tool in ArcGIS.

(j) Land cover (LC) 235
The type of land cover (Fig. 2j) changes the surface runoff to a certain extent, thereby affecting the occurrence of floods (Bui et al., 2019). The land cover data used in this study were obtained from LAADS DAAC (https://ladsweb.modaps.eosdis.nasa.gov/search/) and are for 2015. In order to quantify the impacts of the various types of land cover on floods, we used the information value method to calculate their contributions to flooding. The results (Table 2) were calculated using Eq. (3). 240 (k) Soil texture (ST) Soil texture (Fig. 2k) has a relatively obvious impact on the occurrence of floods (Peng et al., 2019). The texture is a property of soil that describes the relative proportion of the different grain sizes in the soil . The soil texture data used in this study were downloaded from the Food and Agriculture Organization of the United Nations (FAO, http://www.fao.org/). We used the same approach 245 as that described above to quantify the impact of soil texture on flooding, and the results are shown in Table 2.

Information value method
The information value method is an indirect statistical method (Du et al., 2017), which is frequently used 260 in landslide sensitivity assessment, but it is relatively new in flood sensitivity mapping. The purpose of the information method is to determine the weight of each factor (Sarkar et al., 2013). Inspired by this, it was used to determine the weight of each category of land cover and soil texture in this study. The method was originally proposed by Yin and Yan (1988) and was slightly modified by Van Westen (1993) (Sarkar et al., 2013), which was shown as follow: 265 where I is the weight of factor class i; Ni is the number of floods in class i; Si is the number of pixel class i; N is the number of floods in the whole study area; S is the number of pixels in the entire study area.

Correlation analysis of conditioning factors
If there is a high correlation between variables, the model estimation will be distorted or difficult to 270 estimate accurately (Zhang XD et al., 2018). Usually, several methods such as the Pearson correlation coefficient method, the variance decomposition ratio, the conditional index, and variance inflation factor (VIF) and tolerance are used to quantify the correlations between factors (Khosravi et al., 2018b). In this study, we used the VIF and tolerance to measure the relationships between the 11 factors. When the VIF is greater than 10 or the tolerance is less than 0.1, the factor has multiple collinearity problems and should 275 be eliminated. Otherwise, there is no collinearity between the factors.

Support Vector Machine (SVM)
The SVM is one of the most popular machine learning algorithms. It is a supervised learning binary classifier based on the structural risk minimization principle (Yao et al., 2008). Because of its nonlinear mathematical structure, the complex nonlinear relationship between the inputs and outputs in a system 280 can be represented by the SVM (Li et al., 2016). Generally, there are two methods of constructing an SVM model. The first is to construct an optimum linear separating hyperplane, which is used to separate https://doi.org/10.5194/nhess-2021-80 Preprint. Discussion started: 25 May 2021 c Author(s) 2021. CC BY 4.0 License. the data patterns. The second is to use the kernel function to convert the original nonlinear data pattern into a linearly separable format in the high-dimensional feature space (Yao et al., 2008). The major steps of the algorithm are as follows: 285 (1) Assume that T = {x1, x2, ..., xn, y} is the training set of known samples where xi is the ith input data, and y is the output data where i = 1, 2, …, n.
(2) Separate the training set into two categories using an n-dimensional hyperplane to obtain the maximum interval: where ‖w‖ is the norm of the hyperplane normal; b is a scalar base, and (·) represents the product operation.
(3) Using the Lagrange multiplier, the cost function can be defined as follows: where λi is the Lagrangian multiplier. By using standard procedures, the solution can be obtained by minimizing the duality of w and b using Equation (6) (Vapnik, 1995).
The selection of the kernel type retains the significance for performance and the results of the SVM (Damaševičius, 2010). At present, the linear kernel (LN), polynomial kernel (PL), radial basis function 305 (RBF) kernel, and sigmoid kernel (SIG) are the most commonly used kernel types. Several studies have shown that the BRF has a better performance in geological disaster prediction, which is the reason of selecting it in this study (Kia et al., 2012;Pradhan, 2012 where γ is the parameter of the kernel function. Sometimes kernel functions are parameterized using γ = 310 1/2σ 2 , where σ is an adjustable parameter that governs the performance of the kernel.

Model validation method
The receiver operating characteristics (ROC) curve has been used to evaluate the performances of models in many studies Tehrany et al., 2014;Huang et al., 2020). For each possible critical value, the ROC is considered to be a graphical representation of the trade-off between the false negative 315 (X-axis) rate and the false positive (Y-axis) rate (Pourghasemi and Beheshtirad, 2015). It is executed by using the area under ROC (AUC) to compare the known data on flooding with acquired flooding probability map. The value of the AUC is between 0 (a diagnostic test that cannot distinguish between floods and non-floods) and 1 (Bahram et al., 2019). Generally, the greater the AUC, the higher the accuracy of the model. The relationship between the performance of a model and the AUC can be 320 classified into the following categories: 0.9-1 (excellent), 0.8-0.9 (very good), 0.7-0.8 (good), 0.6-0.7 (moderate), and 0.5-0.6 (poor). In this study, 70% of the chosen flood locations were used to obtain the success-rate curve and 30% of the chosen flood locations were used to obtain the prediction-rate curve, which can reflect the goodness of fit and prediction power of the SVM model, respectively (Termeh et al., 2018). 325

Flood Susceptibility Comprehensive Index (FSCI)
In order to reflect the degree of flood susceptibility in the study area more intuitively and comprehensively, in this study the flood susceptibility comprehensive index (FSCI) was calculated for each region and country by referring to the calculation method for the ecological vulnerability synthesis index (EVSI) (Tian, 2018). The calculation method is as follows: 330 where FSCI is the flood susceptibility comprehensive index of a country; Pi is the class value of the ith flood susceptibility calss; Ai is the areas of ith flood susceptibility class; and S is the total area of the country. In this study, the lowest, low, moderate, high, and highest flood susceptibility classes correspond to Pi values of 1, 2, 3, 4, and 5, respectively. 335 https://doi.org/10.5194/nhess-2021-80 Preprint. Discussion started: 25 May 2021 c Author(s) 2021. CC BY 4.0 License.

Workflow of flood susceptibility assessment
The workflow of the flood susceptibility assessment is illustrated in Fig. 3. First, a set of available data, containing a flood inventory map and flood condition factors, was collected from different sources. For the condition factors, the information value method was used to quantify the weights of discrete factors (LC and ST), and the maximum value normalization method was used to normalize the information 340 values of the two discrete factors and the original value of continuous factors. Then, variance inflation factors (VIF) and tolerances were used to verify that there was no serious collinear relationship between the indicators. The results of the factor correlation test are shown in Table 3. As can be seen from Table   3, the SDE has the lowest tolerance (0.142) and the highest VIF (7.065). However, neither of them exceed the critical values (0.1 and 10, respectively) indicating that there is no serious collinearity among the 11 345 factors. Therefore, all 11 factors were input into the SVM model for the training and prediction steps to obtain the flood susceptibility map. After classifying the map into five classes: lowest, low, moderate, high, and highest using the equal interval method, we calculated the FSCI of each country. Based on the FSCI, the flood susceptibilities of countries were classified into five levels also using the equal interval method. Finally, we validated the accuracy of the SVM model and analyzed the results in terms of the 350 spatial patterns and flood susceptibility level of each country and region. It should be noted that the SVM model was conducted using the e1071 package in R software.

Accuracy assessment
The success-rate curve and the prediction-rate curve of the SVM model are shown in Fig. 4a and 4b, respectively. According to Fig. 4a, the AUC of the success-rate curve of the SVM model is 0.917, which 360 indicates that the model has an excellent goodness of fit. For the prediction-rate curve of SVM (Fig. 4b), the AUC is 0.934, indicating that the SVM model has a good prediction effectiveness. Overall, both the AUC values of the success-rate curve and the prediction-rate curve of SVM were greater than 0.9, which demonstrates that the results obtained in this study using the SVM model are scientific and reliable.  Figure 5 shows the flood susceptibility map obtained using the SVM model in this study, and Table 4 presents the area percentages of the various susceptibility levels in the Belt and Road region. According to the statistics (Table 4), the lowest flood susceptibility zone accounts for 32.91% of the study area. The 370 low, moderate, and high flood susceptibility zones account for 31.56%, 13.74% and 9.57% of the study area respectively; and highest flood susceptibility area accounts for 12.22%. Although more than half of the study area is in lowest and low flood susceptibility zones, accounting for about 64.47% together, nearly 1/5 of the study area has the high or highest flood susceptibility, with an area of approximately 1,103.70×10 4 km 2 , which is the focus of our attention. 375

Classification results of flood susceptibility map
In terms of spatial distribution pattern of the susceptibility (Fig. 5), the areas with high and highest flood susceptibility are mostly distributed in the southeastern part of Eastern Asia, almost the entirety of Southeast Asia and of South Asia. Thus, Asia is the part of the study area suffering the most from the floods, which is consistent with the results of Kundzewicz et al. (Kundzewicz et al., 2014). In addition, several coastal areas in Europe, located in the Mediterranean climate zone, also have high or highest 380 flood susceptibilities. The northwestern part of Eastern Asia, the entirety of Central Asia, and Northern Asia mainly have low and lowest flood susceptibilities.
However, this spatial distribution pattern is somewhat difference from the results of flood risk assessment  Fig. 1, we found that the flood susceptibilities determined in their study may have been overestimated, which was also mentioned by the authors. In conclusion, the results of this study have a certain degree of improvement compared to the results of previous studies. 395

Flood susceptibility class
Area (

The entire Belt and Road Region
In this study, the entire region was divided into seven sub-regions, including Eastern Asia, Southeast Asia, South Asia, Central Asia, Western Asia (including Egypt), Central-Eastern Europe (CEE) and Russia (  (Table 5). As can be seen from Table 5, Southeast Asia has the highest FSCI (4.49), followed by South Asia with the FSCI (4.17), both of which are much 405 greater than the overall FSCI value (2.37) of the study area. This result illustrates that Southeast Asia is the most flood-prone region in the Belt and Road region, followed by South Asia. Apparently, Western Asia, Central Asia, and Russia are the least likely to experience flooding, with FSCI values of 1.83, 1.67, and 1.62, respectively.
The results of the FSCI values of each country are presented in Table 6

Russia
Due to the vast size of Russia, it was analyzed as a separate region in this study. It has been pointed out 425 that the number of floods has increased in both the Asian part of Russia (Northern Asia) and the European part (Frolova et al., 2017), so a flood susceptibility assessment for Russia is of great value. According to the results of the FSCI calculations (Table 5) moderate flood susceptibility, mainly in the southern part of its European part and in the central and southeastern parts of its Asian part, which is in good agreement with the results of Frolova et al. (Frolova et al., 2017). In terms of the flood condition factors, the results of this assessment were mainly dependent on the distribution of the M3DP. In short, Russia is not a country with a high susceptibility to flooding, but certain areas still face a certain threat from flooding. It should be noted that several studies (Frolova 435 et al., 2017;Shalikovskiy and Kurganovich, 2017) have shown that the main cause of flooding in Russia is snowmelt, followed by rainfall. However, due to the limited research conditions, snowmelt was not considered in this study, and more attention should be paid to this issue in subsequent studies. Table 5 shows that Eastern Asia has a relatively low FSCI value (2.32), with of the low and lowest flood 440 susceptibility zones accounting for 35.36% and 26.55% of the area, respectively. Still, the high and highest flood susceptibility zones account for 20.87% of the area. More interestingly, as is shown in Fig.   5, there is a clear regional variation in flood susceptibility in Eastern Asia, decreasing from southeast to northwest. The southeastern part of Eastern Asia mainly has high or highest flood susceptibilities, while the northwestern part has low or lowest flood susceptibilities. This pattern was also reported by Liu et al. 445 (Liu et al., 2017). The causes of this phenomenon can be analyzed from two perspectives (the factors and the climate). From the perspective of the factors, the factors (M3DP, RD, LC, and ST) that drive flooding in the southeastern part of Eastern Asia have high values, while the values of these factors are low in the northwestern part of the region. In addition, the factors that are negatively correlated with flood probability (e.g., SDE and SL) have low values in the southeast and high values in the northwest. From 450 the perspective of climate, the Southeastern of Eastern Asia is located in the subtropical monsoon climate zone, which is influenced by the Eastern Asian summer winds (Ding et al., 2020). Therefore, it is prone to extreme rainstorms, which in turn cause floods. The northwestern part of Eastern Asia is located in the temperate continental climate zone, with is dry and experiences little precipitation, so flooding does not easily occur. These above two aspects contribute to the decreasing susceptibility of flooding in Eastern 455

Eastern Asia
Asia from southeast to northwest. The two countries in Southeast Asia (China and Mongolia), Table 6 and Fig. 7, have low and lowest flood susceptibility level, respectively. However, because of the vast size of China and its high flood-proneness in the southeast, China still needs to devote more energy to dealing with floods. In conclusion, Eastern Asia faces a certain degree of flood threat, and its flood-prone areas https://doi.org/10.5194/nhess-2021-80 Preprint. Discussion started: 25 May 2021 c Author(s) 2021. CC BY 4.0 License. coast, has a high flood susceptibility level, and four countries, (Turkey, Georgia, Palestine and Bahrain) have a moderate flood susceptibility level. As can be seen, the vast majority of the countries in Western Asia are less prone to flooding. 520

Central Asia
Central Asia is located in the core hinterland of Eurasia. It is characterized by relatively backward economic development and limited disaster preparedness (Yuan and Wang, 2015), so it is meaningful to analyze the flood susceptibility of this area. Table 5 shows that Central Asia has the second lowest FSCI value of 1.83 (higher than only Russia), with the high and highest flood susceptibility zones accounting 525 for less than 1% of the total area of the region, making it one of the least flood-prone regions in the Belt and Road region. As is shown by the spatial distribution of the flood susceptibility (Fig. 5), almost all of Central Asia has low or lowest flood susceptibility classes. The reasons for these results can be explained from two perspectives. In terms of the factors, although AL, SL, and SDE values are low, the M3DP value is also low. So, there is a lack of flood-causing precipitation, thus leading to a low flood 530 susceptibility throughout almost all of Central Asia. In terms of climate, Central Asia is a typical arid and semi-arid region, with a primarily temperate continental climate. As a result, precipitation is scarce here, (Wang, 2019) making this area less prone to flooding. As can be seen from the FSCI results (Table 6 and

Despite significant investments in flood prevention, flooding remains a serious problem throughout
Europe (Kundzewicz et al., 2014), so an assessment of the flood susceptibility in the European region of the Belt and Road region is of great necessity. As is shown in Table 5, the moderate flood susceptibility 540 zone accounts for 45.46% of the CEE, followed by the low flood susceptibility zone (32.20%), and the high or highest flood susceptibility zones still account for 12.55% of the area. Therefore, the FSCI value of CEE is also moderate (2.63), indicating that the CEE suffers from some degree of flood susceptibility.
As can be seen in Fig. 5, the CEE region mainly has a moderate flood susceptibility. However, the flood susceptibility of the CEE region has a spatial distribution pattern of decreasing from south to north. The 545 southern Mediterranean coastal region has high or highest flood susceptibilities while the northern part https://doi.org/10.5194/nhess-2021-80 Preprint. Discussion started: 25 May 2021 c Author(s) 2021. CC BY 4.0 License. has low or lowest susceptibilities. This result is spatially consistent with the spatial distribution of the number of large floods in Europe from 1985 to 2009 made by Kundzewicz et al. (Kundzewicz et al., 2013). Based on the FSCI results (Table 6) (Marchi et al., 2010). Overall, the CEE countries are relatively 555 prone to flooding, especially those near the Mediterranean coast. represents the moderate flood susceptibility level; 4 represents the high flood susceptibility level; 5 represents the highest flood susceptibility level, and FSCI(n) represents the normalized FSCI. 560

The implications and limitations
The results generated by this study not only identified the flood-prone areas in the Belt and Road region but also assessed the level of flood susceptibility of each country, and thus the results of this study provide important information for the mitigation of damages resulting from future floods. In addition, because the research on global flood susceptibility maps is relatively rare, the successful application of the model 565 and index system used in this study provides a reference for large-scale flood susceptibility research.
Although this study has achieved reasonable results, the following limitations still exist. (1) For the methods, the machine learning method and the statistic method are greatly affected by the quality of the samples. Due to the large area of the study, it is impossible to record all floods, and thus, the sample quality may not be high enough.
(2) The flood susceptibility maps obtained using these methods are 570 semi-quantitative and static. They cannot provide the detailed information on floods, such as flow and submergence range, which can be output by hydro-physical models (Dottori et al., 2016;Hoch and Trigg, 2019). (3) The predictions of this study did not consider the impact of future global climate change on floods which is a popular trend in current research. (4) For the index system, due to the limitations of the data availability, it is difficult for this index system to cover all of the factors that affect the occurrence 575 of floods, e.g., flood control projects such as check dam. In future research, a more comprehensive indicator system needs to be established. Moreover, considering climate change, researchers can try to combine machine learning methods and physical models to obtain more accurate dynamic results in the future.

Conclusions 580
In this study, based on 11 flood condition factors, we adopted a machine learning method (SVM) to generate a flood susceptibility map for the Belt and Road region. Based on the spatial distribution of the flood susceptibility, the areas with the highest and high flood susceptibility accounted for 12.22% and 9.57% of the total study area, respectively, and these areas are mainly distributed in the southwestern part  Author contributions. JL, YL, and YFH were responsible for the collection and processing of the dataset.