The forecasting of inundation levels during typhoons requires that multiple objectives be taken into account, including the forecasting capacity with regard to variations in water level throughout the entire weather event, the accuracy that can be attained in forecasting peak water levels, and the time at which peak water levels are likely to occur. This paper proposed a means of forecasting inundation levels in real time using monitoring data from a water-level gauging network. ARMAX was used to construct water-level forecast models for each gauging station using input variables including cumulative rainfall and water-level data from other gauging stations in the network. Analysis of the correlation between cumulative rainfall and water-level data makes it possible to obtain the appropriate accumulation duration of rainfall and the time lags associated with each gauging station. Analyses on cross-site water levels as well as on cumulative rainfall enable the identification of associate sites pertaining to each gauging station that share high correlations with regard to water level and low mutual information with regard to cumulative rainfall. Water-level data from the identified associate sites are used as a second input variable for the water-level forecast model of the target site. Three indices were considered in the selection of an optimal model: the coefficient of efficiency (CE), error in the stage of peak water level (ESP), and relative time shift (RTS). A multi-objective genetic algorithm was employed to derive an optimal Pareto set of models capable of performing well in the three objectives. A case study was conducted on the Xinnan area of Yilan County, Taiwan, in which optimal water-level forecast models were established for each of the four water-level gauging stations in the area. Test results demonstrate that the model best able to satisfy ESP exhibited significant time shift, whereas the models best able to satisfy CE and RTS provide accurate forecasts of inundations when variations in water level are less extreme.
Typhoons are common weather events in subtropical regions of the Pacific, between July and October. Heavy rains carried in by typhoons often lead to the severe inundation of low-lying areas, which can damage property and even threaten the safety of human lives. Limitations in funding for construction of flood control systems pose limits to the protective capacity of structural measures for disaster mitigation. When the scale of a typhoon exceeds construction design limits, non-structural means are required to prevent disasters associated with typhoons. The real-time forecasting of changes in inundation depth in the hours after a typhoon is a crucial factor in the planning of relief operations.
Considerable research has been conducted on inundation simulations and forecasting techniques, most of which can be roughly divided into two approaches: numerical simulations and black-box modelings. In numerical simulations, various physical phenomena that occur between rainfall and inundation are examined before carrying out theoretical derivations using mathematical analysis, after which solutions are obtained by numerical methods. This approach is based on a sound theoretical foundation and enables a clear representation of the physical mechanisms associated with inundation. The accuracy of the results makes them particularly useful in the forecasting of inundation in the absence of on-site observation data. However, this type of approach requires considerable computing resources and can be very time consuming, which makes it difficult to provide forecast information in real time for immediate disaster relief actions during typhoons. Black-box modelings are implemented in an entirely different manner. The process that occurs between rainfall and inundation is regarded as a black box, and no attempt is made to understand the underlying physical mechanisms. Rather, the relationships between inputs and outputs of the system are analyzed as a means of creating a black-box model. Although this approach is unable to explain the physical phenomena, it provides an accurate representation of the relationship between inputs and outputs. Calculations can generally be completed more rapidly (Karlsson and Yakowitz, 1987), and information related to future variations in water-level in inundated areas can be obtained in real time, which can be immensely helpful to decision making and disaster prevention.
A number of studies have applied black-box models to the problems of inundation or flooding. Karunanithi et al. (1994) proposed a cascade-correlation algorithm for the selection of neural network architectures and training algorithms and obtained encouraging results with regard to flow prediction. Thirumalaiah and Deo (1998) proposed the training of neural networks using a selected sequence of previous flood observations at a specific location to enable real-time flood forecasting. Toth et al. (2000) compared the advantages and limitations of the auto-regressive moving average, artificial neural network (ANN), and non-parametric nearest-neighbor method in rainfall–runoff forecasting. They concluded that time series analysis is far more accurate than simple rainfall predictions of a heuristic nature. Chang and Chen (2001) proposed a counter-propagation fuzzy-neural network capable of automatically generating rules for use in clustering input data to enable streamflow prediction. Nayak et al. (2005) employed fuzzy computation in the development of a real-time flood forecasting model. They concluded that the recursive use of a one-step-ahead forecast model to predict flow using longer lead times produces results better than those achieved using independent fuzzy models for the forecasting of flow under various lead times. Chen et al. (2006) constructed a flood forecast model using an adaptive neuro-fuzzy inference system (ANFIS). Their results demonstrated that ANFIS is superior to back-propagation neural network. Romanowicz et al. (2008) developed a data-based mechanistic methodology for the derivation of nonlinear dependence between water levels measured at gauging stations along a river. Kia et al. (2012) developed a flood model using various flood causative factors using ANN techniques and geographic information system (GIS) for the modeling and simulation of flood-prone areas in the southern parts of Peninsular Malaysia. Pan et al. (2011) presented a real-time rainfall-inundation forecasting model using a hybrid neural network based on a synthetic database of inundation potential. Shiri et al. (2012) compared the performance of gene expression programming (GEP), ANFIS, and ANNs in the forecasting of daily stream flow. They concluded that the GEP model outperformed the ANN and ANFIS models. Chen et al. (2012) utilized an ANN model and an ANFIS model to correct calculations in a two-dimensional hydrodynamic model used for the prediction of storm surge height during typhoon events. Najafzadeh and Zahii (2015) proposed the use of a neuro-fuzzy-based group method of data handling as an adaptive learning network for the prediction of flow discharge in straight compound channels.
In this study, we sought to develop a method for the forecasting of inundation levels, based on data from a water-level gauging network during typhoons. We also performed a case study in which crucial model input variables were obtained by analyzing records from previous typhoons. Autoregressive moving average with exogenous inputs (ARMAX) was used to construct rainfall and water-level relationship models of the gauging stations, and three indices were defined for the evaluation of model performance. A Pareto optimal model set was identified for the three indices using a multi-objective genetic algorithm (MOGA). Predicted water levels were compared with measured data to examine the performance of the optimal models subjected to each index.
This paper is organized as follows. The environmental background of the study area is introduced in Sect. 2. In Sect. 3, we explain ARMAX and the data analysis methods used to find suitable model input variables. We also introduce the indices used for the evaluation of the models. Section 4 presents the method used to identify the Pareto optimal model set for the evaluation indices using a MOGA. Section 5 discusses the forecasting capability of the optimal models for each objective based on search results. Conclusions follow in Sect. 6.
Water-level gauging network in Xinnan area.
Historical typhoon events recorded by SNTIX.
Yilan County (Fig. 1) is situated in the northeastern part of Taiwan. It has a subtropical monsoon climate and is famed for its rainy weather. With over 200 rain days per year, the annual average precipitation ranges between 2000 and 2500 mm. Yilan is bordered by mountains to the west and the ocean to the east. Typhoons are common in summer and autumn. Statistically, an average of two to three typhoons hit Taiwan each year, 45 % of which make landfall in Yilan County (Pan et al., 2014). Severe inundations quickly form in low-lying areas during typhoons. Among the inundation-prone regions, the area of Xinnan is one of the worst.
Xinnan area in Yilan County, Taiwan.
The Xinnan area (Fig. 1) is located near the mouths of two major waterways in the county: the Meifu drainage waterway to the north and the Lanyang River to the south. Flat terrain dips to the east, and its eastern border abuts the Pacific. The average elevation in the area is just about 2 m above sea level. During typhoons, water levels in the two major waterways rise swiftly from large inflows upriver. The levees of the two waterways prevent runoff in the area from being drained out effectively, which soon leads to severe inundation. The safety and property of residents are in risk during typhoons, which underlines the need for effective disaster prevention measures.
In an attempt to better understand local inundation conditions during
typhoons, the Water Resources Agency established the Surveillance Network
for Typhoon Inundation in the Xinnan Area (SNTIX) in 2011. The network
includes four gauging stations receiving water-level data on-site in the area
and a data transmission system receiving precipitation observation data from
the QPESUMS (Quantitative Precipitation Estimation and Segregation using
Multiple Sensor; Gourley et al., 2002) of the Central Weather Bureau. Table 1
lists detailed information related to the gauging stations, the locations
of which are marked in Fig. 1. SNTIX reports local inundation levels via
radio transmission every 10 min during typhoons, while QPESUMS transmits
10 min rainfall in the area via internet connection at the same
frequency. Figure 2 presents the water levels recorded by SNTIX at gauging
stations and the QPESUMS rainfall data during Typhoon Trami in 2013. QPESUMS
was developed jointly by the Central Weather Bureau and the National Severe
Storm Laboratory (NSSL) in 2002, with a view to improving the accuracy of
quantitative rainfall forecasts. QPESUMS comprises eight Doppler radar
stations, each of which scans a radius of approximately 230 km. The system
divides Taiwan into 441
Rainfall and water-level data recorded by SNTIX during typhoon Trami.
Since its implementation, SNTIX has recorded data from 10 typhoon events, as shown in Table 2. In addition to providing rainfall and water-level information at the time of the typhoon, these records can also be used to develop water-level forecast models for gauging stations.
To plan effective disaster prevention and relief operations during typhoons, it is crucial that one has the capacity to forecast inundation levels developing in the following hours. In the Xinnan area, inundation develops swiftly during typhoons, so forecasting must be quick and effective in order to provide sufficient lead time for decision making and operational planning. Thus, we adopted the ARMAX black-box model for the construction of water-level forecast models for gauging stations. It should be noted that during typhoons, response plans rely more heavily on water levels than on runoff. We therefore based the forecast model on this study in the relationship between rainfall and water level rather than on the relationship between rainfall and runoff, as was common in many studies. Moreover, the rainfall and water-level data in this study were not processed in the conventional manner, in which the data are normalized by the maximum and minimum values before performing model regression, considering the fact that this information cannot be obtained while a typhoon is in progress. To enable real-time water-level forecasting during typhoons, we designed the water-level forecast model using raw rainfall and water-level data as inputs with the forecast water level of the next time step as the output.
ARMAX (Box and Jenkins, 1976) is a linear black-box model that merges the AR
model (Yule, 1927) and MA model (Slutzky, 1937) for time series analysis. It
takes into account the influence of other external variables in the
forecasting of future changes in dynamic systems. The model is as follows:
In this study,
Correlations between water-level and cumulative rainfall over
various durations:
Cross-correlations between water level and cumulative rainfall with
various time lags (10 min per lag):
Correlation coefficient (CC) between water-level and cumulative rainfall with average peak and the associated duration of cumulative rainfall.
In this study, we set the cumulative rainfall as the first input variable.
After calculating the cumulative rainfall of various durations from 1
to 30 h, the results are subjected to correlation analysis using water-level data from the target site to derive the correlation coefficient (CC),
which is defined as
Figure 3a–d present the results of correlation analysis pertaining to water-level data from various gauging stations and cumulative rainfall of various durations. The black round dots in the figures mark the average CC values of each typhoon event, and the tops and bottoms of the bars indicate the maximum and minimum CC values among the events. The variations in the average CC in the figures clearly show that the average CC increases with the duration of cumulative rainfall, reaches a peak, and then declines gradually. This phenomenon is apparent in all of the gauging stations. However, the duration of cumulative rainfall corresponding to the peak average CC can vary. Table 3 lists the peak average CC, the corresponding duration of cumulative rainfall, and the maximum and minimum CCs measured at each station. As can be seen, the peak average fluctuates roughly between 0.7 and 0.9, which indicates that a certain degree of correlation exists between water level and cumulative rainfall at the stations. The table also shows that the duration of cumulative rainfall corresponding to the peak average CC is longer in stations located further downward in the area. For instance, the duration of cumulative rainfall corresponding to the peak average CC at the Zhongnanxing station, which is at higher ground in the area, is 18 h, whereas the duration at the Meifu station, which is closest to the sea, is 25 h. We speculate that this might be associated with the time needed for water to aggregate and move downward. The table also presents a slight decrease in the peak average CC as the station falls closer to the sea as well as a greater difference between the maximum and minimum CC values. It is possible that this is because water levels at locations closer to the sea are influenced by ocean tides, which somewhat reduces its correlation with cumulative rainfall.
Cross-correlations of between-site water levels with various time
lags (10 min per lag):
After identifying the duration of cumulative rainfall with the highest
correlation for each gauging station, we analyzed the time lags between
water levels and cumulative rainfall. We shifted back the cumulative
rainfall data one time step at a time (each time step is 10 min) and
calculated the CCs between water level and cumulative rainfall for each
station. Figure 4a–d display the results of cross-correlation
analysis for water levels and cumulative rainfall at each station. As can be
seen, the peak average CC for each station occurred at zero lag, and the
average CC decreases as the leg lengthened. This indicates that no time lag
exists between water level and cumulative rainfall. Furthermore, the figures
show that as the lag increased, not only the average but also the maximum
and minimum CCs decreased, and the difference between the maximum and
minimum CCs (
To make full use of the water-level records from the gauging stations, we
identified an associate station for each existing station and used the water
levels from the associate station as a second input variable of the forecast
models. Generally speaking, the input and output of a model require a higher
degree of correlation, while in between the input variables a lower mutual
information (MI) is expected (Bowden et al., 2005; Talei et al., 2010; Maier
et al., 2010) in order to ensure that the information provided to the model
from the inputs are not redundant. MI is defined as
To find an associate site with which the water-level data have a high CC with
that of the target site while having a low MI with the identified cumulative
rainfall of that specific site, we combined the two indices into
Selection of associate site for the second model input based on
Input variables for the water-level forecast models.
Table 4 lists the event-averaged CCs between water-level data from each
target site and their candidate sites, as well as the event-averaged MI of
the first input variable (i.e., identified cumulative rainfall) of the
target site and the water-level data from the candidate sites. The table
also presents the
To elucidate the meaning of the time lag prior to variations in water-level data from target sites and their associate sites, we followed the previous analysis method in shifting water-level data from the associate sites one time step at a time. We then calculated the CCs between the water-level data from the target site and the associate site until we reached 30 time steps. The results in Fig. 5 show that the event-averaged CCs are all highest at zero lag. As the lag increases, the average, maximum, and minimum CCs of each station decrease, and the difference between the maximum and minimum CCs gradually increases. This is a clear indication that no time lag exists between variations in water level measured at target sites and at their associate sites. It is noted that, as shown in Fig. 5d, the mean CC for Meifu seems to be stationary for small time lags. The location of Meifu station is at the outlet of the area where it is close to the sea, as seen in Fig. 1. The water level at this site is likely to be influenced by factors other than rainfall and water level at the associate site (for example, tidal level of the sea). As a result, the cross-CC of Meifu to the associate site is the lowest compared to that of the other sites, as seen in Fig. 5d compared to Fig. 5a–c. This rather less connection of the cross-site water levels might result in the somewhat stationary CC for small time lags. Still, as shown in Fig. 5d, while the mean CC seems to be stationary for small time lags, the gradually expanding deviation between the maximum and minimum CCs of all the events suggests a zero lag between the water levels at Meifu and its associate site.
The above data analysis makes it possible to determine the input variables of the water-level models for each station as well as their time lags, as shown in Table 5. The first input variable is cumulative rainfall, and the duration of cumulative rainfall in the various stations are not the same; however, all of the time lags are 0. The second input variable is water-level data from the associate site for which the time lags are also 0.
The performance of each model was evaluated using the three indices below.
Nash–Sutcliffe coefficient of efficiency (CE) was proposed by Nash and
Sutcliffe (1970) to assess the forecasting capacity of hydrological models.
It is defined as where Error in the stage of peak water-level (ESP) is calculated by where Relative time shift (RTS): previous researches have shown that using historical data to forecast future
changes often results in time shift errors between the forecast and measured
hydrographs (Dawson and Wilby, 1999; Jain et al., 2004; de Vos and Rientjes,
2005). To evaluate the time shift error of forecast water levels, we shifted
the forecast water-level hydrograph back by 1 to 18 time steps and then
calculated the CE values. The time step corresponding to the highest CE
value is the time shift error ( where The determination of the prediction lead time depends on the required action
time for relief operations during typhoons, such as evacuating people from
the flooded area. In practice, it would be better to have at least 3 h ahead to warrant a smooth operation. Thus,
Cross validation (Geisser, 1993) was adopted for model calibration and
typhoon event validation. For each model with a designated model structure
(i.e., the number of terms for each of the four polynomials
The three indices, CE (to assess the capacity of a model to simulate entire typhoon events), ESP (to assess peak water levels), and RTS (to determine the time at which a peak water level occurs), each provide crucial element to disaster prevention operations during typhoons and must therefore be considered simultaneously. Unfortunately, it is difficult to weigh the importance of each element. Thus, we employed multi-objective optimization to search for models capable of performing well in all three indices.
The design goals included a larger CE and smaller ESP and RTS. Thus, we
defined the objective function as follows:
As mentioned previously, the structure of the ARMAX model is determined by
the polynomial functions
A lack of continuous relationships between the structure of the model and the objective function makes it impossible to obtain the optimal value of this problem using a gradient-based method. Based on the characteristics of the problem, we employed a genetic algorithm (GA) as a tool for optimization due to the fact that GAs do not require the Hessian matrix of the objective function to derive the optimal solution for each design variable. Furthermore, the fact that GAs can search for global optimums (Goldberg, 1989) makes this an extremely suitable approach to the identification of an ideal model.
GAs are based on Darwin's theory of natural selection. Since Holland (1973) developed a sound mathematical foundation based on this principle, GAs have been widely applied in a variety of fields to solve problems that could not otherwise be solved using conventional methods. In GAs, the individuals in a group are viewed as possible solutions to the problem under discussion. The individuals are rated according to their performance as they pertain to the objective functions and constraints. Superior performance increases the chance of passing on genes to the next generation. Through this process, the overall performance of the population gradually evolves and improves. After evolving for several generations, individuals with optimal genes (i.e., those that dominate the population) are adopted as the optimal solutions to the problem. GAs conduct optimization by assessing the performance of individuals in the population, which makes them ideally suited to solving problems with multiple objectives.
Models selected from the Pareto optimal model set using the best
scores for each of the three objectives (
In MOGA, the first generation of models for each gauging sites were produced
by randomly specifying the number of terms of the four polynomials and
The result of the MOGA is a Pareto optimal set.
All of the models in this set are un-dominated, which means that at least one
of their three indices (CE, ESP, and RTS) is not surpassed by that of any
other model. We selected the models with the best performance in all three
indices from the Pareto optimal set, the results of which are as shown in
Table 6. We listed three models for each gauging station and named them
according to their location. For example, the models for Zhongnanxing station
are Z1, Z2, and Z3. Among the three model types, model type 1 achieved the
highest average CE, model type 2 achieved the lowest average ESP, and model
type 3 achieved the lowest average RTS. The table lists four integer design
variables for each model, indicating the number of terms in
Comparison of model predictions (3 h lead time) and measured data
at
Using data from Typhoon Saola, a water-level forecast was performed using the models of each gauging station with a lead time of 3 h. We then compared the results with the observed values, as is shown in Fig. 6a–d. The results in Fig. 6a show that the forecast water levels from the three models of the Zhongnanxing station (furthest from the sea) are roughly identical to the observed water levels, indicating that the forecast results are very accurate. No significant differences were observed among the forecast results of the three model types.
The forecast results of the other three gauging stations in Fig. 6b, c, and d by the three types of models present different characteristics. Model type 2 (X2, S2, and M2) shows good performance in predicting peak water levels while exhibiting a time shift between the measured water levels and the water-level forecast. Model type 2 emphasizes the need to minimize error in peak water levels; therefore, it is likely that variations in the water-level forecasts from this model closely follow the changes in measured water levels with a certain degree of lag. In contrast, model type 1 (Z1, X1, S1, and M1) differs little from type 3 (Z3, X3, S3, and M3) in Fig. 6b–d, both of which achieve perfect forecasts as water levels dropped but produce slight time lags as water levels rose. This is particularly apparent at the Sijie station in Fig. 6c and at the Meifu station in Fig. 6d, where water levels rose swiftly. However, the time shift errors presented by model types 1 and 3 are still smaller than 3 h. Considering that the lead time used in these model forecasts was 3 h, any time shift error of less than 3 h means that the results retain reference value for disaster prevention operations during typhoons. It is worth noting that, as shown in Fig. 6, the predicted hydrographs exhibit certain fluctuation compared to the data. The reason for this is that the input of the models includes rainfall data recorded with a frequency of 10 min. As seen in Fig. 2, the rainfall data record appears to be fluctuating through the event. As a result, the hydrographs predicted by the models also display certain fluctuations. However, the trend of the predictions still matches the observations.
Comparison of model performance:
Figure 7a–c present the variation of CE, ESP, and RTS, respectively, among the models. As shown in Fig. 7a, compared to the type 2 and type 3 models, model type 1 (Z1, X1, S1, and M1) exhibit higher mean CEs as well as smaller deviations between the max and the min. Among the four type 1 models, models Z1 and X1 achieve higher mean CE than models S1 and M1. The reason for this might be that the locations of Sijie station and Meifu station are more closer to the sea and the water levels at these sites might be influenced by other factors (such as tidal levels) not accounted for in the models. Figure 7b shows the variation of ESP among the models. It is clearly seen that type 2 models (Z2, X2, S2, and M2) display mean ESP values much lower than the other two types of models. The deviations of ESP for type 2 models are also much smaller compared to those of the other types of models. This demonstrates the good performance of type 2 models on peak water-level prediction. As seen in Fig. 7b, type 3 models (Z3, X3, S3, and M3) appear to display poorer performance on peak water-level prediction, indicated by higher mean ESP as well as larger deviations. It is noted that, while type 2 models exhibit very low ESP on peak water prediction, they also suffer from severe time shift errors, as signified by the rather high RTS of these models shown in Fig. 7c. In contrast, type 3 models (Z3, X3, S3, and M3) exhibit much lower mean RTS than the type 2 models. In comparing the RTS performance of type 1 and type 3 models, it appears that model Z3 performs slightly better than Z1 at Zhongnanxing station, while at the other stations, the type 3 models achieve rather better RTS scores than type 1 models.
In summary, type 1 models achieve the best score on CE with moderate performance on ESP and RTS. Type 2 and type 3 models display somewhat opposite characteristics. Type 2 models exhibit good performance on predicting the peak water level but also show rather severe time shift errors. In contrast, type 3 models achieve good scores on RTS but perform poorly on ESP.
An approach integrating ARMAX and MOGA for the forecasting in inundation levels during typhoons has been proposed. The developed methodology makes use of water-level data from a network of gauging stations in conjunction with rainfall forecast data to construct ARMAX-based inundation-level forecast models at each gauging site. Suitable input variables and associated time lags in the water-level models were identified by analyzing the cross-site mutual information and cross-correlations. The performance of the models were assessed on three aspects: (1) the bulk prediction ability signified by CE, (2) the accuracy on predicting the peak water level represented by ESP, and (3) the time shift error indicated by RTS. A MOGA was employed to identify the optimal model structures by searching for a Pareto optimal set of models capable of performing well in the three indices (CE, ESP, and RTS). Optimal models that each obtained the best score on the three indices were selected from the Pareto model set. Comparisons with measured water levels show that the models emphasizing ESP (model type 2) resulted in accurate prediction on the peak water levels but also show noticeable time lag. The models emphasizing CE (model type 1) and RTS (model type 3) provided an accurate indication of variations in water levels with no lag while water levels were dropping, yet a slight time lag when water levels were rising. Comparisons on the variations of performance indices among the models indicate that, in general, type 1 models present the best performance on CE with modest ESP and RTS. Type 2 models achieve very good performance on ESP but suffer from time shift errors. In contrast, type 3 models display good performance on RTS, though they are somewhat poorer on ESP. The results show that the proposed methodology is capable of deriving optimal models each showing good performance on the three indices. All three types of models together provide thorough information in a real-time manner and are expected to be of help for disaster prevention operations during typhoons.
The data set is available at:
This research was supported by the Ministry of Science and Technology in Taiwan under grant no. MOST 105-2625-M-197-001. Support from the Water Resources Agency in Taiwan is also gratefully acknowledged. Edited by: S. Tinti Reviewed by: two anonymous referees