Comment on nhess-2021-77 Anonymous Referee # 2 Referee comment on " Real-time Tsunami Force Prediction by Mode Decomposition-Based Surrogate Modeling

This paper examines how tsunami impact can be predicted rapidly using mode decomposition of the results from 2D (shallow-water) propagation modelling coupled with 3D (Navier-Stokes) inundation modelling. The surrogate model with mode decomposition reproduces quite well the time-series and maps of run-up and impulses. The mini workflow of how the mode decomposition surrogate model is clearly presented in Figure 1. What is a little more unclear to me is the overall workflow and purpose of the work. The title uses the term "Real-time" which, in my world, is reserved for urgent or situation computations where the process is initiated by a real-world trigger (observational data or human intervention) and that the computations are running against a real-world clock. Is this the case? Otherwise, "Rapid" is probably a better term – that indicates fast computation.

method. Since POD enables us to systematically extract the feature quantities of the temporal and spatial distributions of risk indicators, it is considered suitable for surrogate modelling for evaluating all kinds of disaster risks. Although some studies using mode decomposition techniques have been reported in the research field of disaster science, the time-space surrogate modeling of tsunami force based on a high precision 3D tsunami simulation has not been extensively studied. 55 The present study proposes a framework for real-time tsunami force predictions by the application of the POD-based surrogate modelling of numerical simulations. Only a limited number of large-scale numerical analyses are performed for a selection of scenarios with various fault parameters to capture the distribution tendencies of the target risk indicators. After 2D shallow water simulations are performed, the results are used as input data for 3D flow simulations to evaluate the time histories of forces acting on buildings. Then, POD is applied to the evaluation results to extract the principal modes that represent the tem-60 poral and spatial characteristics of the forces caused by a target tsunami. A surrogate model can then be constructed by a linear combination of spatial modes whose coefficients are determined by means of the regression analysis and interpolation techniques. To demonstrate the applicability of the proposed framework, one of the tsunami-affected areas during the Great East Japan Earthquake of 2011 is targeted. Using the 2D simulation and the results are obtained for different input parameters, we carry out a series of 3D numerical analyses to obtain a set of time history data of the spatial distributions of tsunami force and 65 then construct a surrogate model capable of predicting the time variations of the spatial distributions of tsunami forces almost instantaneously. A surrogate model of the inundation depth is also constructed and compared with that of tsunami force.
The structure of this paper is as follows. Section 2 explains a flow and specific procedures of the proposed framework. In Section 3, the proposed framework is applied to a target city to construct the surrogate model that enables real-time predictions of tsunami forces equivalent to the 3D tsunami runup simulations. This section describes a flow and methodologies of the proposed framework. As mentioned in previous section, principal spatial modes are extracted from precomputed simulation data that are obtained from 2D-3D coupled tsunami analysis, and the surrogate models are constructed using the spatial modes for the real-time tsunami force prediction. Fig.1 shows the flowchart of the proposed method. Each parts of the proposed method, such as tsunami analyses, mode decomposition, and surrogate 75 modeling, are explained in details in the following subsections.

2D-3D coupled tsunami analyses
To collect the data necessary for the surrogate modelling explained in the previous section, a set of two sequential numerical analyses is carried out for each case with a selected set of fault parameters. A 2D tsunami analysis is first carried out for to obtain the information about the tsunami wave caused by the offshore fault. Then, the time histories of the tsunami height 80 and flow velocity on the specified boundary of the target urban area are extracted for use as input data for a 3D tsunami runup simulation. A 3D numerical analysis is performed to elaborately evaluate the spatial and temporal distributions of risk indicators within the target area. Since the risk indicators in this study include not only the inundation depth, but also the force 4 https://doi.org/10.5194/nhess-2021-77 Preprint. Discussion started: 7 May 2021 c Author(s) 2021. CC BY 4.0 License. acting on buildings, the 3D calculation requires a high-fidelity numerical analysis. In this section, only the governing equations for 2D and 3D tsunami simulations are outlined. For the details of their connection, Takase et al. (2016) can be referenced.

85
To numerically analyze tsunami wave propagation over a wide area, 2D shallow water simulations are commonly performed.
TUNAMI-N2 (Imamura (1995); Goto et al. (1997)), one of the most well-known simulation codes based, is based on the concept of the shallow water theory. In this study, we employ the TUNAMI-N2 to represent wave propagation in the offshore area. The following formats of continuity and nonlinear long wave equations are solved in the 2D analysis: where M and N are the flow rates in the x and y directions, η is the water level, D is the total water depth, g is gravitational 95 acceleration and n is the Manning roughness coefficient. To evaluate the force acting on buildings, we employ the following set of 3D Navier-Stokes and continuity equations in the analysis domain Ω ns ∈ R 3 ρ ∂u ∂t where ρ is the mass density, u is the velocity vector, σ is the stress tensor, and f is the body force vector. Also, assuming a Newtonian fluid, the stress is calculated as where p is the pressure, µ is the coefficient of viscosity, and ε(u) is the velocity gradient tensor defined as To solve the governing equations of the 3D analysis, the stabilized finite element method(SFEM) is employed in this study.
Details of the method is explained in the appendix(see Appendix A).

Mode Decomposition
Proper orthogonal decomposition (POD) (Liang et al. (2002)) is well known as a technique to extract the principal direction along which the variance of the collected data is maximized. In other words, POD is capable of grasping the characteristics 110 of data and expressing data in a lower dimension. When applying POD to data, we prepare a matrix storing the data arranged in accordance with a certain rule (hereafter, referred to as the "data matrix"). When n data are obtained for scenario case i, they are stored into a column vector denoted by x i , which has n components. When the scenario ranges from 1 to N , the data matrix can be defined as Here, a vertical line is added to each column vector to implicate that the component alignment follows the specified rule. The covariance matrix of the data matrix is defined as The eigenvectors are the principal directions, and the corresponding eigenvalue represents the variance in each principal direction. Also, larger eigenvalues have more major contributions to the data.

120
Let λ j be the j-th eigenvalue of C, and u j be the j-th eigenvector (j = 1, ..., n). Since each eigenvalue corresponds to the variance in the corresponding principal direction or mode, those with small eigenvalues have almost no significant values and provide little meaningful information, and therefore can be omitted in the sequel. To omit unnecessary modes, the contribution rate is generally introduced as an indicator to compare the magnitude of each eigenvalue with others. Specifically, the contribution rate d j of a particular mode j is defined as a percentage of each eigenvalue against the sum of all eigenvalues as When this rate is arranged in descending order, the threshold for the omission is determined according to the profile of its decreasing curve. It should be noted that the information about the omitted mode is lost and is a source of error.
Also, based on the theory of singular value decomposition, data matrix X can be expressed as where U is a matrix lining up the eigenvector u j of C in the column direction and V is the matrix lining up the eigenvector v j and matrix C = X T X in the column direction. Also, Σ is the matrix having the square roots of eigenvalues in the diagonal components, which are referred to as singular values. Here, p is the number of singular values that must be greater than zero.
Equation (11) can be substituted for C and C to have their eigenvalue decomposition as It follows that the C and C have common eigenvalues, each of which is equal to the squared singular value of X, and that U and V in Eqn. (11) correspond to the eigenvectors of C and C , respectively.
Additionally, from the relational expression of the singular value decomposition, data vector x i for case i is expressed as the 140 linear combination of the eigenvectors of the covariance matrix as where v ij are components of V with i and j indicating row and column numbers, respectively. Here, the modes are arranged in accordance with the descending order of the corresponding eigenvalues and α ik is the coefficient of the k-th mode associated with the i-th case. Once modes to be omitted are determined according to the magnitude relationships of the contribution rates, 145 etc., the number of modes r used to approximate the data is determined, so that the equation above yields the following reduced order expression: wherex i is expected to be a close approximation of the original data x i for the i-th case. In this manner, data for a certain case can be expressed as a linear combination of major modes expressing the fundamental features. Remember that each of these 150 selected modes corresponds to the eigenvector of the covariance matrix whose associated eigenvalue has relatively larger then others. As acknowledged above, error may occur in Eqn. (15) due to the truncation or mode omission.

Construction of the surrogate model
Once the reduced order expressions of all the data vectors are obtained, a surrogate model can be obtained straightforwardly.
We begin with identifying the relationship between the coefficient for each mode and the set of input parameters β , each of 155 which corresponds to a selected scenario or case. That is, coefficients α ij are approximated as functions of parameters β by interpolation or regression, which are denoted by f j (β). Then, the surrogate model can be expressed in the following equation whose independent variables are input parameters f j (β): In this study, f j (β) is determined by the interpolation with radial basis functions (RBF) (Buhmann, 1990) because it is appli-160 cable even if the parameters are not distributed in equidistant intervals and even if some defects are present in the data.
RBF interpolation for f j (β) can be expressed by the following equation: where β i contains the set of parameters for case i, w i is the weight, and φ(β, β i ) ≡ exp(−γ||β −β i || 2 ) are RBF. Here, γ is the parameter controlling the smoothness of the function. Also, by substituting the set of actual the input parameters in Eq. (17) we obtain the following simultaneous equations to determine the set of weights w i : In this study, ridge regression (Hoerl and Kennard, 1970) is used to compute weights for Eq. (18).By introducing the ridge regression, it is possible to prevent the overfitting problem. Since the accuracy of the interpolation depends on the regularization parameter used in the ridge regression and the smoothness parameter γ of the RBF interpolation, these values are determined 170 by the application of a certain cross-validation technique (Stone, 1974) combined with the Bayesian optimization (Močkus, 1975) in this study.

Application to a real disaster case
In this section, the proposed method is applied to tsunamis that have occurred. The simulation and the construction of the surrogate model are based on the tsunami induced by the 2011 earthquake that occurred off the Pacific coast of Tohoku.

Tsunami simulation
In this study, as previously mentioned, the calculation was performed in two steps, including a numerical analysis based on the 2011 earthquake that occurred off the Pacific coast of Tohoku using 2D analysis over a wide area and 3D runup analysis for the target region. To construct a surrogate model that considers uncertainty, it is first necessary to decide what uncertainty should be considered. Next, we considered the dominant factors from the occurrence of the tsunami to its runup. For example, 180 the epicenter position and magnitude are factors that need to be considered during the earthquake. During tsunami propagation, the fault model that controls the initial waveform of the tsunami and the submarine topography data need to be included. In this study, the 2011 earthquake that occurred off the Pacific coast of Tohoku was used as a basis. That is, the relevant factors in the process of the earthquake itself can be found in the accomplishments of research conducted up to this point. In concrete terms, values of these parameters. The cases that were analyzed and the names of the cases are shown in Table 2. For the slip of each small fault, there were five patterns from 0.7 to 1.4, and for the rake, there were 10 patterns with −20 • and +25 • changes in the angle, making a total of 50 cases.

Wide-area 2D tsunami analysis
To obtain the information about the tsunami wave usable for the 3D analysis, a tsunami analysis is carried out utilizing the 2D shallow water flow model. First, a wide-area 2D analysis was performed to obtain time history data for the tsunami height

Connecting the 2D and 3D analyses
Using the results of the 2D tsunami analysis over a wide area as the input conditions, a 3D tsunami analysis was performed to represent the tsunami runup in the target region. In terms of the concrete connection method, the method used in the study of Takase et al. (2016) was employed. An image of the location of 2D-3D boundary is shown in Fig. 6. The time-series data of 210 the wave height and flow velocity obtained from the 2D wide-area analysis were used as 3D input values. better to consider the effect of washing away buildings, modeling this effect is difficult. Furthermore, the main objective of this study is to construct a surrogate model of tsunami force and discuss it's performance. We therefore conclude that the simulated results roughly express the real tsunami, that use the simulated results in this study.

3D runup analyses
In order to construct a surrogate model of tsunami force, we can use the hydrodynamic force acting on buildings calculated in the 3D simulation. It is however difficult to quantify pointwise tsunami force because the force is strongly affected by the 230 direction of building surfaces. To avoid this problem, we consider a 2D mesh with a grid size of 10m for evaluating the tsunami force. An image of the mesh is shown in Fig. 12. The force acting on the building surface is integrated in each mesh and quantified as a scalar value. The spatial distribution of the tsunami force was obtained in this manner.    evaluated points is n = 214×260 = 55640. The impact force is calculated by using the pressure and the surface affected by the pressure in a 10-meter square. Additionally, 20 seconds is used for the time interval for the use data and 150 steps of data are used in accordance with the time for the tsunami runup. In this application example, in addition to the two input parameters, 240 time is introduced as another parameter because when predicting damage in real-time, it is necessary to evaluate the state of time-related developments in addition to spatial distribution.
Proper orthogonal decomposition is applied to construct a surrogate model for the results of 40 of the 50 cases shown in Table   1, excluding the 10 cases where the rake is ±10 • . The reason that these 10 cases are not used for surrogate model construction is that their associated data are later used to verify the reasonableness of the created surrogate model. Additionally, as 40 cases 245 secured a sufficient number of analysis cases for the two variables in this study, verification is not performed, but it is desirable to assess how many cases are required when constructing the surrogate model.
Next, the data matrix is defined. First, the time-series data for a particular scenario can be defined as follows as matrix X i .
Here, x(U, λ, t) is a vector showing the spatial distribution of the risk indicator in relation to time. Additionally, as a concrete expresses the maximum number of output steps in the numerical analysis, and in the example in this study, this is m = 150.
Data matrix X is defined as follows, as laid out in the column direction.
The previously described proper orthogonal decomposition was carried out for this data matrix. By applying proper orthogonal 255 decomposition to this matrix, it is possible to extract a common temporal mode for data including time, and it is possible to construct a surrogate model including the time direction.

Extracted spatial modes and construction of a surrogate model
Here, we show the results of applying proper orthogonal decomposition in relation to impact force and inundation data.
First, in relation to the extracted spatial mode, the impact force and water depth for the first mode to the third mode are as 260 shown in Figs. 13 and 14, respectively.
1st mode 2nd mode 3rd mode From these figures, it can be confirmed that the impact force distribution and inundation depth distribution both have roughly the same spatial distribution characteristics. If we examine the characteristics in more detail, in the case of the first mode, we can see the tendency for the coastal side to be most strongly impacted. Naturally, since the points close to the coast are most affected by the tsunami, this is a mode that best expresses this kind of impact. Additionally, the second mode expresses opposite 265 tendencies for the section close to the coast and the section behind it, and the third mode expresses opposite tendencies for the eastern and western coastal areas.
The contribution rates of the impact force and water depth are shown in Fig. 15. The contribution in the first mode is extremely high for both impact force and inundation depth. In this study, as an indicator of the number of modes, the error and the contribution rate were investigated. In this example, for the 10 cases of data kept aside as verification data, the root mean 270 squared error for the results obtained from the numerical analysis and the results obtained from the surrogate model for each mode are calculated. The error of case e i is calculated using the following equation.
Here, n is the number of evaluation points and m is the number of time step. Additionally, X j is time series data of numerical simulation results for i-th case andX j is results obtained from a surrogate model with r modes. the relations between the 275 number of modes and the error for each risk index is shown in Fig. 16. From Fig. 16, it can be seen that the error decreases as the number of modes used in the surrogate model increases, and as the number of modes increases, the error rates decrease, thus gradually converging. In this study, 20 modes are used to construct the surrogate model, because the error has been sufficiently reduced.
Next, by interpolating the proper orthogonal decomposition coefficient as a parameter function, the surrogate model is 280 created. As shown in Eq. 20, as the data matrix contains the slip U , rake λ, and time t, which are parameters for numerical analysis in the column direction, the proper orthogonal decomposition coefficient is also expressed as a function of slip, rake and time. In concrete terms, this coefficient can be expressed as f k (U, λ, t) as in the following equation.

285
Here, we compare the results obtained from the numerical analysis and the results from reconstructing the original data using the constructed surrogate model. First, when comparing the time-series data, several evaluation points are set. In this study, the   Based on these results, it can be summarised that whereas errors partially occur based on the impact of mode reduction, in general, the original data are being reproduced, and it is possible to express the data as a linear combination of modes. It should also be noted that there are different tendencies between spatial distributions of tsunami force and those of inundation depth.
For instance, at time step 65, high values are locally seen in the inundation depth, while that tendency is not seen in the result of 300 tsunami force. This indicates that it's difficult to predict damage of buildings based on only inundation depth, and information of tsunami force is also respired for an accurate damage prediction.

Validation of the surrogate models
By comparing data that were not used for the construction of the surrogate model, it is possible to investigate whether the surrogate model could reproduce the numerical analysis results. A comparison of the results obtained from the numerical 305 analysis and the surrogate model for the two scenarios of S2R3 and S5R7 is shown in Figs. 22-27. As in the previous section, for each physical quantity, a comparison by snapshots and time-series data at evaluation points 2, 3, and 8 is shown.   Here, it is confirmed that it is possible to represent the simulation results using arbitrary parameters with the surrogate model.
In concrete terms, the error rate is calculated. For each of the physical quantities, the mean squared error for the time-series data of all points are calculated. The results are shown as follows in Table 2 With regard to Table 2, the ratio of the mean values was 434% for the impact force data and 414% for the water depth data in S2R3 and 318% for the impact force data and 252% for the water depth data in S5R7. It is acknowledged that these are very high values. This large error comes from inadequate representation of local peaks and the phase shift. In addition, because the time-series data includes a large amount of data that would not be affected by a tsunami, such as data from mountainous areas, the data include many values equal or close to zero: this would result in extremely small mean values. That is, it is likely 315 that the large error rate value in calculations is because of the small mean values. Note that the ratio of the maximum values is 0.78% for the impact force data and 2.15% for the water depth data for S2R3 and 0.78% for the impact force data and 2.04% for the water depth data in the case of S5R7. In these cases, the error rate is low compared to the maximum value. In other words, because the error in Table 2 is sufficiently small compared to the maximum value, the overall distribution and the general shape of the time-series data can be roughly captured by the surrogate model.

320
Next, as in the previous section, the impulse is calculated using the time-series data of the impact forces, and a comparison of the impulse calculated from the numerical analysis results and the results obtained from the surrogate model for the 10 evaluation points is shown in Figs few seconds, and because it was possible to calculate the spatial temporal distribution of the physical quantity using arbitrary parameters, it would be possible to apply the surrogate model using the spatial modes for real-time damage prediction.
This study presents the framework for predicting time series data of tsunami force based on the results of a high precision numerical analysis at a low calculation cost. By performing proper orthogonal decomposition in relation to the numerical 335 analysis results while considering uncertainty, we could extract spatial modes and express the spatial distribution of the risk indicators as a linear distribution of the modes. Additionally, by expressing the coefficients as functions of analysis parameters and time in the surrogate model, it was possible to calculate the distribution at extremely low calculation costs for certain cases for which no numerical analysis has been performed. In this study, surrogate models of the impact force and inundation depth of a tsunami running up to a target area were constructed, and it was shown that the results of the numerical analysis could be 340 roughly represented. It is also shown that the impulse calculated from the time-series data of the tsunami force obtained from the surrogate model can sufficiently represent the numerical simulation results. These results indicate that the surrogate model can be efficiently utilized for tsunami risk assessments.
In this study, for tsunami events, only the two fault parameters of slip and rake was considered as uncertainties. However, as many uncertainties are at play in an actual tsunami event, it would be possible to integrate these parameters and perform 345 a numerical analysis and, in the same manner, create a surrogate model that incorporates a larger range of uncertainties in more detail. However, when increasing the types of input parameters, as the number of scenarios that need to be considered is immense, it is important to use an efficient method for setting the scenarios (McKay (1979)). Furthermore, it is necessary to evaluate in advance the extent to which these uncertainty parameters will fluctuate. In the surrogate model created in this study, a sufficient evaluation was performed with interpolation (within the scope of the parameters for which numerical analysis was 350 conducted). However, with extrapolation (outside the scope of the parameters for which numerical analysis was conducted), the possibility of reduced accuracy is high, so it is necessary to evaluate in advance the extent to which the uncertainty parameters fluctuate and set scenarios that allow for this range of fluctuation.
While the mechanism shown in this paper was designed for use with tsunamis, it has potential for use in the real-time simulation of various disasters. Since numerical simulations of disasters typically involve high computational costs, it is important 355 to consider uncertainty in advance and perform a numerical analysis before creating a surrogate model. That is, the real-time simulations developed by the surrogate model proposed in this work can be used as a tool to gauge the extent of disasters.

Appendix A: Stabilized finite element method
With regard to Eqs. (4) and (5), we apply the stabilized finite element method equipped with the streamline-upwind/Petrov-Galerkin (SUPG) and pressure-stabilizing/Petrov-Galerkin(PSPG) (SUPG/PSPG method) (Tezduyar, 1991)  e=1 Ω e ns τ ns cont ∇ · w h ρ∇ · u h dΩ = 0 (A1) Here, Ω ns ∈ R 3 is the analysis domain, n el is the number of elements, u h and p h are the finite element approximations for 365 velocity and pressure, respectively, and w h and q h are the approximations of the weight functions for the equations of motion and continuity, respectively. Also, the fourth involve the SUPG term for stabilizing the advection process and the PSPG term for avoiding pressure oscillation. In addition, the fifth term has the shock-capturing term (Aliabadi and Tezduyar, 2000) for avoiding numerical instability on free surfaces, and τ ns supg τ ns pspg and τ ns cont are all stabilization parameters. In this study, the phase-field approach is employed to capture the complex geometries of free surfaces, such as breaking 370 waves and flow around buildings. In this approach, the locations of free surfaces can be determined by solving the Allen-Cahn advection equation (Chiu and Lin, 2011;Takada et al., 2013), which was modified to the storage format as ∂φ ∂t u · ∇φ = δ ∇ · (δ(∇φ) − F a ) , F a = φ(1 − φ) ∇φ |∇φ| (A2) where the phase-field function φ takes 1.0 or 0.0 to identify liquid (water) and gas (air) phases, and an intermediate value represents the free surface. Here, ε is the mobility and δ is the representative width of the free surface. According to the phase 375 field function, the fluid density ρ and viscosity coefficient µ for each element were evaluated as where subscripts l and g indicate the quantities of the liquid and gas, respectively.

380
In this study, we also apply the stabilized finite element method with the SUPG method to the Allen-Cahn equation above, so that the resulting finite element equations are given as