Assimilation of decomposed in situ directional wave spectra into a numerical wave model of typhoon waves

The purpose of this study was to enhance the accuracy of numerical wave forecasts through data assimilation during typhoon periods. A sequential data assimilation scheme was modified to enable its use with partitions of directional wave spectra. The performance of the system was investigated with respect to operational applications, specifically for typhoon waves. Two typhoons that occurred in 2006 around Taiwan (Kaemi and Shanshan) were used for this case study. The proposed data assimilation method increased the forecast accuracy in terms of wave parameters, such as wave height and period. After assimilation, the shapes of directional spectra were much closer to those reported from independent observations.


Introduction
The application of data assimilation to operational wave modelling has rapidly increased over the past 20 yr, in part due to the increase in the near real-time availability of wave and wind observations.This increase in data availability has drastically increased since the launch of earth-observing satellites, such as ERS-1 and ERS-2.It has also inspired many researchers to investigate the possibilities of including data assimilation methods in operational wave forecasting systems to improve the accuracy of the estimation of sea states.
Assimilation techniques for wave forecasting are commonly divided into sequential techniques (e.g.Lionello et al., 1992;Komen et al., 1994) and variation methods.Sequen-tial techniques are computationally inexpensive and have resulted in some success in improving wave forecasts (e.g.Günther et al., 1993).This success has led to the implementation of this type of system into the operational wave analysis/forecast cycle at the European Centre for Medium-Range Weather Forecasts (ECMWF).
Wave and wind data calculated using sequential techniques are used to correct the winds and waves at each time point of the model regardless of the previous model states.Because the space-time structure of the modelled wave field is not taken into account, the results are not fully consistent with the dynamics of the wave model.In the first attempt of wave data assimilation, Komen (1985) improved swell forecasts in the southern North Sea through the use of observed wave heights in the central North Sea.The waves predicted by the model were replaced in the wave model with independent observations whenever and wherever available.However, the utility of these new observations was relatively short-lived because the corrections were quickly lost due to the uncorrected winds and waves found elsewhere in the wave model domain.Hasselmann et al. (1988) and Janssen et al. (1989) improved the results of the model by distributing the corrections over a larger area and by including wind corrections.
In other cases, the impact of this type of system has proven to be too weak to improve the accuracy of the model, as the researchers expected (Burgers et al., 1992;Mastenbroek et al., 1994;Bidlot et al., 1995).As suggested by Mastenbroek et al. (1994) and Bidlot et al. (1995), this result may be caused in part by the fact that significant wave Published by Copernicus Publications on behalf of the European Geosciences Union.
height observations alone do not contain sufficient information for a proper renewal of the wave spectrum, which is the prognostic variable in a spectral wave model.Recently, sequential assimilation systems have been developed and are capable of assimilating observations of the full wave spectrum (Hasselmann et al., 1994(Hasselmann et al., , 1996;;Voorrips et al., 1997;Breivik et al., 1996).Voorrips et al. (1997) demonstrated the benefit of using spectral information by comparing a method that utilised this information with a method based only on significant wave height assimilation.
During the past decade, the most frequently used operational assimilation schemes have been single-time-level schemes, such as optimal interpolation (OI) (e.g.Janssen et al., 1989;Lionello et al., 1995;Hasselmann et al., 1997;Voorrips et al., 1997).OI is computationally fast and easily applicable to the online wave analysis/forecasting conditions, but it suffers from some drawbacks.Forecast errors are often inhomogeneously distributed over the wave spectrum and limit the improvements obtained by the assimilation of wave heights alone (Mastenbroek et al., 1994).Thus, some groups have challenged the use of SAR (Synthetic Aperture Radar) data (Breivik et al., 1996;Hasselmann et al., 1997).Although the use of SAR data may be found useful for wave models in regional seas, the density of SAR observations is simply too low to have a serious impact on the wave analysis.Additionally, the spectral resolution of SAR, which truncates waves shorter than 100 m, is a larger problem for partly sheltered seas where the average wavelengths are substantially shorter than those in the open ocean.However, there is a good alternative to the SAR data for regional seas.Regional seas are densely covered with pitch-and-roll buoys, which measure spectral information.Moreover, pitch-and-roll buoys supply more data than satellites in the region because they continuously record data at fixed positions.
The aim of this study was to investigate the potential use of the spectral observations from pitch-and-roll buoys, which were reported in near-real-time, for assimilation in an operational forecast system.The set-up of an optimal interpolation scheme if only one buoy is available in the forecast domain, which is located in the deep ocean approximately 220 km away from the Taiwan coast, is discussed.In addition, the impact of assimilation on the wave analysis and forecast is quantified by comparing runs with and without assimilation for several typhoons that occurred in 2006.

Descriptions of the simulation region
This study focused on coastal waters in eastern Taiwan.A three-level nesting scheme was applied to obtain detailed wave information in this region and to effectively simulate the wave field (Fig. 1).The simulated regions, grid resolutions, and time steps of the model nestings are listed in Table 1.The grid resolution and time step conformed to the CFL condition.The purpose of including the larger region was to provide boundary values for the next finer layer.For this study, we only concentrated on the fine-resolution grid (i.e.layer 3).The SWAN wave model (Booij et al., 1999) was used for all layers.The parametric sensitivity analysis (Lee et al., 2009) was used to search for optimal parameter values in SWAN wave model.These two parameters revealed by sensitivity analysis are: nonlinear saturation-based whitecapping combined with wind input (default value is 0.0015; tuned value is 0.00172) and the coefficient of the JONSWAP results for bottom friction dissipation (default value is 0.038; tuned value is 0.0284).
All SWAN model runs were forced by operational 1-hour wind fields, with a 0.5 • resolution in longitude and latitude, provided by the Central Weather Bureau (CWB).In order to match up with the model simulation, the wind fields were linearly interpolated in space and time, with the simulated regions, grid resolution and time step corresponding to the model nesting.
Observed spectral data from the Gagua Ridge buoy (122.78 • E, 22.01 • N) were used for model assimilation.The Gagua Ridge buoy is located approximately 220 km east of Taiwan, where the water depth is approximately 6000 m.Measurements from the Hualien buoy (Fig. 1) were used for verification purposes.The Hualien buoy is moored near the shore (approximately 1 km off-shore, where the water depth is approximately 21 m).Pitch-and-roll buoys are developed, manufactured, and operated by the Coastal Ocean Monitoring Center (COMC) of National Cheng Kung University, which was commissioned and is supported by the CWB, and the buoys report directional wave spectra every hour.A fast Fourier transform (FFT) was used to obtain the full two-dimensional wave spectrum (Brigham, 1988).
3 An introduction to the data assimilation scheme OI (Hollingsworth, 1986) is a statistical method used to construct the analysed significant wave height field.It determines the minimum error variance solution for the model state by combining a model first-guess field and observations with pre-specified forecast and observation error covariances.Previous experience has been obtained with the optimal interpolation method applied to wave height and wave period measurements (Janssen et al, 1989;Lionello and Janssen, 1990;Burgers et al, 1992;Mastenbroek et al, 1994).The Optimal Interpolation of Partitions (OI-P)  method, which used in the present study is also an optimal interpolation method, but an extended version which assimilates observations of full wave spectra from pitch-and-roll buoys (Hasselmann et al., 1996;Voorrips et al., 1997).Spectral partitioning (Gerling, 1992) is a technique used to decompose a wave spectrum into the main wave systems, which are the wind sea system and swell systems.Instead of specifying the main wave systems, a fairly accurate characterisation of the spectrum may result from specifying the wave energy of different frequencies and directional bands.In this study, the model-simulated directional spectra were replaced by the observed data from data buoys and the OI-P scheme was derived based on the procedure from OI formulas (Lionello et al., 1992) as follows.The analysed directional wave spectra at each point x i , denoted as S i A (f, θ), were expressed as a linear combination of S i P (f, θ), indicating the first-guess results produced by the model and by S k O (f, θ)(k = 1, . .., M obs ), and the observation: where σ k P is the root mean square error in the model prediction.In addition: where S k T (f, θ ) represents the idealised true value of the directional wave spectra.The weights, W ik , were chosen to minimise the root mean square error in the analysis of σ k A : The angle brackets indicate an average over a large number of iterations.Assuming that the errors in the model are unrelated to the errors in the measurements, the solution is: where the elements of matrix M are of the form: where P and O represent the error correlation matrices of the model predictions and observations, respectively (both are actually scaled with σ i P ): where S m P (f, θ ), S m O (f, θ ) and S m T (f, θ ) were expressed as the directional wave spectra at each point m of the model prediction, observation and the idealized true value, respectively.S k P (f, θ ), S k O (f, θ ) and S k T (f, θ ) represent the directional wave spectra at each point k of the model prediction, observation and the idealized true value, respectively.Therefore, the prediction error correlation matrix P and the observation error correlation matrix O must be clearly specified.This specification would, in practice, require the determination of statistics for both predictions and observations, which are presently unavailable.If the idealised true value is known, the RMSE between the observations and first-guess results can be obtained.However, errors are inherent in any observation technique during data collection; therefore, we are unable to obtain the idealised true observation and the assumptions from Lionello et al. (1992) were adopted in this study.
where | xm − xk | is the distance between grid points m and k.L max = 5 • , which is the correlation length.The effect of variations of L max and of the ratio between σ m O and σ m P on the results of the assimilation is discussed and verified (Fan, 2008).The observation errors are random and unrelated.δ mk is the Kronecker delta.In addition, the simulation analysis yielded a ratio between the observation and first guess with a standard deviation R m of 1.

Optimal frequency and directional bands for partitioning
The assimilation procedure was used to integrate the model's first guess and the observed partition parameters (e.g.frequency and direction) into an analysed field of parameters.An important input value for the OI-P procedure is the covariance of the errors of the observed and model parameters.
The covariance is obtained by calculating long-term statistics of the differences between the observations and the hindcasts of the SWAN model.The observational errors are assumed to be spatially independent.
Although there is only one data buoy in the deep ocean, the first-guess spectra of neighbouring grid points of the Gagua Ridge buoy must be used as fictitious buoy data.The weight between the virtual stations and the field station was acquired by comparing the wave spectra of virtual stations with the wave spectra of the field station.Wave spectral data collected for 3 months from the Gagua Ridge buoy were used to carry out statistical analysis, and the OI-P of these wave spectra were then calculated.The computer processing time was influenced by the number of wave directions and wave frequencies inputted into the model.Therefore, with 2-day warm up assimilations set as initial values, the assimilation data taken from the 3rd day onward were used to acquire the optimal choices in terms of RMSE (root mean square error) for the comparison of the significant wave heights between observational and simulated data (Table 2).The most accurate results of the model assimilations were obtained when 32 wave directions and 41 wave frequencies.Due to limitations of transmission technology for real-time data, higher resolutions were not investigated and this resolution was applied for the assimilation of typhoon events later on.

Optimising the number of virtual stations
Generally, at least two observed stations are required to carry out OI; however, only the Gagua Ridge buoy station was used in this study.Therefore, we needed to select additional stations for optimal interpolation purposes.The wave heights, which were corrected with altimeter wave height data, were distributed over a finite region of influence with radius of the order of 1000 km (Bauer et al., 1992).However, in order to enable the fictitious buoy data representative virtual stations, and consider the average of storm radius is around 200 and 300 km, so a radius has been extensively defined for 250 km from the Gagua Ridge buoy station.Figure 2 shows the virtual stations on the grid points.In order to obtain the historical wave data of virtual stations, the covariance between the observations and the hind-casts of the SWAN model described in Sect.4.1 were used to figure out wave data of virtual stations.Three, five, and seven stations were selected to complete the numerical tests, respectively; these three sets of combinations were then used to identify the appropriate test method.The average errors of the SWHs and mean wave periods (MWPs) at the Gagua Ridge buoy for different selected stations in the 2-day model simulations are shown in Table 3.An increased number of virtual stations in the numerical tests resulted in more accurate model results.However the average error of significant wave height within 0.1 m is acceptable during typhoon periods.Therefore, seven virtual stations were established for evaluation.This set-up of virtual stations will cover most of the typhoons approaching the east coast of Taiwan.In principle the concept can be applied to other areas, too, but the special  geometry (e.g. the presence of islands) and the climatology of the storm systems must be taken into account.

Verifications of the results from the assimilation runs against buoy observations
The influence of the assimilation on the wave analyses and wave forecasts was assessed by running the SWAN wave model for two typhoon events in the summer of 2006: Typhoon Kaemi and Typhoon Shanshan.In cases where the CWB wind fields were missing and no simulations were performed, these warm-up periods were removed from the evaluation.The effects of OI-P assimilation in the SWAN model are shown in Figs.3-6.In general, the results of the assimilation runs were much closer to the buoy measurements compared to the reference runs, one-dimensional spectra, significant wave heights, and mean periods.Figure 3 shows, for example, the directional wave spectra obtained at the Hualien Buoy station at 05:00 UTC on 26 July 2006 from buoy observations, an assimilation run, and a reference run (Fig. 3a-c, respectively).Assimilation runs means: modelruns with data assimilation; reference runs means: modelruns without data assimilation.The assimilation results were similar to the results obtained using the buoy observations  for the directional distribution, intensity, and main direction of the spectra in the rose diagram.For the directional spectral distribution, the results of the reference run showed a direction shift of 20-30 • toward the west compared with the observation and assimilation results.Additional high-frequency components appeared in the reference runs, which were removed by the assimilation.
Figure 4 shows the one-dimensional frequency of wave spectra at the Hualien buoy on 24 July at 05:00 UTC for Typhoon Kaemi (Fig. 4a) and on 15 September at 15:00 UTC for Typhoon Shanshan (Fig. 4b).The results also reveal that the same tendency of the wave spectra for both typhoon events exists.The intensity of the wave spectra in the reference runs was lower than that in the assimilation runs and observations, which were similar to one another.Other features, such as the second peak at approximately 0.18 Hz in Fig. 4a and 0.15 Hz in Fig. 4b, were not simulated in the reference run.
The SWH time series (Fig. 5) and MWP time series (Fig. 6) show the improvements of the model results by assimilating data into the models for both typhoon events.The hindcast results of the SWH in Fig. 5 revealed that neither the peak values nor the timing of the peak values were calculated correctly without data assimilation.The oscillations around the peak times were modelled well by the assimilation runs.
The comparison of the MWPs for both typhoon events (Fig. 6) showed that the tendency of the time series for both assimilation runs was similar to the observed tendency.In contrast, the results of the reference runs show a significant difference from the observed tendency.The assimilation run was able to simulate the arrival of the long waves correctly, while the reference run lagged by approximately 24 h for Typhoon Kaemi and by approximately 3 h for Typhoon Shanshan.Thus, the data assimilation performed well in the SWAN wave model simulation of the MWP.
The statistical comparisons for the two typhoon events of the modelled waves with the Hualien buoy observations in terms of bias, RMSE, and scatter index (SI) are summarised in Table 4.Although the tendency of the time series for both assimilation runs was similar to the observed tendency, there were significant differences found between Typhoon Kaemi and Typhoon Shanshan when comparing the       statistical results between the model results and independent observations at the Hualien buoy.
The results from the assimilation run were closer to the observations than the results from the reference run.In other words, the OP-P concept proposed in this paper can enhance the forecast capability even if only one reference buoy within the forecast domain is used for assimilation.Therefore, data assimilation performed well in the SWAN wave model simulation for the SWH and MWP.

Conclusions and outlooks
A spectral wave data assimilation scheme is presented in this paper and is based on the wave spectrum being separated into wave systems and the subsequent OI of wave partitions.The assimilation experiments in the eastern Taiwan region resulted in a large improvement in the sea state analysis specifically for typhoon waves.
To obtain the optimal number of parameters, the numerical results showed that the use of 32 directions and 41 frequencies was optimal for data assimilation.
In order to carry out OI, the wave data of virtual stations were established successfully via a statistical technique.The numerical results indicate that the number of virtual stations should be greater than five for the errors to be stable.
The impact of data assimilation on wave forecasts depends on the layout of the observations system, e.g. 7 buoy stations were sufficient for Typhoon Kaemi.
The assimilation results for both typhoon events were close to the buoy observations for the directional distribution, intensity, and mean spectral direction.The results reveal the same tendency for the wave frequency spectra.The underprediction of the reference run was clearly corrected by the assimilation.Comparisons of the SWH and MWP time series indicated that the performance of the model output was improved by incorporating data assimilation for both typhoon events: Typhoon Kaemi and Typhoon Shanshan.

Fig. 1 .
Fig. 1.Computational regions of the model and locations of data buoy stations.

Fig. 1 .
Fig. 1.Computational regions of the model and locations of data buoy stations.

Fig. 2 .
Fig.2.Locations of virtual stations on the grid points.

Fig. 2 .
Fig. 2. Locations of virtual stations on the grid points.

Fig. 4 :
Fig. 4: Spectra at the Hualien buoy on 24 July 2006 at 0500 UTC and on 15 September 2006

Table 2 .
SWH RMSE statistics of the various numerical experiments performed compared with data collected from the Gagua Ridge buoy.

Table 3 .
Average errors of significant wave heights and mean wave periods for different virtual stations.

Table 4 .
Statistical results of the comparison between the model results and independent observations at the Hualien buoy station: bias, root mean square error (RMSE), and SI.