Reply on RC3

In the introductory preamble, present altimeter capabilities are a bit underestimated. On line 33: “Although along-track observations can have a sampling frequency of ~7 km, various sources of noise limit the feature resolution to 100 km (Xu and Fu, 2013).” This is pessimistic for modern altimeters. The small footprint of AltiKa, and the enhanced along-track resolution of the SAR-Mode Delayed Doppler altimeter on CryoSat, Sentinel-3A/B and Sentinel-6 has brought the resolved wavenumber spectrum down to ~50 km, or possibly less with advanced re-tracking (e.g., ALES – Passaro and Birol papers). Admittedly, this is along-track, and is not realized in 2-D gridded products.

observations compared to the surge in particular is problematic. We have updated the text in Section 1 to highlight the deficiency of existing DA schemes in this respect.
Are the open boundary conditions for Nature Run and OSSEs the same? It's not explicitly stated. One disturbing result that is never really explained is why there should be a slow drift in SSH bias. Is the free run model steadily changing net volume, that assimilation serves to restore by reimposing the MDT along with the observations?
Yes, the lateral boundaries for the Nature Run and the OSSEs come from the same sources -our 1/12 degree North Atlantic system and the CMEMS Baltic Sea forecasting system. We have updated Section 2 to make this clear. Since the only differences between the Free and Nature Runs are the initial conditions and the surface forcing, we believe that there is a bias between the surface forcing datasets which is driving the slow drift in the SSH bias and that this is likely a difference in the evaporation-precipitation. It is known that the ECMWF IFS has a global average wet bias (https://doi.org/10.1175/JHM-D-20-0308.1) and the Met Office UM has been shown to have a wet bias in some regions (https://link.springer.com/article/10.1007/s12040-018-1023-3).
Although these biases cause a drift in the SSH, we believe that this adds a measure of realism into these idealised experiments. Our operational shelf-seas systems (a 7km and 1.5km system) use both of these sources of surface forcing and so such biases in the E-P will also be present in these systems.
I would be interested to see a map of the regions that are predominantly in the category of top-to-bottom temperature difference less than 2 o C where the balance adjustments to temperature and salinity are not applied. This would add context to Figs. 7 and 11. Unfortunately, we are not offered a map view of the skill for temperature and salinity to complement Figs. 7 and 11, which is an oversight the authors might care to address in revision. I leave it to them to decide how to usefully present this 3-D skill assessment in a 2-D map.
A figure has been added to show the extent of this restriction on applying the SSH balancing changes to temperature and salinity for a summer and winter example day (new Figure 4). This figure is also referred to in later sections addressing your points (#7 & #8) below.
We have also added figures showing maps of the vertically averaged bias and RMSE for temperature and salinity which aid the interpretation of the profile statistics shown in later figures. This is addressed further in response to comments #7 & #8.
I have some reservations about how appropriate the balance operator approach is for shelf seas, but that's a can of worms we can't open here. However, I would not oppose some rampant speculation about how altimeter sea level data might be better exploited in shelf sea DA systems.
We fully agree that the existing balance relations within our data assimilation scheme are not optimised for the shelf-seas. We are currently developing global and shelf-seas ensemble systems with which we will be able to better represent both errors-of-the-day and the region-specific balances and length-scales. In the future, we plan to investigate the use of this information to adjust the balances and/or directly within a hybrid DA system combining ensemble information with climatological error covariances. We have expanded our discussion on the deficiencies of our existing balance relationship in Sections 5 & 6 to include the above information.
The term RMSE is not defined when it is first used, and it is not spelled out whether this is full Root Mean Squared Error of observation minus model, or what is frequently called Centered RMS Error in geophysics, being the RMS of the difference between observation anomaly and model anomaly from their respective means. CRMS and bias are independent errors. I suspect here we have CRMS, otherwise we would need to tease out the effect of bias in the RMSE statistics. But, conventionally, RMSE includes bias, so please clarify.
Here we have used Root Mean Squared Error. Throughout, the "error" part of the RMSE is the gridpoint-by-gridpoint difference between two model runs. Since we are running OSSEs, we know the "true" state everywhere (the Nature Run, NR) from which our simulated observations are drawn and so the error between each OSSE and the NR is not skewed by the observation sampling as would happen when comparing real-world observations (an incomplete and uncertain sample of the true state) to an operational system.
We have updated the text (on first use of RMSE in the abstract and at the start of Section 5) to clarify this.
I would welcome some speculation as to why temperature and salinity on the shelf is improved, but velocity and sea level are not. Here, some spatially explicit view of where the balance operator is being applied, and where it is not, might be instructive.
In stratified regions SLA observations contain information on vertical T/S structure which is effectively assimilated in the deeper ocean. In the stratified regions of the shelf, we can make similar positive adjustments. However, our assimilation scheme applies a ramping of the velocity balance near coasts which may be limiting the retention of these observations on-shelf and so their impact on the SSH and velocity statistics.
Additionally, our approach of using 25-hour mean fields in the simulation of the SLA observations and for the background in our innovations necessarily removes highfrequency signals which dominate on-shelf. When assimilation of the standard observations is introduced, the Control Run shows improved bias and RMSE relative to the Free Run with a very low RMSE of ~1cm on-shelf. There is little further improvement when assimilating additional observations. We intend to further explore adaptions to our assimilation scheme, including whether it would be beneficial to retain the higherfrequency signals in our innovations.
We have updated Section 6 to discuss these areas of ongoing investigation. Fig. 10 would differentiate further if they were conditionally averaged by whether the balance operator was applied, or not. I encourage this addition to the paper. Perhaps the authors already made this calculation and found it of no consequence. If so, a remark to that effect would be useful to readers.

Indeed, I wonder if the results in
Thank you for the useful suggestion. We have investigated this further and found that in regions/periods where no balanced changes to temperature and salinity are applied when assimilating SSH, there is no appreciable difference between the various assimilating experiments. That is, while the Control Run is superior to the Free Run in terms of the temperature and salinity bias and RMSE, no further improvement is gained in these regions/times from assimilating additional SLA observations.
As mentioned earlier, we have added maps of the vertically-averaged bias and RMSE for temperature and salinity (new Figures 11 & 13) to demonstrate the spatial differences between the experiments and aid the interpretation of the spatially averaged profile statistics presented in Figures 12 & 14. In particular, these maps show that for the wellmixed regions in the Southern North Sea, the temperature profile is well-constrained by SST assimilation. We have also updated Section 5.2 and the on-shelf T/S profile figure to focus on the Northern North Sea during June-October during which time the full balanced changes are applied. Speed (Fig. 11) is only one measure of current errors. What about direction? The speed error could be zero but with respective currents pointing in opposite directions.
While we agree that a more complete comparison would also include the current directions, we chose to compare the current speeds to allow a straight-forward interpretation of the effect of the assimilation changes. Maps showing the error in the current speed readily highlight where a particular experiment may be degraded or improved, while a similar map of the error in the current direction can be dominated by regions with very small currents.

Minor comments:
2 and 3 captions. Please say whether the bias is Free minus Nature, or Nature minus Free.
These are both Free minus Nature. The figure captions have been updated to state this.
The resolution of many figures is poor. It looks to me like these are produced with matplotlib, in which the case the fix is simply to specify dpi resolution in savefig.

Thank you. Figure resolution has been increased.
Powered by TCPDF (www.tcpdf.org)