Articles | Volume 24, issue 7
https://doi.org/10.5194/nhess-24-2481-2024
https://doi.org/10.5194/nhess-24-2481-2024
Research article
 | 
19 Jul 2024
Research article |  | 19 Jul 2024

Revisiting regression methods for estimating long-term trends in sea surface temperature

Ming-Huei Chang, Yen-Chen Huang, Yu-Hsin Cheng, Chuen-Teyr Terng, Jinyi Chen, and Jyh Cherng Jan
Abstract

Global warming has enduring consequences in the ocean, leading to increased sea surface temperatures (SSTs) and subsequent environmental impacts, including coral bleaching and intensified tropical storms. It is imperative to monitor these trends to enable informed decision-making and adaptation. In this study, we comprehensively examine the methods for extracting long-term temperature trends, including STL, seasonal-trend decomposition procedure based on LOESS (locally estimated scatterplot smoothing), and the linear regression family, which comprises the ordinary least-squares regression (OLSR), orthogonal regression (OR), and geometric-mean regression (GMR). The applicability and limitations of these methods are assessed based on experimental and simulated data. STL may stand out as the most accurate method for extracting long-term trends. However, it is associated with notably sizable computational time. In contrast, linear regression methods are far more efficient. Among these methods, GMR is not suitable due to its inherent assumption of a random temporal component. OLSR and OR are preferable for general tasks but require correction to accurately account for seasonal signal-induced bias resulting from the phase–distance imbalance. We observe that this bias can be effectively addressed by trimming the SST data to ensure that the time series becomes an even function before applying linear regression, which is named “evenization”. We compare our methods with two commonly used methods in the climate community. Our proposed method is unbiased and better than the conventional SST anomaly method. While our method may have a larger degree of uncertainty than combined linear and sinusoidal fitting, this uncertainty remains within an acceptable range. Furthermore, linear and sinusoidal fitting can be unstable when applied to natural data containing significant noise.

1 Introduction

Global warming refers to the long-term increase in the Earth's average surface temperature due to the accumulation of greenhouse gases in the atmosphere, including carbon dioxide, methane, and nitrous oxide. Global warming has long-term consequences in the ocean, specifically rising sea levels and increased sea surface temperatures (SSTs). The former is mainly driven by the expansion of seawater due to higher temperatures and the melting of land ice. Elevated sea levels pose risks to coastal communities, infrastructure, and ecosystems (Hinkel et al., 2010; Le Cozannet et al., 2014; Ranasinghe, 2016) due to the increased threat of coastal flooding, erosion, and saltwater intrusion. Moreover, global warming has led to increased sea surface temperatures. Rising sea temperatures have the potential to cause changes in ocean circulation patterns. Research has shown that the Kuroshio and the Gulf Stream, two important subtropical western boundary currents in the North Pacific and North Atlantic, could become stronger (Sakamoto et al., 2005; Cheon et al., 2012; Chen et al., 2019; Wang and Wu, 2019) and weaker (Levermann et al., 2005; Chen et al., 2019), respectively. This can ultimately impact the Atlantic Meridional Overturning Circulation (AMOC), as the Gulf Stream is a key system component. The impact of SST warming on tropical cyclones has been a top concern in recent decades (Emanuel, 2005). As global warming continues, we see fewer cyclones overall, but those that do occur are more powerful, longer-lasting, larger, and more destructive (Emanuel, 2005; Maue, 2011; Lin et al., 2014; Sun et al., 2017). This increase in destructive potential is due to the combination of longer storm lifetimes and greater storm intensities resulting from warmer sea surface temperatures. However, the situation may be more nuanced, as other atmospheric conditions such as increased wind shear could counteract or even reverse this trend of heightened destruction (Lin and Chan, 2015). Coral reefs are facing an increasing threat due to rising ocean temperatures (Pandolfi et al., 2011). This has resulted in the unprecedented mass bleaching of corals, which has been triggered by rising sea surface temperatures (Frieler et al., 2013; Hughes et al., 2017; Hoegh-Guldberg et al., 2017; Sully et al., 2019). Although some mitigation has been observed through small-scale local upwellings or mixing of cold water (Tkachenko and Soong, 2017; Safaie et al., 2018; Davis et al., 2021), the overall situation remains concerning.

Therefore, studying and monitoring these long-term change trends is essential to understand and address the challenges associated with global warming. It is crucial for decision-making and adaptation strategies (Mimura, 2013). Trends in sea surface temperature changes are obtained by analyzing long-term data collected from various sources, including satellite remote sensing, buoys, ships, and coastal monitoring stations. This approach involves methods such as linear regression to determine the temperature change slope over a specific period. Such data can generally be decomposed (Cleveland et al., 1990) as

(1) ST = ST LT + ST SV + ST R ,

where ST is the measured SST and STLT, STSV, and STR are the long-term trend, seasonal variations, and remainder component, respectively. STSV is often the most predominant component in the collected ST data. STR primarily encompasses the signals of tides and minor signals of sub-seasonal variations (e.g., mesoscale and sub-mesoscale processes and inertial oscillations), day–night variation (e.g., Chang et al., 2023), and unresolved noise (measuring errors and turbulence). Commonly used methods to extract STLT include linear regression methods (Emery and Thomson, 2001; Boretti, 2020; Sreeraj et al., 2022) and the seasonal-trend decomposition procedure based on LOESS (STL; Cleveland et al., 1990; Tseng et al., 2010; Nidheesh et al., 2013), where LOESS refers to locally estimated scatterplot smoothing (Cleveland and Devlin, 1988). Linear regression is a widely used and efficient statistical technique to model the relationship between a dependent variable (in this case, sea surface temperature) and an independent variable (typically time). STL is a robust and versatile method for decomposing time series data into long-term trends, seasonal variations, and residuals. STL algorithms use LOESS to smooth the data and extract the long-term trend component, which has been recognized as a better method to extract the trend (Tseng et al., 2010; Nidheesh et al., 2013). However, long computational time is a primary concern when employing STL. Moreover, while the STL method typically captures a nonlinear trend, many practical applications necessitate a linear trend to depict the overall scenario better. Linear regression methods are a common choice for extracting the long-term trend in SST increase. These methods are often applied without thorough assessment due to their universal nature, which is well documented in statistics textbooks. However, SST data recorded over long time periods possess unique characteristics that could introduce bias, requiring careful attention to the details of the analysis. Commencing with the fundamentals of linear regression and utilizing realistic and simulated data, this study offers a systematic evaluation and comparison of linear regression methods and STL.

2 Methodology

Linear regression analysis begins by considering the dependent variable y and the independent variable x. The variables y and x can be decomposed as y=Y+ε and x=X+δ, respectively. Here, (X, Y) and (δ, ε) represent the deterministic and random components, respectively, and the mean values of ε and δ are 0 (με=μδ=0). Linear regression seeks to determine a linear relationship between their deterministic parts via the model Y=b0+b1X by determining the value of b0 and b1. The presence of the random component is undesirable and significantly affects the extraction of the linear relationship within the deterministic parts. Here, the deterministic component of y is STLT, and the random component is intended to mimic STSV+STR. Note that STSV is indeed non-random, which could bias the estimate of b1 and b0. Based on the likelihood estimate (Wong, 1989; Emery and Thomson, 2001; Leng et al., 2007), the slope of the model b1 can be estimated as

(2) b 1 ^ = S y y - λ S x x + ( S y y - λ S x x ) 2 + 4 λ S x y 2 2 S x y ,

where Sxx, Syy, and Sxy are the sample variance of x, the sample variance of y, and the sample covariance between x and y. The estimator described in Eq. (2) is also called the Deming regression (Deming, 1943). However, Eq. (2) is complicated by the value λ=σεσδ, where σε and σδ are the variances of the random variables ε and δ, and these variances are generally unknown.

https://nhess.copernicus.org/articles/24/2481/2024/nhess-24-2481-2024-f01

Figure 1Topography and coastline surrounding Taiwan. The blue dots denote three coastal buoys at the Chenggong, Linshan Cape, and Magong stations, maintained by Taiwan's Central Weather Administration (CWA).

In practical applications, certain assumptions or approaches regarding λ are employed to simplify Eq. (2). The most widely adopted approach is to set λ=∞ (σδ=0), in which x has only the deterministic part. This results in the ordinary least-squares regression (OLSR), where b1^=SxySxx. If λ=0 (σε=0) is taken, we obtain another OLSR that treats x as the dependent variable and y as the independent variable, where b1^=SyySxy. For convenience, the former and the latter will be referred to as OLSR1 and OLSR2, respectively. The regression lines obtained from OLSR1 and OLSR2 typically differ, motivating the creation of a neutral slope. This is achieved by calculating the geometric mean of b1^ derived from OLSR1 and OLSR2, yielding b1^=sign(Sxy)SyySxx, which is referred to as the geometric-mean regression (GMR). Alternatively, the GMR can be determined by assigning λ to λ^=SyySxx in Eq. (2). Finally, by assuming λ=1, the orthogonal regression (OR) can be obtained as b1^=Syy-Sxx+(Syy-Sxx)2+4Sxy22Sxy. OLSR, OR, and GMR are commonly used to find a linear relationship between two datasets. In this application, the SST variations are designated y, and the time is assigned to x. Based on the overview provided, OLSR1 is likely to be the most suitable method within the linear regression family for capturing the long-term trend, especially considering that time lacks a random component. However, there has been a lack of careful examination investigating the applicability of the regression family.

STL is a robust iterative nonparametric regression employing the LOESS smoother, facilitating the decomposition of a given time series into its long-term, seasonal, and remainder components, as in Eq. (1). Like other nonparametric regression approaches, STL requires the subjective selection of a smoothing parameter to delineate the lowest-frequency component. STL comprises a sequence of intricate operations, which are divided into an inner loop and an outer loop, resulting in considerably longer computation times compared to the linear regression methods. Furthermore, STL can also process the nonlinear relationship between x and y. While the specifics regarding the implementation of STL are not addressed in this context, detailed information is reported in Cleveland et al. (1990). We thus assess the applicability of OLSR, OR, GMR, and STL for accurate temperature modeling.

https://nhess.copernicus.org/articles/24/2481/2024/nhess-24-2481-2024-f02

Figure 2(a) Time series (2010–2023) of sea surface temperature from Chenggong coastal buoy stations and the long-term trend estimated using the OLSR1, OLSR2, GMR, OR, and STL methods. (b) The seasonal and remainder components of the STL result. The vertical magenta lines and triangles denote the mean value of the time axis.

Download

3 Regression analysis of realistic data

Three sets of SST data (Chang, 2023) collected from three coastal buoys located at the Chenggong, Linshan Cape, and Magong stations (Fig. 1), all maintained by Taiwan's Central Weather Administration (CWA), were employed to assess the effectiveness of linear regressions and STL. The Chenggong, Linshan Cape, and Magong stations are located on the eastern coast of Taiwan; the northern coast of Taiwan; and the coast of Magong, Penghu county, respectively. SST hourly time series data from the three stations (Figs. 2a and 3a, b) exhibit temperature fluctuations of 8 to 15 °C related to seasonal variations over the 13-year measurement period. The variations encompass a consistent trend, overlaid with a seasonal cycle of approximately 1 year and short-term fluctuations driven by tides and other oceanic processes. We summarize three distinct features of the SST time series: (a) the time (x) lacks a random component, (b) the time (x) covers a significantly broader range than the SST (y), and (c) the SST exhibits vigorous seasonal variations with amplitudes exceeding the magnitude of the long-term SST increase. All the above features significantly affect the applicability of the regression family.

https://nhess.copernicus.org/articles/24/2481/2024/nhess-24-2481-2024-f03

Figure 3Time series (2010–2023) of sea surface temperature from (a) Linshan Cape Station and (b) Magong Station and the long-term trends estimated using the OLSR1, OLSR2, GMR, OR, and STL methods. The vertical magenta lines and triangles denote the mean value of the time axis.

Download

Table 1Summary of the b1^ (unit – °C yr−1) estimated using a general linear regression, STL, evenized SST, the SST anomaly, and a combination of linear and sinusoidal fitting. The slope derived from linear fitting of the STL nonlinear curve (blue lines in Figs. 2a and 3a, b) represents the b1^ value of STL. As for the evenized SST, SST anomaly, and combined linear and sinusoidal fitting methods, the representative b1^ is determined as the mean value during its stable period, marked by the dashed black lines in Fig. 7 (6 months of trimmed time).

Download Print Version | Download XLSX

As depicted in Figs. 2a and 3a and b, the long-term trends derived from the five methods are diverse, underscoring the necessity of a thorough investigation into their applicability for accurate temperature modeling. Using the SST time series data from the Chenggong station (Fig. 2a) as an illustration, the long-term trends estimated by the OLSR1 (thick dashed black line in Fig. 2a) and OR (red line) methods are highly similar, revealing an SST increase of 0.198 °C yr−1. In contrast, the GMR method (green line) yields a different long-term trend than the previous two methods, indicating a rapid SST increase of 0.586 °C yr−1. Moreover, the OLSR2 method (thin dashed black line in Fig. 2a) yields an SST increase of 1.73 °C yr−1. The outcome obtained with the GMR and OLSR2 methods is clearly unsatisfactory due to the excessively high estimated rate. The failure of OLSR2 is inevitable due to its reliance on the assumption that time (x) has a random component and SST (y) lacks such randomness. This assumption directly contradicts how to properly model the dataset. As a result, the GMR method is also inappropriate because its slope involves taking the geometric mean of b1^ derived from OLSR1 and OLSR2, causing its regression line (green line in Figs. 2a and 3) to fall between those of the two methods. Similar conditions hold true for the SST time series at the other two stations (Fig. 3). Thus, the OLSR2 and GMR methods will be excluded from our subsequent analyses. These results are summarized in Table 1.

https://nhess.copernicus.org/articles/24/2481/2024/nhess-24-2481-2024-f04

Figure 4(a) Schematic diagram showing the distances between the data points and the regression lines used for OLSR1, OLSR2, and OR. Regression lines 1 and 2 represent steep and gentle slopes, respectively. (b) Regression lines derived from OLSR1 and OR with monthly and yearly time units using SST data collected at Chenggong station.

Download

We next consider why OLSR1 and OR yield the same result despite having different estimators. Toward this explanation, we consider the geometric distance between each data point and the regression line. The slope of OLSR1 is determined by minimizing the sum of the squares of the vertical distances (DOLSR1 in Fig. 4a) between the observed points and the assumed regression line. However, OLSR2 employs a similar approach but minimizes the sum of the squares of the horizontal distances (DOLSR2 in Fig. 4a). The method of OR minimizes the sum of squares of the orthogonal distances (DOR in Fig. 4a) from the data points to the assumed regression line. Figure 4a also contrasts two regression lines with larger and smaller slopes. When the regression line is nearly flat (regression line 2), DOR is very close to DOLSR1. The orthogonal and vertical distances are very similar when the regression line is nearly horizontal (close to flat); therefore, the OR will closely approximate the OLSR1 regression. This set of conditions is generally true for long-term measured data. In estimating the regression lines depicted in Fig. 2a, we use day as the time unit. Accordingly, the time span is 4745 d, while the SST spans 10 °C, indicating that the slope would be nearly flat and that the lines of OR and OLSR1 are almost overlaid. The subsequent question is whether OR can consistently match OLSR1. It is well established that OR is not scale invariant. Changing the units of the variables will result in different regression lines. We test this effect by changing the unit of time to months and years. These changes do not affect the estimate when using OLSR1. Thus, in Fig. 4b, only the result of OLSR1 using days as the unit is plotted as a representative line (thick dashed black line). In contrast, the regression line from OR, utilizing months as the time unit, results in an estimated slope that is 0.1 % larger (blue line in Fig. 4b), which may still be deemed acceptable. If years are used as the unit, implying that 13 (years) is comparable to the SST variations of 10 (°C), the estimated SST increase rate is 0.278 °C yr−1 (magenta line in Fig. 4b). This is approximately 40 % larger than that estimated using OLSR1, a difference that should be deemed unacceptable.

Next, we proceed to the investigation of whether OLSR1 and OR genuinely capture the proper regression line. The long-term trend in Chenggong SST extracted by STL, which may be the most suitable method (Tseng et al., 2010; Nidheesh et al., 2013), is illustrated by the blue curve in Fig. 2a. This curve exhibits a nonlinear trend characterized by a gradual increase from 2010 to mid-2019 followed by a decline until 2023. The 2–4 °C seasonal variation amplitudes are also captured well by STL, as the black curve in Fig. 2b indicates. The remainder of the curve encompasses the high-frequency tidal fluctuations and interannual variability (gray curve in Fig. 2b). The STL long-term curve (blue curve in Fig. 3a) obtained using the SST data at Linshan Cape also exhibits a nonlinear trend. In the above two cases, although OLSR1 and OR cannot capture the nonlinearity of the long-term variations, they align neutrally with the STL curves. At Magong station, the nonlinearity of the curve (blue curve in Fig. 3b) becomes weaker, allowing for better alignment between the OLSR1/OR lines and the STL curve. Finally, it remains imperative to demonstrate that OLSR1 and STL effectively capture the “true” regression line, necessitating using simulated data for validation.

https://nhess.copernicus.org/articles/24/2481/2024/nhess-24-2481-2024-f05

Figure 5(a) Time series of simulated SST using Eqs. (1) and (3) and the regressions using OLSR1 and STL. (b) Seasonal and remainder components derived from STL. (c) Seasonal variations using the cosine function shown in Eq. (4). The numbers in (b) and (c) denote the corresponding seasonal peaks on both sides of the midpoint (magenta lines).

Download

4 Examination using simulated data

Following Eq. (1), we first generate the test SST data ST=STLT+STSV+STR spanning 14 years (black curve in Fig. 5). STLT is the long-term trend given by STLT+ 25 +qt, where t is the time in hours, and q is the rate of SST increase. Here, q= 1.8754 × 10−5°C h−1 is given, equivalent to 0.1643 °C yr−1. This equation signifies a linear temperature increase of 2.3 °C over the 14 years. The seasonal variation is represented as a sine function,

(3) ST SV = A SV sin ( 2 π t / T SV ) ,

where ASV and TSV are 2.5 °C and 365 d, respectively. The remainder component (STR) incorporates diurnal (24 h period) and semidiurnal (12.42 h period) tidal variations, each of which has an amplitude of 0.5 and 0.2 °C. The regression line of the test dataset (blue line in Fig. 5a) derived from STL precisely overlays the long-term trend in STLT that is not plotted because it would be covered. In addition, the seasonal variations and the remaining components are effectively decomposed, as shown in Fig. 5b. OLSR1 underestimates the slope at 0.14 °C yr−1, which is 15 % lower than the actual value (dashed black line in Fig. 5a). The obtained result is unexpected, considering the good correspondence observed between STL and OLSR1 in the realistic data (Figs. 2a and 3a, b). We further clarify the mechanism behind this underestimation by replacing the STSV with a cosine function,

(4) ST SV = A SV cos ( 2 π t / T SV ) ,

in the test data. As a result, OLSR1 accurately predicts the slope, indicating that the phase of the seasonal cycle holds significance in this context.

https://nhess.copernicus.org/articles/24/2481/2024/nhess-24-2481-2024-f06

Figure 6Relationship (a) between the trimmed time and b1^ and (b) between the trimmed time and b0^ for simulated SST (Eq. 1) with STSV represented as a sine function (Eq. 3; gray curve) and a cosine function (Eq. 4; cyan curve). Panels (c) and (d) are the same as (a) and (b), but the tidal signals, STR, have been removed. The black and blue curves depict the outcomes after correcting the seasonal bias through steps 1–7.

Download

The linear regression line must pass through the point of sample mean of t and ST (t, ST). Therefore, we can model the regression line as a lever with a fulcrum at (t, ST), adjusting its angle (or slope) to ensure that the positive residuals entirely cancel out the negative residuals. Similar to the concept of torque, the greater the distance of a data point deviation from t, the more substantial its impact on adjusting the line's angle. In our original test data (Eq. 3), the seasonal variation is modeled as a sine function, an odd function with respect to t=7 (Fig. 5b). The high-leverage points arising from seasonally induced deviations (black line in Fig. 5a) from the regression line play a role in determining the slope of the line. This effect can be further illustrated by examining the seasonal variation depicted in Fig. 5b, specifically focusing on the seasonal peaks. We consider the 14 peaks at STSV< 0, which are labeled as 1–7 in blue and red on the right-hand and left-hand sides of the fulcrum (t=7; denoted by magenta lines), respectively. The seven peaks on the right-hand side favor clockwise rotation of the line. Consequently, they tend to decrease the slope. In contrast, the seven peaks on the left-hand side favor counterclockwise rotation of the line, contributing to an increase in the slope. However, the two opposing tendencies on the right- and left-hand sides are unbalanced. Peak no.1 on the right-hand side is 34TSV away from the fulcrum, and the corresponding peak no.1 on the left-hand side is 14TSV away from the fulcrum. Overall, the seasonal peaks on the right-hand side consistently have 12TSV longer distance than their counterparts on the left-hand side. Based on the concept of torque, this indicates that the overall effect of the seasonal peaks at STSV< 0 is to lower the slope. The peaks at STSV > 0 also act to lower the slope if the same treatment is applied. Therefore, we can conclude that the net effect of the 14 seasonal peaks is to lower the slope, so that the long-term trend is underestimated. We call this “phase–distance imbalance”.

The bias of phase–distance imbalance does not manifest in the case of an even function, such as a cosine function (Eq. 4 and Fig. 5c). The resulting data show that the new STSV appears to be a mirror pattern with respect to t=7, which is a typical feature of an even function. Clearly, the peak pairs distributed on both sides of the fulcrum at t=7 have identical distances to the fulcrum. Our previous analysis pointed out that the seasonal variations do not introduce bias in estimating the long-term trend. It is suggestive that addressing the bias induced by phase–distance imbalance may involve trimming the data to ensure that STSV (t-t) within the dataset becomes an even function.

This task involves trimming the test data (black curve in Fig. 5a) for the last N days (N=0,1,2365) and subsequently conducting OLSR1 to obtain b1^ (gray curve in Fig. 6a) and b0^ (black curve in Fig. 6b). When the trimmed time is 0 d, b1^= 0.14 °C yr−1 and b0^= 25 °C, which represent the initial regression line shown as the dashed black line in Fig. 5a. As the number of trimmed days increases, b1^ (black curve in Fig. 5a) gradually rises toward the correct value of 0.1643 °C yr−1 (as marked by the dashed gray line) at trimmed time = 182.5 d (12TSV) before subsequently decreasing again. It is worth noting that STSV (t-t) becomes an even function when the data in the last 12TSV are trimmed off (Fig. 5b), supporting our proposed concept.

The value of b0^ still deviates from the correct value of 25 °C at trimmed time = 182.5 d (12TSV; gray curve in Fig. 6b). This deviation is attributed to the change in ST that is induced by trimming the data, which can be corrected by vertically displacing the regression line to intercept the original ST. Another example illustrates the case of an even (cosine) function in the initial stage (Eq. 4). The trimming of the data causes the initially correct b1^ and b0^ to deviate, except for trimmed times of 12TSV and TSV. This example also demonstrates that both b1^ and b0^ can be underestimated or overestimated. We revisited the real data collected from the three CWA stations, applying the insights obtained from examining the simulated data. The seasonal cycle regularly exhibits higher SST in August and a lower value in February (Figs. 2 and 3). To make SST an even function, it is desirable that the midpoint (t) is located at the maximum or minimum SST. The regression lines derived from our three SST datasets using OLSR1 align reasonably well with the STL curves because the datasets tend to approach an even function, although not precisely. Highlighting the midpoint line (magenta) in Figs. 2 and 3 can illustrate this. The midpoint falls in June 2016, just 2 months away from the peak SST in August 2016. Improving the slope estimation accuracy might involve data trimming to position the midpoint of the dataset precisely in August 2016. Indeed, Fig. 6a and b emphasize a significant uncertainty in estimating b1^ and b0^, depending upon the seasonal phases and magnitudes. Consequently, having a longer dataset does not necessarily guarantee a more accurate outcome.

5 Implementation in correcting the seasonal bias

5.1 Methodology

To correct the seasonal bias, we consider a pre-processing procedure that trims the data to ensure an even function before running the regression analysis. For convenience, we name this procedure evenization, following an analogous concept introduced in signal processing (Kahn, 1957; Kondo and Kou, 2001). This could involve removing the initial portion of the data until the remaining set reflects an even pattern around its midpoint. This adjustment helps reduce or eliminate the bias introduced by the seasonal patterns. Numerous methods are expected to achieve this correction. Here, we propose a procedure (Chang, 2023) as follows.

Step 1: detrending. The long-term trend is derived by applying the OLSR1 to the raw SST data. A detrended time series (DTR_DATA) is obtained by subtracting the long-term trend from the raw SST data.

Step 2: data folding. DTR_DATA is split evenly at its midpoint, forming two segments of equal length: the first half segment, D1, and the “flipped” second half segment, D2.

Step 3: Hilbert transform. We convert D1 into the analytic signal AD1, a complex-valued function comprising the real part D1 and the imaginary part D1^. D1^ is a π/2 shifted function of D1 that can be obtained via the Hilbert transform (Marple, 1999) of D1. AD2, the analytic signal for D2, can be derived similarly.

Step 4: complex correlation. Computing the correlation coefficient between AD1 and AD2 yields a complex correlation, with magnitude C representing the maximum correlation and phase angle θ where the maximum correlation occurs. θ typically falls between π and π. θ represents the phase by which D1 leads D2.

Step 5: trimming the data. If θ0, the raw SST data in the period of the initial TSV-θ2πTSV are trimmed off. If θ<0, the raw SST data in the period of the initial |θ|2πTSV are trimmed off. These measures ensure that the remaining data approximate an even function by trimming to the minimum dataset length that is shorter than π/2 (3 months).

Step 6: obtaining the slope b1^. The long-term slope is estimated by applying OLSR1 to the trimmed SST dataset.

Step 7: obtaining the intercept b0^. Data trimming induces change in (t, ST). Therefore, the intercept is obtained by ensuring that the new regression line passes through the original (t, ST). Returning to step 1 for further detrending using the derived long-term trend is an optional step that could be carried out iteratively. However, this additional iteration did not significantly impact the results in our case. Nevertheless, this option remains open for datasets that may benefit from further iterations.

The application of steps 1–7 of our procedure to the simulated datasets, comprising a sine seasonal cycle and a cosine seasonal cycle, is shown as the black and blue lines in Fig. 6a and b, respectively. Our corrected b1^ aligns closely with the known b1 value shown as the dashed red line (Fig. 6a). The seasonal bias-related uncertainty, initially at O(10−2) °C yr−1, has been notably reduced to O(10−3) °C yr−1. The wiggles and discontinuity observed around the trimmed time of 184 d, contributing to uncertainty at O(10−3) °C yr−1, stem from tidal influences. This is evidenced by re-examining the dataset using Eq. (1) after removing the tidal component, STR, showing the absence of the wiggles and discontinuity (Fig. 6c). These tidal signals might slightly impact the determination of θ in step 4 of the process. The corrected estimate of b0^ converges closely to the known b0 value of 25 °C (depicted as the dashed red line), exhibiting a deviation smaller than 0.07 °C (Fig. 6b). Similarly, the tidal effect could produce small wiggles but can be corrected by excluding the tidal component in the analysis (Fig. 6d).

https://nhess.copernicus.org/articles/24/2481/2024/nhess-24-2481-2024-f07

Figure 7b1^ as a function of trimmed time using the SST data collected at the (a) Magong, (b) Linshan Cape, and (c) Chenggong stations. The OSLR1 method and our proposed method (steps 1–7) are represented by the gray curves and red dots, respectively. The black lines depict the averaged b1^ values based on corrected data within a trimmed time shorter than Tsv/2, indicated by the vertical dashed line. The cyan and blue curves are the slopes estimated using SST anomalies and combined linear and sinusoidal fitting methods, respectively. The dashed magenta lines are the slopes derived from linear fitting to the STL nonlinear curve.

Download

Our approach is subsequently employed on the SST datasets collected by the CWA (as shown in Fig. 7). While the true b1^ remains unknown within real data, we can assess the coherence through data trimming from the end. It is generally understood that the long-term trend of a 10-year or longer time series does not significantly change when its length decreases by 6 months. At the Magong station, the estimated range of b1^ spans from 0.07 to 0.13 °C yr−1 when employing the general OLSR (gray curve in Fig. 7a). In contrast, our approach significantly reduces uncertainty to a range of 0.08 to 0.1 °C yr−1 (red dots in Fig. 7a) within the initial 6-month trimmed time period. The consistency of b1^ can reach 10 months of trimmed time at the Linshan Cape (Fig. 7b) and Chenggong (Fig. 7c) stations. These findings provide support for the effectiveness of our proposed method. To maintain the nature of the long-term trend behind the data, the trimmed data length ought not exceed a seasonal cycle. The representative b1^ could be the mean value during its stable period. The recommended length of the stable period would be half of a seasonal cycle, i.e., 6 months. As a result, the representative b1^ values at the Magong, Linshan Cape, and Chenggong stations are 0.09 °C yr−1, 0.124 °C yr−1, and 0.193 °C yr−1 (Table 1), respectively, as shown by the black lines in Fig. 7.

5.2 Comparison with conventional methods

We have demonstrated our proposed evenization method as a feasible approach to estimate the long-term trend in SST. It is desirable to compare our method with the commonly used methods in the climate community. The first method involves computing the daily climatological value of SST using the available SST data. The long-term trend can be estimated by applying OLSR1 to the SST anomalies derived by subtracting the climatological SST from the original SST data. This is expected to lower the bias resulting from seasonality. The second method models the SST data as a combination of linear and sinusoidal functions (e.g., Park et al., 2022):

(5) SST ( t ) = b 0 + b 1 t + A cos 2 π t T + B sin 2 π t T .

The first two terms on the right-hand side are the linear function, representing the long-term trend. The third and fourth terms on the right-hand side represent the seasonal component, where the period is T= 365 d. The amplitude of seasonal variations can be obtained as A2+B2. Here, bo^, b1^, A^, and B^ are obtained using nonlinear least squares fitting the SST dataset. Adding other periodic components such as interannual variations may only sometimes be helpful due to the increased number of fitting parameters, which could lower the numerical accuracy and stability.

https://nhess.copernicus.org/articles/24/2481/2024/nhess-24-2481-2024-f08

Figure 8(a) b1^ as a function of data length using the simulated SST data and the probability density function of b1^ using data lengths of (b) 3–14 years and (c) 7–14 years by applying the methods of SST anomalies (cyan lines), combined linear and sinusoidal fitting (blue lines), and evenized SST (red lines). The dashed black lines denote the known b1 value.

Download

The performance of the three methods is evaluated using the 14-year time series as depicted in Fig. 5a, which is generated using Eq. (3). The semidiurnal tidal amplitude is increased from 0.2 to 0.3 °C to better investigate the impacts of small fluctuations. Figure 8 shows how the estimated slope changes with the different data lengths (3–14 years) used for estimation, allowing for the evaluation of the uncertainty in each method. Overall, as the data length increases, there is a reduction in b1^ uncertainty for both the linear trends in SST anomalies (cyan curve in Fig. 8a) and evenized SST (red curve). In both methods, the uncertainty in b1^ is significant when the data length is less than 7 years, and the deviation could reach 0.01 °C yr−1. The deviation is less than 0.003 °C yr−1 for data length > 7 years. The b1^ obtained from the evenized SST method closely aligns with the correct value (represented by the dashed black line), whereas the b1^ obtained from the SST anomaly method tends to be consistently lower than the correct value. This can be clearly found in the probability density function (PDF) shown in Fig. 8b. The PDF of b1^ estimated using the evenized SST is unbiased because it concentrates around the correct value (0.1643 °C yr−1; the vertical dashed line). In contrast, the SST anomaly estimate (cyan line) is biased because its PDF deviates from the correct value. This suggests that our method could be a better estimator.

https://nhess.copernicus.org/articles/24/2481/2024/nhess-24-2481-2024-f09

Figure 9Linear and sinusoidal fitting curve of Eq. (5) using simulated data lengths of (a) 8 years and (b) 7 years. The solid red and blue lines represent the simulated data and their fitting curves, respectively. The dashed red and blue lines represent the long-term trends from the evenized SST and fitting methods, respectively.

Download

When using the combined linear and sinusoidal fitting method, there is no clear relationship between the uncertainty and data length, as depicted in Fig. 8a by the blue line. The PDF shows a more concentrated distribution at b1^= 0.1643 °C yr−1 (blue line in Fig. 8b), suggesting better performance than our method. Figure 9a shows a successful fitting curve of Eq. (5) (solid blue line), which overlaps with the simulated data (solid red line) when the data length used is 8 years. The resulting long-term trend (dashed blue line) also aligns with that of the evenized SST (dashed red line), but it is exactly covered by the dashed blue line. However, unexpected fitting failures can cause large deviations (blue line in Fig. 8a), such as in the example when the data length of 7 years is used (Fig. 9b). The fitting curve (solid blue line in Fig. 9b) has a smaller seasonal amplitude and a clear phase shift compared to the simulated data (solid red line in Fig. 9b). The estimated slope of the long-term trend (dashed blue line) is gentler than the known trend. In contrast, the known trend agrees with that estimated one using evenized SST (dashed red line). We re-examine the PDF using b1^ with a data length > 7 years, which is generally applicable for long-term trend estimates (Fig. 8c). The method of SST anomalies remains biased. The methods of linear and sinusoidal fitting are unbiased, and the peak value of PDF slightly increases from 0.79 to 0.81. Similarly, the methods of evenized SST are unbiased, but the peak value of PDF significantly increases from 0.32 to 0.5. To summarize, our proposed method is unbiased and better than the conventional SST anomaly method. While our method may have a more significant degree of uncertainty than linear and sinusoidal fitting, this uncertainty remains within an acceptable range. Furthermore, linear and sinusoidal fittings can be unstable when applied to natural data containing significant noise.

The same examination using the CWA data for SST anomalies (cyan lines in Fig. 7) and combined linear and sinusoidal fitting (blue lines in Fig. 7) methods is carried out. Again, we focus on the comparison in the stable period, 6 months of trimmed time. The b1^ obtained using SST anomalies is 0.004–0.015 °C yr−1 lower than that obtained using evenized SST (cyan lines in Fig. 7 and Table 1). The result obtained agrees with the expected outcome based on the simulated data, as shown in Fig. 8. Using the combined linear and sinusoidal fitting (blue lines), the obtained b1^ roughly aligns with the SST anomalies. The result differs from the simulated data. It is anticipated that complex and diverse natural signals could have interfered with the fitting results, often considered noise. Indeed, unexpected peaks related to the failed fitting occur for the data from Chenggong station (Fig. 7c), where the tidal signal (Fig. 2a) is strongest among the three stations. Finally, the slope obtained from linear fitting to the STL nonlinear curve (dashed magenta lines in Fig. 7 and Table 1) is close to the result obtained from evenized SST.

6 Discussion and conclusions

Here, we systematically examine the methods of linear regression and STL for their ability to extract the long-term trend from an SST time series. STL may be the best method to extract the long-term trend accurately. However, it comes with a significantly long computational time. The runs conducted here using idealized data require approximately 3–5 min each. The linear regression methods are usually shorter than 0.1 s. STL is not suited well for tasks that involve lengthy loop operations, e.g., computing the global increase rate of SST using satellite data. Instead, linear regression methods are preferable for such tasks but are subject to the nature of the long-term SST data. We summarize three distinct features of the SST time series: (a) the time axis lacks a random component, (b) the time component covers a significantly broader range than the SST one, and (c) SST exhibits vigorous seasonal variations with amplitudes exceeding the magnitude of the long-term SST increase. Feature (a) indicates that the most suitable linear regression method is OLSR, which considers SST as being composed of a deterministic part and a random part, while time has only a deterministic part. The alternative OLSR that swaps the consideration of SST and time is evidently unsuitable. GMR is excluded because it also accounts for the randomness of time. Feature (b) indicates that the slope is nearly flat because time spans a large interval, e.g., 3000 d, while the SST range is typically smaller than 10 °C. When the regression line is nearly horizontal, the OR will closely approximate the OLSR. This is generally true for long-term measured data.

Accordingly, we propose that OLSR (and OR) can be employed to extract the long-term trend in SST data by addressing the bias arising from feature (c). The proper regression to obtain the long-term trend depends on the effective cancellation of the strong seasonal signals. However, effective cancellation only occurs when the seasonal signal is an even function, indicating that it has a mirroring trend with respect to the midpoint. The bias will be strongest when the seasonal signal is an odd function. We refer to this temporal phenomenon as phase–distance imbalance. This bias induced by the seasonal signal can be appropriately corrected by trimming the data, ensuring that the dataset becomes an even function before conducting OLSR (or OR). Finally, we compare our methods with two commonly used methods in the climate community. Our proposed method is unbiased and better than the conventional SST anomaly method. While our method may have a more significant degree of uncertainty than linear and sinusoidal fitting, this uncertainty remains within an acceptable range. The fitting method generally performs better, as seen in Fig. 8. However, linear and sinusoidal fittings can be unstable in occasional cases. The poor fitting may be addressed by providing better initial guess values, constraining parameter intervals, changing the numerical method, filtering the data, and using other approaches. All of these require additional trials. Our proposed method provides another robust and efficient method that can avoid this disadvantage. Users can choose the method that best suits their analysis needs.

Code availability

The MATLAB code for the linear regressions and seasonal bias correction is available at https://doi.org/10.5281/zenodo.10360553 (Chang, 2023).

Data availability

The experimental SST data are available at https://doi.org/10.5281/zenodo.10360553 (Chang, 2023).

Author contributions

MHC and YHC initiated and formulated the study's conceptual framework. MHC and YCH developed the methods to correct the seasonal bias, conducted statistical analyses, and produced the figures. MHC undertook the literature review and drafted the initial manuscript. CTT provided insight into the application of our method to the real data. YHC, CTT, JC, and JCJ contributed to the revision and editorial enhancement of the final paper.

Competing interests

The contact author has declared that none of the authors has any competing interests.

Disclaimer

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors.

Acknowledgements

This paper's inception stems from our endeavor to estimate the long-term sea surface temperature trend concerning global warming around Taiwan. Discussion with Yu-Yu Yeh has been very helpful. We acknowledge the outstanding efforts of the personnel at CWA responsible for maintaining the coastal buoys over a long period, facilitating our study. We thank the editor and both reviewers for their constructive suggestions, which have greatly improved this paper.

Financial support

We thank Taiwan's Central Weather Administration (CWA) for funding this research (grant no. 1122057C).

Review statement

This paper was edited by Dan Li and reviewed by Duo Chan and one anonymous referee.

References

Boretti, A.: Analysis of segmented sea level time series, Appl. Sci., 10, 625, https://doi.org/10.3390/app10020625, 2020. 

Chang, M.-H.: Revisiting regression methods for estimating long-term trends in sea surface temperature, Zenodo [code and data set], https://doi.org/10.5281/zenodo.10360553, 2023. 

Chang, M.-H., Cheng, Y.-H., Yeh, Y.-Y., Hsu, J.-Y., Jan, S., Tseng, Y.-H., Sui, C.-H., Yang, Y. J., and Lin, P.-H.: Diurnal Sea Surface Temperature Warming along the Kuroshio off Taiwan under Easterly Wind Conditions, Geophys. Res. Lett., 50, e2022GL101412, https://doi.org/10.1029/2022GL101412, 2023. 

Chen, C., Wang, G., Xie, S., and Liu, W.: Why does global warming weaken the Gulf Stream but intensify the Kuroshio?, J. Climate, 32, 7437–7451, https://doi.org/10.1175/JCLI-D-18-0895.1, 2019. 

Cheon, W. G., Park, Y. G., Yeh, S.-W., and Kim, B.-M.: Atmospheric impact on the northwestern Pacific under a global warming scenario, Geophys. Res. Lett., 39, L16709, https://doi.org/10.1029/2012GL052364, 2012. 

Cleveland, R. B., Cleveland, W. S., McRae, J. E., and Terpenning, I.: STL: a seasonal-trend decomposition procedure based on loess, J. Off. Stat., 6, 3–73, 1990. 

Cleveland, W. S. and Devlin, S. J.: Locally weighted regression: An approach to regression analysis by local fitting, J. Am. Stat. Assoc., 83, 596–610, 1988. 

Davis, K. A., Pawlak, G., and Monismith, S. G.: Turbulence and coral reefs, Annu. Rev. Mar. Sci., 13, 343–373, https://doi.org/10.1146/annurev-marine-042120-071823, 2021. 

Deming, W. E.: Statistical Adjustment of Data, New York, NY: John Wiley & Sons, https://archive.org/details/in.ernet.dli.2015.18293 (last access: 18 July 2024), 1943. 

Emanuel, K.: Increasing destructiveness of tropical cyclones over the past 30 years, Nature, 436, 686–688, https://doi.org/10.1038/nature03906, 2005. 

Emery, W. J. and Thomson, R. E. (Eds.): Data Analysis and Methods in Physical Oceanography, Elsevier, Amsterdam, Netherlands, ISBN 978-0-444-50756-3, 2001. 

Frieler, K., Meinshausen, M., Golly, A., Mengel, M., Lebek, K., Donner, S. D., and Hoegh-Guldberg, O.: Limiting global warming to 2 °C is unlikely to save most coral reefs, Nat. Clim. Change, 3, 165–170, https://doi.org/10.1038/nclimate1674, 2013. 

Hinkel, J., Nicholls, R. J., Vafeidis, A. T., Tol, R. S. J., and Avagianou, T.: Assessing risk of and adaptation to sea-level rise in the European union: an application of DIVA, Mitig. Adapt. Strat. Gl., 15, 703–719, https://doi.org/10.1007/s11027-010-9237-y, 2010. 

Hoegh-Guldberg, O., Poloczanska, E. S., Skirving, W., and Dove, S.: Coral reef ecosystems under climate change and ocean acidification, Front. Mar. Sci., 4, 158, https://doi.org/10.3389/fmars.2017.00158, 2017. 

Hughes, T., Kerry, J., Álvarez-Noriega, M., Álvarez-Romero, J., Anderson, K., Baird, A., Babcock, R., Beger, M., Bellwood, D., Berkelmans, R., Bridge, T., Butler, I., Byrne, M., Cantin, N., Comeau, S., Connolly, S., Cumming, G., Dalton, S., Diaz-Pulido, G., Eakin, C., Figueira, W., Gilmour, J., Harrison, H., Heron, S., Hoey, A., Hobbs, J., Hoogenboom, M., Kennedy, E., Kuo, C., Lough, J., Lowe, R., Liu, G., McCulloch, M., Malcolm, H., McWilliam, M., Pandolfi, J., Pears, R., Pratchett, M., Schoepf, V., Simpson, T., Skirving, W., Sommer, B., Torda, G., Wachenfeld, D., Willis, B., and Wilson, S.: Global warming and recurrent mass bleaching of corals, Nature, 543, 373–377, https://doi.org/10.1038/nature21707, 2017. 

Kahn, A. B.: A Generalization of average-correlation methods of spectrum analysis, J. Atmos. Sci., 14, 9–17, https://doi.org/10.1175/0095-9634-14.1.9., 1957. 

Kondo H., and Kou H.: Wavelet image compression using sub-block DCT, Proceedings. Ninth IEEE International Conference on Networks, ICON 2001, Bangkok, Thailand, 10–11 October, 327–330, https://doi.org/10.1109/ICON.2001.962362, 2001. 

Le Cozannet, G., Garcin, M., Yates, M., Idier, D., and Meyssignac, B.: Approaches to evaluate the recent impacts of sea-level rise on shoreline changes, Earth Sci. Rev., 138, 47–60, https://doi.org/10.1016/j.earscirev.2014.08.005, 2014. 

Leng, L., Zhang, T., Kleinman, L., and Zhu, W.: Ordinary least square regression, orthogonal regression, geometric mean regression and their applications in aerosol science, J. Phys. Conf. Ser., 78, 012084, https://doi.org/10.1088/1742-6596/78/1/012084, 2007. 

Levermann, A., Griesel, A., Hofmann, M., Montoya, M., and Rahmstorf, S.: Dynamic sea level changes following changes in the thermohaline circulation, Clim. Dynam., 24, 347–354, https://doi.org/10.1007/s00382-004-0505-y, 2005. 

Lin, I. I. and Chan, J. C. L.: Recent decrease in typhoon destructive potential and global warming implications, Nat. Commun., 6, 7182, https://doi.org/10.1038/ncomms8182, 2015. 

Lin, I. I., Pun, I. F., and Lien, C. C.: `Category-6' Supertyphoon Haiyan in global warming hiatus: contribution from subsurface ocean warming, Geophys. Res. Lett. 41, 8547–8553, https://doi.org/10.1002/2014GL061281, 2014. 

Marple, S. L.: Computing the Discrete-Time Analytic Signal via FFT, IEEE T. Signal Process., 47, 2600–2603, 1999. 

Maue, R. N.: Recent historically low global tropical cyclone activity, Geophys. Res. Lett. 38, L14803, https://doi.org/10.1029/2011GL047711, 2011. 

Mimura, N.: Sea-level rise caused by climate change and its implications for society, P. Jpn. Acad. B-Phys., 89, 281–301, https://doi.org/10.2183/pjab.89.281, 2013. 

Nidheesh, A. G., Lengaigne, M., Vialard, J., Unnikrishnan, A. S., and Dayan, H.: Decadal and long-term sea level variability in the tropical Indo-Pacific ocean, Clim. Dynam., 41, 381–402, https://doi.org/10.1007/s00382-012-1463-4, 2013. 

Pandolfi, J. M., Connolly, S. R., Marshall, D. J., and Cohen, A. L.: Projecting coral reef futures under global warming and ocean acidification, Science 333, 418–422, https://doi.org/10.1126/science.1204794, 2011. 

Park, K. -A., Park, J. -E., and Kang, C.-K.: Satellite-Observed Chlorophyll-a Concentration Variability in the East Sea (Japan Sea): Seasonal Cycle, Long-Term Trend, and Response to Climate Index, Front. Mar. Sci., 9, 807570, https://doi.org/10.3389/fmars.2022.807570, 2022. 

Ranasinghe, R.: Assessing climate change impacts on open sandy coasts: a review, Earth Sci. Rev., 160, 320–332, https://doi.org/10.1016/j.earscirev.2016.07.011, 2016. 

Safaie, A., Silbiger, N. J., McClanahan, T. R., Pawlak, G., Barshis, D. J., Hench, J. L., Rogers, J. S., Williams, G. J., and Davis, K. A.: High frequency temperature variability reduces the risk of coral bleaching, Nat. Commun., 9, 1671, https://doi.org/10.1038/s41467-018-04074-2, 2018. 

Sakamoto, T. T., Hasumi, H., M. Ishii, Emori, S., Suzuki, T., Nishimura, T., and Sumi, A.: Responses of the Kuroshio and the Kuroshio Extension to global warming in a high-resolution climate model, Geophys. Res. Lett., 32, L14617, https://doi.org/10.1029/2005GL023384, 2005. 

Sreeraj, P., Swapna, P., Krishnan, R., Nidheesh, A. G., and Sandeep, N.: Extreme sea level rise along the Indian Ocean coastline: Observations and 21st century projections, Environ. Res. Lett., 17, 114016, https://doi.org/10.1088/1748-9326/ac97f5, 2022. 

Sully, S., Burkepile, D. E., Donovan, M. K., Hodgson, G., and van Woesik, R.: A global analysis of coral bleaching over the past two decades, Nat. Commun., 10, 1264, https://doi.org/10.1038/s41467-019-09238-2, 2019. 

Sun, Y., Zhong, Z., Li, T., Yi, L., Hu, Y., Wan, H., Lian, Q., Ma, C., and Li, Q.: Impact of ocean warming on tropical cyclone size and its destructiveness, Sci. Rep.-UK, 7, 8154, https://doi.org/10.1038/s41598-017-08533-6, 2017. 

Tkachenko, K. S. and Soong, K.: Dongsha atoll: a potential thermal refuge for reef-building corals in the South China Sea, Mar. Environ. Res., 127, 112–125, https://doi.org/10.1016/j.marenvres.2017.04.003, 2017. 

Tseng, Y. H., Breaker, L. C., and Chang, E. T. Y.: Sea level variations in the regional seas around Taiwan, J. Oceanogr., 66, 27–39, https://doi.org/10.1007/s10872-010-0003-2, 2010. 

Wang, Y.-L. and Wu, C.-R.: Enhanced warming and intensification of the Kuroshio Extension, 1999–2013, Remote Sens.-Basel, 11, 101, https://doi.org/10.3390/rs11010101, 2019.  

Wong, M. Y.: Likelihood estimation of a simple linear regression model when both variables have error, Biometrika, 76, 141–148, https://doi.org/10.2307/2336378, 1989. 

Download
Short summary
Monitoring the long-term trends in sea surface warming is crucial for informed decision-making and adaptation. This study offers a comprehensive examination of prevalent trend extraction methods. We identify the least-squares regression as suitable for general tasks yet highlight the need to address seasonal signal-induced bias, i.e., the phase–distance imbalance. Our developed method, evaluated using simulated and real data, is unbiased and better than the conventional SST anomaly method. 
Altmetrics
Final-revised paper
Preprint