Human influence on growing-period frosts like the early April 2021 in Central France

. In early April 2021 several days of harsh frost affected central Europe. This led to very severe damages in grapevine and fruit trees in France, in regions where young leaves had already unfolded due to unusually warm temperatures in the preceding month (march 2021. We analysed with observations and 172 climate model simulations how human-induced climate 20 change affected this event over central France, where many vineyards are located. We found that, without human-caused climate change, such temperatures in April or later in spring would have been even lower by 1.2°C [0.75°C;1.7°C]. However, climate change also caused an earlier occurrence of bud burst, that we characterized in this study by a growing-degree-day index value. This shift leaves young leaves exposed to more winter-like conditions with lower minimum temperatures and longer nights, an effect that over-compensates the


Introduction
Frost days and cold spells are decreasing in frequency and intensity worldwide (IPCC, 2021;van Oldenborgh et al., 2019). 35 Yet, severe cold spells continue to pound many mid-latitude areas, due to the occasional invasion of polar air being transported well into lower latitudes as a consequence of the chaotic motion of Rossby waves. When occurring in spring, such cold events can create a range of impacts on agriculture such as in April 2021, when young leaves and flowers have started to develop in fruit trees or grapevines. The frost event which took place from 6 to 8 April 2021 was exceptional with daily minimum temperatures below -5°C recorded in several places. In several places, such low temperatures left no chance to save grapevines 40 and fruit trees by frost management strategies (e.g. local heating from braseros or spreading water to keep frost moderate at the surface of plants). The cold temperatures led to broken records at many weather stations (see Figure 1, right-hand-side).
Unfortunately, this cold event happened a week after an episode of record-breaking high March temperatures also in many places in France and Western Europe (Figure 1, left-hand-side). This sequence led the growing season to start early, with bud burst occurring in March and the new leaves and flowers left exposed to the deep frost episode that followed. 45 Fruit production was also severely hit for some fruits. Estimates of production losses are of about 50% for pears, cherries, ~25% for peaches and ~20% for apples (with large departures from average depending on the region). Some other productions were also impacted (as sugar beet emergence) but final yields were not finally affected because of favorable production conditions (Agreste, 2021). 60 The occurrence of such an event called for investigating the role of climate change. From the weather point of view, the event is rather classical for cold outbreaks, when air masses of polar origin invade Western Europe. The large-scale flow pattern was characterized by a strong high-latitude anticyclone extending from Greenland to the North-Western European coasts, which was found among the 4 most recurrent or stationary North-Atlantic flows (the "Greenland Blocking" pattern of Michelangeli et al., 1995, andVautard et al., 1990), inducing a negative value of the North-Atlantic Oscillation (NAO) Index. The 65 combination of polar air advection, cloud-free sky and still long nights led to hours of intense frost. Such dynamical events are not observed to have become more frequent (Screen et al., 2013, Blackport andScreen, 2020) despite the ongoing debate on the role of narrower sea ice extent favoring the occurrence of blocking anticyclones (Barnes and Screen, 2015). However, human-induced changes in dynamical conditions, especially leading to cold outbreaks, remain largely uncertain and can be viewed from various indices (Shepherd, 2014), and their understanding would require an in-depth, dedicated analysis. 70 Here we perform a statistical attribution analysis of the 2021 late frosts to climate change from an impact perspective. The effects of climate change on late frosts and their consequences are complex because several processes are in competition, in particular the earlier start of the growing season and the general regression of cold extremes and frost days (IPCC, 2021). The advance in the start of the growing season has increased the number of frost days occurring after the start of the growing season in several places worldwide, including in Europe (Liu et al., 2018). Using several indices for grapevine exposure, it has been 75 found that the date of the last frost day has not regressed as fast as the date of growing season start (Sgubin et al., 2018). However, so far no formal attribution study of a "growing period frost" has been carried out to quantify the role of anthropogenic climate change in these observed trends. In order to carry out the attribution study, we use several indices and event definitions characterizing cold temperatures in the growing season, and the well-established attribution methodology described in Philip et al. (2020) andvan Oldenborgh et al. (2021). 80 A rapid attribution analysis was carried out in June 2021 and reported in (Vautard et al., 2021, https://www.worldweatherattribution.org), with several indices developed and analyzed, showing that while spring frosts are generally becoming less severe and frequent, frosts occurring after the growing season start are becoming more intense due to climate change. Since then, observations were consolidated, more model data has been collected and simulation data processing was homogenized. This article reports the final results, which confirm the conclusions of the preliminary analysis. 85 We present several definitions of the frost event in section 2, and the corresponding indices chosen. In Section 3, we present methods, observations and models used, and trends in observations are analyzed, and in section 4, results from observations and model ensembles are analyzed. This is followed by a synthesis of results, a discussion and a conclusion in Section 5.
The cold spell of 6-8 April 2021 hit much of Central and Northern Europe (see Figure 2a). However, we focus here on 90 central/northern France in order to investigate a relatively homogeneous, mostly plain or low-elevation area (see Figure 2b). This area [-1°-5°E; 46°-49°N] covers most of the grapevine agriculture areas of Champagne, Loire Valley and Burgundy which were identified as specifically vulnerable regions under climate change (Sgubin et al., 2018). The area also covers regions with high crop and fruit production.

100
We use several event definitions, accounting for different phenological aspects. Differences in results for these definitions also test the robustness of the attribution. In each case, the "event" is defined as the yearly minimum temperature (TNn) obtained under specific conditions, and then averaged over the area, or taken at specific station locations. A basic reference conditioning is the fixed-season minimum temperature and does not consider phenology: the TNn is calculated over the April-July months (index TNnApr-Jul). The second index accounts for phenology. The TNn is calculated conditioned upon the Growing Degree 105 Day above 5°C (hereafter denoted "GDD") being larger than thresholds characterizing bud burst conditions, which depend on species. In this study, our aim is not to tie thresholds to specific plants' phenology but to provide a general overview for different thresholds. GDD is calculated, at each grid point, with a starting date of the previous winter solstice, in a similar approach used by Garcia de Cortazar-Atauri et al. (2019), assuming that the dormancy break period for grapes is finished in the calculation period. The formula for the GDD at day t during year y is therefore: with TM the daily mean temperature and tstart is the 21 december of previous year y-1 (starting time of the cumulation). In 2021, the values of GDD obtained on the day before the frost events in the concerned area vary in the range 150°C.day to 115 350°C.day, with an average value on 5 April of 259°C.day over the study domain. This value is high for this calendar day (rank=14th since 1921 in the E-OBS extended dataset) but the record value was obtained in 2020, with a mean GDD of 320°C.day. Given the range of values taken in the domain, we considered 3 thresholds for GDD: 250°C.day as a central value, and 150°C.day and 350°C.day as sensitivity experiments. This range of values also helps to capture a range of bud burst values of grapevine cultivars as found in Garcia de Cortazar-Atauri et al. (2009). For each GDD threshold, the yearly minimum TN 120 values (TNn), respectively called hereafter "TNnGDD250", "TNnGDD150" and "TNnGDD350" for the three GDD thresholds, is calculated over subsequent days and until the end of July at each grid point and then averaged over the study area [-1°-5°E; 46°-49°N]. Despite the fact that the average characterizes the mean lowest temperature that can occur after crossing the GDD threshold, the average can mix several dates as the GDD threshold crossing and the yearly minimum does not necessarily occur on the same date over the whole domain. In 2021, for instance, the TNnGDD250 was already reached 125 during the 6-8 Apr episode for most of the area, but not in the easternmost part and in some other parts, because GDD did not exceed 250°C.day during the April frosts.
In order to focus on specific phenological periods when young leaves and flowers are sensitive to frost after bud burst and flowering, we also defined indices over limited ranges of GDD values. The number of possibilities are large, in most cases providing qualitatively similar results. The analysis is reported here only for the range 250-350°C.day, by using the yearly 130 minimal temperature over this GDD range (index TNnGDD250-350). This index is again calculated by grid point before being averaged spatially over the study region, or is taken at stations.

Methods
Event attribution methods used in this study are well documented in previous studies. The rapid attribution methodology is a 135 classical probabilistic approach, described in Philip et al., 2020 andvan Oldenborgh et al., 2021, and has been used in many case studies for heat waves (e.g. Kew et al., 2019, extreme precipitation (e.g. Philip et al., 2018), or more complex events such as wildfire weather (van Oldenborgh et al., 2020). It uses a stepwise approach analyzing observations with a Generalized Extreme Value (GEV) with a global warming index as a covariate, then using ensembles of models validated on the event indices and their extreme value statistics by comparison with observations, and finally using the GEV 140 with the covariate fit to build a statistical model of the data under some assumptions.
In all cases (observations and models), we used data in the 1951-2021 period for the GEV fit for attribution and for future trend estimates for a global warming of 2°C, we used the period from 2000 to 2050. For observations, the covariate is the smoothed observed GISTEMP Global Mean Surface Temperature (GMST), while for models the smoothed Global mean Surface Air Temperature (GSAT) (5-year running average) is used. The only exception is the High Resolution Model 145 Intercomparison Project (HighResMIP) SST-forced ensemble (see below), for which the observed GMST was used, because of the ensemble forcing.. Change statistics are calculated with respect to the 2021 year and estimated return period from observation as a reference.

Observations and model ensembles
The observations used here for the attribution are the E-OBS v23e dataset of daily minimum temperatures (Cornes et al., 2018). 150 The above indices are calculated for each grid point, spaced every 0.25° in this dataset, and then averaged over the study area.
For the attribution of the frost event, we use five model ensembles. Each simulation of each ensemble was bias-adjusted using the CDFt method (Vrac et al., 2016) using the daily minimum and the daily average temperatures from E-OBS over the 1981-2020 period. Bias correction is an important step here since GDD calculations use a threshold. This method was assessed for use in climate services in Bartok et al. (2019), and showed good performance. We used statistics of pooled ensembles, using 155 data until 2021 for the GEV fit of the distributions. Indices are calculated exactly as for the observations: model GDD values are calculated at each grid points using Equation (1), and the indices are averaged over the area of study (rectangle in Figure   2b).
The first model ensemble is the Euro-CORDEX (0.11° resolution, EUR-11) multi-model ensemble. It is composed of 75 combinations (as of May 2021) of Global Climate Models (GCMs) and Regional Climate Models (RCMs) for downscaling 160 (see Vautard et al., 2021 andCoppola et al., 2021 for the description of the ensemble which has increased since these publications). Each simulation consists of a historical period simulation and a RCP8.5 scenario simulation with fixed aerosol concentrations. For the attribution of past evolutions historical and scenario are concatenated until 2020. Some simulations start in 1971, whereas most simulations start from 1951. Given that we need to use data from the previous year for starting GDD accumulation, all yearly indices are calculated from their second simulation year (i.e. 1972 and 1952 respectively). 165 The second model ensemble used to study the influence of internal variability was the IPSL-CM6A-LR model (see Boucher The fourth ensemble used is a set of 10 SST-forced HighResMIP simulations (Haarsma et al. 2016). For the historical time period , the SST and sea ice forcings used are based on observed dataset, and for the future time period  2050) the SST and sea ice are derived from CMIP5 RCP8.5 simulations and a scenario as close to RCP8.5 as possible within CMIP6. The analysis of this ensemble was carried out using the observed GMST as for the observations. The fifth ensemble is the same set of models run in coupled mode, and the model GSATs were used. Again, more details can be found in the Appendix A.
Note that we bring together available simulations which do not follow the same greenhouse gas emission scenarios, which 180 could lead to large difference in climate response for given times. Such would also be the case for individual models' responses.
However, this should not be a problem as long as results are compared with fixed degree of warming. Such an approach is also followed by the recent IPCC report where changes in extremes are compared (see IPCC, 2021).
The differences with the rapid attribution in models is (i) the homogeneous bias correction, while it was model-dependent in the rapid attribution, (ii) the addition of the HighResMIP coupled runs, and the change in the CMIP6 selection which was 185 based on least-biased models instead of bias-corrected models. The present analysis is therefore more consistent across ensembles.

Model evaluation
As stated in the Philip et al. methodology, we only keep model ensembles which have extreme statistics compatible with observations. We compared the model GEV fit parameters over the overlapping model periods (1951-2020 or 1971-2020) in 190 order to check the ability of models to simulate such extremes. For reference, such ability was not confirmed for heat waves (eg. . In the current case, we found that model ensembles are compatible with the observations accounting for uncertainties (see Figure 3) for most indices but not all. Models are said to be compatible with observations when GEV scale and shape parameters have overlapping 95% confidence intervals. The comparison is made for two indices for simplicity.
For TNnGDD250 the fitted model scale parameter is compatible with the observed one. The shape parameter is very uncertain 195 in observations, leaving all model fits compatible with them. The same occurs for the TNnApr-Jul, but in this case all models have an overestimated scale parameter (in terms of amplitude). Only Euro-Cordex and HighResMIP-SST appear to have a parameter compatible with observations. Given this evaluation for this index, for the final model "weighted average" (see Philip et al., 2020), only Euro-Cordex and HighResMIP-SST should in principle be considered for the statistical evaluation of probability ratio and intensity change, while for the TNnGDD250 index, all ensembles can be considered. However, we have 200 here considered all model ensembles even for the TNnApr-Jul index for consistency across indices, and because results are qualitatively similar, keeping all models or retaining only the compatible models (see also discussion in Section 5).

Observations 210
In Figure 4, we show the annual time series of the indices as obtained from E-OBS, together with simple trend statistics for the 1951-2020 period. The Apr-Jul TNn has a slightly upward linear trend of +0.13°C/Decade, which is however not significant at the 90% (two-sided) level because of the large interannual and interdecadal variabilities. By contrast, both TNnGDD250 and TNnGDD250-350 have a significant cooling trend of -0.21 and -0.25°C/Decade respectively. The warming trend in TNnApr-Jul is partly due to larger values since 2000, but these higher values are not reflected in the other indices because 215 We conclude that, on average, since 1950, extreme yearly minimum temperatures for GDD>250 have cooled by about 1.5-2.0°C. Very low growing-period frosts were also found in 1957 and 1991, with lower values than in 2021.
For different thresholds we also find cooling trends, however with lower significance (Figure 4b). The significance of the signal remains. Interestingly, over the last 50 years (1971-2020) the trends have increased and become more significant (for 220 instance +0.29°C/Decade, p<0.1 for TNnApr-Jul, and -0.37°C/Decade for TNnGDD250, p<0.1).

225
When considering trends in low extremes of these indices, the results are qualitatively similar but significance is increased when considering GEV fitting using the smoothed observed GMST as covariate instead of assuming a linear trend (see Table   1). We estimate that the event, defined as minimum temperatures over Apr-Jul, has a return period of 78 years [at least 19 years], which means a very rare event in the current climate. However, in a climate corresponding to a global temperature 1.2°C cooler, this would have been about a 1-in-7-year event (best estimate). By contrast, the minimum temperature, taken 230 over the growing period as characterized by the GDD index, instead of fixed month, has significantly cooled by almost 2°C with large varying uncertainty ranges and significance depending on the chosen index. The observational analysis is however not sufficient to conclude a role of climate change, which would require models with factual and counterfactual assumptions. To assess the changes at local scale, we also calculated trends for 3 specific stations in the domain (stars in Figure 2). We  Table 2, for these stations, and for the three main indices: TNnApr-Jul, TNnGDD250 and TNnGDD250-350. In most cases, the trends are positive for the fixed season index and negative for the growing season period. However, almost no result is statistically significant. We conclude that at local scale, variability is dominating trend signals (   Results here differ from the rapid attribution analysis (Vautard et al., 2021) in the completion and adjustment of the E-OBS 255 dataset by the producers. This led to slightly different values for the observed indices in 2021. For instance, the estimation of the TNnGDD250-index based return period was estimated here to 8 years instead of 12 years in the rapid attribution. However, the results are qualitatively similar to those found in the preliminary analysis.

Simulated mean trends
The trends in the two main indices for the 5 model ensembles is analyzed in the form of histograms ( Figure 5), in order to 260 examine the variability across ensemble members. There is a large range of minimum temperature trends from April to July, which are almost all positive. The observed trend in the minimum temperature from April to July is close to the median middle of the distribution for the Euro-Cordex and the CMIP6 ensemble, while it is closer to the lower tail of the distribution of the remaining three ensembles. A large range of possibilities is also found for the trends of the TNnGDD250 index, with a large part of the simulations showing negative and lower trends than those of the minimum temperature from April to May, 265 consistent with the observations. We conclude from these figures that, despite the general trend towards cooling of the growing period frosts, the expected trend, for a given singular member, can also be a warming one, albeit with a smaller chance than for a cooling one. This large uncertainty also has to be taken into account in any adaptation strategy.  Table 3 shows the extreme value statistics for all indices for this ensemble as well as other ensembles used. Models show large agreement with observations on changes in return periods and intensities between the preindustrial and current climates for the fixed-calendar TNn index (TNnApr-Jul). The trends in all models seem however underestimated compared to observations for the indices with a GDD conditioning (TNnGDD250). 280 The behaviour present in all model analysis is illustrated in Figure 6: a clear, significant increase in TNnApr-Jul and an opposite trend sign for the TNnGDD250. Despite being weaker, this increasing trend in low extremes is significant for all ensembles but for HighResMIP-SST (Table 3), with a clear signal of increase in coldest temperatures when considered over the growing period, and with a threshold of 250°C.days. Such a trend is also clear and significant in most ensembles when considering the sensitive range 250<GDD<350 where young leaves and flowers are vulnerable to frost. For the other indices, trends are also 285 significant in most cases but not all.   Table 3: Change in extreme value statistics for all model ensembles and observations, with the GEV model fitted from data over the 1951-2020 period for the past trends estimates, and over the 2000-2050 period for future trends (when estimating the changes for a 2°C warming above pre-industrial levels); We assume here that pre-industrial (p.i.) global warming level is 1.2°C cooler than the 2021 one, and therefore the 2°C warming level is reached when the warming is 0.8°C above the current level. In each row, the values using 1000 samples. Numbers in blue indicate a decrease of TN, and in red an increase of TN. The last row indicates changes for a 2°C warming level. Boldface numbers indicate statistical significance against a "no change" assumption.
Despite a sign agreement between models and observations on trends, models generally simulate much weaker trends for the 300 GDD-conditioned indices than observed, a fact that remains unexplained, just as the underestimation in extreme temperatures in summer heat waves (see eg. van Oldenborgh et al., in revision). For TNnGDD250, all ensembles simulate an increase in the frequency of growing-period low extreme temperatures ranging from 10% to 110% with a weighted best estimate of 50% (see also Section 5). For the other indices the range of factors is rather similar, despite lower values for TNnGDD350. Changes in intensities are also all negative but remain below 1°C. 305

Future trends
Indices have similar future projected trends as in the past decades in the ensembles and scenarios considered here. Figure 7 shows evolutions of the ensemble-median and 10th and 90th percentiles for Euro-Cordex [RCP8.5] and CMIP6 [SSP3-7.0], for example, but similar results hold for the other ensembles, which have less members or shorter time coverages. In both cases, the median of April-July minimum temperatures over the region continues to increase with mean values around 2°C 310 while they are below frost level in 2021. By the end of the century, frost such as in 2021 will become a very rare occurrence in April or after in these scenarios. However, frost can still be expected earlier in the year, while at the same time the growing season starts earlier. This can be seen in the development of the TNnGDD250 index throughout the 21st century which shows a weak decreasing trend. It is noteworthy that in the second half of the century, the 10th percentile often nears or exceeds the 2021 value. More frequent events like the 2021 are therefore expected. By the end of the century for this scenario, we also 315 expect deep frosts in the growing period with intensities which have never been met in 2021 or in earlier years.

320
Figure 7 also includes the observed time series for the two indices. For TNnGDD250, even though points generally fit well the 10%-90% model range (expected because the models are bias-corrected), we observe a bias in low extremes with variability in the observations inducing frequent excursion in temperatures far below the 10% quantile. Such bias is not found in the TNnApr-Jul index. 325 We restrict the analysis of future trends in extremes to the 2°C warming level above the P.I. conditions, which is assumed to be 0.8°C above current level in 2021. This restriction is made to be on the safe side with potential nonlinearity of response of the extreme indices to global warming while we assume linearity here with the covariate GEV method. In this future case the GEV fit is carried out over the 2000-2050 period, and probability ratios and intensity changes are given for events with a similar return period as for the 2021 event. 330 Results are shown along with attribution results in Table 6. Extreme cold temperatures for the April-July period will continue to become less extreme. Euro-Cordex simulations, which are the only ones consistent with observed trends, project that events similar to the 2021 event would become about half as frequent in a 2°C warming climate. The other models predict factors ranging from between 3 and 10 times less frequent. In contrast, the growing-period extreme frost intensity is increasing, and the 2021 event with a GDD>250 is projected to have an increasing frequency by about 30% [10% -60%] for a 2°C warmer 335 climate than preindustrial in Euro-Cordex, 10% [0%-30%] for CMIP6 selections and 30% [10% -60%] for HighResMIP (coupled).

Synthesis, summary and discussion
The individual assessments described above for probability ratio and intensity changes in the past period are summarized in Figure 8. Given the large differences between models and observations for the growing-period indices TNnGDD250 and 340 TNnGDD250-350, we do not combine the observational and model results to form a single "synthesis" but instead we present the model weighted average for comparison with the observations. In the case of the TNnApr-Jul index, only two ensembles, Euro-Cordex and HighResMip-SST, pass the validation criteria. However, the three additional models (IPSL-CM6A-LR, CMIP6 and HighResMip) that validate well for TNnGDD250 and TNnGDD250-350, give similar results to the other ones.
Incorporating them in the weighted average has no impact on the high significance of the change found, and makes the 345 comparison across indices consistent.
While uncertainties are comparably large for the quantitative assessment of probability ratios there is a significant decrease in the likelihood of cold waves as defined above for TNnApr-Jul. The event that has occurred in 2021, taken as a fixed-season extreme, has become rare, with a return period of at least 19 years, and with a best estimate of 78 years. The intensity of a cold wave as observed in April is also decreasing, by a well-constrained best estimate of 1.2°C. When considering the lowest 350 temperatures after the growing season start simulated by the GDD thresholds, models and observations quantitatively disagree with respect to probability ratio and intensity, but the qualitative agreement is clear and shows an increase in the likelihood of damaging frost as well as an increase in the intensity across all indices. This is corroborated by the fact that these trends change with the model results serving as lower bounds. 355 In Figure 9 we summarize the projected changes in probability and intensity between the present and +2°C climate, showing an unweighted average for the three model ensembles Euro-Cordex, CMIP6 and HighResMIP. We again use all available models for TNn-Apr despite CMIP6 and HighResMIP ensembles not passing the validation over the historical period. We do so because (i) all models are included for the other two indices and we do not know how well they validate for the future, (ii) no synthesis is formed so the unweighted average shown is only of qualitative use. Probability ratios are less than unity for 360 TNnApr-Jul, indicating that the current trend for decreasing frequency of cold snaps is likely to continue in the future.  While the growing season is starting earlier, necessary plant dormancy characteristics also change and the lack of chilling winter days may delay the bud burst in many species (Chuine et al., 2016). This effect is not taken into account here and could 390 alter our results concerning changes in bud burst dates. Such dates are also dependent on species. We have tested the dependence on thresholds of a simple GDD index, which provide similar results than the central thresholds discussed in the synthesis. Dormancy effects, as well as other specific plant effects can only be studied through impact models, which was not the goal in this study.
The applicability of our results at local scale is limited in quantitative terms. The local station analysis, and the trends 395 histograms show that given locations are more likely to exhibit cooling of extreme growing-period temperatures than warming, but a warming cannot be excluded at these scales and at present day warming levels.
The discrepancy between trends in models and in observations in the historical periods currently remains unexplained. It shows that either large variability inhibits an accurate estimation of trends of cold extremes or that other factors come into play which may not be well simulated such as trends in radiation or cloudiness as a response to either warming or aerosols. These factors 400 should be investigated in future studies.
Above all, the finding that trends identified up until now continue under future warming indicates that anthropogenic climate change is an important driver of the observed trends and suggests that the models indeed underestimate the effect of change due to forcing factors and that the discrepancy between observed and simulated trends is not entirely explainable by unmodelled factors other than human-induced climate change. 405 In conclusion, we identify two key attributable effects, the decrease in likelihood and intensity of minimum temperatures and the increase of likelihood and intensity of minimum temperatures when conditioned on growing degree indices. These findings are consistent across the different lines of evidence pursued despite the quantitative differences. The GDD-indices are however a crude representation of the vulnerability of different species to frost. Thus, our findings highlight that growing season frost damage is a potentially extremely costly impact of climate change already damaging the agricultural industry but to inform 410 adaptation strategies for specific species impact-based modeling will need to complement our assessment. Other studies, in particular, indicated that impacts may be highly variable across locations and species (Leolini et al., 2018) , Vuuren et al. 2014, and O'Neill et al. 2016 together spanning the period between 1850 and 2099 for tas and tasmin variables. The analysis, as for the other ensembles, is however restricted to the years after 1950. Simulations were also bias-450 corrected but we kept only 3 members maximum per ensemble in order not to overload the results with models having many members. In total, given the available simulations initially, we obtained 45 simulations with models described in Table A

IPSL-CM6 single model ensemble
The IPSL-CM6A-LR model ensemble is a 32-member ensemble of the coupled climate model with the same name. The model is described in Boucher et al. (2020) and the ensemble is presented and evaluated in Bonnet et al., (2021). Simulations start in 460 the pre-industrial period with slightly different initial conditions and are saved in this study for the whole historical period and beyond, until 2029. The ensemble has been used for attribution studies, for instance in the 2019 heatwave attribution described in .

HighResMIP SST-forced and coupled ensembles
We also consider two sets of ensembles from the High Resolution Model Intercomparison Project (HighResMIP, Haarsma et 465 al. 2016), which is a coordinated set of experiments as a part of CMIP6, designed to assess the impact of model horizontal resolution. HighResMIP consists of atmosphere-only (SST-forced) and coupled runs, both spanning 1950-2050. In this study, we make use of both the SST-forced and coupled ensembles. As briefly described in the main text, in the SST-forced ensemble, for the 'present' time period , the SST and sea ice forcings used are based on the daily, 0.25° x 0.25° Hadley Centre Global Sea Ice and Sea Surface Temperature dataset, with area-weighted regridding used to map this to each model 470 grid; for the 'future' time period , SST/sea-ice data are derived from RCP8.5 (CMIP5) data, and combined with  Table A3. Spatial grids of the HighResMIP models in high-, medium,-, and low-resolution groups used in this study, along with relevant references for the simulations, their origins, and the number of simulations used in the analysis