Reply on RC2

performance of the models to land use, management, and disturbance

Science) -standard deviation metric. This metric was calculated as the last time from 2019 since a 30m pixel experienced tree cover loss, and we further computed the standard deviation of this last time since tree cover loss within 1km.
We acknowledge that these proxies can also be related to other types of disturbances than human-induced disturbances or management (e.g., tree cover loss due to drought). Still, they very likely contain relevant signals related to management and harvesting practices. For instance, we compared the two metrics derived from the Hansen data with a management regime product for Europe (Nabuurs et al., 2018), and there seems to be an association between the Hansen derived metrics and the management dataset.
Interestingly, these metrics were not selected during the feature selection procedure, and the final model still relies only on climatic data and biomass estimates.
The statistics derived from the Hansen data are now provided as part of our datasets whether users want them as proxies for disturbances regimes and applied some filtering in our forest age products (e.g., filtering out pixel that did not experience disturbances in the last 20 years). In addition, we added discussion points in our paper related to the role of management on the age-AGB relationships as well as the role of soil-related variables.
As things stand, I think that the interpretation of the influence of climate variables on the age estimates (Fig. 5) has to be made in the context of these variables effectively driving an implicit stand growth model which converts the biomass data into age. The biomass product is the only predictor which contains information about the current state of the forest including the cumulative effect of its full land-use, management and disturbance history. i.e. the climatic variables are not explaining age, they are (or at least likely are) explaining how biomass relates to age. A direct interpretation of climate effects on age would in any case be flawed because of management not being considered. I think that assessing the drivers of age distributions themselves (as opposed to trying to make the best map) would be clearer using a model which only included drivers and not also the state (i.e. biomass).

Response:
Thanks for this comment. We agree that climate variables are explaining how age relates to biomass in our modelling framework. We have, therefore, adapted our discussion points on the link between climate and age in our manuscript. In addition, we agree that it would be very interesting to better understand the drivers of the age distribution globally by building a model that only uses its drivers while discarding state variables such as biomass. However, in this paper, we primarily intended to provide the best forest age map possible, a relevant dataset as pointed out by the referee. By including biomass as a predictor, we substantially increase the performance of the models and make sure that information related to land use, management, and disturbance history are implicitly considered through biomass estimates. Additionally, as the referee pointed out, it is challenging to retrieve global products that explicitly describe land-use, management and disturbance history.
The masking out of low tree cover is a nice technique to reduce the negative bias on pixel biomass, and thus stand age, caused by a mix of forest and non-forest land covers within a 1 km2 pixel. However, although the size of a stand is a loosely defined concept, it is generally much smaller than 1 km2 . This leaves me wondering why the authors aggregate the biomass dataset from 100 m to 1 km? Is this to reduce noise? 100 x 100 m is already much bigger than most of the inventory plots which will underlie this product, but it would make the scale mismatch between plot-level training data and the biomass product used for extrapolation much less acute. Is a reduction in noise worth the loss of this important small-scale variation? An assumption of homogeneity within a 1 km2 cell (even after masking for tree cover) would tend to reduce the extremes of low and high biomass which would have been seen in the plot-level training data. This might go some way to explaining the relative dearth of young stands in the global age product (Figs. 9 and 10).

Response:
Thanks for your comment. The main reason for aggregating the biomass dataset from 100m to 1km is to match the spatial resolution of the climate data (i.e., Worldclim data) for the upscaling procedure. We agree that by aggregating the biomass dataset from 100 m to 1km, we lose spatial variability as mentioned by the referee. Although we do not intend to provide a forest age product at 100m resolution, we added some discussion points about the limitation of having a global age product at 1km instead of, for instance, 100m pixel size.
Do all training datasets resolve up to 300 years of age (Line 100)? How did you deal with this if not? Are all the methods used to determine age likely to be accurate going back this far? Given the biases in prediction at greater than ca. 130 years (Fig., 3), and that the age of old-growth forest is both perspective and biome dependent, perhaps the classification between old-growth and non-oldgrowth would have been more accurate if a younger age threshold was taken? Whether or not you want to test this is of course up to you (perhaps you did already?), but at least a bit more clarity and discussion would be good.

Response:
Thanks for this comment. Not all in-situ data resolve up to 300 years old as some of the old-growth forests are older than 500 years old, while sometimes the age of the plot is not even known (e.g., old-growth tropical forests). In our analysis, we used an arbitrary upper limit of 300 years old, which we agree can be discussed. However, we did additional experiments with an upper age limit of 150, and we observed a decrease in the model performance and a strong bias for the intermediate age class (>70 years old) (i.e., underestimating this age class). As mentioned by the referee, there are biases in prediction at greater than ca. 150 years; that is why we advised the user to use an upper age limit of 150 years old when using the MPI-BGC age product. Fig. 7, where my reading of the graphs doesn't find the same features.

I'm struggling with the interpretation around
L312. My reading of this graph is that in warm regions the whole range of forest ages is found, whereas in cooler regions, only a relatively narrow range of ages exists. This itself is pretty weird, as the plot seems to suggest that cold (presumably boreal) forests only have ages of around 100 years, with no younger stands. This doesn't seem very plausible. L320. It seems very strange that the youngest stands should only be found in the tropics (i.e. regions of high temperature, Fig. 7c). How can you explain this result? Fig.  7d. The points basically seem to form a square, apart from a small indent in the upper left, which seems challenging to interpret causally.

Response:
Thank you for your comment. It is important to note that this plot has been done using a forest age product aggregated at a 0.25-degree pixel size; therefore, one loses resolution in the age spectrum (i.e., young and old forest age estimates were possibly averaged at a 0.25-degree pixel size). This could explain the low fraction of young forests in some regions mentioned by the referee. We agree on the interpretation of the referee, in any case. Note that air temperature variables represent annual means. As mentioned by the referee, it appears that hot and dry regions have a substantial fraction of young forests while very cold regions (< 0 degC) have mainly old forests. Nevertheless, we still observe young stands (i.e., 10-20 years old) in relatively cold regions that correspond to boreal regions (e.g., annual mean around 0-5 degC).
In order to have a more accurate description of the age distribution of our dataset in the climate space, we decided to do Figure 7 using the original age product (i.e., 1km pixel size) and not an average age estimate at 0.25-degree pixel size.

The comparison against other age products is very nice, however, not all comparator datasets are created equally. For instance, Pan et al. (2011) is based on a spatially systematic inventory system (at least in the US). Similarly, much of the temperate and boreal data in GFAD is based on summaries of statistics from national forest inventories.
Whilst these come with substantial uncertainties, I would argue that they provide a sterner evaluation of the MPI-BGC product than the comparison with products based on biomass age curves. I suggest that this is reflected in the discussion of these results and also that the comparison with GFAD provides separate histograms for regions where GFAD is based on inventories and those where it is based on the biomass-age approach.

Response:
Thanks for the advice. We agree that a direct comparison between the MPI-BGC product and other independent datasets is not systematically fair. We have followed the feedback from the referee by adding some discussion points in the manuscript as well as provided separate histograms for regions where GFAD is based on inventories and those where it is based on the biomass-age approach in Figure 10.

Response:
Thanks for the comments. We have rephrased this sentence.

Response:
Thanks for pointing this out. It has been corrected.

L230. I don't think this result necessarily implies that vapr is a strong determinant of forest age.
It can also imply that vapr is a strong control of how AGB relates to age. Given that high vapr is likely to limit both growth rates and maximum biomass, this seems to me the most likely interpretation. An influence of vapr on fire frequency, as suggested in the text, would surely primarily act through the effect of fire on AGB.

Response:
We agree on the referee's interpretation of the vapr role in the age-AGB relationship. We have adapted our interpretation in the manuscript based on the referee's comment.

Response:
Thanks for this advice. We provided a map showing the residuals estimates at the plot level in the supplements.

Response:
Thanks for the comment. We agree that there is a temporal mismatch between our product and Ceccherini et al. study. We removed this citation from the manuscript and used the Vilén et al. study to discuss the age pattern in Europe.
L292. I agree that different fire regimes seem plausible (Shorohova et al., 2011), but there is also substantial harvest identified in southern Siberia (Curtis et al., 2018), which should also play into this discussion.

Response:
Thanks for the comment. We also added points related to harvesting when discussing the patterns in Siberia.

Response:
Thanks for pointing this out. It has been clarified in the caption.
L367-368. This sentence is really hard to follow, please can you rephrase it? I think what is being presented is the fraction of the random forest ensemble which predicts an old-growth forest for pixels which the mean of the ensemble attributes to old-growth, but after reading it several times I'm not 100% sure.

Response:
Thanks. This sentence has been rephrased to improve clarity.
L392. Are you able to speculate a bit more on the methodological differences that drive the differences of the new forest age product with Pan et al.?

Response:
We have added more discussion points in the manuscript explaining the methodological differences between the MPI-BGC and Pan et al products. Figure 9. I don't follow why the age map without using a tree cover threshold needs to be applied for consistency. Surely the tree cover threshold deemed to be most appropriate should be used for comparison. Or, perhaps better, all tree cover thresholds should be compared and the one which provides the best agreement with the independent, inventory based, datasets (at least for regions where the inventory is systematic) might be recommended? It's really helpful that you explore this uncertainty due to tree cover and make all the maps available (thanks!), but many users will need to make a choice on a single map to use and it would be helpful to have a best recommendation to support this.

Response:
Thanks for this feedback. The Chazdon age map product uses a biomass product for which there was no tree cover correction applied. That is why we thought it was fairer to compare the two products using the MPI-BGC age product without tree cover correction. However, we agree that it is relevant to compare all tree cover thresholds with the independent datasets to understand the one which provides the best agreement. We have added such an analysis in the manuscript.
Figures 6 and 9. Is this mean or median age per pixel? If so, how did you deal with the old growth age class having an infinite upper age bound?

Response:
In Figures 6 and 9, the age per pixel does not represent mean or median estimates but the age estimates that were predicted by the two random forests (i.e. the classifier and the regressor) at 1km spatial resolution. For the pixels classified as old-growth forests (i.e., having an infinite upper age bound), we assigned an age estimate of 300 years-old as described in the method section. Figure 10. The integrated forest area does not appear to be the same for GFAD and MPI-BGC products. Differences also appear to be shown in regions that have no forest (high northern latitudes, interior Australia). Please could you check? Also, which tree-cover correction was used in this comparison?

Response:
Thanks for noticing this. We have now double-checked a potential mismatch in the forest area between the two products. Here, we did not apply any tree correction in the MPI-BGC product as none was applied for the GFAD product. This was done to have a fair comparison between the two products.