Reply on RC1

The authors present experiments on dust emission, in which they conclude that an approach using an albedo-based representation of surface roughness (called AEM) outperforms “traditional” approaches (called TEM) for three reasons: 1. The formulation of streamwise sediment/saltation flux was incorrect in TEMs; 2. spatio-temporal dynamics of roughness elemental coverage were not considered in TEMs; 3. TEMs required calibration while AEM did not.

The authors present experiments on dust emission, in which they conclude that an approach using an albedo-based representation of surface roughness (called AEM) outperforms "traditional" approaches (called TEM) for three reasons: 1. The formulation of streamwise sediment/saltation flux was incorrect in TEMs; 2. spatio-temporal dynamics of roughness elemental coverage were not considered in TEMs; 3. TEMs required calibration while AEM did not.
We do not agree that your summary above adequately represents our manuscript conclusion. In the manuscript abstract we state "…It is difficult to avoid our conclusion, also raised by others, that tuning dust emission models to dust in the atmosphere has hidden for more than two decades, these TEM modelling weaknesses and its poor performance." The purpose of our manuscript is not to evaluate performance between dust emission models per se. There are many perspectives and metrics of performance that are required for that type of evaluation and that is not what is done in this manuscript. We can find no mention in our manuscript that the AEM does not require calibration. In constrast to the reviewer's comment, we recognise that all dust emission models require calibration of some sort because of simplifying model assumptions necessary for implementation. The albedobased dust emission model (AEM) is no different in this respect to other dust emission models as illustrated in our recent publication (Hennen et al., 2021) which is calibrated.
Additionally, MODIS satellite-based estimates of dust optical depth (DOD) and dust emission point sources (DPS) and their probabilities of occurrence are compared with one another and the AEM and TEM results. It is concluded that DOD was not suitable for evaluation/calibration of dust models.
We can find no mention in our manuscript where we describe that DOD was not suitable for evaluation/calibration of dust models. Our manuscript demonstrates that dust emission modelling should be evaluated against satellite observed dust emission point source (DPS) data and not dust in the atmosphere.
All these aspects are interesting and worthwhile to explore, however, the listed points do not support the authors' conclusions, The reviewer's listed points are not those which represent the manuscript or the conclusions of our manuscript. We justify our perspective on the manuscript in response to the following reviewers' points.
because a) the correctness/completeness of a formulation of streamwise saltation flux is independent of the use of albedo-based or another roughness representation, We disagree, the streamwise saltation flux is dependent on the albedo-based roughness representation. Our manuscript describes in the Introduction, and makes clear using equations, that the incorrect formulation is the use of the total wind friction velocity (u_star_cubed; Eq. 2) in the calculation of the streamwise saltation flux magnitude. We describe in the manuscript that term should be replaced with the soil surface wind friction velocity (u_s_star_cubed; Eq. 3) which in the manuscript is retrieved directly using the albedo-based approach.
b) a dynamical representation of surface roughness can also be used with TEMs using methods other than the proposed albedo-approach, Traditional dust emission models (TEMs) include aerodynamic roughness lengths (z0) but those z0 values may not represent the heterogeneous vegetation that occurs within large (up to 50 km pixels). We accept that some TEMs may allow those z0 values to vary in time / spatially due to snow, vegetation etc. In some TEMs, the z0 of each surface type is uniformly and globally held constant. The z0 value of bare ground never changes, but the ground may not be bare all year. The same is true of desert shrubland which may not be bare everywhere within the land cover type. So, in many dustprone parts of the world, especially those that are bare soil, the roughness length never changes in TEMs. We will revise our manuscript to make these perspectives clear.
c) the albedo-approach is also calibrated.
Agreed. We can find no mention in our manuscript where we state otherwise.
The authors aim to proof the advantages of AEM using uneven comparisons with what they described as a TEM: AEM uses dynamical vegetation, while TEM does not; AEM uses an updated streamwise flux formulation, while TEM does not; AEM uses variable soil texture/clay content, according to the proposed conversion of streamwise saltation to vertical dust emission flux, while the TEM does not (about the latter, there is contradictory information in the manuscript).
As we mentioned above we are not evaluating the performance of the models per se. We are demonstrating that weaknesses in dust emission modelling have been hidden because dust emission models are evaluated against dust in the atmosphere. Due to the role of mineral dust in the climate system, we see this work as an important contribution towards reducing uncertainties in Earth System Models (ESMs) and increasing the models' accuracy in reflecting past, current and future climate scenarios. Previous attempts to publish the results of the AEM compared with the TEM have been unsuccessful because reviewers have complained that the TEM is not correctly represented when using an 'even' comparison. Hence, here we implement the TEM following the scheme described in the appendix of the manuscript (Marticorena and Bergametti, 1995).
To evaluate the calculated dust emissions, the authors use two observational data sets: MODIS DOD at a spatial resolution of 1 degree and MODIS-based DPS at a spatial resolution of 250 m for a region in the southwestern USA. Again, the conclusion of DOD not being a suitable reference is not supported by this comparison due to the substantial difference in spatial resolution (approximately 400-fold higher resolution for DPS compared to DOD) in the presented comparison,… We can find no mention in our manuscript where we conclude "DOD not being a suitable reference". Our manuscript demonstrates that dust emission modelling should be evaluated against satellite observed dust emission point source (DPS) data and not dust in the atmosphere. After we aggregate the satellite observed dust emission point source (DPS) data to 1 degree pixels to tackle the incompatible spatial scales (Gotway and Young, 2002) there is no "substantial difference in spatial resolution" in our results. Our manuscript states clearly (Lines 167-172) that "Modelled (AEM and TEM) and observed frequencies are aggregated by a 1°x1° grid matrix, normalizing the results to the lowest resolution data (MODIS DOD) (Figure 1). For each grid box location, the observed frequency is calculated as the number of DPS observations per year during observation period (2001 -2016). The AEM and TEM modelled dust emission frequency describes (F>0) at DPS locations in each grid cell per year during the same period. DOD modelled frequency describes DOD > 0.2 in each grid pixel per year for the same period." …and due to the definition of DOD used in the study, which seems to be simply a gridpoint selection of AOD for DPS locations and likely includes aerosol other than dust as well.
We disagree with this statement and refer the reviewer to our manuscript (Lines 145-147) which states clerly that dust optical depth (DOD) is calculated following the established criteria: "To understand the extent to which AOD estimates the spatial variation in dust emission magnitude and frequency we calculated the probability of dust occurrence modelled by the dust optical depth (DOD>0.2) using the criteria established previously (Ginoux et al.,

2012)".
Besides these previous aspects, the description of a TEM seems inaccurate and outdated at many places, and often not supported by recent references.
With this comment, there is no explanation of where in the manuscript the description of a TEM seems inaccurate and outdated. Please provide additional clarification and we will happily respond in this discussion.
In our manuscript we refer to long-established dust emission model schemes which were developed more than two decades ago (Marticorena and Bergametti, 1995;Zender et al., 2003;Tegen et al., 2002). We also refer to very recent dust emission model results including our own (Hennen et al., 2021).
In light of these aspects, I cannot recommend publication of this manuscript. Additional comments are added below.
We have responded to every comment provided by the reviewer and have described and illustrated using examples from the manuscript, how each comment has misrepresented the manuscript. Consequently, we find that there is no basis to support the reviewer's conclusion and do not agree with the reviewer's recommendation. We endeavoured to provide clarity in our manuscript but recognize that there may be room for improvement, and we are happy to revise it accordingly. * L56-57 "total wind friction velocity ustar created by all scales of roughness" sounds like surface roughness were solely sufficient to describe ustar. Shouldn't the atmospheric flow be an even more important aspect?
Near the land surface, the total wind friction velocity (u_star) is influenced by the (freestream) wind speed of the inner boundary layer and the aerodynamic roughness of the land surface. We use a scale invariant calibration between shadow (1-albedo) and wind tunnel wind friction velocity (normalized by freestream wind speed) to calculate the soil surface wind friction velocity (u_s_star). This is the albedo-based approach and avoids the need for aerodynamic roughness length and roughness height, properties which are unknown and approximated by roughness factors attributed to land cover classes. * L64 There is no "reanalysis model". Reanalyses are based on model runs and observations.
By 'reanalysis models' we are referring to Reanalysis wind fields. We will clarify this in the revised manuscript.
For clarity, those lines (67-76) referred to by the reviewer are not a repeat of an already published paper . In describing this issue here we are demonstrating the significance of this incorrect formulation for dust emission modelling.
A reference to Kok et al. (2014), who have previously brought up this issue, should be added.
We could not find any information in Kok et al., (2014) which describes how this issue was raised earlier. If the reviewer could direct us to page / paragraph we will happily respond in this Discussion.
It is not clear why -if this problem has been identified -the authors are not simply using the updates expression for both estimates with AEM and TEM, as this issue has nothing to do with the proposed albedo-approach.

In a previous manuscript, we tried using the correct formulation in both AEM and TEM and reviewers would not accept that formulation correctly represented the TEM.
In contrast to the reviewer's perspective, we see the incorrect formulation of the sediment flux as central to the albedo-based approach. The soil surface wind friction velocity (u_s_star) is retrieved directly from a calibration with shadow (1-albedo) which avoids the need for aerodynamic roughness length and roughness height, properties of the heterogeneous land surface which are unknown and approximated by homogeneous roughness factors attributed to land cover classes.
If the authors wish to test the sensitivity to the streamwise saltation flux formulation, this test needs to be performed holding all other settings constant.
The manuscript does not intend to test the sensitivity to the streamwise saltation flux formulation. The manuscript demonstrates (Figure 2 and related text) the significance for dust emission models of using the incorrect formulation. As described in L80 of our manuscript the incorrect formulation have been widely adopted in the TEMs.
* L80 Why is it that the correct values of R are not known for every pixel and every time step? Parameterizations of drag partition also rely on satellite data as input, as does the presented albedo-approach. While each data set comes with uncertainties and missing data, I do not see a fundamental different in the knowledge obtainable about surface roughness from satellite data for use with one approach or another.
The explanation of why the correct values of the drag partition R are not known for every pixel and every time step is described in line 81: "…common approach to modelling dust emission in ESMs uses globally constant values of aerodynamic roughness length (z0), which are static over time and fixes R(z0) ≈ 0.91." In other words, if a land cover class does not change over time and an aerodynamic roughness length is attributed to it then the drag partition is static over time and fixed across all pixels given that land cover class. In contrast, the albedo-based approach uses shadow (1-albedo) of the land surface (e.g., using MODIS at 500 m) which may be different for every pixel and changes significantly every few weeks with the vegetation phenology, changes abruptly with any rapid land cover change, and changes over long-time periods with extrinsic factors like global 'greening'. Consequently, dust emission calculations will respond to these spatiotemporal changes in the land surface. Thanks, we will attempt to do that in the revised manuscript. Of course, it is quite hard to do that for every point we might make as that would require a list of differences / similarities based on the different models. That difficulty in why in this manuscript we are attempting to make a more universal assessment using one of the common dust emissions schemes (Marticorena and Bergametti, 1995) which underpins many of the dust emission models.
* L82 What is meant by values of z0 are pre-tuned and tend to maximize dust emission? No reference is provided for this statement. Values in the aforementioned references were obtained based on satellite data and ground-based measurements, as the authors also mention in the following sentences.
Models which fix the aerodynamic roughness length (z0) across all barren regions need to set a value. That value is typically set to a value which ensured that the dust emission from North Africa is large (consistent with dust in the atmosphere above North Africa) because models tended to underestimate dust from the region because of the reduced wind speeds and the soil having about half the clay content. In the revised manuscript, we will include citations (Zender et al., 2003;Woodward, 2001;Tegen et al., 2002).
* L85 The use of preferential source areas, e.g. as in Ginoux et al. (2001), in some models has the purpose to generally specify soil erodibility and circumvent the need to prescribe detailed soil-surface properties. It is not specific to a description of surface roughness or z0.
Our sentence is referring to generic nature of preferential source areas reducing the magnitude of dust emission. In the revised manuscript we will make this point clearer.
* L95 Again, the "correct" Equation (3) is not related to the use of albedo to describe roughness. Hence a comparison of the albedo-approach and Eq. (3) with another approach and Eq. (2) is inconsistent.
We have explained above that the correct formulation is related directly to the albedo-based approach and which controls the magnitude of sediment flux and hence dust emission.
* L101-102 Why would the albedo-approach be inconsistent with a grain-scale entrainment threshold? Is that because the albedo-derived u* is also resolutiondependent?
The albedo-based approach can use any source of shadow (1-albedo). In the manuscript we use shadow from MODIS at 500 m. The entrainment threshold is established from wind tunnel measurements and is effectively a point scale which is incompatible with area (Gotway and Young, 2002). The entrainment threshold needs to be developed like the albedo-based approach so that it can be upscaled from a point to an area (as described in the following paragraph). (Raupach and Lu, 2004). Therefore, shadow from any source can be upscaled with sufficient measurements / estimates to any coarser pixel resolution. In this case, the shadow is then calibrated with total wind friction velocity (u_star) to determine the area-weighted, coarse resolution total wind friction velocity (u_star). This ability cut across scales is one of the key benefits of the albedobased approach (Chappell and Webb, 2016).

Shadow (1-albedo) scales linearly over area in contrast to the total wind friction velocity (u_star) which is non-linear
* L116-117 If the calculation of dust emission flux from streamwise sediment flux depends on %clay, then why a fixed clay content is used in the TEM (L537), but a spatially variable clay content is used in the AEM (L586-587). This, again, is an uneven comparison, which is unjustified.

In a previous manuscript, we used variable clay content in both AEM and TEM. Reviewers would not accept that variable clay was an accurate representation of the original TEM.
Then in L200, you claim that a soil clay content map was used with both models. In the appendix again, it is claimed that soil clay content was fixed in the TEM. Please explain.
A soil clay content map was used in both models. In the AEM, the soil clay content map varied over space (SoilGrids) and in the TEM a map used fixed values.
* L120-121 Marticorena and Bergametti (1995) use the adjustment of dust emission according to the bare soil fraction in addition to a drag partition scheme. In the implementation of a TEM in the present study, this adjustment is used alone. Why is no drag partition applied, in combination with dynamic surface roughness as for the AEM? This would provide much better insight into the performance of the albedo-roughness approach.
In previous manuscripts we have tried to do exactly as the reviewer suggests here. However, those previous results were not accepted because previous reviewers would not accept that was an accurate representation of the original TEM. Consequently, we implement the traditional dust emission scheme described in our manuscript Appendix.

* L130 Whether E includes brown vegetation depends on what data is used to define it.
Brown vegetation cannot be described using vegetation indices which are designed to measure 'greenness'. In our manuscript we are referring to a common way of representing vegetation in dust emission models using Normalized Difference Vegetation Index (NDVI). Consequently, the use of NDVI cannot represent brown rough vegetation.
* L131 This sentence is lacking foundation and in my humble opinion also dispassion.
We think the reviewer is referring to the end of that line 131 where the sentence begins "This crude model representation of process…". We think the statement is accurate, but we are happy to rephrase to avoid causing offense. We are passionate about our science.
* L135 Not clear which pre-tuning is meant.
In the response to the same point above, we described above that models which fix the aerodynamic roughness length (z0) across all barren regions need to set a value. That parameter is typically set to a value which ensured that the dust emission from North Africa is large consistent with dust in the atmosphere above North Africa. Models tended to under-estimate dust from North Africa because of the reduced wind speeds and the soil having about half the clay content. In this respect some TEMs are pre-tuned before they are tuned to dust in the atmosphere.
* L140-142 The authors claim that using AOD for evaluation or calibration of a dust model includes the assumptions that 1. dust in the atmosphere represents the dust emission process, and that 2. the spatial variation of magnitude and frequency of modeled dust emission is correct. This is incorrect. First, for model evaluation/comparison, observed AOD (better DOD, dust optical depth) is compared with modeled AOD/DOD, and not with modeled emissions.
At these lines in our manuscript we describe the intrinsic comparison being made between observed AOD and modelled AOD/DOD: modelled dust emission is compared to dust in the atmosphere which assumes that dust in the atmosphere represents the dust emission process.
Second, the goal of comparing modeled with observed atmospheric dust fields is to determine how well the observed fields can be reproduced with the model.
The problem with comparing dust emission to dust in the atmosphere, is that any weaknesses in the dust emission model are avoided in favour of reproducing dust in the atmosphere. That preference to dust in the atmosphere may appear adequate, but it assumes that the model produces the correct magnitude and frequency of dust emission as we described (line 141). We explained in previous comments above, how aerodynamic roughness length fixed over space and static over time, has been used to maximise dust emission in North Africa to match dust in the atmosphere. "However, we know a priori that dust in the atmosphere is only partially related to dust emission because dust concentration is controlled by dust emission magnitude and frequency which varies over space and time, by residence time of dust near the surface which itself is dependent on wind speed, and on dust deposition in the dust source region, a size dependent process." (Lines 142-145).
Indirectly, but not unambiguously, this also sheds light on modeled dust emissions. No direct observations of dust emission or surface dust concentration are available on a global scale; hence this indirect evaluation is made.

We have shown in our recent publication (Hennen et al., 2021) and used in this manuscript, direct satellite observations of dust emission point sources (DPS) data.
If emissions were assumed to be correct, model evaluation would only test dust transport and deposition processes and their parameterizations, which is not the case.

We are not sure what point is being made here.
* L150 Do I understand right that DOD was obtained from AOD by selecting pixels which coincided with a DPS? Over North America, I expect that even at DPS locations, this DOD contains a significant contribution of other aerosol, leading to a larger value and therefore higher frequency of DOD > 0.2 than from dust alone.

DOD uses a specific Deep Blue algorithm, which utilizes singlescattering albedo, AOD and the reflectance properties from specific spectral bands (412, 470, and 670 nm) to differentiate the mixing ratio of dust and various other atmospheric aerosols (e.g., smoke) (Hsu et al., 2013).
However, as DOD is a measurement of atmospheric dust, it is entirely possible that the DOD frequencies are affected by transported dust aerosols from upwind dust sources. This is another reason for evaluating dust emission models against satellite observed dust emission point source (DPS) data.
* L155 I would see it the other way round: The correct probability of occurrence of (any) sediment flux depends on the correct (magnitude and) frequency of dust emission.
We think that difference in perspective amounts to the same outcome.
* L160 How are these assumptions circumvented? Do you mean that you evaluate the frequency of emission instead of the magnitude?

Since the 1960s (Wolman and Miller, 1960) it has been established that the magnitude of sediment flux (in fluvial and aeolian systems) is controlled by the fluid friction velocity. That magnitude is adjusted by the probability of occurrence which depends on whether the entrainment threshold is exceeded.
A key and long-standing limitation with dust emission models is that they assume an infinite supply of loose, dry, erodible, and available sediment. Consequently, whenever and wherever the entrainment threshold is exceeded sediment flux and dust emission occurs. Hence, the probability of occurrence is central to calculating the correct dust emission. In the absence of any way to constrain this assumption, we circumvent the need for this assumption by using satellite observed dust emission point source (DPS) data which describes when and where dust emission occurs (without the need for the assumption). We can explain this more fully in the revised manuscript.
* L168 How is the aggregation performed? Are the frequencies aggregated or the original data? What is meant by "normalizing the results to the lowest resolution data"?
The DPS data are aggregated per grid box, per day. Therefore, in each grid box, if any of the identified dust sources recognizes an event, either modelled or observed, then that grid box is described as active for that day (FoO =1). Each grid box has a maximum of 1 FoO per day, i.e., if more than one DPS event occurs within a single grid box on a single day, then the grid box FoO is normalized to 1. This is the case for both observed frequencies and modelled frequencies.
* L169 Please provide more detail about how the DPS observations have been obtained.

The DPS data are made available from previously published studies and those extant papers provide detail on how they have been obtained. Our recently published paper cited in this manuscript describes the way in which the DPS observations have been obtained. In any case, we can provide additional limited information in the main text of the revised manuscript or more fully in an extension to the manuscript Appendix.
* L197 Do I understand correctly that the us*/u10 from albedo is completely decoupled from the atmospheric model and that, to calculate dust emission, you calculate us* by multiplying us*/u10 with the ERA5 winds, which correspond to a totally different modeled u* calculated with a different roughness? If so, I very much wonder about consistency of the obtained us* with both, the original albedo-approach and the atmospheric model. If the us*/u10 does not need ancillary data to calculate dust emissions, then -if used in an ESM -I wonder about consistency with the model winds responsible for dust transport after emission.
Modelled wind fields are themselves decoupled from the actual varying in space and time land surface. For example, we use here ERA5-Land data which we have shown recently to be one of the wind fields which most closely represents wind speed measurements (Fan et al., 2021). ERA5-Land wind fields take planetary boundary layer winds down to a blending height of 40 m before using roughness factors (not aerodynamic roughness length) for each land cover type to extrapolate wind to 10 m height. We use shadow to produce a calibrated soil surface wind friction velocity normalized by wind speed (u_s_star/Uh) which we multiply by ERA5-Land wind speed (at 10 m height) to retrieve the soil surface wind friction. There are many weaknesses / inconsistencies in dust emission modelling which include combining incompatible spatial scales of wind, roughness, entrainment threshold and avoid assuming an infinite supply of sediment. First we need to ensure that these weaknesses are evident by evaluating dust emission model simulations with satellite observed dust emission point source (DPS) data.
It is also mentioned that the albedo-based us*/u10 from polar-orbiting MODIS data has incomplete coverage. I believe you argued earlier that for this same reason, the R ratio for use in TEMs cannot be estimated accurately. So this applies also to the albedo-based us*/u10?
We think the reviewer is referring to Line 80 "The substantive issues for dust emission modelling are that the incomplete form of QTEM (Eq. 2) has been widely adopted in TEMs in which large area estimates of wind speed are typically used, the correct values of R are not known (for every pixel and every time step) …" As we described previously the correct values of R varying over space and time are not known because aerodynamic roughness length is attributed to land cover which is homogeneous over space and fixed over time.
* L214-215 If I understand well, wind speeds are obtained from ERA5 and in the AEM combined with the albedo-based us*/u10 ratio. Do I also understand well, that the AEM obtains one us*/u10 value each day from which dust emission is estimated, combined with the hourly ERA5 input? How are the different temporal resolutions treated?

In the manuscript (Lines 322-323) we describe how the albedo-based dust emission model (AEM) uses the daily maximum of all hourly wind speed data from the ERA5-Land data.
What is the motivation to select a narrow wind speed range between 8.5 and 9.5 m/s? To emphasise the difference in the modelling approaches we selected that narrow range of wind speeds (8.5-9.5 m/s). We identified that narrow range of wind speeds in the experiment (symbols Fig 2a) where the soil surface wind friction velocity was varied to show the response of dust emission. The albedobased dust emission model (AEM) varied along the same curve and produced a large difference for the changed roughness (for the same wind speed). A representation of the traditional dust emission model (TEM) varied along different curves and produced approximately the same dust emission regardless of the changed roughness (for the same wind speed). In other words, fixing over space and holding static over time the aerodynamic roughness length (and hence making R constant) causes dust emission to occur regardless of roughness when wind is sufficient to overcome the entrainment threshold. When roughness varies realistically over space and time it attenuates the wind speed and dust emission responds accordingly. This model behaviour is illustrated spatially using the map in Figure 5.
In short, much less adjustment of magnitude, frequency and spatial extent of dust is required if the soil wind friction velocity is allowed to vary over space and time.
* L224-232 This paragraph seems redundant as there is no noteworthy drag partition used in the TEM as described here, hence (vegetation) roughness is not considered, but only surface coverage. The same applies to the discussion in L254-256.
We stated in that section of text that the reason for the comparison with a restricted drag partition is that is how some traditional dust emissions model have been implemented (as also described in the Appendix). We recognize that some dust emission models use a geometric relation between leaf area index (or a vegetation index) and the drag partition. Whilst this approach may describe change in the drag partition depending on changed planform geometry, that approach does not change with wind speed. Consequently, that geometric drag partition is not aerodynamic (Raupach, 1992;Raupach et al., 1993). As we have shown here (Fig. 2 & Fig. 4), the significance for dust emission models of using this approach is that dust emission is larger and more extensive than when including aerodynamics.
* L238-239 I agree that the interplay of friction velocity and roughness is critical for dust emission, but this can be easily implemented also for the TEM. One option is described in the appendix, but not used.
We agree that changes can be made to the TEMs. We think one of the main challenges for dust emission modelling is to recognize the need for change in the TEMs. We think that the main mechanism for identifying the need for change is the routine evaluation of dust emission modelling with satellite observed dust emission point source (DPS) data.
* Fig. 2 What does "changed" mean in the axis titles? Does this refer to a difference or normalization? I also assume that uf should be u10 in the x-axis.
The normalized soil surface wind friction velocity is the soil wind friction velocity divided by the wind speed as shown by the term in brackets (u_s_star / Uf). In this case, Uf is correct since this is the calibrated value and applicable to any freestream (f) height which is unaffected by the land surface roughness. The word 'changed' indicates how much dust emission changes when normalized soil surface wind friction velocity changes within the known range (0-0.04 m/s) * L295 I am impressed that despite the severe limitations in the presented TEM implementation, it gives a similar (actually higher) R^2 than the AEM when compared with DPS.
The larger correlation coefficient between the TEM and the satellite observed dust emission point source (DPS) data is created by the 'pre-tuning' of the aerodynamic roughness length used to fix the drag partition. If that value were perturbed the correlation would change. Furthermore, the correlation is with available DPS data which predominantly occur in relatively sparsely vegetated conditions for which the TEM is well suited given its 'pre-tuned' value for barren land.
* L316 The pattern similarity between TEM and u10 is most likely a direct result of how u* was calculated and of not including roughness in the TEM implementation.
Yes, that is exactly our point. Recall from descriptions above, that even when the drag partition of a pixel is approximated by the geometric relation with the leaf area index (or a vegetation index) it is not wind speed dependent and therefore not aerodynamic. Consequently, it will respond in the same way as having a fixed roughness unless the geometric relation causes a change.
* L322 Do the daily maxima used in both models refer to wind speed data?
Yes, thanks for the query. We will include that clarification in the revised manuscript.
* L355 Please provide a reference for this statement. This is our perspective which is why there is no citation. We will clarify that in the revised manuscript.

Thanks for the suggestion. We will include on Fig 4 the RMSE values for each of the relations.
* L362-364 I do not see how the use of a fixed z0 can be called a tuning of the TEM. Most importantly, the calculation of us*/u10 is also calibrated/tuned, so there is no difference between AEM and TEM in that regard.
We have explained in our responses (above) that the value of z0 may be set to produce a sufficiently large amount of dust emission to match that of measured dust in the atmosphere in North Africa. This is done before the dust model is calibrated.
The shadow (1-albedo) retrieved from MODIS is calibrated to the soil surface wind friction velocity. It is not calibrated to produce a particular magnitude or frequency of dust emission.
* Table 1 The left and center columns contain a large amount of overlap. Conceptually, it is not clear to me why u*ts at the grain scale should be inconsistent with the albedoapproach unless us* is not correctly retrieved. This may be related to resolution as indicated by the authors, a problem similar to the model calculation of u*.
We have explained above in response to a similar query that the entrainment threshold is at the grain scale because it is derived using wind tunnel measurements. It is important to recognize that this grain scale does not represent an area and is sufficiently relatively small to be considered at the point scale. Reflectance is measured over an area. Consequently, the calibration of shadow (1-albedo) to wind tunnel measurements of soil surface wind friction velocity provides an area-weighted estimate which when albedo is scaled linearly overcomes the non-linearity of scaling the wind friction velocity. As the albedo-based dust emission model (AEM) stands the point scale entrainment threshold is inconsistent with the new area-weighted albedobased wind friction velocity as described in Table 1.
At the same time, the albedo-approach is claimed to be scale-invariant which appears contradictory.
We have described in the original derivation of the albedo-based approach the potential for combining incompatible spatial scales (Gotway and Young, 2002;Chappell and Webb, 2016). We have also recently demonstrated how the albedo-based approach is scale invariant using ground-based albedometers and MODIS albedo .
The authors also suggest that modeled u10 may be too large. While this can be the case, models typically underestimate strong winds, in particular with decreasing resolution. (Figure 4) demonstrates that there are too many occurrences when the wind friction velocity exceeds the entrainment threshold. Assuming that the wind friction velocity is well constrained using the albedo-based (calibrated) approach, too many occurrences are caused by wind speeds being too large and / or entrainment threshold being too small. We agree with the reviewer that model wind fields e.g., ERA5-Land at 11 km may under-estimate fine-scale wind gusts e.g., low-level jets. Since the wind friction velocity can be calculated at any area-weighted resolution we can test this and other inconsistencies in scale. However, we first need the community to recognize that these weaknesses / inconsistencies in the dust emission modelling have endured because dust emission models are not being compared against DPS data.

The comparison in our manuscript between dust emission model and satellite observed dust emission point source (DPS) data
In line 4 of the table, it is noted that DPS may not include all dust emissions. Shouldn't this also be a reason why the modeled dust emission frequency is higher?
In the fifth row of Table 1 we describe how some dust emission may not be included in the satellite observations described. That description could explain why the modelled dust emission frequency is larger than the DPS data. However, we think that the first order explanation is around scale difference and the assumed infinite supply of sediment.
Finally, in the last line of the table, the calibration (tuning) of the albedo-approach is questions. Unfortunately, the assessment of this issue in the center column is not clear. It is also not clear why research on the applicability of the approach for a range of conditions is of low priority. This should be first priority when proposing a new parameterization.
We think that the use of shadow to establish the calibration with wind friction velocity is the first-order explanation of variability in dust emission modelling. There is some uncertainty in the function fitted to that calibration (cf. RMSE in Chappell & Webb, 2016). However, consistent with our previous point we think that the first order improvement of dust emission model is around scale difference and the assumed infinite supply of sediment. Therefore, we have given these points in the table low priority. In the revised manuscript we will make this point clearer.
* L393 It seems that the authors consider the dependence of dust emissions in the TEM on u10 (or better u*) negative and uncertainty arising from it more problematic than the fact that the albedo-based us*/u10 ratio was calibrated only against a data set covering a very limited range of conditions. The sentence prior to line 393 provides the context for our description "These contrasting estimates emphasise TEM dependency on variability in U10, due to the use of u*3 and the inability of R(z0)=0.91 fixed over space and time to correctly attenuate wind speeds by aerodynamic roughness". A dust emission model is dependent on the wind friction velocity which in turn must be dependent on both wind speed and aerodynamic roughness to a lesser or greater extent controlled by extrinsic factors. We have used the best available data to calibrate shadow (1-albedo) to wind friction velocity (Marshall, 1971) and which has been examined and tested extensively in the aeolian research community (Raupach et al., 1993). These data cover the entire range of conditions under which wind friction velocity causes dust emission. We are happy to discuss this further if the reviewer would like to clarify what they mean by "very limited". * L431 While the works from Marticorena and Bergametti (1995) and Shao et al. (1996) have certainly been major advances, there have been many additional advances since. Generally and in contrast to what is described in this paragraph, the importance of vegetation dynamics for dust emission has been well recognized for a long time and has been implemented in several global models/ESMs, also for climate simulations (see also previous comments).
The manuscript text at lines 431-433 is not about vegetation dynamics: "There is also a great risk that the major scientific advances made in developing dust emission schemes (Marticorena and Bergametti, 1995;Shao et al., 1996) and newly developed data / parameterizations (Prigent et al., 2012) are being overlooked by an over-reliance on simplistic assumptions about dust source location and erodibility to implement dust emission models".
There is no mention of vegetation dynamics in that section of text because the broad point we are making is that major developments in dust emission modelling are being overlooked by comparing dust emission models to dust in the atmosphere. * L480 I presume you are considering a neutral, and not stable, wind profile.
Consistent with other dust emission models we are assuming neutral buoyancy and a well-mixed. In the revised manuscript we will improve this description.
* L490 Please include reference for u*ts.
Thanks. It is encompassed by the earlier citation (Marticorena and Bergametti, 1995). In the revised manuscript we will repeat that citation at the description of the entrainment threshold.
* L516 Please include reference for R.
Thanks, we will include the omitted citation in the revised manuscript.
* L517 Input data to calculate R are not absent. You have already referenced, for example, Prigent et al. (2012).
The sentence states "…the absence of regional and global spatio-temporal dynamics of R…." Those data cited by the reviewer do not vary over time (Prigent et al., 2012;Prigent et al., 2005).

* L527 Which previous work?
Thanks for the query. In the revised manuscript we will list again all the citations included in the earlier sentences.
* L530 Which challenge to estimate R are you referring to?
Thanks for the query. In the revised manuscript we will replace the phrase "a challenge" with "challenging" to improve the clarity of the sentence.