Reply on RC4

The main objective of the spin-up is to assure that ash from the most recent phase of the eruption is present in the domain. Some first simulations starting from 13 May showed that starting from 0 concentrations degraded the scores. From 9 to 13 May, the emission was rather low and constant. Considering the size of the domain and the intensity of ash emission, 3 days is a reasonable time frame to allow realistic background ash concentrations in the domain for the model. WRF-Chem used longer spin-up (4th to 13th May, i.e., 9 day spin-up. Differences between WRF-Chem and the other models were not attributed to different spin-up lengths. An argument for justification of the spin-up length has been added to the manuscript.

The authors thank RC4 for his positive evaluation of the manuscript and for his insightful remarks.

The general purpose of spin-up phases is clear, although I thought about differences regarding their importance between meteorology models and externally driven LPDMs.
Could you please comment how you chose the spin-up phase of 3 days in the specific situation? Have you looked internally for any features/differences of the models during spin up evolvement?
The main objective of the spin-up is to assure that ash from the most recent phase of the eruption is present in the domain. Some first simulations starting from 13 th May showed that starting from 0 concentrations degraded the scores. From 9 th to 13 th May, the emission was rather low and constant. Considering the size of the domain and the intensity of ash emission, 3 days is a reasonable time frame to allow realistic background ash concentrations in the domain for the model. WRF-Chem used longer spin-up (4th to 13th May, i.e., 9 day spin-up. Differences between WRF-Chem and the other models were not attributed to different spin-up lengths. An argument for justification of the spin-up length has been added to the manuscript.
The a posteriori source term was generated by inversion of satellite observations using flexpart, at least partially driven also by ECMWF analyses. The real plume somehow connected the satellite observations and the other measurements used in the present study. The ATM is connected by at least one similar setup, whereas the a posteriori source term seems to me rather free and independent. Could you briefly give some explanation why to assume "validity" of the a posteriori source term is justified in light of this potential model self-consistency issue?
This comment raises many points, we presume that the two main aspects that deserve to be considered behind this issue are: 1/ potential self-consistency between FLEXPART used for source-term inversion and as a model in the study, 2/ potential self-consistency between ash load used for source-term inversion and for verification. 1/ We do not see a systematic better performance of FLEXPART a posteriori run compared to the other a posteriori runs. So there is no clear advantage of having used FLEXPART for the inversion. As noted in the manuscript, sharing the source term computed from one model to others provides good results.
2/ The ash load measurements used for inversion are based on IASI and SEVIRI and the retrievals use (Stohl et al., 2011) a look-up table approach in combination with a correction for atmospheric water vapor (based on Yu et al., 2011) and a prior detection scheme using threshold tests for the brightness temperature difference between 10.9μm and 12μm and an opacity test. The reference ash load data VACOS used in the manuscript is the only algorithm around based on neural networks combined with simulated ash observations. VACOS uses all TIR channels (6.2μm, 7.3μm, 8.7μm, 9.7μm, 10.8μm, 12μm, 13.4μm), compared to using only 10.9μm and 12μm in Stohl et al. (2011). On top of these, there are many other scientific differences between retrievals algorithms (training data, assumptions about ash properties, etc) that make the VACOS data retrievals independent from the data used

L38 although it is of course true that longer routes have enhanced environmental and climate
impact I cannot imagine that this consideration plays any significant role in the airline's decision on ad hoc rerouting. This is probably purely about safety / economic rational (including maintenance costs and passenger rights compensations).
The climate-related argument has been removed.
L52 reconsider the expression "perfect models" -that they cannot be reached in "near future" is not only highly probable, it is systematically certain. Perhaps "with sufficient accuracy" or similar expression for reliable correctness / precise plume representation.
The part of the sentence now states: "[…], it is highly probable that predictions with sufficient accuracy cannot be reached in a near future."

L103 ECMWF's vertical resolution of lower sigma-hybrid levels is surface pressure dependent, insert e.g. approximately/about/roughly
Due to the simplification of the description of models (see answer to RC2), this has been removed from the manuscript.