Reply on RC1

C1: Since it is a forward uncertainty analysis the results depend completely on the prior assumptions made by the authors, given very limited information about prior eruptions, in terms of type, magnitude and impacts. While the authors give some background to their choice of prior distributions the subjectivity inherent in such choices is rather glossed over. Indeed, it is in some cases presented in a rather pseudo-scientific way to appear more persuasive – for example the mention of the Akaike information criterion to choose between distributional forms in representing the distribution of total erupted volumes when the primary influence has to be subjective choice of probabilities for the different types of eruption and their range of magnitudes.

It is therefore somewhat ironic that the authors suggest that their analysis is unbiased (L353). This is clearly not the case, it is biased by all their prior assumptions. The results are the logical consequence of those assumptions (given enough samples) so it is not clear what they mean by unbiased (probably best to avoid it).
We thank the reviewer for this comment that gives us the opportunity to make some basic points of our work clearer.
Firstly, we did not mean at all to be "persuasive" (not to mention "pseudoscientific", which in these days might indeed be taken as offensive when used by a scientist to describe other scientists' work). Rather we used the Akaike Information Criterion for exactly its goal, that is: objectively select, among six different possible statistical models (the 6 we used are among the most common ones to describe a continuous random variable), the one reaching the best balance between model simplicity and goodness-of-fit to the available data.
That said, it is true that we may have overstating by saying our analysis is unbiased. Following the reviewer's comment, we are going to smooth that part in the text, acknowledging that there is a degree in subjectivity in the selection of the types of PDFs used to model the eruptive size (we used 6 different ones as mentioned above, and chose the best among those by AIC) and those used to describe the variability in Eruption Source Parameters (ESPs).
In any case, in the manuscript we are going to keep the message that our analysis is "less biased" than what is commonly done in other hazard assessments in volcanology (and specifically tephra dispersal hazard assessment), where usually a specific sharp scenario is assumed (no uncertainty on the ESPs, so-called One Eruption Scenario in Bonadonna, 2006) or, less frequently, only the variability within a specific scenario is explored (Eruption Range Scenario in Bonadonna, 2006). The latter case is exactly the one used in the paper by Prata et al 2019, pointed by the reviewer: in that study, the authors assumed "The eruption plume height was set to 15,000 masl with a duration of 16 hr to ensure significant ash dispersal at cruise altitude (approximately 10 km)" and then explored 600 simulations by sampling the variability with a uniform distribution ("All model parameters considered in the LHS analysis were sampled from a uniform distribution. By sampling from a uniform distribution, it is assumed that all values between their specified ranges are equally likely.", sic). Further, the ranges over which the uniform distributions were sampled were based on the available literature "The ranges selected for each ESP were made based on typical ranges reported in the literature", sic).
In this view, we believe that our approach is "similarly biased" to the type of work by Prata et al (2019) by assuming ranges on the ESPs from the literature, and "less biased" than that in the sense that we fully explore the eruptive size variability (by simulating all the scenarios producing tephra, not only a specific one), and take into account their relative frequency as it appears from the geological and historical record at Jan Mayen.
C2: The other major conceptual issue with the paper is the value of such a forward uncertainty analysis to inferences about impacts on air traffic. The results presented in the paper are integrated over the whole distribution of events simulated. They therefore clearly do not apply to any single event, and will not apply to the next event that might actually cause disruption. So certainly the areas indicated as at risk in the maps presented could be affected, and the maps could be used to guide an initial response of areas to avoid lacking further information. But application of such an approach to the next event would require Bayesian updating given information about that event (see e.g. Harvey and Dacre, 2016). Initially this might perhaps be to identify a sub-set of "similar" simulations but later perhaps using more explicit data assimilation -and about the specific meteorological conditions that might be quite different to those associated with a "similar" event run with reanalysis data, particularly in respect of wind fields relative to the integral of reanalysis sampling. I think this should be discussed further in the paper.
We thank the reviewer as this comment is very important for us because it demonstrates that our message on the main purpose of our analysis is not clear enough to the reader. Here we get the chance to clarify the general approach adopted in volcanology regarding hazard assessment, and the text in the manuscript will be changed accordingly.
In volcanology, one of the main challenge is actually to know which will be the next event and, in the case of poorly characterized volcanoes, as Jan Mayen, it is very difficult to estimate the absolute probability associated to each potential scenario. So, it's difficult for us to understand what the reviewer means with "will not apply to the next event". As explained in the text, our goal is to produce a long term hazard assessment based on the information available from the past eruptive history of the volcano. Such a product is surely useful for long term planning.
When a volcano begins showing some restless conditions we might ask ourselves "which will be the impact of the next eruption?" but more data (mainly from the monitoring) would be indicative of the potential expected eruption. For example the location of the potential vent, the volume of intruded magma, the presence of water bodies in the area of intrusion, might suggest which eruptive scenario is more likely than others. When eventually an eruption starts the amount of data available and the direct observations of the ongoing phenomena will be the primary input to dispersal models to answer a question like "what will be the impact of the current eruption, given the current meteorological conditions and the observed plume height?". In this sense the first case is what we call longterm hazard assessment, the second is a short-term hazard assessment, that eventually would become a more deterministic forecasting approach in the third case.
As we mentioned above, this paper aims to perform a long-term hazard assessment for Jan Mayen volcano (as stated at line 6-8 of the Abstract), which is currently in a repose phase, but is considered active and an eruption in the future cannot be excluded. For this reason in our study, the application of the dispersal model has been done by using primarily information on what this volcano featured in the past (in terms of size of eruption, volumes, plume heights, grain size distribution), as no more constraints on the next eruptive scenario can be done at this stage. The results presented here are indicative of the potential impact of an eruption in Jan Mayen in light of what is known about its previous eruptive activity. The maps produced could be used for land-use planning, definition of mitigation actions, identification of vulnerable infrastructure, but not for emergency response, which is out of the scope of our work. Whenever an eruption will start at Jan Mayen (as at any other volcano) the direct observations, current meteorological data will be used as long as data assimilation procedures and probabilistic forecasting products.
We hope we clarified this point and we will revise the text in this sense, however, in order to avoid misunderstandings, we decided to change the title of the manuscript as: "Long-term hazard assessment of explosive eruptions at Jan Mayen Island (Norway) and implications on air-traffic in the North Atlantic" C3:T here is one technical point that I did not follow. On L377 it is stated that "The probability of each combination is weighted in accord with the associated magnitude". It is not clear why this weighting should be applied. You have already sampled from the distribution of magnitudes which gives the (assumed) probability of such an event occurring, so why weight by magnitude?
We would like to thank reviewer for this comment. When sampling from a fixed probability distribution of magnitudes, the combinations are inherently being weighted by the associated magnitude (or assumed probability). This sentence has been removed in the new version of the manuscript, since in section 3.2, line 140, we describe the sampling process and the combination of parameters clearly. In addition, we have modified the manuscript (line 361) and emphasized that: "The total erupted volume, expressed as DRE is computed uniformly within a range of values. In a second step, we weight each total eruption volume based on the Weibull distribution function previously defined (Figure 2). Doing that the unlikely events are properly represented." Some minor points C4:Section 3.4 Location differences -is this a matter of resolution and limitations of the reanalysis rather than lack of real differences (what does such a wind profile really mean at reanalysis scales??).
We thank the reviewer for this comment. For the sake of clarity, we have modified the last sentence of this section. In the new version, we highlight that given the current limitations in the resolution of both the grid and meteorological data, the wind profiles analysis carried out over the different points of the Jan Mayen island (NE-SW) do not show significant differences. Therefore, we conclude that the location of the vent is a marginal parameter in our dispersion problem, since such location will not affect the final results. However, the location of the vent is a high impact parameter if other analyzes are performed. The sentence has been corrected as: "Considering the current limitation of both grid resolution and meteorological data resolution, the location of potential JM vents does not influence the ash dispersal pattern. As a result, we will not consider the uncertainty on the vent location and assume a fixed vent at the middle of the island."

C5: L85. Delete "occurred"
We thank the reviewer for this suggestion. The correction has been addressed.
C6: L228. "Predictions made without uncertainty quantification (UQ) are usually not trustworthy and inaccurate". I think this could be better worded. Normally for a forward uncertainty analysis the "best guess" prediction without uncertainty estimation would be within the modal range of the uncertainty analysis. Thus if it is not trustworthy and inaccurate so too are the equivalent ensemble members close to it … which is not really what you meant.
We agree with the reviewer, inaccurate is not the correct word. This sentence has been corrected as: "Predictions made without uncertainty quantification (UQ) do not quantify how constrained the prediction is, whereas ensembles members gives such an idea." C7: L276. "This is due to the fact that the height of the eruptive column for medium eruptive class eruptions does not exceed 11 km (see section 3.1) ." This a clear example of how inference depends on prior assumptions in a forward uncertainty analysis We thank the reviewer for this comment that gives us the opportunity to make some basic points of our work clearer. Our work addresses a comprehensive longterm Probabilistic Volcanic Hazard Assessment (PVHA) focused on the potential impact of airborne tephra concentration at different flight levels. To do that, we propose two differentiated types of eruptions depending on their magnitude, Medium and Large. In this sense, medium-sized eruptions have been defined to get volcanic plumes reaching up to 11 km height. Therefore, the analysis and the conclusions obtained are closely related to the characteristics of the modeled eruptive scenarios. However, within large eruption scenarios, the dispersion and concentration of tephra associated with plumes above 11 km are also addressed and, differently from the reviewer, we don't see any bias in that (columns higher than 11 km belong to the Large Magnitude scenario).
C8: L389. "This result, that we quantify at each point of the target domain, allows integrating hazard in quantitative risk analysis, through fragility curves. In this view, it represents the most complete way to quantify hazard. This could be discussed later but is not really relevant here since you do not apply any such fragility curves (or mention evaluating their uncertainties at additional computational expense….) The reviewer is correct here as we mention the possibility of extending the usage of our results towards a risk evaluation, without really doing it as it is beyond the scope of this manuscript. The sentence has been changed and now it is: "This result, that we calculate at each point of the target domain, could be eventually used as input for risk analysis like for producing fragility curves, tolerance analysis and in general investigation of impact on infrastructure. In this view, it represents the most complete way to quantify hazard." C9: L330. "Finally we want to highlight the robustness of our PVHA in terms of uncertainty quantification, that should be routinely considered in all this kind of studies". What do you mean by robust exactly? Why should your subjective prior assumptions be considered robust?
We thank the reviewer as this comment gives us the opportunity to clarify the idea behind the robustness concept. Based on the arguments given within the framework of unbiased/less biased issue at the beginning of this reply, we think that while our work keeps the message that our analysis is less biased than what is commonly done in other hazard assessments in volcanology where usually a specific sharp scenario is assumed or, only the variability within a specific scenario is addressed, we fully explore the eruptive size variability (by simulating all the scenarios producing tephra, not only a specific one), and take into account their relative frequency as it appears from the geological and historical record. As a result, we are considering a large number of potential scenarios, while quantifying the uncertainty associated with these ESPs. In this sense, we think that our method, in terms of UQ (uncertainty quantification) is really robust and should be routinely considered in all these kinds of studies.