Articles | Volume 26, issue 3
https://doi.org/10.5194/nhess-26-1603-2026
© Author(s) 2026. This work is distributed under the Creative Commons Attribution 4.0 License.
Predicting thunderstorm risk probability at very short time range using deep learning
Download
- Final revised paper (published on 31 Mar 2026)
- Preprint (discussion started on 23 Jul 2025)
Interactive discussion
Status: closed
Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor
| : Report abuse
-
RC1: 'Comment on egusphere-2025-2893', Anonymous Referee #1, 02 Oct 2025
- AC1: 'Reply on RC1', Mélanie Bosc, 06 Nov 2025
-
RC2: 'Comment on egusphere-2025-2893', Anonymous Referee #2, 19 Nov 2025
- AC2: 'Reply on RC2', Mélanie Bosc, 02 Dec 2025
Peer review completion
AR – Author's response | RR – Referee report | ED – Editor decision | EF – Editorial file upload
ED: Reconsider after major revisions (further review by editor and referees) (05 Dec 2025) by Ricardo Trigo
AR by Mélanie Bosc on behalf of the Authors (17 Dec 2025)
Author's response
Author's tracked changes
Manuscript
ED: Referee Nomination & Report Request started (02 Jan 2026) by Ricardo Trigo
RR by Anonymous Referee #1 (02 Jan 2026)
RR by Anonymous Referee #2 (03 Feb 2026)
ED: Publish as is (17 Feb 2026) by Ricardo Trigo
AR by Mélanie Bosc on behalf of the Authors (20 Feb 2026)
Review of “Predicting thunderstorm risk probability at very short time range using deep learning”
The preprint proposes a deep learning methodology for very short-term (5-60 minutes) probabilistic forecasting of lightning risk, motivated by aviation safety within the ALBATROS project. It adapts the ED-DRAP neural network, incorporating spatio-temporal sequences from satellite (GOES-16 ABI brightness temperature and GLM lightning groups) and NWP (GFS lifted index and relative humidity) data over a region centered on the Gulf of Mexico and Florida. A key focus is on achieving well-calibrated outputs through a combined cross-entropy and Dice loss function, enabling interpretable risk probability maps. Results report F1 scores of 0.65 at 5 minutes and 0.5 at 30 minutes, with ECE below 10%. In general, the manuscript is well-structured, the approach is innovative in emphasizing calibration for probabilistic lightning nowcasting without radar data, and the topic is highly relevant for natural hazards research, particularly in aviation and thunderstorm impacts. However, I have some concerns regarding the scope, comparisons, and generalizability.
My main concern is the limited scope and potential lack of generalizability of the dataset and results. The data is restricted to winter mornings (00:00-05:00 UTC, December-February) from 2020-2023, covering only 154 days with a balanced split of stormy and non-stormy periods. While this controls for variability, it may not capture seasonal, diurnal, or regional differences in thunderstorm dynamics (e.g., summer afternoons or other global hotspots). The study area is narrowed to a subset of CONUS, but no sensitivity analysis is provided for other regions. A discussion on how these choices affect broader applicability, perhaps with preliminary tests on extended data, would strengthen the contribution.
My second concern is the benchmarking and novelty assessment. The model is compared to ConvLSTM, PredRNN, persistence, and U-Net, showing superior F1 and calibration scores. However, it lacks direct comparison to recent lightning-specific DL models from the literature, such as those in Brodehl et al. (2022), Geng et al. (2021), or Leinonen et al. (2023), which also use satellite/radar data for nowcasting. While the intentional exclusion of radar data is well-justified for enhancing applicability to aircraft flight paths where radar coverage may be limited or absent, discussing how the proposed method might compare to radar-inclusive baselines would better contextualize its advantages and limitations.
Other comments
L90-95: Clarify why the smaller area (red rectangle in Fig. 1) was chosen beyond computation cost, does it represent typical thunderstorm regimes?
Fig. 2: Add coordinate axes (latitude/longitude) to subfigure (b) to match (a) for consistency and better spatial context.
L164-165: The effective training/testing area is further cropped to 256x256 pixels (17.3°N–37.7°N, 93°W–72°W) from the subselected red rectangle; consider adding this cropped boundary as an inner rectangle in Fig. 1 for clarity.
L175-180: The input sequence (6 timesteps) is justified by a comparative study, but I suggest including a table or figure summarizing F1 scores for 2/4/6/8 timesteps to support this.
L305-310: The example in Fig. 9 misses only 5% lightning, but it’s not clear which threshold is used in this case.