the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Application of machine learning to forecast agricultural drought impacts for large scale sub-seasonal drought monitoring in Brazil
Abstract. Drought events have increased in frequency and severity in recent years, and result in significant economic losses. Although the Brazilian semi-arid northeast has been historically associated with the impacts of drought, drought is of national concern, from 2011–2019, drought events were recorded in all Brazilian territories. Droughts can have major consequences for agricultural production, which is of particular concern given the importance of soybeans for socio-economic development. Due to its regional heterogeneity, it is important to develop accurate drought forecast and assessment tools for Brazil. We explore machine learning as a method to forecast the vegetation health index (VHI), for large scale monthly drought monitoring across agricultural land in Brazil. Furthermore, we also determine spatio-temporal drivers of VHI across the wide variation in climates, as well as evaluate machine learning performance for ENSO variation, forecasting of the onset of drought impact, and how the trade off between spatial variation and sample size affects model performance. We show that machine learning methods such as gradient boosting methods are able to more easily forecast vegetation health in the north and north east Brazil than south Brazil, and perform better during La Niña events than El Niño events. Drought impacts which reduce VHI below the commonly used 40 % threshold can be forecast across Brazil with similar model performance. SPEI is shown to be a useful indicator of drought impact, with 3 month accumulation periods preferred over 1 and 2 months. Results aim to inform future developments in operational drought monitoring at the National Center for Monitoring and Early Warning of Natural Disasters in Brazil (CEMADEN). Future work should build upon methods discussed here to improve drought forecasts for agricultural drought response and adaptation.
- Preprint
(10531 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on nhess-2024-60', Anonymous Referee #1, 30 May 2024
General comments.-
The article tests multiple machine learning methods for the prediction of VHI (agricultural drought) in Brazil on land cultivated by maize and soybean. It uses as predictors climate variables, soil moisture, and drought indices such as SPI and SPEI at multiple time scales. The article lacks the use of substantial and proper references about the matter. The article does not state a clear definition of the objectives, which could lead to a better structure for the manuscript. Due to this, it was hard to follow and understand the storyline. Finally, the manuscript looks careless, with multiple typographical errors.
TITLE.
The word "impact" in the title induces me to think about the consequences of drought (socio-economic, human, etc.). The title should be improved.
INTRODUCTION
The introduction is not easy to follow; it doesn't have a clear scientific meaning, has vague sentences, and does not have a proper order of ideas. It needs some improvements to make it scientifically sound. For example, the paragraph talking about drought impacts on Brazil should be moved up before the paragraph talking about drought monitoring (L35). Also, it should be presented with some numeric figures of the real impact rather than solely indicate where it has impacted. Further, the paragraph describing previous works using machine learning lacks robustness; it should not only describe what types of methods have been used but also describe what results these works have had. The definition of the objectives is vague; the authors should go directly to the scientific aims of the work rather than deviate toward the potential benefits of the results or come back to defining the importance of the indices. Defining two or three clear objectives that will lead the work is preferred.
METHODS
It is needed to provide a better description and justification for the use of the crop grid dataset (Tang et. al, 2023) instead of just presenting the reference.
In the sentence (L130), it says, "... NDVI which is in turn converted to TCI, VCI, and the vegetation health index (VHI)," which implies that the TCI is derived from NDVI when in fact it is derived from the thermal bands.
Please add a table for a better comprehension of all the satellite data used in this work and its characteristics.
There is too much description of the data used, but it was not calculated in this work. I believe it should be good to reduce this and focus on what was made in this work, i.e., forecasting by machine learning. The acronym for the method used is not described here.
In the evaluation section, the authors describe classification methods for the onset of drought; this method should have been described earlier in the manuscript. The section on evaluation should have only the methodology for evaluation (e.g., metrics of performance).
I think it is important to somehow analyze the spatial cross-validation. But, as it is currently written, it is not clear to me what the purpose of the spatial clustering analysis is. It needs further clarification regarding its link with the rest of the methodology. Also, authors should not cite figures from results in the methods section. The cluster allows for the splitting of the data (training and testing), and it affects the regression (forecasting) and classification (onset of drought)¿?
RESULTS
There is an excessive number of figures that can be reduced. For example, Figs. 14, 15, and 16 could be reduced to one and perhaps a table summarizing the results.
The study area includes crops of soybean and maize; it would be good to know how the ML model performs per crop type.
The quality of Fig. 17 is poor.
CONCLUSION
The conclusion is too general and does not specifically state what the main results are. What variables were the best predictors for VHI? What machine-learning methods achieve the best performance? What is the main contribution to the drought research in the article?
Citation: https://doi.org/10.5194/nhess-2024-60-RC1 -
RC2: 'Comment on nhess-2024-60', Anonymous Referee #2, 26 Jun 2024
The authors explored machine learning as a method to forecast the vegetation health index (VHI), for large scale monthly drought monitoring across agricultural land in Brazil. Though the work is largely well-written, the article still needs significant improvement in the introduction, results and discussion, and conclusion sections. Considering my observations as follows, I suggest major revisions before considering it for publication.
- There are too many paragraphs in the introduction. I suggest to rewrite the introduction with 3 paragraphs, highlighting the basic content of the research field in the first paragraph. Then review the research progress of the literature in the second paragraph, and in the third paragraph, analyze the limitations of past research and clarify the innovation of your own research.
- The importance of ML compared to other methods such as statistical, probabilistic, and time series modeling for drought monitoring and drought forecasting is missing in the introduction section. I would suggest to add this in the introduction section.
- The description under section 2 (page 3; line 85, 90, and 95) is not so much important. Please delete these lines.
- Rewrite the study area highlighting the key geographic features, climate, and physiography of the study area. Please omit the first three line of the study area.
- The author used 1,2- and 3-month SPI. Why did the author not use the SPI 6? SPI 6 indicates the seasonality of agricultural drought.
- Why was only precipitation used as a predictor variable? Was it average precipitation or total precipitation? I think using only precipitation does not make any sense as the author used SPI and SPEI index, which is the form of precipitation-based drought index. In this regard, I would suggest adding precipitation anomaly index (PAI) as a predictor variable instead of only using precipitation.
- In case of the machine learning model what amount of data was used for training, validation and testing of the model? I mean, how was the model built? How was it calibrated? The most important parameters and the choice of values for the model were not explained sufficiently much more explanation needed.
- What does “SEA AV” mean for? What kind of model was it? What is the utility of using “SEA AV” model?
- Page-16 (line 300): Please close the first bracket for “(Figure 7”
- The author discussed only the forecasting performance of various machine learning models. But I did not see any forecasted results of VHI by the machine learning model, which performed better compared to other models. It is very important to add results of forecasted VHI by the best machine learning model.
- Conclusion can be improved by highlighting the innovation content of the paper, future research direction, and recommendation for policy formulation.
Citation: https://doi.org/10.5194/nhess-2024-60-RC2
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
254 | 55 | 13 | 322 | 10 | 12 |
- HTML: 254
- PDF: 55
- XML: 13
- Total: 322
- BibTeX: 10
- EndNote: 12
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1