the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Application of machine learning to forecast agricultural drought impacts for large scale sub-seasonal drought monitoring in Brazil
Abstract. Drought events have increased in frequency and severity in recent years, and result in significant economic losses. Although the Brazilian semi-arid northeast has been historically associated with the impacts of drought, drought is of national concern, from 2011–2019, drought events were recorded in all Brazilian territories. Droughts can have major consequences for agricultural production, which is of particular concern given the importance of soybeans for socio-economic development. Due to its regional heterogeneity, it is important to develop accurate drought forecast and assessment tools for Brazil. We explore machine learning as a method to forecast the vegetation health index (VHI), for large scale monthly drought monitoring across agricultural land in Brazil. Furthermore, we also determine spatio-temporal drivers of VHI across the wide variation in climates, as well as evaluate machine learning performance for ENSO variation, forecasting of the onset of drought impact, and how the trade off between spatial variation and sample size affects model performance. We show that machine learning methods such as gradient boosting methods are able to more easily forecast vegetation health in the north and north east Brazil than south Brazil, and perform better during La Niña events than El Niño events. Drought impacts which reduce VHI below the commonly used 40 % threshold can be forecast across Brazil with similar model performance. SPEI is shown to be a useful indicator of drought impact, with 3 month accumulation periods preferred over 1 and 2 months. Results aim to inform future developments in operational drought monitoring at the National Center for Monitoring and Early Warning of Natural Disasters in Brazil (CEMADEN). Future work should build upon methods discussed here to improve drought forecasts for agricultural drought response and adaptation.
- Preprint
(10531 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on nhess-2024-60', Anonymous Referee #1, 30 May 2024
General comments.-
The article tests multiple machine learning methods for the prediction of VHI (agricultural drought) in Brazil on land cultivated by maize and soybean. It uses as predictors climate variables, soil moisture, and drought indices such as SPI and SPEI at multiple time scales. The article lacks the use of substantial and proper references about the matter. The article does not state a clear definition of the objectives, which could lead to a better structure for the manuscript. Due to this, it was hard to follow and understand the storyline. Finally, the manuscript looks careless, with multiple typographical errors.
TITLE.
The word "impact" in the title induces me to think about the consequences of drought (socio-economic, human, etc.). The title should be improved.
INTRODUCTION
The introduction is not easy to follow; it doesn't have a clear scientific meaning, has vague sentences, and does not have a proper order of ideas. It needs some improvements to make it scientifically sound. For example, the paragraph talking about drought impacts on Brazil should be moved up before the paragraph talking about drought monitoring (L35). Also, it should be presented with some numeric figures of the real impact rather than solely indicate where it has impacted. Further, the paragraph describing previous works using machine learning lacks robustness; it should not only describe what types of methods have been used but also describe what results these works have had. The definition of the objectives is vague; the authors should go directly to the scientific aims of the work rather than deviate toward the potential benefits of the results or come back to defining the importance of the indices. Defining two or three clear objectives that will lead the work is preferred.
METHODS
It is needed to provide a better description and justification for the use of the crop grid dataset (Tang et. al, 2023) instead of just presenting the reference.
In the sentence (L130), it says, "... NDVI which is in turn converted to TCI, VCI, and the vegetation health index (VHI)," which implies that the TCI is derived from NDVI when in fact it is derived from the thermal bands.
Please add a table for a better comprehension of all the satellite data used in this work and its characteristics.
There is too much description of the data used, but it was not calculated in this work. I believe it should be good to reduce this and focus on what was made in this work, i.e., forecasting by machine learning. The acronym for the method used is not described here.
In the evaluation section, the authors describe classification methods for the onset of drought; this method should have been described earlier in the manuscript. The section on evaluation should have only the methodology for evaluation (e.g., metrics of performance).
I think it is important to somehow analyze the spatial cross-validation. But, as it is currently written, it is not clear to me what the purpose of the spatial clustering analysis is. It needs further clarification regarding its link with the rest of the methodology. Also, authors should not cite figures from results in the methods section. The cluster allows for the splitting of the data (training and testing), and it affects the regression (forecasting) and classification (onset of drought)¿?
RESULTS
There is an excessive number of figures that can be reduced. For example, Figs. 14, 15, and 16 could be reduced to one and perhaps a table summarizing the results.
The study area includes crops of soybean and maize; it would be good to know how the ML model performs per crop type.
The quality of Fig. 17 is poor.
CONCLUSION
The conclusion is too general and does not specifically state what the main results are. What variables were the best predictors for VHI? What machine-learning methods achieve the best performance? What is the main contribution to the drought research in the article?
Citation: https://doi.org/10.5194/nhess-2024-60-RC1 -
AC2: 'Reply on RC1', Joseph Gallear, 26 Jul 2024
Thanks for the comments, we will use your insights to improve the manuscript. Please see below for a list of ways in which we will address your comments
General comments
“It uses as predictors climate variables, soil moisture, and drought indices such as SPI and SPEI at multiple time scales. The article lacks the use of substantial and proper references about the matter.”
We are incorporating a broader literature review into the introduction to improve the use of references in the paper.
“The article does not state a clear definition of the objectives, which could lead to a better structure for the manuscript. Due to this, it was hard to follow and understand the storyline.”
We are Including a new objectives section with 4 bullet point objectives to make clearer the aims of the work. The aims are then referenced throughout the manuscript and in the conclusions.
Title
“The word "impact" in the title induces me to think about the consequences of drought (socio-economic, human, etc.). The title should be improved.”
The title is being changed to emphasise the forecasting of agricultural drought rather than using the word impact. The new proposed title is “Evaluation of machine learning approaches for large scale agricultural drought forecasts to improve monitoring and preparedness in Brazil”
Introduction
“The introduction is not easy to follow; it doesn't have a clear scientific meaning, has vague sentences, and does not have a proper order of ideas. It needs some improvements to make it scientifically sound. For example, the paragraph talking about drought impacts on Brazil should be moved up before the paragraph talking about drought monitoring (L35).”
We are restructuring the introduction to improve the flow and readability as well as improving the breadth of references
“Also, it should be presented with some numeric figures of the real impact rather than solely indicate where it has impacted.”
We are including more numeric Figures to show the worldwide drought impacts and the importance of drought in the context of different natural hazards in the introduction.
“The paragraph describing previous works using machine learning lacks robustness; it should not only describe what types of methods have been used but also describe what results these works have had.”
This paragraph is being expanded to discuss results of several studies as well as mentioning the methods used.
“The definition of the objectives is vague; the authors should go directly to the scientific aims of the work rather than deviate toward the potential benefits of the results or come back to defining the importance of the indices. Defining two or three clear objectives that will lead the work is preferred.”
We are Including a new objectives section with 4 bullet point objectives to make clearer the aims of the work
Methods
“It is needed to provide a better description and justification for the use of the crop grid dataset (Tang et. al, 2023) instead of just presenting the reference.”
The crop grids dataset is the most up to date dataset found at the time of writing which included substantial coverage of maize and soybean growing areas across Brazil. We are adding this justification to the methods section.
“In the sentence (L130), it says, "... NDVI which is in turn converted to TCI, VCI, and the vegetation health index (VHI)," which implies that the TCI is derived from NDVI when in fact it is derived from the thermal bands.”
We are rewording the sentence on L130 to clear up any misunderstanding on how TCI, VHI or VCI is calculated.
“Please add a table for a better comprehension of all the satellite data used in this work and its characteristics.”
Table 3.1 contains a description of all the satellite data used. This will be expanded with additional information
“There is too much description of the data used, but it was not calculated in this work. I believe it should be good to reduce this and focus on what was made in this work, i.e., forecasting by machine learning. The acronym for the method used is not described here.”
We are reducing the description of how VHI and RZSM is calculated and processed from satellite data.
“In the evaluation section, the authors describe classification methods for the onset of drought; this method should have been described earlier in the manuscript. The section on evaluation should have only the methodology for evaluation (e.g., metrics of performance).”
Currently the method of drought onset forecasting is described in section 3.6. A new subsection is being added specifically on the drought onset forecasting below the 40% threshold. This is also being prefaced better in the objectives section.
“I think it is important to somehow analyze the spatial cross-validation. But, as it is currently written, it is not clear to me what the purpose of the spatial clustering analysis is. It needs further clarification regarding its link with the rest of the methodology. Also, authors should not cite figures from results in the methods section. The cluster allows for the splitting of the data (training and testing), and it affects the regression (forecasting) and classification (onset of drought)?”
The purpose of the spatial clustering is to determine how many spatial subdivisions of the data are required to produce the best model. This will determine how much data is required to produce the best model and the magnitude of spatial variation in VHI/climate which is optimum for the model to produce the best results. Currently, the description of the rationale for this method is in section 3.7. However, this section will be expanded upon and will be mentioned in the objectives section to preface the discussion more.
Results
"There is an excessive number of figures that can be reduced. For example, Figs. 14, 15, and 16 could be reduced to one and perhaps a table summarizing the results.”
To reduce the number of Figures we will combine some and move some to appendices. Figures 4.7 and 4.8 can be combined, and Figure 4.13 can be moved to an appendix.
“The study area includes crops of soybean and maize; it would be good to know how the ML model performs per crop type.”
Models trained on maize and soybean growing areas individually typically performed very similar to each other. This may be in part because many of the areas of maize and soybean cultivation overlap as they can be grown in rotation. We deemed that because results were so similar it was not worth including more Figures for both maize and soybean growing areas individually.
The quality of Fig. 17 is poor.
Figure 17 will be removed
Conclusion
The conclusion is too general and does not specifically state what the main results are. What variables were the best predictors for VHI? What machine-learning methods achieve the best performance? What is the main contribution to the drought research in the article?
Conclusions are to be made more direct and specific. In summary, the main conclusions are:
- Machine learning methods have great potential to be used to forecast agricultural drought 1 month in advance, and gradient boosting methods can achieve up to ~0.8 coefficient of determination in some areas such as the northeast, making them an especially promising method to use.
- Across Brazil SPEI may be a more useful indicator than SPI alone.
- For the agricultural drought onset forecasts, models also performed well but further work is needed to test different methods of classification.
- ENSO variation had small effects on model performance, with El Nino effects being more difficult to predict than La Nina effects.
Thanks again for your comments. We would like the opportunity to improve the manuscript and then resubmit with the above changes to improve the scientific rigour, readability and presentation of the manuscript
Citation: https://doi.org/10.5194/nhess-2024-60-AC2
-
AC2: 'Reply on RC1', Joseph Gallear, 26 Jul 2024
-
RC2: 'Comment on nhess-2024-60', Anonymous Referee #2, 26 Jun 2024
The authors explored machine learning as a method to forecast the vegetation health index (VHI), for large scale monthly drought monitoring across agricultural land in Brazil. Though the work is largely well-written, the article still needs significant improvement in the introduction, results and discussion, and conclusion sections. Considering my observations as follows, I suggest major revisions before considering it for publication.
- There are too many paragraphs in the introduction. I suggest to rewrite the introduction with 3 paragraphs, highlighting the basic content of the research field in the first paragraph. Then review the research progress of the literature in the second paragraph, and in the third paragraph, analyze the limitations of past research and clarify the innovation of your own research.
- The importance of ML compared to other methods such as statistical, probabilistic, and time series modeling for drought monitoring and drought forecasting is missing in the introduction section. I would suggest to add this in the introduction section.
- The description under section 2 (page 3; line 85, 90, and 95) is not so much important. Please delete these lines.
- Rewrite the study area highlighting the key geographic features, climate, and physiography of the study area. Please omit the first three line of the study area.
- The author used 1,2- and 3-month SPI. Why did the author not use the SPI 6? SPI 6 indicates the seasonality of agricultural drought.
- Why was only precipitation used as a predictor variable? Was it average precipitation or total precipitation? I think using only precipitation does not make any sense as the author used SPI and SPEI index, which is the form of precipitation-based drought index. In this regard, I would suggest adding precipitation anomaly index (PAI) as a predictor variable instead of only using precipitation.
- In case of the machine learning model what amount of data was used for training, validation and testing of the model? I mean, how was the model built? How was it calibrated? The most important parameters and the choice of values for the model were not explained sufficiently much more explanation needed.
- What does “SEA AV” mean for? What kind of model was it? What is the utility of using “SEA AV” model?
- Page-16 (line 300): Please close the first bracket for “(Figure 7”
- The author discussed only the forecasting performance of various machine learning models. But I did not see any forecasted results of VHI by the machine learning model, which performed better compared to other models. It is very important to add results of forecasted VHI by the best machine learning model.
- Conclusion can be improved by highlighting the innovation content of the paper, future research direction, and recommendation for policy formulation.
Citation: https://doi.org/10.5194/nhess-2024-60-RC2 -
AC1: 'Reply on RC2', Joseph Gallear, 26 Jul 2024
Thanks for the comments, we will use your insights to improve the manuscript. Please see below for the list of ways in which we will address your comments.
“There are too many paragraphs in the introduction. I suggest to rewrite the introduction with 3 paragraphs, highlighting the basic content of the research field in the first paragraph. Then review the research progress of the literature in the second paragraph, and in the third paragraph, analyze the limitations of past research and clarify the innovation of your own research.”
We are restructuring the introduction to more closely resemble the structure you describe, we also plan to expand the literature review for the second paragraph
“The importance of ML compared to other methods such as statistical, probabilistic, and time series modeling for drought monitoring and drought forecasting is missing in the introduction section. I would suggest to add this in the introduction section.”
A justification for the use of ML over traditional statistical methods or time series modelling will be included in the introduction. We will emphasise ML as a method to make use of statistical correlations without making assumptions about the structure of the data; and also ML’s capability for generalising across a wide range of environments making it suitable for large scale regional prediction.
“The description under section 2 (page 3; line 85, 90, and 95) is not so much important. Please delete these lines.”
We will assess the revised text, and delete this section if redundant.
“Rewrite the study area highlighting the key geographic features, climate, and physiography of the study area. Please omit the first three line of the study area.”
Thanks for the suggestion, we will rewrite the study area description, adding information about geography, climate and physiography.
“The author used 1,2- and 3-month SPI. Why did the author not use the SPI 6? SPI 6 indicates the seasonality of agricultural drought.”
SPI 6 was not used because the planting to maturity growing time of maize and soybean is roughly 3 months. Therefore SPI 6, although indicative of hydrological conditions, would be less relevant to the growth of the key crops studied.
“Why was only precipitation used as a predictor variable? Was it average precipitation or total precipitation? I think using only precipitation does not make any sense as the author used SPI and SPEI index, which is the form of precipitation-based drought index. In this regard, I would suggest adding precipitation anomaly index (PAI) as a predictor variable instead of only using precipitation.”
The Precipitation parameter used was total precipitation over each month, and it was included to provide a benchmark contrast to the SPI and SPEI index. We think that using the precipitation anomaly would be redundant when using SPI.
“In case of the machine learning model what amount of data was used for training, validation and testing of the model? I mean, how was the model built? How was it calibrated? The most important parameters and the choice of values for the model were not explained sufficiently much more explanation needed.”
The machine learning methods were trained / evaluated using a leave one out cross validation strategy, this is described in section 2.5 (entitled cross validation). The best predictors are described in section 3.6 (entitled Feature importance). We will revise this section to improve clarity.
“What does “SEA AV” mean for? What kind of model was it? What is the utility of using “SEA AV” model?”
SEA AV stands for seasonal average model which is described on page 15 line 296 onwards. The purpose of the model is to simply show for Figure 6 that all models are able to outperform a prediction gained when simply assuming that VHI would follow the average monthly seasonality of VHI for a given location. This will be made clearer through greater emphasis in the methods section.
“Page-16 (line 300): Please close the first bracket for “(Figure 7””
Thanks for pointing this out, we will fully revise the manuscript in order to eliminate typos such as this.
“The author discussed only the forecasting performance of various machine learning models. But I did not see any forecasted results of VHI by the machine learning model, which performed better compared to other models. It is very important to add results of forecasted VHI by the best machine learning model.”
Figure 6 is the comparison of different machine learning methods. Subsequent Figures in the results section are all relating to the best model chosen from that initial comparison (the gradient boosting machine model)
“Conclusion can be improved by highlighting the innovation content of the paper, future research direction, and recommendation for policy formulation.”
The significance of the work is to be described in the following paragraph which is to be added to the conclusions:
“These findings are of significance for future drought monitoring and forecasting work in Brazil as well as for other regions in which drought monitoring and forecasting systems using machine learning are being considered or developed. Specifically, in showing how machine learning methods perform across Brazil, this research provides a first benchmark set of results for agricultural drought forecasts in the country. This also provides useful information about the spatio-temporal pattern of model performance. For future research outside of Brazil, this work provides a case study as to how machine learning methods perform across a wide area with large diversity in climate.”
Thanks again for the comments. We will endeavour to improve the manuscript through your insightful feedback.
Citation: https://doi.org/10.5194/nhess-2024-60-AC1
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
397 | 116 | 25 | 538 | 20 | 22 |
- HTML: 397
- PDF: 116
- XML: 25
- Total: 538
- BibTeX: 20
- EndNote: 22
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1