A data-driven framework for assessing climatic impact drivers in the context of food security

Benso, Marcos Roberto; Silva, Roberto Fray; Chiquito Gesualdo, Gabriela; Saraiva, Antonio Mauro; Delbem, Alexandre Cláudio Botazzo; Marques, Patricia Angélica Alves; Marengo, José Antonio; Mendiondo, Eduardo Mario

doi:https://doi.org/10.5194/nhess-25-1387-2025

Articles | Volume 25, issue 4

https://doi.org/10.5194/nhess-25-1387-2025

© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.

Special issue:

Methodological innovations for the analysis and management...

https://doi.org/10.5194/nhess-25-1387-2025

© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume 25, issue 4

Research article

|

10 Apr 2025

Research article |

| 10 Apr 2025

A data-driven framework for assessing climatic impact drivers in the context of food security

Marcos Roberto Benso, Roberto Fray Silva, Gabriela Chiquito Gesualdo, Antonio Mauro Saraiva, Alexandre Cláudio Botazzo Delbem, Patricia Angélica Alves Marques, José Antonio Marengo, and Eduardo Mario Mendiondo

Download

Final revised paper (published on 10 Apr 2025)
Supplement to the final revised paper
Preprint (discussion started on 21 Feb 2024)
Supplement to the preprint

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2023-3002', Anonymous Referee #1, 27 Feb 2024

General comments:
The study uses interpretable machine learning to identify the most important climate impact drivers for predicting maize and soybean yield variability in Brazilian states. Overall, the manuscript is quite well-written and has clear descriptions of the datasets used which are very helpful for the reader. They discuss in some detail the advantages and disadvantages of the use of different yield datasets in the region, which is crucial for the interpretation of the results of this type of analysis, and make the effort to show a comparison of the datasets and where they agree and disagree. They also use two specific examples of droughts in Brazil as case studies to examine the interpretations, which is interesting and helps to verify their approach.
The topic is very important, and novel methods such as this have a clear use-case in identifying the most relevant periods at which different CIDs impact yields. However, I have some concerns about the methodology. The description of the methodology used is not sufficiently thorough, so these concerns may have been addressed by the authors, but this should be clarified.
Random forests are often used for this type of study and are a good choice when working with tabular data such as this. However, care must be taken when using any machine learning method not to allow the model to overfit to dependencies or correlation between features. The training and testing method used was not explained clearly, except in Figure 1, which only states that 20% of the data was used to test the models, but not how that 20% was selected. Given that models were trained on a state level, multiple municipalities within each state would have highly correlated climate and yields. Were the datapoints split in time and/or space to account for this, or sampled randomly? If they were sampled randomly, this can lead to misleading estimations of model performance and the interpretations are less likely to represent the physical mechanisms that are intended to be studied. Particularly relevant - if soil is used as a predictive feature, which does not vary in time in the dataset used (I believe), the model can easily spatially overfit.
Overall, I find the manuscript to be quite well-written and the thorough analysis of the different datasets used and how they impact the results is interesting and excellent scientific practice. However, I think that some small changes to the methodology (most importantly, selecting a test set considering the spatiotemporal autocorrelation and estimating SHAP values using this test set, ideally using a different feature selection method such as SFS) and better explanation of the steps involved to generate the results discussed could very much improve the paper. As the paper aims to present a framework to enhance the interpretability of ML methods fo crop yield loss prediction, it is important that the framework is robust and can deal with common issues for this type of problem such as overfitting to spatiotemporal data.
Finally, given that the title of the paper and stated goal is to present a framework that can be used by other researchers, the code used should be published and made openly available, but this is not currently stated in the manuscript.
Specific comments:
At what stage was RFE used to select features, and how was this conducted? How many features were selected? I also question the use of RFE in cases where models can overfit (e.g. when spatiotemporal data is used), as features that the model find most important are more likely to not be physically meaningful. Using, for example, sequential feature selection with a spatial or temporal cross-validation splitting method would be more likely to return relevant drivers, and I would recommend to the authors to try this if possible.
In Figure 4, it would be helpful to have descriptions of what features were included in the different scenarios - in particular, I could not understand what ‘Complex’ meant.
In Figure 5 and 6, is this after RFE has been used to select only 10 features? I was confused by the fact that for maize, only February features are shown, but later in the text it states that April and May precipitation was important for some regions.
I would strongly advice not removing correlated variables before doing the feature selection. You can expect that the highly correlated variables will not both be selected, and it is another opportunity for data leakage to enter.
I think it is very useful to compare the importances between the different states and datasets, as this can help to find robust insights and identify potential problems with the datasets used. It would be useful to see uncertainty quantification here as well, as given that similar model performance can come from many combinations of features (as shown in Figure 4), one would expect that there is significant uncertainty in the interpretations as well. I would also consider using an additional feature importance metric (permutation feature importance on held-out test set?) for comparison, but this might be out of scope.
I also find it unusual to fit random forest models and then to use a more complex model (XGBoost) to explain them via SHAP. Normally, SHAP is used directly on the trained model to be interpreted, and if a second model was used it would normally be a simpler model. Why not use XGBoost for the initial part of the analysis instead of adding this complexity of using a second model to explain the first?
Partial dependence plots do not need SHAP values - they can be calculated by just varying individual features and estimating the output. It might be interesting to compare this against those gained from SHAP (but again, maybe out of scope). It would at least be useful to discuss/justify in the text why the partial dependence plots gained from SHAP are more useful (which is very plausible).
SHAP values are also sensitive to the data used to calculate them, and I would again recommend to use test sets for this that are split with consideration to the spatial and temporal correlations.
Interpreting the results of this type of study can be difficult, as in general, any feature used for training is one that could be a causal driver. This means that it is hard to figure out if the results are meaningful or if the model has learned some spurious correlations. The fact that only February features are shown as important for maize suggests, to me, that something strange is going on, as the authors state that this is peak planting date and in some regions, planting is not finished until the beginning of April. It seems more likely that heat, for example, would be more important during the reproductive period. Using the different test sets as I mentioned before might help with this, as well as using permutation feature importance instead of the internal RF variable importance measure.
Why remove heteroskedasticity? Could this be justified more in the text? As we expect more climate variability with climate change and therefore more yield variability, it isn’t obvious that this should be corrected for.
Lines 171-172 describe a second analysis using Gaussian copulas, but I could not find this further described or any results from this in the rest of the manuscript?
Technical corrections:
The paragraph on interpretability (lines 53 to 56) I could not understand.
Please state briefly that the crop yields were detrended in the main text (the further explanation in the Supplementary is very helpful, but there is no mention of the fact that the yields are detrended in the main manuscript which is very important to interpret the results).

Some references on selecting test sets appropriately when using ML with spatiotemporal data:

Meyer, H., Reudenbach, C., Wöllauer, S. & Nauss, T. Importance of spatial predictor variable selection in machine learning applications – Moving from data reproduction to spatial prediction. Ecological Modelling 411, 108815 (2019).

Sweet, L., Müller, C., Anand, M. & Zscheischler, J. Cross-Validation Strategy Impacts the Performance and Interpretation of Machine Learning Models. Artificial Intelligence for the Earth Systems 2, (2023).

Roberts, D. R. et al. Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure. Ecography 40, 913–929 (2017).

Citation: https://doi.org/10.5194/egusphere-2023-3002-RC1
- AC2: 'Reply on RC1', Marcos Roberto Benso, 30 May 2024
  
  We thank RC1 for the review. The response is attached as a pdf document.
  
  Citation: https://doi.org/10.5194/egusphere-2023-3002-AC2
RC2:
'Comment on egusphere-2023-3002', Anonymous Referee #2, 15 Apr 2024

Shortly about myself to better interpret my review: I am an agricultural economist postdoc fellow, working in the interdisciplinary field of agricultural trade, food security, with application of econometrics and machine learning, mostly interpretable. I have no in-depth background in climate change.
Summary:

The paper uses the Climatic Impact-Driver (CID) approach to evaluate the impact of climate risks on food security, focusing on maize and soybeans in Brazil. The authors use data-driven methods and machine learning models to identify the most relevant climate indices and their thresholds that increase the impact probability. They found that mean precipitation is a key CID, with specific thresholds indicating increased risk of crop yield losses. The study emphasizes the relevance of both extreme and mean climate indices in assessing climate risk on agriculture, contributing to decision-making and policy development in response to climate change.
Introduction

I find the introduction comprehensive and insightful, providing a clear overview of the challenges associated with predicting crop yield variability in response to climate extremes. The emphasis on the importance of considering multiple weather variables and employing models that incorporate sector-specific vulnerability and exposure adds depth to the discussion, highlighting the complexity of agricultural risk assessment. The introduction's exploration of machine learning algorithms, particularly decision tree algorithms like random forest models, offers innovative possibilities for improving predictive accuracy despite data availability constraints.
Furthermore, I like the idea of using model interpretability techniques in the modeling framework to address the limitations of existing approaches. The integration of the CID framework promises a solid foundation for contextualizing climate in decision-making, aligning with the need for localized solutions in agricultural systems. Overall, the introduction effectively sets the stage for a research endeavor that holds significant potential for informing critical decisions and strategies aimed at enhancing food production resilience in the face of climate variability.
Methodology

This methodology section presents a comprehensive approach to investigate the impacts of climate extremes on soybean and maize crop yields in Brazil, which is of paramount importance for agricultural research and policy-making. The modeling framework outlined, with its emphasis on data filtering, variable selection, and threshold determination, offers a systematic way to analyze the complex relationships between climatic variables and crop yields. By integrating different interpretable machine learning techniques, the study ensures both predictive accuracy and interpretability, crucial for gaining trust of people who will later use the proposed modelling-framework.
The delineation of the study area and selection criteria for municipalities provide a clear understanding of the geographical scope and rationale behind the dataset selection. Moreover, the detailed description of data collection and processing, including the handling of outliers and missing values, enhances the reliability and reproducibility of the study's findings. Additionally, the inclusion of soil data enriches the analysis by considering the influence of soil properties on crop productivity.
However, while the methodology appears robust and well-structured, some sections could benefit from further clarification. As a non-expert in climate change, I could benefit from an explanation regarding the application of climate indices and their relevance to crop yield analysis. Providing more insights into the selection process of specific indices and their interpretation within the context of agricultural impacts would enhance the understanding of the wide audience. Overall, this methodology sets a solid foundation for investigating climate extremes' impacts on food production, contributing valuable insights to the field of agricultural economics.

Results and Discussion

The chapter is clear and summarizes the article very well.

Citation: https://doi.org/10.5194/egusphere-2023-3002-RC2
- AC1: 'Reply on RC2', Marcos Roberto Benso, 27 May 2024
  
  We thank the Reviewer #2 for his/her thorough revision of our paper and great attention to details. The answer for each comment is found attached.
  
  Citation: https://doi.org/10.5194/egusphere-2023-3002-AC1

Peer review completion

AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload

ED: Reconsider after major revisions (further review by editor and referees) (06 Jun 2024) by Aloïs Tilloy

AR by Marcos Roberto Benso on behalf of the Authors (15 Sep 2024) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (24 Sep 2024) by Aloïs Tilloy

RR by Anonymous Referee #1 (05 Nov 2024)

Suggestions for revision or reasons for rejection

I appreciate the authors’ hard work on the manuscript, and I think the changes they have made have improved it. I also am glad that the authors have decided to make their code available, which will be very useful to other researchers. Overall, I think that the manuscript makes a useful contribution and could be accepted for publication after minor revisions.

However, as it stands, the methodology is still unclear to me from the text as written, and I recommend revising it. I have picked out my main areas of confusion below, and I hope this can help the authors in clarifying the text.

First, are models trained individually for each spatial point? This is not explicitly stated anywhere, as far as I can tell.

It is also not clear from the text how the CV and training/test evaluation is executed. For example, lines 151-152 (in the file with tracked changes): ‘The creation of training, validation, and testing subsets is crucial to avoid overfitting and achieve reasonable estimates of model performance. The data set was divided into the first 80% for training and 20% for validation data.’. This does not explicitly state that the split is in time, and although a training, validation and test set is mentioned, in fact only a training and test split is done.

Additionally, it is unclear how the LOYO-CV is done - the text states a ‘fixed window’ of 10 years was used, and one year as a test set; is this repeated such that every year is a validation set and then the resulting scores are averaged, as is usually done for CV? ‘Fixed window’ reads as if only a single split is done, but I assume the authors do not mean that.

Line 160: ‘the best fit model was determined by employing a leave-one-year-out cross-validation approach (LOYOCV)’: I do not understand this step. Did the authors train multiple models, with identical hyperparameters and features, on different subsets of years and select the best one according to its performance on each corresponding validation set? This would not be advisable, as the validation set performance would depend on the year, not just the model performance. Or were the hyperparameters selected or features selected based on the average CV performance across all folds, as would be more typical?

Lines 175-177: ‘Different models were trained considering different combinations of input data, including precipitation means, temperature means, and combinations of means and extreme climate indices. The goal of this experiment was to identify the most important climate indices.’ I would recommend describing this more explicitly. Which subsets of features were tested and how were they evaluated?

I would suggest that the authors explicitly state which years are covered in the training and test sets, on which years the LOYO-CV is done, and further on in the text, on which years/data the results are calculated on for each figure. This should also be included in Figure 1.

Additionally, there are multiple ways of calculating feature importance from random forest models. I am assuming that the authors refer to the internal entropy-based feature importance. This should be stated explicitly. I would also advise that the authors consider additionally calculating the permutation-based feature importance on the validation or test years. Their agreement or disagreement with the entropy-based feature importance would aid readers in assessing the robustness of the findings.

Other points of confusion:

- Line 128, 129 introduces ‘explainable’ and ‘operational’ features, but does not define them. Also lines 137-8: ‘Other relevant aspects, such as relevancy, explainability, and operationability, will be explained in the following steps.’ As far as I can see, these terms aren’t explained later in the text.
- Line 144: ‘The SHAP method uses a second model, most commonly the RF model…’ I do not think this is true. SHAP is used to explain the original model.
- Line 162: ‘The models were trained and optimized on the training and validation datasets’ This is unusual, did the authors intend to write only training datasets, not validation datasets?
- Outlier removal: Lines 276-277 state that the authors remove outliers using the interquartile range, but from the supplementary material it seems that only few datapoints are removed, which does not make sense. Additionally, the supplementary material discussion of the outlier removal is confusing: in lines 30-31 they state ‘Removal of outliers is a complex problem since we are working with extreme events’, but this is followed by an explanation of the trend and heteroskedasticity removal process, and not the outlier removal. Then, in line 52: ‘After obtaining a consistent time series corrected for outliers, trends, and heteroskedasticity’ - but the outlier removal occurs afterwards, as far as I can tell (line 59: ‘To eliminate potential outliers, we excluded values considering each year and state’).
- Lines 277-278: ‘Changes in technology in seed production, fertilizers, and land management, also known as technological trends (Liu and Ker, 2020) were removed by Local Polynomial Regression Fitting (LOESS)’ - all trends would be removed, including those due to e.g. climate change, not just technology. I would recommend mentioning this.
- Supplementary material line 137: the sentence ending in ‘indicating the significant role of both rainfall’ is incomplete.
- Figure 3: Where do the error bars come from here? Which variables were used as predictive features? Are these metrics calculated on the test set?
- Lines 340-342: Where do these ranges come from?
- Lines 349-354: Where are the results discussed here shown?
- Figure 4: The hazard types don’t correspond, as far as I can tell, to the CID types and categories from Table 2. What do these labels mean?
- Line 420: ‘This technique allowed us to extract insights by coupling the results of a Random Forest model’ - I don’t think SHAP works by coupling a model to another, it is intended to explain the original model.
- In lines 466-468, and in the Supplementary material (lines 95-101) the authors discuss whether or not to keep the most extreme years in the training or test set, or remove them. I understand from the text that they were kept in the dataset, but it doesn’t say if they were used in the test or training set.

Technical corrections:

- Line 136: I would remove the sentence ‘Feature selection is a pre-processing step in machine learning models’. It is confusing as the feature selection is described later in the text.
- ‘ML’ is introduced as an abbreviation early in the text, but the authors continue to use ‘machine learning’ afterwards. I would also advise using ‘RF’ as an abbreviation for random forest to improve readability.
- Figure 1 is helpful, but there are multiple typos and minor formatting issues. E.g. ‘Filter Highly correlated variables’ should be ‘Filter highly correlated variables’; ‘Boostrap RF model’ should be ‘Bootstrap RF model’.
- Line 170: ‘To achieve this, we used the Random Forest model’ This is repeated multiple times in the text, and could be removed.
- Line 175: ‘Different models were trained considering different combinations of input data’ - I believe the authors refer to different combinations of features or variables, rather than data.
- Line 207: Typo - ‘The ~~the~~ SHAP explanations was performed’
- Lines 239, 240, 244: Maize and soybean should not be capitalised here.
- Figure S1 and S2: Typo - eath should be each
- Supplementary line 86: The reference is erroneously capitalised: RODRIGUES et al. (2013)
- Supp line 130: typo - Table SS2 should be S2
- In the Supplementary material, Section 4 still refers to an XGBoost model.
- Lines 364-365: Typo - ‘The analysis can be of variable importance for soybean datasets is shown in Table S1’
- Line 413: Typo - ‘climate impact-divers’ should be ‘drivers’
- Lines 461-462: Typo - ‘however, the for IBGE is 150mm’
- Figure 7: Please include the units for e.g. precipitation.
- Line 471: ‘exemplify’ is the wrong word, I think - perhaps ‘present’?

Hide

ED: Publish subject to minor revisions (review by editor) (11 Nov 2024) by Aloïs Tilloy

AR by Marcos Roberto Benso on behalf of the Authors (28 Nov 2024) Author's response Author's tracked changes Manuscript

ED: Publish subject to technical corrections (07 Dec 2024) by Aloïs Tilloy

ED: Publish subject to technical corrections (10 Feb 2025) by Bruce D. Malamud (Executive editor)

AR by Marcos Roberto Benso on behalf of the Authors (11 Feb 2025) Manuscript

Download

Article (4150 KB)
Full-text XML

Short summary

This study applies climate extreme indices to assess climate risks to food security. Using an explainable machine learning analysis, key climate indices affecting maize and soybean yields in Brazil were identified. Results reveal the temporal sensitivity of these indices and critical yield loss thresholds, informing policy and adaptation strategies.