Natural Hazards and Earth System Sciences A cautionary note regarding comparisons of fire danger indices

Over the past decade, several methods have been used to compare the performance of fire danger indices in an effort to find the most appropriate indices for particular regions or circumstances. Various authors have proposed comparators and demonstrated different responses of indices to their tests, but rarely has much effort been put into demonstrating the validity of the comparators themselves. We present a demonstration that many of the published comparators are sensitive to the different frequency distributions, that may be inherent in the performance of the different indices, and outline a non-parametric method that may be useful for future work. We compare four hypothetical fire danger indices, three of which are simple mathematical transformations of each other. The hypothesis tested is that the comparators often used in such studies may indicate spurious performance differences between these indices, which is found to be the case. Non-parametric methods are robust to differences in index value frequency distribution and may allow more valid comparisons of fire danger indices. The new comparison method is shown to have advantages over other non-parametric comparators.


Introduction
Recently much effort has been put into finding the "best" fire danger indices for particular regions.Indices with greater skill may allow for more efficient allocation of firefighting resources, more appropriate public warning systems and more precise research studies.Regional differences in index performance may be apparent at relatively small geographical scales (e.g., Padilla and Vega-García, 2011), and it is unlikely that there will ever be a "one size fits all" approach.The United States Forest Service maintains the "FireFamily Plus" computer programme, one component of which can be used to analyse the performance of a range of fire danger indices (Andrews et al., 2003), and similar efforts to systematically compare fire danger index performance are underway in alpine Europe (e.g., Arpaci et al., 2010a, b).The comparison of fire danger indices is not trivial and the performance of indices must be tested according to high standards.Different indices may be discrete or continuous and may produce data across different ranges or follow different distributions in their frequency of occurrence throughout a year, all of which serves to complicate direct comparisons.

Fire danger indices
The fire danger indices we are concerned with here are those formed so as to assign some particular value to any day, with (usually) higher values indicating a greater chance of a fire occurring.These indices form the basis of public warning systems and are of increasing interest for fire planning and resource allocation.
A wide variety of indices have been developed, with different mathematical formulations and different input variables.Some of these use only the weather conditions on the days in question, such as the Angström index (I A ), which is dependent only on the relative humidity (R, %) and the temperature (T , • C): (note that in this case, lower values indicate higher fire danger).
Other indices also include information from previous days.The Nesterov index (I N ) is constructed using temperature and the dew point (D, • C), summed over the number of days since a rainfall of 3 mm was recorded (W ): The set of all daily values from some index over some time period define the "frequency distribution" of that index.Depending on the mathematical formulation of the index and the characteristics of its input variables, these frequency distributions can have markedly different shapes.
Published by Copernicus Publications on behalf of the European Geosciences Union.

Parametric and non-parametric tests
Parametric statistical tests assume by definition that data follows some particular statistical distribution and are often invalid if those assumptions are not met.Although for some tests these conditions are well known (such as the assumption of a normal distribution in the calculation of t-tests), in complex procedures or statistical software packages the need for the data to meet certain conditions may not be immediately apparent.Consequently, the ensuing results are at best meaningless, or at worst dangerously misleading.Exploratory data analyses should always be performed before applying complex statistical procedures.Using a simple set of hypothetical indices, we show here that these differences in frequency distributions can introduce spurious results into common index comparison methods.A non-parametric test (not affected by differences in frequency distribution) is introduced which can support comparative studies in future research.Although this paper serves to introduce the new method to the scientific community for further study, our main purpose is to elucidate the shortcomings of methods currently in use and, thus, demonstrate the necessity for the development of more descriptive non-parametric index comparison methods.

Previous approaches
Over the past decade or so several methods have been proposed to compare fire danger indices.For reasons of space these methods are only briefly described here; readers are asked to refer to the cited papers.

Mahalanobis distance
The Mahalanobis distance is a measure of the distance between two datasets.Viegas et al. (1999) applied this method in southern Europe, beginning by normalising their indices so that all will range from zero to 100.This is done linearly, with the normalised index where I x is each individual normalised index value, I x is the individual index value at its original scale I min is the minimum value of I x in the full dataset, and I max is the maximum value of I x in the full dataset.The normalised indices are then grouped into ten categories (I x = 0:10, 10:20 etc.).Viegas et al. (1999, p. 240) recognised the possibility of error being introduced due to using equally spaced class limits, but proceeded due to the simplicity of this approach.After plotting the percentage of days in each class and the percentage of fire-days in each class, they calculated the Mahalanobis Distance (M d ) as a measure of the discrimination of days with higher or lower fire danger.The Mahalanobis Distance is calculated as: where X 1 is the mean index value on fire-days, X 2 is the mean index value on non-fire days and σ is the standard deviation of the index value on all days.A larger Mahalanobis Distance is presumed to represent greater differentiation of fire/nonfire-days.Note also that M d gives the same result whether raw or normalised index values are used.

Percentile analysis
Andrews et al. ( 2003) described a "percentile analysis", where, for a particular index, the index values at the 90th, 50th and 25th percentile are calculated for all days and compared with the corresponding percentile of that index value on fire-days.For example, the 90th percentile of some index across all days may have a value of 80, but when considering only fire-days a value in that index of 80 represents the 75th percentile, a difference of 15.The differences for each of the three given percentiles are summed and represent the shift in the distribution of index values between all days and fire-days.A greater distribution shift is taken to signal a better index.Using the 90th, 50th and 25th percentiles appears to be subjective; selecting a different set of percentiles would give different results.

Logistic regression
As the occurrence or non-occurrence of a fire is a binary event, it may be modelled with a logistic regression.Andrews et al. (2003) also used a logistic regression technique to model the probability of a day at a particular index value being a fire-day or a nonfire-day with the index values as independent variables.The indices are ranked according to the range of the fitted values (with wider ranges, beginning closer to zero being indicative of more sensitive models) and to the fit of the models to the observations using a pseudo R 2 value that they denote R 2 L .A higher R 2 L indicates a closer fit of the logistic regression to the observed data.

c-index
Verbesselt et al. ( 2006) also used a logistic regression model, but judged their models' performances using an adjusted chisquare form of Akaike's Information Criteria (AIC), where AIC χ2 is the model likelihood ratio chi-squared statistic minus two times the degrees of freedom.With this form of the AIC, a higher value indicates better model fit.To represent the "discrimination power" of each model they calculated the "c-index", which is equal to the area under a receiver operating characteristic (ROC) curve.An ROC curve is a graphical representation of how a model performs with regard to "true" or "false" positive predictions and "true" or "false" negative predictions.For each fire day, the index value is plotted according to its "sensitivity" or "true positives" (the index's ability to correctly determine that a fire might occur  at or above that value) and its "specificity" or "false positives" (the index's propensity for false alarms at or below that value).Fawcett (2006) provides an excellent overview of the concept and notes that the area under the curve is equivalent to a non-parametric Wilcoxon test of ranks (Hanley and McNeil, 1982).A c-index of less than 0.5 indicates random predictions, whereas a "perfect" model would have a c-index of 1.0.The c-index gives useful non-parametric information, but is not adequate to fully describe differences in the performance of competing indices.Two ROC curves may not be identical, yet have the same area beneath them.

Proposed new comparator
We present here an outline of a two-part descriptor of fire indices that may help to differentiate performance, based on the slope of the ranked fire-day percentiles and the "y" intercept of that slope.The daily values for each index are converted to individual percentiles across the full range of days in the dataset.Those index percentiles for fire-days are ranked from lowest to highest and plotted on the "y"-axis, with the "x"axis indicating the rank.Figure 1 provides a small example, with three fires occurring within a timespan of ten days.
Considering that on this plot an index composed of random numbers would have an expected slope of 1.0 and an intercept of zero while a "perfect" index would have a slope approaching zero and an intercept approaching 100, these two parameters together may usefully describe the performance of fire indices.To reduce the influence of outliers in the data, the slope is calculated with the Theil-Sen technique (Theil, 1950;Sen, 1968), which gives the median of all slopes from all points plotted to all other points.The intercept is the median of all of the possible individual intercepts of that slope, passing through each single point.Although the Theil-Sen method is well established in the hydrological sciences as a means of producing a robust regression (e.g., Granato, 2006), to the best of our knowledge we are the first to suggest that it may be applied in order to characterise a curve of ranked percentiles, and that such a curve can usefully describe the performance of a fire danger index.

Test and application example
To assess the robustness and usefulness of these index comparison methods, we firstly constructed a set of four hypothetical indices and applied them to a arbitrary year containing 10 fire days.The four hypothetical indices were assessed with the four previously published comparison methods and with our proposed new comparator.This is intended to demonstrate the need for non-parametric techniques.
The greater utility of our ranked-percentile method is shown through a brief application example.Meteorological data was obtained from the weather station in Graz (southern Austria) and used to derive values for the Angström and Nesterov indices for the surrounding region over the period 1978-2008.Twenty-one fires occurred in this time period.This data is a subset of an Austria-wide project examining the performance of 19 different indices including, for example, the Canadian FWI (Van Wagner, 1987), the M68 index (Käse, 1969) and the FMI of Sharples (2009).
Calculations of the hypothetical index values, the Mahalanobis Distance and the percentile scores for the theoretical examples in this paper were made in Excel 2003, and both these and the remaining tests were performed with the R statistical software v.13.1 (R Development Core Team, 2008), using the "glm" model ("binomial" family) to develop the logistic regressions, the "anova" function to derive the model likelihood ratio chi-squared statistic to calculate AIC and functions in the "pROC" package (Robin et al., 2011) to calculate the c-index and the "mblm" package for the Theil-Sen statistics.The use of both Excel and R is intended to explore what differences may potentially result from applying the tests under different software frameworks.R and all packages used are available via http://cran.r-project.org/.Calculations for the Graz application case are made solely in R.

Index value generation
Consider a hypothetical fire index (index "A"), that is based simply on the calendar day of a year.The index is formulated as: Ten days are arbitrarily selected as "fire occurrence" days, 9 loosely centred around the middle of the year and one outlier.For this example, these are days 4, 143, 156, 170, 189, 201, 208, 222, 247 and 262.

Index characteristics
Figure 2 displays the daily values of the four indices and the fire occurrence days, while Fig. 3 shows the frequency distributions over all days.Index A values occur mostly at each extreme, while the mathematical transformations applied to create indices B and C causes the distributions to cluster towards one extreme.As the first three indices are simply mathematical transformations of each other (i.e., ranks are not changed through the transformation), it is reasonable to suppose that any valid method of comparing them should rank each index as equally useful, otherwise the comparison method may be ranking indices differently simply because the index values have different occurrence frequency distributions.This proposition will be tested for a number of different comparators that have been proposed in the literature, following procedures described by Viegas et al. (1999), Andrews et al. (2003) and Verbesselt et al. (2006).

Hypothetical indices
Plotting the percentages of days that record index values in each "normalised" class (Fig. 4a) is sufficient to show that the indices are drawn from markedly different distributions.This should be a warning signal that perhaps parametric methods of index comparison may not be appropriate.Figure 4b is the percentage of days where fires occurred in each class and shows little consistency between indices with regard to which "normalised" index class contains the highest proportion of fires.
Results for the Mahalanobis Distance, the percentile analysis, logistic regression statistics and the c-index are given in Table 1, along with the ranks of each index for each comparator.The only comparator that detects that indices A, B and C are effectively identical is the c-index.Also, c-index results are identical whether calculated using raw index values, "normalised" values or fitted values from the logistic regression, because the indicator uses ranks rather than values.
Ranking the fire-day percentiles of each index gives identical results for indices A, B and C, but some differences are apparent to index D (Fig. 5).Index D performs better at the fifth and tenth ranks, but poorer at all others.The Theil-Sen robust regression line summarises the performance difference, with indices A, B and C having a slope of 3.836 and an intercept of 58.90, and index D having a slope of 4.658 and an intercept of 52.60.Index D is, thus, concluded to have less skill than the others, which are of equal worth.

Application example
The results for the Graz application example are shown in Table 2.The c-index suggests that the indices are identical, while other methods may recommend either.The ranked percentile curves are shown in Fig. 6, demonstrating that the performances of the two indices are in fact different.

Discussion and conclusions
It has been argued that the transformations applied in our example are a valid means of developing improved indices, on the grounds that it is often necessary to transform input variables in order to better the fit of a function.This, however, is not the same thing as comparing transformed outputs.Figure 7a shows the comparison of indices A and B, on linear axes.A visual appraisal of this figure would suggest that index A is superior to index B, as index A appears to provide better discrimination between fire-days and non-fire days, because index B is commonly quite high on non-fire   days.Consider though, if the y-axes on the plots were displayed on a different scale.Figure 7b shows the identical indices on an exponential scale.The data and the relationship between each index and the day of year is identical to that in Fig. 7a, only the scale of the "y"-axis is changed.Yet now, it "appears" that index B has greater discriminatory power than index A, because index A is commonly low on fire days.
The parametric methods examined in this paper are essentially Euclidean distance-based, in linear space.As pointed out by Wu (2005), there is no a priori reason to suppose that linear scaling is necessarily superior.The differences that some comparators find between the transformed indices are a result of the distribution, just as the appearance in Fig. 7 is the result of the axis scale.
Apart from the c-index, all of the established fire index comparators that we examined here indicate different predictive power between the effectively identical indices A, B and C, suggesting that in many cases the differences they detect are spurious, resulting from the frequency distributions of the index values rather than from any real difference in predictive power.
The percentile analyses method of Andrews et al. (2003) is non-parametric, and should in theory give identical values for indices A, B and C. The large differences we reported for this method are an artefact of the way that Excel interpolates quantiles, strongly suggesting that Excel should not be used in this application.The R "quantile" function offers nine different ways of computing quantiles (see Hyndman and Fan, 1996), but it is unclear which would be appropriate for use within a fire index comparator, or if perhaps different methods should be used for indices with different distributions.Conducting the percentile analysis with all 9 computation methods in R shows that none of them produce simultaneously identical values for indices A, B and C. The problem with the Excel results may also be exacerbated by the low number of fire events in our example, as quantile calculation methods involve some degree of interpolation between data points.In practical applications (with large numbers of firedays) this particular shortcoming of the method is unlikely to cause problems if R is used rather than Excel, but may be important where smaller numbers of events (such as "multiple fire-days" or "large fire days") are of interest.The choice of which quantiles to be compared is somewhat subjective and will sometimes influence the results of the comparisons.In our Graz application example, the percentile analysis method suggests that the Angström index is substantially better than the Nesterov (Table 2).The reason for this is apparent in Fig. 6.The 90th, 50th and 25th percentiles are used for comparison.While at the 50th and 25th percentile the curves for the two indices are at similar points, at the 90th percentile the Angström curve is higher.If the 75th percentile had been used instead of the 90th, the percentile analysis method would have determined the Nesterov to be the better index.
The non-parametric method that we outline appears to avoid some of the shortcomings of parametric methods, correctly determining that the hypothetical indices A, B and C have equal predictive power.The method is in agreement with the c-index, that index D is inferior to the other indices.The proposed method also offers an improvement over the cindex, in that it is able to distinguish differences between fire indices that have identical c-index scores, where such difference is real rather than an artefact of frequency distribution.Our Graz application case was selected to demonstrate this.The c-index for both the Angström and Nesterov indices is 0.816, yet the ranked percentile curves for each index have different characteristics.The higher intercept and flatter slope would lead us to accept the Nesterov as the better index in this application.Although from Fig. 6, we can see that the Angström works well at the very high values (above the 90th percentile), its performance below this level is comparatively poor, with several fires occurring when the index was between its 66th and 76th percentile levels.The Nesterov index for fires at the same ranks was between its 72nd and 79th percentiles.This is immediately clear from the figure, but this pattern is also inferable from the fact the Angström index has a lower intercept and a higher slope than the Nesterov.
Although substantial work remains to be done on determining acceptable methods for comparing fire indices, we have established that commonly used parametric methods may produce potentially spurious results.Our proposed two-part non-parametric comparator is robust to index distribution differences and can provide more useful information than current alternatives.Future investigations will be needed to determine its full worth, including indepth mathematical analyses and application studies over a range of realworld datasets.

Fig. 1 .
Fig. 1.Example figure for the proposed ranked percentile method of index comparison.Fire danger index values for every day are expressed as percentiles and those percentiles on days when fire occurs are plotted according to rank.The slope and intercept of a robust regression line through these points characterises the index.

FireFig. 2 .
Fig. 2. Hypothetical fire danger index values over the course of one year.Index A is sinusoidal, as defined in Eq. (5) (in text).Indices B and C are respectively logarithmic and exponential transformations of index A, and index D is a discontinuous function described in Equation set 6 (in text).Triangles indicate fire occurrence days.

Fig. 4 .
Fig. 4. Characteristics of normalised indices.Top figure shows the percentage of days falling in each index class, while the bottom figure is the percentage of the days in each class that recorded a fire occurrence.

Fig. 7 .
Fig. 7. Hypothetical fire danger index values over the course of one year, on linear (a) and exponential (b) "y"-axes.

Table 1 .
Andrews et al. (2003) ranks for each index applied to the 4 hypothetical indices.Md = Mahalanobis distance; R 2 L = Pseudo R 2 , as perAndrews et al. (2003); AIC = Akaike's Information Criteria, r = rank.Bold numbers in the rank column indicate the "best" index according to each method

Table 2 .
Comparator values and ranks for each index applied to Angström and Nesterov indices in the Graz area.Bold numbers in the rank column indicate the "best" index according to each method.