Novel method for hurricane trajectory prediction based on data mining

This paper describes a novel method for hurricane trajectory prediction based on data mining (HTPDM) according to the hurricane’s motion characteristics. Firstly, all frequent trajectories in the historical hurricane trajectory database are mined by using association analysis technology and their corresponding association rules are generated as motion patterns. Then, the current hurricane trajectories are matched with the motion patterns for predicting. If no association rule is found for matching, a predicted result according to the hurricane current movement trend would be returned. All experiments are conducted with the Atlantic weather Hurricane/Tropical Data from 1900 to 2008. The experimental results show that if the matching failure part is contained, the prediction accuracy is 57.5 %. Whereas, the valve would be to 65 % provided all matches are successful.


Introduction
With the rapid development of Word Wide Web (WWW) and wireless communication technologies, mobile communication and mobile computing technologies have been widely used in various fields.Mobile communication equipment, animal migrations, traffic and transportation and clouds cluster tracking are all moving object instances in specific application areas (Morzy, 2007).Some correlative technologies, such as sensor networks, global positioning systems and satellite data services collect and provide a large amount of behavioural data of moving objects, which have brought huge challenges to analyse their inherent regularities.The mobile path prediction has become a hot topic in many research areas.
Hurricanes are tropical cyclones with sustained winds of at least 64 kt (119 km h −1 , 74 mph).On an average, more than 5 tropical cyclones become hurricanes in the United States each year causing great human and economic losses (Su et al., 2010).In respect to this fact, the trajectory prediction as the most important measure to reduce losses has become a hot issue in the field of mobile path prediction.
In this paper, we emphasise on the study of a hurricane trajectory prediction method based on data mining.The prediction method we propose gives up the complex modelling process in the traditional objective forecast method.Instead, it identifies the effective motion patterns in the historical trajectory database by using association analysis technology, and then predicts their future trajectories with pattern matching.The overall framework of the hurricane trajectory prediction method is shown as Fig. 1.After data pre-processing, all frequent trajectories from 1900 to 2000 in the historical hurricane trajectory database are mined according to the given minimum support and then generate all corresponding association rules as motion patterns.Secondly, the current hurricane trajectories from 2001 to 2008 are matched with the motion patterns for predicting.If no association rule is returned, the one according to the hurricane current movement trend would be returned.At last, the correctness of this method would be verified.

Instructions
Data preprocessing 1 : the data in this stage is for frequent trajectory mining.
Data preprocessing 2 : the data in this stage is for pattern matching and correctness verifying.The details are seen in Sect.7.1.

Related work
According to different behavioural characteristics of moving objects, many new ideas for mobile path prediction with data mining are constantly emerging.
The one based on association analysis is concerned by more and more researchers.Zhao et al. (2008) proposed a mobile device location prediction algorithm MPP based on the AprioriAll algorithm, which avoids the state space expansion problem in K order Markov predictor.Long et al. (2009) proposed an effective and simple trajectory prediction algorithm (E 3 TP), which mines frequent path model based on FP-Growth algorithm, and introduces speed, one of the most important characteristics into it.Otherwise, FP-Growth requires large storage space.So the effect is not very ideal in the case of a large amount of data.Moreover, E 3 TP is appropriate for unordered sequences, not for ordered time series.Kim et al. (2007) proposed a novel method for predicting a future path of an object in an efficient way by analysing past trajectories whose changing pattern is similar to that of a current trajectory of a query object.However, this method is only adapt for the road network, and has many limited conditions.Morzy (2007) mines the database of moving object locations to discover frequent trajectories and movement rules, and then matches the moving object trajectory with the movement rule database to establish a probabilistic model of object location.
Hurricane is a kind of strong and deep tropical cyclone generated in the Eastern Atlantic and the North Pacific region, also known as typhoon, cyclone.Hurricane movement is regarded as disengaged motion, which is generally accompanied by strong wind and heavy rain.For a long time, the formation of hurricane has been concerned by many researchers in various disciplines (Rozanova, 2004).Much progress has been made over the last decade in the understanding of physical processes and the quality of operational prediction of hurricane (Chan, 2005;Weber, 2005).The present prediction methods mainly include the numerical prediction, the objective forecast and the comprehensive forecast, where the objective forecast method based on statistical dynamics, due to its higher precision, has been more and more popular in recent years.Currently, the objective forecast method is commonly used in the hurricane trajectory prediction: 1. Persistence and Climatology Method (i.e.PC method).
It is one of the most widely used forecast methods, which has the advantages of simple calculation and high prediction precision.
2. Integrated Forecast Method.It is a comprehensive forecast method, which is integrated by many techniques' calculations, such as stepwise regression, multiple regression, stepwise multilevel discriminant, analysis of variance, fitting error analysis, autoregressive, etc.This method has a so short running time and can forecast many items.
3. Probability Forecast Method.It includes some techniques, such as REEP, discrete likelihood estimation, uncertain consumption, Bayesian, Markov chain experience statistics, etc.
However, the objective forecast method is so complex, because it takes many factors into account, such as phase transitions, vertical advection, and boundary layer effects, etc.The assumptions for the understanding of the underlying physical process, to simplify the complexity of the original system, often ignore some important properties (Rozanova et al., 2010).For example, due to using multiple regression analysis, PC method should select predictors from many factors according to the F value to construct prediction model.However, these selected predictors inevitably affect the prediction model itself.
Therefore, some researchers begin to use data mining technology to predict hurricanes trajectories.Zou et al. (2008) established a model, which is developed based on the similarities of key points on typhoon tracks to forecast typhoon tracks using historical data.The centres of active typhoons are compared with historical records of similar typhoons.The typhoon tracks are weighted based on their similarity.The centre positions are then quickly forecasted based the similarity weights.Kim et al. (2011) introduced the feasibility of a straightforward metric to incorporate the entire shapes of all tracks into the fuzzy c-means clustering method.This method is suitable for the data where cluster boundaries are ambiguous.Kim et al. (2012) proposed a seasonal tropical cyclone forecast model based on the tracking mode.This model combines fuzzy C-means clustering method with statistical dynamics.In addition, some concepts, such as grey theory, neural network, etc. are also introduced into the hurricane weather forecast.
Others are committed to examine the problem of forecasting the intensification of hurricanes using data mining techniques, and have got some achievements.Su et al. (2010) proposed a new hurricane intensity prediction model, WFL-EMM, which is based on the data mining techniques of feature weight learning (WFL) and Extensible Markov Model (EMM).Chatzidimitriou et al. (2005) formulated tropical storm intensification prediction as a supervised data mining problem; the objective being to produce accurate early warnings with respect to changes in wind speed of a particular storm.They examined two alternative approaches to discover classification rules on current hurricane data: particle swarm optimisation and class association rules.

Region discretization
The raw hurricane trajectory data is the Atlantic weather Hurricane/Tropical Data (1900Data ( -2008)), got from the website http://weather.unisys.com/hurricane/.The data from 1900 to 2000 is used to mine the motion patterns, and the rest is for predicting and verifying.
Figure 2 shows the raw information of a certain hurricane.Unfortunately, the domain of position coordinates is continuous and the granularity level of raw data is very low.Therefore, any pattern discovered from raw data cannot be generalised.To overcome this problem we choose to transform original paths of moving objects into trajectories expressed on a coarser level (Morzy, 2007).
The moving region can be divided into many square areas with same size.Each trajectory is converted to an area sequence.

Frequent trajectory mining
The pseudo code for the HTPDM algorithm is given in the annex for a transaction database TD, a support threshold of minsup and a confidence threshold of minconf.Frequent trajectory mining as the first step is to mine all frequent trajectories based on the Apriori algorithm, according to the userdefined minimum support threshold.
The Apriori algorithm (Agrawal and Srikant, 1994) is a classical algorithm for association rules mining.The name of the algorithm comes after a prior knowledge about frequent itemsets was used.The prior knowledge is that any non-empty subset of a frequent itemset is also frequent.Apriori algorithm uses a level-wised and iterative approach, it first generates the candidates then test them to delete the non-frequent item sets.Most of previous studies adopted an Apriori-like candidates generation-and-test approach.The hurricane trajectory is different from the traditional item sets, but a time series that change with time (Qin and Shi, 2006).The frequent trajectory mining problem from single series can be viewed as the mining problem of sequential patterns.Before introducing the algorithm, we first give some related concepts.
Trajectory length.The length of the trajectory t 1 is the number of elements < lat, lon > in the trajectory sequence, denoted |t 1 | or length (t 1 ).We refer to a trajectory of length x as x sequence.When x equals one, the trajectory is called unit trajectory, i.e. 1 sequence.
Adjacent unit trajectories.Let t 1 = {P 1 }, t 2 = {P 2 } be unit trajectories.If the square regions they represent share an edge, or at least one of the sequences {P 1 , P 2 } and {P 2 , P 1 } is the sub-trajectory of a certain trajectory, t 1 and t 2 would be said to be adjacent.t 1 and t 2 can be merged into a 2-sequence trajectory, denoted t 1 ∧ t 2 = {P 1 , P 2 } or t 2 ∧ t 1 = {P 2 , P 1 }.

Frequent trajectory set.
A trajectory t is frequent if its support exceeds userdefined threshold of minimum support (a real numbers between 0 and 1), denoted minsup.The set of all k frequent trajectories is denoted F k .The collection of all frequent trajectories is called frequent trajectory set, denoted FreTraSet.
Priori principle.If a trajectory is frequent, then all its subtrajectories must also be frequent.Conversely, if a trajectory is a non-frequent, then all its sub-trajectories also must be non-frequent.
Frequent trajectory mining is the first step of the HTPDM algorithm we proposed.Firstly, it scans the database TD for the first time for calculating all supports of the unit trajectories, and selects 1-frequent trajectory set F 1 through the comparison with minsup.Then it generates candidate frequent trajectory sets of length k C k from frequent trajectory sets of length k − 1 F k−1 , and prunes the candidates which have an infrequent sub pattern.After that, it scans the database TD to determine frequent trajectory set F k among the candidates.

Association rule generating
After frequent trajectory mining, the corresponding association rules can be generated according to the user-defined minimum confidence threshold, as the motion patterns stored in the database.Frequent trajectories are transformed into movement rules.A movement rule is an expression of the form h ⇒ t − h, where t is a frequent trajectory, and h is the rule's premise; t − h is the rule's conclusion.With respect to the movement rules, there are some concepts to introduce.
Support Of Movement Rule.The support of h ⇒ t − h is defined as the support of trajectory t.

support(h ⇒ t
(2) Association rule generating is the second step of the HT-PDM algorithm we proposed.The objective is to generate all corresponding association rules, which confidence exceeds user-defined threshold of minimum confidence.
A k-frequent trajectory can generate k − 1 association rules (h x → t − h x ).For example, let a trajectory t be {a, b, c, d}, it can be decomposed into three rules: {a} → {b, c, d}, {a, b} → {c, d}, {a, b, c} → {d}.Take each rule h x → t − h x into account, if its conf (= sup (t)/ sup (h x )) is greater than minconf, this rule would be stored in the database.

Pattern matching for predicting
This part is the last step of the HTPDM algorithm.It firstly matches the hurricane trajectories, which are used to predict after pre-processing, with each rule generated in the previous step and chooses the optimal one according to the evaluation function.If any association rule can not be found, a predicted point depend on the movement trend would be returned.Then the hurricane actual future trajectory and the predicted one by pattern matching would be compared to determine whether the predicted result is correct by computing the centre points' distance.Some related concepts are as follows.
Matching Length.Let t 1 = {P 1 1 , P 1 2 , P 1 3 , . . .P 1 n }, t 2 = {P 2 1 , P 2 2 , P 2 3 , . . .P 2 k } be two trajectories.We say that the matching length of t 1 and t 2 is c, if there exists a positive integer c, so that Otherwise, the matching length is 0. Evaluation Function.The evaluation function's value reflects the matching degree of current trajectory with motion patterns quantitatively.The higher the evaluation function's value is, the higher the matching degree would be.The impact factors are the rule's confidence and the matching length.Evaluation function is defined as follows.
where conf is the rule's confidence, and l match is the matching length of the current trajectory with the rule's premise.According to Eq. ( 4), we can calculate its value in different cases.Seen in Table 1, we find that the influence of l match is more and more obvious as the increase of the confidence.Under the condition of the same evaluation function value, i.e. two different rules' conf and l match are the same, we would select the one, which conclusion's length is longer, for pattern matching.
If no association rule is found for matching, a predicted point depend on the movement trend would be returned.For example, let t = {P 1 , . . . . . ., P k−1 , P k {(k > 2) be a current trajectory, where Standard For Correct Prediction Select the minimum m, according to the actual future trajectory's length and the predicted trajectory's length with pattern matching.Cut out the former m items of the two trajectories, and get their centre points.If the distance of the two points is not greater than 1, we would say that the prediction is correct.

Experiment
All experiments were conducted on a PC equipped with Pentium T2390 CPU, 1G RAM, and a SATA hard drive running under Windows XP SP2 Home Edition.Algorithms and the front-end application were implemented in C# and run within Microsoft .NET 2.0 platform.The experimental data is stored in Microsoft SQL Server 2000.
Depend on the HTPDM algorithm, the results of frequent trajectory mining and association rules generating are shown in Fig. 5.The predicted results are shown in Fig. 6.

Analysis
Throughout the process of prediction, minsup (minimum support) and minconf (minimum confidence) are the most important parameters for impacting prediction accuracy.minsup directly control the frequent trajectory mining, and minconf indirectly control the generation of association rules under the premise of frequent trajectories.The first correct rate reflects the accuracy of data mining technology for itself.With the increase of minconf, remove the last abnormal point, the correct rate overall presents a rising trend, and always maintain above 60 %.Because minconf is larger, the credibility of the generated association rules is higher; the error rate would be smaller.When minconf continues to increase, because the number of the generated association rules is less, the accuracy would present an unstable state, and would appear abnormal points as shown in Fig. 8.The second correct rate reflects the efficiency of data mining technology for the prediction system.With the increase of minconf, the correct rate is stable at the beginning and remains above 45 %.When minconf continue to increase from 0.4, however, the correct rate decreases rapidly to about 10 %.This is because, for the whole prediction system, if the system can not find the matching pattern, it would return a result according to the movement trend, and the accuracy of this speculation is fairly low.

Minsup and the prediction accuracy
Figure 8 shows that when minconf = 0.25, the prediction accuracy is the best.Therefore, we let minconf value to be 0.25.
Figure 9 shows the prediction accuracy with respect to the varying value of the minsup threshold for a set value of minconf = 0.25.With the increase of minsup threshold, two correct rates fluctuate by small degrees partly, but decrease generally.Under the condition of the same minconf value, the increase of minsup leads to the decrease of the frequent trajectory number, as well as the number of association rules.All these reasons result in the decrease of prediction accuracy.
Through repeated experiments, we find that when minsup set to 0.003 and minconf set to 0.25, prediction accuracy is the best.The experimental results show that in the 214 trajectories, the number of the trajectories, which match the motion patterns successfully, is 160, including 104 correct predicted  trajectories and 56 incorrect predicted trajectories.The accuracy is 65 %.The other 54 trajectories fail to match the patterns, including 19 correct predicted trajectories and 35 incorrect predicted trajectories.The accuracy is 35.2 %.In the whole 214 trajectories, the number of the correct predicted trajectories is 123 and the number of the incorrect ones is 91, the accuracy of the whole prediction system is 57.5 %.

Conclusions
This paper proposes a novel method for hurricane trajectory prediction based on data mining by integrating association analysis technology and using the real American Atlantic hurricane data.This method is different from the traditional dynamics modelling forecast affected by multiple factors.Firstly, all frequent trajectories in the historical hurricane trajectory database are mined by using association analysis technology and their corresponding association rules are generated as motion patterns.Then, the current hurricane trajectories are matched with the motion patterns for predicting.If no association rule is found for matching, a predicted result according to the hurricane current movement trend would be returned.The experiments show that the prediction accuracy is ideal and satisfactory.3. Developing more effective matching strategies and finding more useful evaluation functions.

Fig. 2 .
Fig. 2. A certain hurricane.(a) the summary information; (b) the schematic diagram of its trajectory; (c) the trajectory information.

Fig. 3 .
Fig. 3.An original path of a moving object.
a. Set the region size to 5 × 5, making the most of the processed trajectory points with single coordinate value within 1 unit jump;b.Process the raw trajectories as explained in Sect.3. A complete trajectory's storage type is string, such as {1, 2; 2, 3; 3, 4 . ..};

Figure 8
Figure8presents the prediction accuracy with respect to the varying value of the minconf threshold for a set value of minsup = 0.003.The calculation methods for the prediction accuracy are shown as follows:

Fig. 7 .
Fig. 7.The predicted result of the trajectory with the id number of 126.The black part is the current trajectory for predicting; the blue part is the actual future trajectory for verifying; The red part is the predicted trajectory by data mining.

Table 3 .
The datum in experiment_data table.