Signal discrimination of ULF electromagnetic data with using singular spectrum analysis – an attempt to detect train noise

Electromagnetic phenomena associated with crustal activities have been reported in a wide frequency range (DC-HF). In particular, ULF electromagnetic phenomena are the most promising among them because of the deeper skin depth. However, ULF geoelctromagnetic data are a superposition of signals of different origins. They originated from interactions between the geomagnetic field and the solar wind, leak current by a DC-driven train (train noise), precipitation, and so on. In general, the intensity of electromagnetic signals associated with crustal activity is smaller than the above variations. Therefore, in order to detect a smaller signal, signal discrimination such as noise reduction or identification of noises is very important. In this paper, the singular spectrum analysis (SSA) has been performed to detect the DC-driven train noise in geoelectric potential difference data. The aim of this paper is to develop an effective algorithm for the DC-driven train noise detection.


Introduction
There are many reports on seismo-associated electromagnetic phenomena in a wide frequency range (Hayakawa and Molchanov, 2002;Molchanov and Hayakawa, 2008).Measurements of electromagnetic phenomena can be classified into three types; (1) passive ground-based observation, (2) active ground-based observation using transmitter signals, and (3) satellite observation (e.g.Hattori, 2004).Among these observational methods, one of the most promising, is ULF (ultra low frequency, with a frequency of less than 1Hz) electromagnetic observation on ground because Correspondence to: S. Saito (s-saito@graduate.chiba-u.jp) of skin depth (Fraser-Smith et al., 1990;Molchanov et al., 1992;Kopytenko et al., 1993;Hayakawa et al., 1996;Hattori et al., 2002;Hattori, 2004).ULF electromagnetic data (frequency range 0.001 ∼ 1 Hz) are considered a superposition of signals of different origins.Figures 1-3 show an example of ULF electromagnetic variation associated with a geomagnetic storm, precipitation, and train noise.The typical amplitude of them is introduced in Table 1.In the case of electric data, the first one is due to precipitation.The second one originated from the external source field associated with the solar-terrestrial interactions such as geomagnetic pulsations or geomagnetic storms and their induced fields which appear on a global (hundreds of km) scale.The third one is artificial noise associated with the leakage current from DCdriven trains.And the fourth one is propagating under the ground and are considered earthquake-related signals and are to be detected.The third and fourth ones are regional (a few tens of km) signals.In order to detect the weak, earthquakerelated signals, an effective signal discrimination will be required (Hayakawa et al., 1996(Hayakawa et al., , 2000;;Hattori et al., 2002Hattori et al., , 2004aHattori et al., , b, 2006;;Hattori et al., 2004;Harada et al., 2004, 2005, Telesca and Hattori, 2007;Telesca et al., 2008).In this paper, the singular spectrum analysis (SSA) (Golyandina et al., 2001) has been adopted to develop a signal discrimination method for detecting DC-driven train noises, which are considered the more intense component in the electric field data (Ishikawa et al., 2007).In this paper, the principle of SSA will be given and a simulation has been performed to evaluate the detection of modeled train noise in various parameters.The results of the simulation show the ability to detect train noise.Therefore, the developed method has been applied to the observed data.    2 Principle and procedure of SSA SSA is a kind of periodic analyses.The resolution is one data point and it is a model-free tool for detecting impulsive changes in the time series data.In this sense, SSA has an advantage in comparison with the Fourier and wavelet transform, although a large scale matrix should be handled.In this section, the procedure of SSA and its performance are briefly described.
6. Using the eigentriple, the following elements are obtained; The decomposed matrix X i is given by as follows: where x i ij with i = 1,.....,k and j = 1,....,L. 7. The ith principal components G i j are given by the diagonal averaging in the following; 8.Then, the original time series is divided into N components.(d) correspond to reconstructed trend component, the annual variation, the seasonal variation (the 4-month period), and the variation of the 6-month period, respectively.We can see changes of intensity of periodic components in time series clearly and this is a strong advantage of SSA. Figure 4e shows the residuals regarded as a noise component.

Capability of SSA for train noise detection
In this section, a simulation has been performed to check the capability of detection of the DC-driven train noise using SSA. Figure 5 depicts a simple model of the electric power circuit of DC-driven train system.When a train is running between substations A and C, it is receiving electric power supplies I AC and I CA from substation A and C, respectively.At this time, an electric current in a line flows through the railway and returns back to each substation as a feedback current.However, some of it leaks into the ground and returns as a leak current.Therefore, geoelectric potential difference measurements near railway tracks are strongly affected by  the current.It is important to identify the train noise at these stations.Here, we assume a rectangular pulse as a DC-driven train noise based on our previous experiences (Ishikawa et al., 2007) and simplicity.The procedure of the simulation is as follows: 1. Create time series data as modeled train noise and data with white noise as shown in Fig. 6. Figure 6a, b and  c shows a simple rectangular pulse with some duration, white noise adding to the modeled data, and simulated data with some noise a + b.
2. Perform SSA to the time series modeled data (Fig. 6a) and the simulated data (Fig. 6c) and decompose into N principal components as given in Eq. ( 9).The obtained decomposed components are described in Fig. 7a and b, respectively.
3. Compute correlations among the decomposed components of Figs.7a and b.The results are displayed in Fig. 8a .
4. Extract components with high correlation only (the correlation r > 0.7 in this paper) (see Fig. 8b).
5. Reconstruct a time series by using components satisfying the above condition (Fig. 9c).
The parameters of the simulation are signal noise ratio (SNR), duration of a rectangular pulse (D), number of the rectangular pulses (P ), and window length (WL).We define SNR using amplitudes of a rectangular signal and the white noise.In this paper, we pay attention to the SNR and WL dependences of the results for data length = 300.Figure 9 shows an example of the simulation with D = 20, P = 1, SNR = 4, and WL = 50.Figure 9a, b, c and d corresponds to the train noise model, the train noise with white noise, a reconstructed time series data with the above SSA reconstruction procedure, and mean squared error (MSE: ((a)-(c)) 2 , respectively.If MSE is within mean +σ , the reconstructed data mostly satisfy the criteria and are identical to the model.But at the edges of the rectangular pulse, MSE reaches high values.This means that the train noise cannot be removed completely.However, using these characteristics, the capability to detect this type noise has been found.

Application of SSA to ULF geoelectric potential difference data
Figure 12 shows a map of ULF electromagnetic stations and routes of the DC-driven railway at Boso Peninsula, Japan.
We have installed 3 ULF electromagnetic stations which measure three components of geomagnetic fields and two horizontal geoelectric potential differences (Hattori et al., 2004).The names of the stations are Kiyosumi (KYS), Uchiura (UCU), and Fudago (FDG) and the inter-sensor distance is about 5 km.The distance between the stations and the track is 5-10 km.The observed data at these stations are found to be contaminated with noise from the DC-driven railway system.Figure 13 shows the configuration map of   2 describes the timetable of the first train.The rectangular-shaped train noises are generated when the train accelerates and decelerates (Ishikawa et al., 2007).Therefore, the noise pattern resembles every day in the study region.The typical duration of the rectangularshaped noise is about 20 s.The other two stations (UCU and FDG) also register similar variations.When we investigated observed data for several years, the similar changes were recognized almost the same time in the case of the same diagram of the railways.Since transient changes in the Ex component were more apparent, hereafter we focussed on the Ex component.Instead of the rectangular pulse, we took a monthly average of the geoelectric data for a realistic model of the train noise.Figure 16a shows an example of the monthly average model for the first train of the day in September 2002.Figure 16b-d includes the daily variations of the geoelectric field on 1, 2 and 3 September.Although the variation on each day resembles others, the rectangular or sharp shape is lost and a waveform after a low pass filtered can be seen due to some difference of occurrence time and amplitude.This means that characteristics of a rather discrete or intermittent structure for the train disappear.For the simulation experiments in Sect.2.3, the simple rectangular pulse is adopted in the train noise model, but the model for the practical application in this section has only components with lower frequencies.This also is expected to provide larger values of MSE in investigation on detection of transient changes, such as the train noise as shown in Fig. 9d.Since the model is made from the monthly average, effects of random noises such as induction of geomagnetic pulsations are depressed.The exact same procedure described in the previous section is performed on the above-mentioned monthly average model and a practical variation on a certain day.The SNR between the rectangular pulse and background variation in Ex component for the real data as shown in Fig. 14 17a shows the monthly average model and the reconstructed variations using its decomposed components with high correlations.Figure 17b shows the variation on the day and the reconstructed variation using decomposed components with high correlations.Fig. 17c gives the variations of MSE and its mean value + standard deviation (σ ).When the MSE exceeds the criteria of mean +σ values, it seems that the detection of the train noise is successful.The red dots in Fig. 17b indicate the anomalous MSE values and it is safe to say that we can detect noise for the first train in the vicinity of the station.

is rather
The developed method is applied to a longer and daytime time series data.Figure 18a and b shows an example of results based on the one station analysis.It was found that the train noise can be detected but it seemed to provide many errors.In order to remove fault detections, multiple (three) station analysis was examined for improvement.Figure 19 shows an example of the results for the three station analysis.Figure 19a, c and e indicates the observed data of Ex at KYS, UCU, and FDG, respectively.Figure 19b, d and f shows the MSE and threshold values as described in Fig. 17c.The red colored dots are given by simultaneous satisfaction of the criterion of MSE > mean + σ for all the stations.It is safe to say that the faint changes seem to be reduced and a promising detection of the train noise results is achieved.
Finally, the developed method was applied to the daytime data.Table 3 and Fig. 20 describe the timetable of two trains and their running directions, respectively.Figure 21a and b shows the result of one station (KYS) analysis and Fig. 21c shows that of the three stations analysis and it is found that multiple station analysis is essential to detect the train noise.These results suggest that the three (multiple) station operation seems to be more effective for train noise detection.

Conclusion and discussion
We demonstrate the performance of the developed algorithm for the detection of DC-driven train noise based on SSA.By using multiple stations data, the detection of the DC-driven train noise seems to be promising and convincing.The developed method shows the effectiveness for the daytime data when multiple trains are in operation around the study area.
In this paper, we only show the ability to detect the train noise.This technology does, however, provide the increase of possible data.The developed method enables the analysis of daytime data in time series, although we did not use them for investigation on earthquake-related ULF electromagnetic phenomena because of contamination from intensive train noise so far.
A further and intrinsic problem is to remove the train noise from the data.We extract the intense factors from the observed time series data because the signal we want to investigate might be very weak.SSA has the capability to remove the known perturbations using an adequate model.If we can find out the possible model of the DC train current, there is a high chance of removing the DC-driven train noise.The possible candidates are the substation data (consumer current data) as shown in Fig. 5 and the monitoring of leak current in the vicinity of substations or railway tracks.

Fig. 4 .
Fig. 4.An example of SSA.A variation of the monthly average of air temperature at Chiba from January 1967 to December 2007 has been investigated.(a) The monthly average of air temperature and its trend, (b) the reconstructed annual component, (c) the reconstructed seasonal component (4-month period), (d) the reconstructed components of a six-month period, and (e) the noise component reconstructed from the residuals.

Fig. 5 .
Fig. 5. Schematic view of the electric circuit for a DC-driven train system.

Fig. 6 .
Fig. 6.The DC-driven train noise model for the simulation.(a) an example of the DC-driven train noise model, (b) white noise to be added to the simulated data, and (c) an example of simulated data (a + b).

Fig. 7 .
Fig. 7. Decomposed components using SSA.(a) 20 decomposed components for Fig. 6a.(b) 20 decomposed components for Fig. 6c.The number shown in each panel indicates the ith principal component of SSA decomposition.

Fig. 8 .
Fig. 8.The correlation diagram.The horizontal and the vertical axes correspond to the number of the principal component of the simulated data and the model data, respectively.(a) The correlation diagram between Fig. 6a and c.The brightness shows the correlation.(b) The correlation diagram with the correlation r > 0.7 between Fig. 6a and Fig. 6c.

Figure 4
Figure4shows an example of the application of SSA for temperature data.A red line in Fig.4ashows the variation of the monthly average of temperatures at Chiba, Japan from January 1967 to December 2007.The monthly average values are input data for SSA.The decomposed components are given in blue lines in Fig.4.Blue lines in (a), (b), (c), and (d) correspond to reconstructed trend component, the annual variation, the seasonal variation (the 4-month period), and the variation of the 6-month period, respectively.We can see changes of intensity of periodic components in time series clearly and this is a strong advantage of SSA.Figure4eshows the residuals regarded as a noise component.

Fig. 9 .
Fig. 9.An example of the simulation results in the case of SNR = 4, WL = 50, D = 20, N = 1.(a) The assumed DC-driven train noise, (b) the simulated data ((a),+ white noise), (c) reconstructed data of the model with high correlation using SSA procedure, and (d) MSE.

Fig. 10 .
Fig. 10.The variation of MSE with the SNR for various WLs.

Fig. 11 .
Fig. 11.The variation of MSE with the WL for various SNRs.

Fig. 12 .
Fig. 12.The map of ULF stations and DC-driven railway routes in Boso Peninsula.

Fig. 14 .
Fig. 14.ULF electromagnetic data when the first train of the day needs to cover the region near the stations.(a) Ex component (mV m −1 ), (b) Ey component (mV m −1 ), (c) Bx component (nT), (d) By component (nT), and (e) Bz component (nT).They are observed at KYS station on 1 September 2002.

Fig. 15 .
Fig. 15.The location of the stations and the traveling direction of the first train used in Figs.14, 16, 17, 18 and 19.

Fig. 16 .
Fig. 16.An example of the monthly averaged data for a model of a DC-driven train noise and real observed data at KYS station.(a) A monthly averaged model of the Ex component for September 2002, (b) Ex component for 1 September 2002, (c) Ex component for 2 September 2002, and (d) Ex component for 3 September 2002.

Fig. 17 .
Fig. 17.An example of the DC-driven train noise detection of the first train for Ex component at 20:15-20:20 UT on 1 September 2002.(a) The model data of the monthly average and the reconstruct data using SSA decomposed components with high correlation values, (b) the observed data and the reconstruct data using SSA decomposed components with high correlation values, and (c) MSE and the threshold values.

Fig. 18 .
Fig. 18.An example of the DC-driven train noise detection in Ex component.The analyzed period is 20:00-20:40, on 1 September 2002 (UT).(a) The observed data and the reconstructed data using SSA decomposed components with high correlation values.(b) MSE and the threshold values.

Fig. 19 .
Fig. 19.An example of the DC-driven train noise detection in Ex component with multiple station data.The analyzed period is 20:00-20:40, on 1 September 2002 (UT).(a, c and e) The observed data and the reconstructed data using SSA decomposed components with high correlation values at KYS, UCU, and FDG, respectively.(b, d and f) MSE and the threshold values at KYS, UCU, and FDG, respectively.

Fig. 20 .
Fig. 20.The location of the stations and traveling directions of two trains used in Fig. 21.

Figure 17
Figure 17 is an example of the results when the first train covers the region near the stations.It shows the variations of Ex from 20:15 to 20:20 UT on 1 September 2002.Figure17ashows the monthly average model and the reconstructed variations using its decomposed components with high correlations.Figure17bshows the variation on the day and the reconstructed variation using decomposed components with high correlations.Fig.17cgives the variations of MSE and Figure 17 is an example of the results when the first train covers the region near the stations.It shows the variations of Ex from 20:15 to 20:20 UT on 1 September 2002.Figure17ashows the monthly average model and the reconstructed variations using its decomposed components with high correlations.Figure17bshows the variation on the day and the reconstructed variation using decomposed components with high correlations.Fig.17cgives the variations of MSE and

Fig. 21 .
Fig. 21.An example of the DC-driven train noise detection in Ex component with a single station and multiple station data.The analyzed period is 03:00-04:00, on 1 September 2002 (UT).(a) The observed data and the reconstructed data using SSA decomposed components with high correlation values at KYS station.Red dots indicate candidates of train noise using a single station threshold.(b) MSE and the threshold values at KYS. (c) The observed data and the reconstructed data using SSA decomposed components with high correlation values at KYS station.Red dots show candidates of train noise using a multiple station threshold.

Table 1 .
ULF electromagnetic changes associated with some sources.

Table 2 .
Railway stations and the timetable for the first train.