A Homogeneous Earthquake Catalogue for Turkey and Surrounding Region

A new earthquake catalogue for Turkey and surrounding region (32° 47° N, 20° 52° E) is compiled for the period 1900-2017. The earthquake parameters are obtained from the Bulletin of International Seismological Centre that is fully updated in 2020. New conversion equations between moment magnitude and the other scales (md, ML, mb, Ms and M) are determined using in the General Orthogonal Regression method to build up a homogeneous catalogue, which is the 10 essential data for seismic hazard studies. The 95% confidence intervals are estimated using the bootstrap method with 1000 samples. The equivalent moment magnitudes (Mw*) for the entire catalogue are calculated using the magnitude relations to homogenise the catalogue. The magnitude of completeness is 2.9 Mw* and 3.0-3.2 Mw* for Turkey and Greece generally. The final dataset is not declustered or truncated using a threshold magnitude because of motivation for generating a widely usable catalogue. It contains not only Mw*, but also the average and median of the observed magnitudes for each event. 15 Contrary to the limited earthquake parameters in the previous catalogues, the 45 parameters of approximately 700k events occurred in a wide area from the Balkans to the Caucasus are presented.


Introduction
The earthquake catalogues are the first output of seismological observations. National and international catalogues are generated by several institutions around the world for understanding the seismic activity of a region. Principally, a catalogue 20 contains the parameters such as origin time, coordinates and focal depth. Although the magnitude of an earthquake, which is a dimensionless scale of energy release, is one of the main seismological parameters, it has different scales (types) based on different seismic wave types and determining approximation (Table 1). A catalogue may not contain all magnitude scales for an event. If an earthquake catalogue is used for just showing seismicity on a map, the magnitude type may not be important because the differences among the values of scales are not too big for visualisation. However, magnitude scale information 25 used in energy calculation is crucial for seismic hazard studies.
There are several unknowns for magnitude calculations of institutions due to used equations, seismic network structures, man-made mistakes etc. Both amplitude and distance constants in the magnitude equations are the major items. Although https://doi.org/10.5194/nhess-2020-368 Preprint. Discussion started: 21 November 2020 c Author(s) 2020. CC BY 4.0 License. they are specific for a region because of seismic wave attenuation in the crust and mantle, the constants calculated from the 30 Californian earthquakes (i.e. for local magnitude by Richter, 1935;Hutton and Boore, 1987) are widely used. On the other hand, individual magnitudes are calculated at each station for an event; then they are averaged. The averaged magnitude is closely related with several factors: The number of stations, the standard deviation of the average, amplification or attenuation due to the geological structure beneath the station, the radiation pattern of the seismic waves related with the azimuthal distribution of stations. Therefore, institutions report different magnitudes for an event. Another issue picked out 35 in this study is about moment magnitude (M w ) in catalogues. M w is determined using waveform modelling for events (M w ≥3.5-4.0) that have a high signal-to-noise ratio. However, a few institutes report M w for small events (M w <3.0, i.e. 25.01.1999 13:06 M w =1.8 by Cyprus Geological Survey Department; 29.05.2014 01:14 M w =1.8 by the Earthquake Research Center, Ataturk University). It is clear that these magnitudes are determined by using a relationship equation, but it cannot be proved this type of man-made faults. Consequently, there are more than one magnitude values for an event with known and 40 unknown calculation errors, and only one scale for each event must be used in the studies based on the parametric data such as hazard mitigation analyses. At this point, essential of a homogenised catalogue with a common magnitude arises. In the last two decades, the studies on unifying earthquake magnitudes and generating improved catalogue are carried out for different parts of the earth (i.e. Grünthal et al., 2009;Chang et al., 2016;Manchuel et al., 2018;Rovida et al., 2020 This study focuses on the earthquakes that occurred in Turkey and surroundings. The region is one of the most geodynamically active areas on the earth and deformed among Eurasian, African and Arabian plates (Fig. 1). Both the 50 continental collision between Arabia and Eurasia and subduction of the African Plate beneath Eurasia started in the Early/Middle Miocene (11-23 Ma). The interactions of the three plates are the major driving forces for the tectonics of the region. The plate motions result in thrust faulting in Eastern Anatolia, Caucasus and Iran, normal faulting in Western Turkey and Greece, and transform faults due to escaping to west and east (see Bozkurt, 2001 for a brief synthesis). The complex tectonic character of the region causes a high number of earthquakes with different faulting mechanism and a wide range of 55 focal depths. https://doi.org/10.5194/nhess-2020-368 Preprint. Discussion started: 21 November 2020 c Author(s) 2020. CC BY 4.0 License. The destructive earthquakes in Turkey and the surrounding countries along the centuries are found in the historical records. Pınar and Lahn (1952), Ergin et al. (1967Ergin et al. ( , 1971, Soysal et al. (1981), Güçlü et al. (1986), Ambraseys and Finkel (1995), Ambraseys and Jackson (1998) compile the historical earthquakes in the region. Tan et al. (2008) present the historical 65 events on a digital database and publish the first catalogue that contains the focal mechanism parameters of the earthquakes in Turkey. On the other hand, Leptokaropoulos et al. (2013) and Kadirlioğlu et al. (2018) introduce homogenised catalogues for the Turkish earthquakes. The main component of homogenization is to obtain reliable magnitude conversion from one scale to moment magnitude. Several empirical relations are also proposed for the region (Papazachos et al., 1997;Ambraseys 2000;Baba et al., 2000;Burton et al., 2004;Ulusay et al., 2004;Akkar et al., 2010;Deniz and Yücemen, 2010). 70 The motivation of this study is to build a widely usable earthquake catalogue (i.e. for geophysicists, geologists, earthquake engineers) that contains homogenised moment magnitudes and the other seismological parameters. During the international seismic hazard studies of the Sinop Nuclear Power Plant that is planned to construct in northernmost of Turkey, it is clearly understood that a comprehensive homogenised earthquake catalogue for Turkey is needed for future studies. For this aim, all 75 earthquakes occurred in a wide area are analysed with a statistical approach, and the empirical magnitude relation equations are obtained using a refined data set. Then, an extensive homogenised earthquake catalogue for Turkey and the surrounding https://doi.org/10.5194/nhess-2020-368 Preprint. Discussion started: 21 November 2020 c Author(s) 2020. CC BY 4.0 License. region is constructed. The distinguishing feature of the new homogenised catalogue is that it contains all earthquakes in a manageable format from Greece to Azerbaijan without removing aftershocks and truncating small events.

Database and processing 80
The Bulletin of the International Seismological Centre (ISC, 2020) is used as the main database to generate a new and comprehensive homogeneous earthquake catalogue for Turkey and the surrounding region. The ISC Bulletin contains a large number of parametric data for an event that occurred anywhere on the earth. Because all national and international seismological centres contribute to the bulletin, it contains not only moderate-to-large events (M≥4) but also local earthquakes with small magnitudes (M<4). The most important feature of the bulletin is that an event with sufficient data is 85 manually checked and relocated by a seismologist. Therefore, the latest event information in the database is two years behind in real-time (ISC, 2020). The bulletin also presents the event parameters reported by the contributor centres. The ISC finished rebuilding the entire database in 2020. The ak135 seismic velocity model (Kennett et al., 1995) and location procedure that is recently used by the ISC is implemented to all data. Furthermore, a large number of earthquake data from the permanent and temporary networks have been added (ISC, 2020;Storchac et al., 2017). Therefore, the latest and revised 90 international dataset is used in this study.
The earthquake parameters in the bulletin are in the IASPEI Seismic Format (ISF, 2020). Each event has its own data block, such as origin and magnitude, contains several data types and comments. Data and comment lines have no specific flag to identify their types, and it is not possible to read the database using a simple computer program or shell-scripts. A Fortran 95 code is written to analyse the ISF lines using the parsing subroutines provided by the ISC. Each line in the database is checked by the different parsers to identify its data type. After determining the origin and magnitude sub-blocks of an event properly, the parameters are analysed. The overall data processing is given in the flowchart in Fig. 1. In the first step, the origin data such as time, location and focal depth are searched for the "PRIME" comment that indicates the residuals is useful to prefer the hypocentre parameters. The hypocentres determined by the ISC are always prime. If there is no "PRIME" 100 flag, the origin data is searched in the secondary hypocentres using a priority order for the institutes given in the flowchart.
The parameters reported by the ISC are preferred first. If there is no information from the ISC, the availability of the origin parameters of the European-Mediterranean Seismological Centre (CSEM or EMSC) are tried to find (see Appendix A for the institute abbreviations). The priority of both institutes is high because they use all available data in the study area. In turn, ISK (Kandilli Observatory and Earthquake Research Institute, KOERI) and DDA (General Directorate of Disaster Affair 105 until September 2017; Disaster and Emergency Management Presidency -AFAD after October 2017), which are the national seismological networks in Turkey, are selected. The other institutes are used for the local events around Turkey. Besides, the earthquake information reported by ISS and GUTE is used for the pre-instrumental period . If the origin https://doi.org/10.5194/nhess-2020-368 Preprint. Discussion started: 21 November 2020 c Author(s) 2020. CC BY 4.0 License.
parameters of an event are found in any step of this query order, this event is added to the homogenised catalogue with these parameters. 110 After determining the origin parameters of an event in the selected area, the magnitude data sub-block are analysed by the magnitude parser. The values of different magnitude scales given in Table 1 are collected. If there are two or more values for each type, average with standard deviation and median are calculated. Selecting a magnitude value from a particular institute is not preferred to overcome the problems such as unreported magnitude, the effect of network distribution, calculation 115 errors. On the other hand, we have no evidence for that an institute calculates true magnitude for an earthquake.   . N is the total number of event for each magnitude.

Refining the dataset
The dataset is refined in detail for regression analyses to obtain the empirical relations between the magnitudes. In the first 155 step, the catalogue is declustered using Reasenberg's (1985) second-order moment approximation because removing aftershocks is necessary to determine reliable magnitude completeness. For aftershock analysing in space, a subsequent event is searching in an area with a radius of 20 times of the circular source dimension of the preceding event considering ±4 km hypocentre uncertainties (Kanamori and Anderson, 1975;Reasenberg, 1985). The maximum interaction period for the next event in a sequence is 10 days to build a temporal extension for a cluster. After declustering, the earthquakes occurred 160 after 1980 are selected because the national station networks and data analyses procedure become much more reliable in Turkey. In the third step, completeness (Mc) for each magnitude is determined and it is found that Mc is about ~2.8 for m d and M L , ~4.0 for m b and M s . The earthquakes with averaged magnitudes are smaller than the Mc thresholds are excluded in the regressions. In the last step, a cut-off value is applied for high differences between magnitude pairs. There are, naturally, differences among the reported magnitudes for an earthquake. Occasionally, the difference between the magnitude pairs may 165 be as large as 2 or more magnitude units. After obtaining the distribution of the differences for each pair, the data points that are out of the 95% confidence interval (±2σ) are removed using the Interquartile Range (IQR) method (Galton, 1869;MacAlister, 1879), which is one of the robust methods for outliers and can be successfully applied to seismological data (i.e. Tan et al., 2010Tan et al., , 2014Tan, 2013) https://doi.org/10.5194/nhess-2020-368 Preprint. Discussion started: 21 November 2020 c Author(s) 2020. CC BY 4.0 License.

Regression Analyses
The relationships of the refined magnitude pairs are estimated using the general orthogonal regression (GOR). The method is better estimator than the least-square (LS) approximation when both x and y variables have errors of non-negligible size 175 (Castellaro et al., 2006). The slope (a) and intercept (b) value of the GOR line in the form of y = a·x + b is given by 180 where ! ! , ! ! and !" ! are the covariance of X (independent variable), Y (dependent variable) and between X and Y, respectively (i.e. Castellaro et al., 2006;Das et al., 2014). and are the average values of the variables. η is the error variance ratio of the variables (σε X , σε Y ) and defined as η = (σε X / σε Y ) 2 . When the standard errors of the variables are not known, η is arbitrary set to a value. In practice, η = 1 (squared Euclidean distance) gives good results (Castellaro et al., 2006;Das et al., 2014). In this study, η is tested for the values from 0.5 to 2.0 to seek a better fit. The R 2 values do not 185 increase when η is assigned different than 1.0 and a significant improvement is not observed in the regressions. Besides, the real errors of the magnitudes are not known; η = 1 is used. The squared Euclidean distance gives better results for all magnitude scales. The 95% confidence intervals of the best-fit lines are determined with the bootstrap method (Efron, 1979).
Total 1,000 new regressions are calculated using 50% of the total number of data of each relation. The bootstrap samples are randomly selected using the Mersenne Twister random number generator (Matsumoto and Nishimura, 1998), and the 190 random numbers are unique in each test to prevent multiple selections of any datum. After obtaining a large set of the constants a and b of the linear fits, the outliers are removed with the IQR method. Then, the standard deviation (σ) of the normally distributed dataset is calculated.
The GOR results are given in Table 3  only with M ≥ 5.0 before 1964 in the study area. Therefore, an M-M w conversion is necessary for seismic hazard analyses 205 using long-term seismicity data. There are few magnitude pairs (N = 228) and they distribute sparsely between 3 and 7 with high standard deviation (Fig. 4).

Homogenization
The GOR results are implemented to all events in the study area. First M w is searched and assigned as M w * if found. For the events without M w , the first averaged magnitude with non-zero value is chosen according to the priority of saturation order in Table 1

Completeness of the Catalogue
One of the important parameters of an earthquake catalogue is the magnitude of completeness (Mc). Mc is a threshold magnitude and indicates that the earthquakes with magnitudes greater than Mc are recorded in a study area. It is determined using Gutenberg-Richter's (1954) cumulative frequency-magnitude law (GR). The GR relation is simple but powerful and formulated as log (N) = a -b·m, where N is the cumulative number of events with magnitudes equal to or greater than m. The 245 other useful parameter derived from this equation is the b-value (slope). The b-value is around 1 for the tectonically active areas.
The instrumental period (since 1964) observation for the region shows a linear relation with b = 0.91 between the cumulative number of earthquakes and equivalent moment magnitude, M w *, (Fig. 6). If the dataset is extended to cover pre-instrumental 250 period , the linearity for the magnitudes between 5 and 7 due to the magnitude calculation uncertainties of the earthquakes in that time span. The Mc, the lowest intercept point of the linear fit with slope b, is 2.9 for the over all

265
The maximum curvature method (Wyss et al., 1999;Wiemer, 2001) is applied to investigate the spatial and temporal change of Mc for the instrumental period. Equal horizontal sampling in latitude and longitude is not used to prevent artificial elongation because the length of 1˚ of longitude is ~94 and ~76 km in south and north of the study area, respectively. I use 20 km grid spacing and at least 100 events larger than the completeness magnitude (M w * > 2.9) in 100 km radius for the spatial distribution of Mc. On the other hand, the temporal variation is estimated using a window with 500 events and a step 270 of 25 events. These sampling parameters are sufficient to avoid erroneous statistical results for the b-value and Mc due to under-sampling and non-homogenous subsets (Amorese et al., 2010;Kagan, 1999Kagan, , 2002Kagan, , 2010Kamer and Heimer, 2013;Shi and Bolt, 1982). The contour map given in Fig. 8 shows that the homogenised catalogue is complete down to Mw* 3.0-

Discussions
Generating an earthquake catalogue is the main issue for seismologist. An institution that operates a costly seismological 280 network provides the main parametric information of an event from raw waveform observations. The parametric catalogues are released in paper prints before the internet and are online anymore. Although accessing catalogues is very easy via the internet, it is difficult to obtain all available data due to some limitations of the data providers' web pages. The problems of online datasets, such as absence or limited observation for the past years, a limited number of parameters, lack of parameter uncertainties, listing limitations, useless formats in web pages etc., make difficult to use the earthquake data for a large range 285 of users. However, most of the researchers pay only attention to the homogenised magnitudes and the number of events.
Unfortunately, the importance of a large number of parameters and their uncertainties in a catalogue are missed, and the given datasets less useful for the studies other than seismic hazard analyses.
The earthquake information for Turkey comes from two national networks operated by the KOERI and AFAD. Both 290 institutes have a large number of stations around Turkey and report recent events online. The date, time, depth and magnitudes without uncertainties of events are given by the search engines of both institutions. While the KOERI lists only 50k events in a single search with a downloadable text file, the AFAD search result is given with maximum 100 events at each window and can be downloaded in the comma-separated CSV format. Another online catalogue with the same parameters is provided by the EMSC. The searched events can be downloaded in the CSV format with the limitations of 5k 295 lines. Among the three institutions, only the KOERI provides all available magnitude scales for an event. Additionally, the EMSC does not provide the type of magnitude scale for an event. On the contrary, the ISC provides all available parameters for an event determined not only by itself but also by the other institutions as mentioned in the previous chapter. The magnitudes in the ISC event list are given in separate lines, so it is not easy to use without knowledge of the comprehensive bulletin format and programming. The online bulletin search of the ISC has also output limitation with 60k events. 300 Besides the online catalogues, some catalogue compilations based on homogenization of magnitudes for Turkey and its vicinity are published. Leptokaropoulos et al. (2013)  containing ~6573 events between 1900 and 2012. They use the same dataset and conversion equations in their previous study (Kadirioğlu et al., 2014;Kadirioğlu and Kartal, 2016). Their final catalogue is declustered and contains events only M w * > 4.0 (not observed M w as given in the catalogue, notation mistyping). On the other hand, Kadirioğlu et al. (2014Kadirioğlu et al. ( , 2018 mention that a 10 km of focal depth is assigned to the events without reported depth or shallower than 1 km in the final 310 catalogue. This is an arbitrary and unrecoverable parameter assignment and may generate artificial errors in future studies using this catalogue, especially in seismic hazard analyses.

320
The common structure of the previous catalogues mentioned above and others has limited earthquake parameters such as date, location, depth and M w *. Especially, the observed magnitudes and error/uncertainty values are not included. The source institute of the parameters is also missing. Therefore, it is impossible to trace back to the origin of the parameters, and the equivalent moment magnitude (M w *) cannot be recalculated using newly determined conversion equations. On the other hand, a truncated final earthquake list using a magnitude threshold is not useful for the researchers who not familiar details 325 of earthquake catalogues and want to analyse or map whole instrumental period seismic activity in a region. The homogenised catalogue overcomes the common deficiency of the previous earthquake catalogues for Turkey and surroundings.

Conclusions
Turkey and the surrounding area is one of the most seismically active regions on the earth. Therefore, improved earthquake 330 catalogue studies are necessary. A new, extended and homogenised earthquake catalogue is compiled in this study. The main aim is to present an earthquake database in an easily manageable ASCII format for a broad range of researchers. The study is based on the latest ISC Bulletin that its rebuilding process was finished in 2020. All parameters of the earthquakes during the period from 1900 to 2017 in an extended region from the Balkans to the Caucasus are analysed. The origin parameters and magnitude data in the IASPEI Seismic Format are systematically parsed with a Fortran algorithm. 335 Approximately 700k events in the study area bounded by 32° -47° N and 20° -52° E are compiled (Fig. 3). The equivalent moment magnitude (M w *), which is the mandatory parameter for the seismic hazard studies, is calculated for all events. For https://doi.org/10.5194/nhess-2020-368 Preprint. Discussion started: 21 November 2020 c Author(s) 2020. CC BY 4.0 License.
According to the values of M w *, the overall catalogue is complete down to Mc = 2.9. The spatial completeness variation indicates Mc = ~3.0-3.2 in Turkey and Greece, and as high as 4.5 in the Caucasus. The catalogue is not declustered or truncated using a threshold magnitude to be useful for geophysicist, geologist and geodesist. The M w * values can be easily recalculated and the catalogue can be declustered using different parameters by seismologist and earthquake engineers for seismic hazard studies. The final dataset contains not only M w * as in the previous studies but also the average with standard 345 deviation and median of the observed magnitudes. The ISC event ID-number and geographic region of each event are also given to trace an event in the bulletin. Total of 45 parameters is presented. 350 https://doi.org/10.5194/nhess-2020-368 Preprint. Discussion started: 21 November 2020 c Author(s) 2020. CC BY 4.0 License.

Data availability
The catalogue is available as the electronic material of this article. 370