Review of fragility analyses for major building types in China with new implications for intensity-PGA relation development

The evaluation of the seismic fragility of buildings is one key task of earthquake safety and loss assessment. Many research reports and papers have been published over the past four decades that deal with the vulnerability of buildings to ground motion caused by earthquakes in China. We first scrutinized 69 papers and theses studying building damage for earthquakes occurred in densely populated areas. They represent 15 observations where macro-seismic intensities have been determined according to the Chinese Official Seismic Intensity Scale. From these many studies we derived the median fragility functions (dependent on intensity) for four damage limit states of two most widely distributed building types: masonry and reinforced concrete. We also inspected 18 publications that provide analytical fragility functions (dependent on PGA) for the same damage classes and building categories. Thus, a solid fragility database based on both intensity and PGA is 20 established for seismic prone areas in mainland China. A comprehensive view of the problems posed by the evaluation of fragility for different building types is given. Based on the newly collected fragility database, we propose a new approach in deriving intensity-PGA relation by using fragility as the bridge and reasonable intensity-PGA relations are developed. This novel approach may shed light on new thought in decreasing the scatter in traditional intensity-PGA relation development, i.e., by further classifying observed macro-seismic 25 intensities and instrumental ground motions based on difference in building seismic resistance capability.


Introduction
Field surveys after major disastrous earthquakes have shown that poor performance of buildings in earthquake affected areas is the leading cause of human fatalities and economic losses (Yuan, 2008). The evaluation of seismic fragility for existing building stocks has become a crucial issue due to the frequent occurrence of 30 earthquakes in the last decades (Rota et al., 2010). Building fragility curves, defined as expected probability of exceeding specific building damage state under given earthquake ground shaking, have been developed for different typologies of buildings. They are required for the estimation of fatalities and monetary losses due to building structural damage. The development of fragility curves can be divided mainly into two approaches: empirical methods and analytical methods. Empirical methods are based on post-earthquake surveys for groups 35 of buildings and considered to be the most reliable source, because they are directly correlated to the actual seismic behaviour of buildings (Maio and Tsionis, 2015). Numerous post-earthquake investigations have been conducted for groups of buildings to derive the empirical damage matrices. A damage matrix is a table of predefined damage states and percentages of specific building types at which each damage state is exceeded due to particular macro-seismic intensity levels. However, as pointed out by Billah and Alam (2015), empirical investigations are usually limited to particular sites or seismo-tectonic/geotechnical conditions with abundant 5 seismic hazard and lack generality. Moreover, they usually refer to the macro-seismic intensity, which is not an instrumental measure but is based on a subjective evaluation (Maio and Tsionis, 2015). By contrast, analytical methods are based on static and dynamic nonlinear analyses of modelled buildings, which can produce slightly more detailed and relatively more transparent assessment algorithms with direct physical meaning (Calvi et al., 2006). Therefore, analytical methods are conceived to be more reliable than empirical results (Hariri-Ardebili Despite the limitations of each fragility analysis method, both empirical and analytical fragility curves are essential in conducting seismic risk assessment. However, the application of the existing fragility curves has been considered as a challenging task, since different approaches and methodologies are spread across scientific journals, conference proceedings, technical reports and software manuals, hindering the creation of an integrated 20 framework that could allow the visualization, acquisition and comparison between all the existing curves (Maio and Tsionis, 2015). In this regard, the first purpose of this study is to describe and examine available fragility curves, specially developed for Chinese buildings from 87 papers and theses using empirical and analytical methods. The median fragility functions from these previous research findings for the main building types in seismic prone areas in mainland China are then outlined. 25 Furthermore, based on the empirical and analytical fragility database collected, the second purpose of this work is to propose a new approach in deriving intensity-PGA relation by using fragility as the bridge. The main concern behind this attempt is that intensity-PGA relation is quite essential in seismic hazard assessment, while traditional practices in deriving such a relation are generally region-dependent and have large scatter (Caprio et al., 2015). Traditionally, intensity-PGA relations are developed using instrumental PGA records and empirical 30 intensity observations within the same geographical range. In this work, we try to establish intensity-PGA relation using fragility as conversion media. Formally, this is achieved by the elimination of the fragility values from the fragility-intensity and from the fragility-PGA relation. Theoretically, reasonable results should emerge if the building types used in analytic fragility analyses and those investigated in the empirical field surveys are close enough. 35 This study is organized as follows. In Section 1, the necessity of fragility database construction and the pros and cons of main fragility analysis methods are briefly introduced. In Section 2, a literature review of fragility studies in mainland China and related concepts is provided. Section 3 presents the discrete fragility database extracted from reviewed papers and theses. In Section 4, median empirical and analytical fragility curves and their scatter are derived for major building types in seismic prone areas in mainland China. In Section 5, we introduce in detail our new approach in developing intensity-PGA relation by using fragility as bridge, which is quite comparable with relation developed by traditional practice. In Appendix and Code/Data availability, accesses to supplementary documents mentioned in the context are provided. 5 As documented in Calvi et al. (2006), the first application of empirical method to investigate building fragility at large geographical scale was carried out in the early 1970s. In mainland China, since the 1975 Haicheng M7.5 earthquake, around 112 post-earthquake surveys have been conducted for M≥4.7 earthquakes (Ding, 2016).

Empirical method
Currently, the main processes in post-earthquake field investigation and macro-seismic intensity determination in mainland China basically follow the workflow proposed by Hu (1988) based on the field work of Tonghai 10 earthquake in the 1970s (Wang et al., 2007). In this workflow, the key concept of "average damage index" is introduced. That means, in each post-earthquake field survey unit (village/town/street), the number of different types of buildings in each damage state are firstly investigated; median damage index of five damage states D5, D4, D3, D2, D1 as defined in GB/T 17742-2008 are used in later on calculation, namely 0. 93, 0.70, 0.43, 0.20, 0.05 for these five damage states respectively. For each building type in each field survey unit, the corresponding 15 average damage index is derived by summarizing the products of percentage of building in each damage state and its damage index. Generally, there should be one or two predefined reference building types, thus the average damage index of other surveyed building types can be further scaled to the damage index of the reference building type. In the end, the overall average damage index for each survey unit is calculated by summarizing the products of each building type's scaled damage index and that building type's weight in the 20 survey unit. Once the average damage index in the survey unit is determined, the corresponding macro-seismic intensity can be directly derived from the predefined empirical relation between macro-seismic intensity and damage index of reference building type (GB/T 17742-2008). In mainland China, currently three reference building types are used to determine macro-seismic intensity: (1) Type A: wood-structure, soil/stone/brick-made old building; (2) Type B: single-or multi-storey brick masonry without seismic resistance; (3) Type C: single-25 or multi-storey brick masonry sustaining shaking of intensity degree VII. A detailed building structural damage state description for judgement of macro-seismic intensity scale in China is given in Table B2 (a non- Given the importance of building fragility in seismic risk assessment and loss mitigation, in total we reviewed 87 existing fragility analyses from papers and theses for the main building typologies in seismic prone areas in mainland China. It's worth to note that, in Ding (2016), a very detailed collection of empirical fragility database 35 was provided for 112 M≥4.7 events since the 1975 M7.5 Haicheng earthquake based on available postearthquake surveys. However, due to the lack of building seismic resistance capability information in this database, it is not suitable for our later-on fragility analysis. Thus, we did not use this database and instead collected our own empirical fragility database from individual publications and M.S./Ph.D theses. In mainland China, the main building types of concern are masonry and reinforced concrete (RC) buildings (Sun and Chen, 2009), given the wide distribution of masonry in rural and township areas and the increasing popularity of RC buildings in urban areas. Historic earthquakes that caused serious building damage mainly occurred in seismic prone provinces including Sichuan (Chen et al., 2017;Gao et al., 2010;He et al., 2002;Li et al., 2015;Li et al., 2013;Sun et al., 2013;Sun et al., 2014;Sun and Zhang, 2012;Ye et al., 2017;Yuan, 2008;Zhang et al., 2016), 5 Yunnan (He et al., 2016;Ming et al., 2017;Piao, 2013;Shi et al., 2007;Wang et al., 2005;Yang et al., 2017;Zhou et al., 2007;Zhou et al., 2011), Xinjiang (Chang et al., 2012;Ge et al., 2014;Li et al., 2013;Meng et al., 2014;Song et al., 2001;Wen et al., 2017), Qinghai (Piao, 2013;Qiu and Gao, 2015), Fujian (Bie et al., 2010;Zhang et al., 2011;Zhou and Wang, 2015) and other seismic active zones (A, 2013;Chen, 2008;Chen et al., 1999;Cui and Zhai, 2010;Gan, 2009;Guo et al., 2011;Han et al., 2017;He and Kang, 1999;He and Fu, 2009;10 He et al., 2017;Hu et al., 2007;Li, 2014;Liu, 1986;Lv et al., 2017;Ma and Chang, 1999;Meng et al., 2012;Meng et al., 2013;Shi et al., 2013;Sun and Chen, 2009;Sun, 2016;Wang et al., 2011;Wang, 2007;Wei et al., 2008;Wu, 2015;Xia, 2009;Yang, 2014;Yin et al., 1990;Yin, 1996;Zhang and Sun, 2010;Zhang et al., 2017;Zhang et al., 2014;Zhou et al., 2013). The main outputs of these post-earthquake surveys are empirical damage probability matrices (DPMs), which can be used to derive the discrete conditional probability of exceeding 15 predefined damage limit states under different macro-seismic intensity degrees. That is, for the DPMs, macroseismic intensity degree is usually used as the ground motion indicator. GB/T 17742-

Analytical method
As summarized in Introduction section, the main drawback of empirical method lies in the subjectivity on allocating each building to a damage state and the lack of accuracy in the determination of the macro-seismic 20 intensity affecting the region (Maio and Tsionis, 2015). Furthermore, the interdependency between macroseismic intensity and damage as well as the limited or heterogeneous empirical data are commonly identified as the main difficulties to overcome in the calibration process of empirical approaches (Del Gaudio et al., 2015). By contrast, analytical methodologies produce more detailed and transparent algorithms with direct physical meaning, that not only allow detailed sensitivity studies to be undertaken, but also allow for the straightforward 25 calibration of the various characteristics of the building stock and seismic hazard (Calvi et al., 2006). Different from the empirical fragility that is directly collected from post-earthquake survey, the derivation of analytical fragility curve is often based on nonlinear fine-element analysis. Popular analytical methods include push-over analysis (Freeman, 1998;Freeman, 2004), adaptive push-over method (Antoniou and Pinho, 2004), and incremental dynamic analysis (IDA) (Vamvatsikos and Cornell, 2002;Vamvatsikos and Fragiadakis, 2010). 30 Within these approaches, most of the methodologies available in literature lie on two main and distinct procedures: the correlation between acceleration or displacement capacity curves and spectral response curves, as the well-known HAZUS or N2 methods (FEMA, 2003;Fajfar, 2000), and the correlation between capacity curves and acceleration time histories, as proposed in Rossetto and Elnashai (2003).
The major steps in using analytical methods to study building fragility include: the selection of seismic demand 35 inputs, the construction of building models, the selection of damage indicator and the determination of damage limit state criteria (Dumova-Jovanoska, 2000). To combine empirical post-earthquake damage statistics from actual building groups with simulated/analytical damage statistics from modelled building types under consideration, we examined quite a few studies deriving analytical fragility curves for masonry and RC buildings in mainland China. The analysis techniques in these studies vary from static push-over analysis or adaptive pushover method (Cui and Zhai, 2010;Liu, 2017), to dynamic history analysis or incremental dynamic analysis (Liu et al., 2010;Liu, 2014;Liu, 2014;Sun, 2016;Wang, 2013;Yang, 2015;Yu et al., 2017;Zeng, 2012;Zheng et al., 2015;Zhu, 2010) as well as based on necessary statistical assumptions (Fang, 2011;Gan, 2009;Guo et al., 2011;Hu et al., 2010;Zhang and Sun, 2010).

Damage state definition
As predefined, building fragility describes the exceedance probability of specific damage state given an ensemble of earthquake ground motion levels. To describe the susceptibility of building structure to certain ground motion level, four damage limit states are used to discriminate between different strengths of ground shaking: slight damage (LS1), moderate damage (LS2), serious damage (LS3) and collapse (LS4). These four  Medvedev and Sponheuer (1969) and AIJ1995 (Nakamura, 1995) in Japan issued by Architectural Institute of Japan are summarized in Table 1. In mainland China, the latest standard GB/T 17742-2008 was issued in 2008 by China Earthquake Administration (CEA), in which detailed damage to structural and non-structural components are defined for each damage state (Table 2). (1) 30 where N refers to the total number of damage limit states (here N=4); for each building type, [ ] refers to the proportion of building in each structural damage state i.
In analytical method, fragility curve is derived by Eq.
(2), with the assumption that building response to seismic demand inputs follows the lognormal distribution: where [ | ] is the probability of being in or exceeding damage limit state LS due to ground motion indicator (e.g. the inter-storey displacement, the spectral acceleration, the peak ground acceleration etc.); | refers to the median value of damage state indicator at which the building reaches the threshold of the damage state LS; represents integrated uncertainties from seismic demand input, building capacity and model uncertainty, generally within the range of 0.6-0.8; Φ[ ] is the normal cumulative probability distribution. Haicheng earthquake (Ding, 2016). These damaging earthquakes mainly clustered in seismic prone provinces in southwestern China (e.g. Sichuan, Yunnan) and western China (e.g. Xinjiang Uygur, Tibet, Qinghai), as shown in Fig. 2. The main building types in these areas are featured by masonry, reinforced concrete (RC), brick-wood, soil, stone as well as chuandou-timber (a typical building type in mountainous area of Tibet, Qinghai and Sichuan). Due to the limitation in fragility data abundance, we mainly focus on studying the seismic fragility of 15 the two most widely distributed building types: masonry and RC buildings (Sun and Chen, 2009). Masonry buildings are mainly composed of brick and concrete. RC buildings include building structures such as RC core wall, frame structure and frame-shear wall.
The seismic resistance level of masonry and RC buildings is further divided into two classes: level A and level B.
The assignment of seismic resistance level in this study is mainly based on supplementary information given in 20 each scrutinized literature, including building age, construction material, seismic resistance code at construction time, load-bearing structure etc. Given the changes in building quality and corresponding code standard over the past four decades in China, buildings constructed in different ages though with the same nominal resistance level of each period, are reassigned with different seismic resistance levels according to the latest standard. The referred grouping criteria is given in Table 3 (more building classification details can be found from the online 25 supplement). Generally, "level A" includes buildings with seismic resistance level assigned as pre/low/moderatecode, and "level B" includes buildings assigned as high-code.

Outlier check
After grouping the empirical and analytical fragilities based on building type (masonry and RC) and seismic resistance level (A and B) in Sect. 3.1, the empirical fragility database based on macro-seismic intensity (Fig. 3) 30 and analytical fragility database based on PGA (Fig. 4) for four damage limit states (LS1, LS2, LS3, LS4) are thus constructed (data can be found from the online supplement). The Y-axis "fragility" of Fig. 3 and Fig. 4 refer to the exceedance probability of each damage limit state at each ground motion level. As can be seen, the scatter of fragilities varies across building types and seismic resistance levels. For empirical fragilities, the scatter may relate to the uneven abundance of damage data for buildings investigated in post-earthquake field surveys, the 35 subjective judgement of damage states as well as the rough division of building structure types. For analytical fragilities, the scatter may come from the difference in the selection of seismic demand inputs, the use of analysis techniques, the detailing of the modelled building structure, the definition of damage state as well as the difference in damage indicators used by different researchers. Thus, before deriving consecutive building fragility curves from these discrete fragility data in Fig. 3 and Fig, 4, the outliers need to be firstly removed from these originally collected datasets.
To figure out the outliers in the originally collected fragility database, the box-plot check method was applied.

5
For each building type (Masonry_A, Masonry_B, RC_A, RC_B) and in each damage limit state (LS1, LS2, LS3, LS4), the corresponding series of fragility data was sorted from the lowest to the highest value. Three quantiles (Q1, Q2, Q3) were used to divide each fragility series into four equal-sized groups and they correspond to the 25%, 50% and 75% quantile value in each series. A discrete fragility value (Qi) was assigned as an outlier if − 3 > 1.5 × ( 3 − 2 ) or 1 − > 1.5 × ( 2 − 1 ). The box-plot check results are shown in Fig. 5 for 10 empirical fragility data and in Fig. 6 for analytical fragility data.

Derivation of representative fragility curves
After removing outliers, details of the remaining fragility dataset (e.g., the number of data points, median and standard deviation of these data) for each damage state of each building type are summarized in Appendix Table   B1. The change of standard deviation of each fragility series is shown in Fig. A3 and Fig. A4 for empirical and 15 analytical data, respectively. It is worth to iterate that, as aforementioned in the Introduction section, the organization of this study is centred on two focuses. The first one is to construct a comprehensive fragility database for Chinese buildings from 87 papers and theses using empirical and analytical methods, which is one key component of seismic risk assessment. Based on the empirical and analytical fragility database collected, the second focus is to propose a new approach in deriving intensity-PGA relation by using fragility as the bridge. In 20 this regard, a representative fragility curve should be firstly derived for each damage state of each building type, and we refer to use the median fragility values to derive such a curve.
To derive the representative fragility curve for each damage limit state (LS1, LS2, LS3, LS4) of each building type (Masonry_A, Masonry_B, RC_A, RC_B) for further study (to derive intensity-PGA relation in Sect. 5), the median values (50% quantile) of each fragility series in Fig. 5 and Fig. 6 are used. For consecutive median 25 fragility curve derivation, cumulative normal distribution is assumed to fit the discrete median empirical fragilities and log-normal distributions is assumed to fit the discrete median analytical fragilities. For each damage limit state of each building type, the parameters μ and σ in the consecutive fragility curve can be regressed following Eq. (3): where ( | ) represents the exceedance probability of each damage limit state LS given ground motion level ( refers to , namely macro-seismic intensity in terms of empirical fragility; and refers to , namely PGA in terms of analytical fragility).
The median fragility curves derived from the discrete fragilities for empirical data and for analytical data are plotted in Fig. 7 and Fig. 8, respectively. To better illustrate the scatter of the originally collected discrete 35 fragility data, the error-analysis is attached with each regressed median fragility curve. As can be clearly seen from the regressed fragility curves in Fig. 7 and Fig. 8, there are two obvious trends: (1) for the same building type (masonry or RC), the higher the seismic resistance level (A<B), the lower the building fragility, which applies for all damage limit states; (2) for the same seismic resistance level, RC building has lower fragility than masonry building, which also applies for all damage limit states. These two trends indicate the reliability of the newly collected fragility database, the reasonability of the criteria in grouping building types and seismic resistance levels, as well as the suitability of using median fragility values to develop representative fragility 5 curves for further analysis. However, some extra abnormality is also noteworthy, e.g. in the median fragility curve developed for LS4 of "RC_B" in Fig. 8, the probability to exceed LS4 damage limit state remains 0 even when PGA is higher than 0.8 g, which is obviously not the case in reality. Detailed source of such abnormality and its effect on the intensity-PGA relation to develop will be discussed in Sect. 5.3.
Mathematically, the goodness of fit of the consecutive median fragility curve from discrete median fragilities can 10 be measured by statistical indicator 2 (Draper and Smith, 2014). Higher 2 value indicates a better fit of the regressed fragility curve, since it is defined as the ratio between SSR and SST: SSR is the sum of squares of the ); refers to the original discrete fragilities for each damage limit state, ̅ refers to the mean fragility, ̂ refers to the predicted fragility by the fitted fragility curve. As shown in Table 4, the 2 values are generally above 0.95, which 15 indicates the normal or lognormal distribution assumption in Eq. (3) is very suitable to match the discrete fragility datasets. Noticeably, there are also three low 2 values (≤0.8) in Table 4 for damage limit state LS1, LS2, LS3 of building type "RC_A", which may indicate the low quality (e.g. high scatter) of the originally collected fragility data. As can be cross validated from Fig. 4 and even better Fig. 6 and Fig. 8, the analytical fragility data for "RC_A" are more scattered than for other building types. This thus directly leads to the low 2 20 values in fitting the median fragility curve for damage limit state LS1, LS2, LS3 of "RC_A".

New approach in deriving intensity-PGA relation
Intensity-PGA relation has an important application in seismic hazard assessment, since the use of macroseismic data can compensate for the lack of ground motion records and thus help in reconstructing the shaking distribution for historical events. Traditionally, intensity-PGA relations are developed using instrumental PGA 25 records and macro-seismic intensity observations within the same geographical range (Bilal and Askan, 2014;Caprio et al., 2015;Ding et al., 2014;Ding, 2016;Ding et al., 2017;Ogweno and Cramer, 2017;Worden et al., 2012). These relations are generally region-dependent and have large scatter (Caprio et al., 2015). In this section, we propose a new approach in deriving intensity-PGA relation based on the newly collected empirical and analytical fragility database. For each building type and each damage limit state, an empirical fragility curve 30 (exceedance probability vs. macro-seismic intensity) and an analytic fragility curve (exceedance probability vs. PGA) are available, as derived from the median fragilities in Sect. 4. By eliminating the same fragility value, we can derive the corresponding pair of macro-seismic intensity and PGA. Thus, for a series of fragility values, we can further regress the corresponding intensity-PGA relation based on the paired intensities and PGAs. Ideally, we would expect the overlap of all these regressed intensity-PGA relations, regardless of the difference in 35 building type, seismic resistance level and damage state.

Difference between this new approach and previous practices
Compared with this new approach in intensity-PGA relation development, previous practices directly regressed intensity and PGA datasets within the same geographical range, but no further classification of datasets was conducted, as based on building type or damage state in this study. The lack of further classification of PGA and intensity datasets may explain why the previously derived intensity-PGA relations generally have high scatter.

5
The reason lies behind is that, although macro-seismic intensity is a direct macro indicator of building damage, higher instrumental ground motion (e.g., PGA) does not necessarily mean higher damage to all buildings.
Instead, damage is more determined by the seismic resistance capacity of different building types. Thus, further division of intensity and instrumental ground motion records based on affected building types should promisingly help decrease the scatter of regressed intensity-PGA relation.
10 Furthermore, local site effect also contributes to the amplification of instrumental peak ground motions (PGA or SA), when combining intensity and PGA datasets from areas with different geological background together. This in turn increases the scatter of regressed intensity-PGA relation. In this regard, it is worth to emphasize that, in our PGA-related analytical fragility database, the PGA parameter is not the real instrumental records as used in regressing traditional intensity-PGA relation, but the input PGA records used in experimental fragility analysis 15 (push-over analysis, incremental dynamic analysis, dynamic history analysis etc.). Therefore, the regional dependence (here we mainly refer to site condition), which contributes to the scatter of traditional PGA-intensity relation, is not a source of uncertainty in our relation.

Derivation of initial intensity-PGA relation
As a tentative approach, here we derive the relation between intensity and PGA using median fragility as the 20 bridge for each damage limit state of each building type. We're deeply aware that uncertainty is inherent in every single step both in empirical and analytical fragility analysis. However, the trial of using the median fragility as the bridge to develop intensity-PGA relation proposed here, more importantly, aims at providing a new approach in this regard compared with traditional practice, not to backwards reduce the uncertainties (due to differences in building structure, seismic demand inputs, computation methods etc.) in deriving empirical and analytical 25 fragility. By using Eq. (3) for PGA-fragility and intensity-fragility respectively and eliminating fragility as variable, we find: In which, the parameters μ , μ , σ , σ are taken from Table 4 with values varying across building types 30 and damage limit states.
These intensity-PGA relations are plotted in Fig. 9 (grouped by building types) and Fig. 10  curves for fragility values above 1%. Ideally, we would expect the overlap of all relation curves between intensity and PGA, whether grouped by building type or by damage state. As a matter of fact, for building type "Masonry_A" and "Masonry_B" in Fig. 9, the four intensity-PGA curves of four damage limit states coincide very well. Meanwhile, the discrepancy in intensity-PGA relations of "RC_A" for damage states LS1, LS2, LS3 in Fig. 9 is not surprising, given the relatively high scatter in the original analytical fragility datasets of "RC_A" (as discussed in Sect. 4 and verified by Appendix Fig. A3-A4).

Source of abnormality in intensity-PGA curves
For building type "RC_A" and "RC_B" in Fig. 9, it is observed that for the same intensity levels, the corresponding PGA values of damage state LS4 are much higher than that of damage limit states LS1, LS2, LS3.
For fixed fragility value, this may due to the underestimation of intensity by the median empirical fragility curve in Fig. 7, or the overestimation of PGA by the median analytical fragility curve in Fig. 8, or a combination of 10 both effects. In this regard, damage data scarcity at higher damage limit states may contribute to the abnormal high PGA values of LS4. When reviewing the fragility data collection process, it is clear that the construction of empirical fragility database requires the combination of damage statistics from multiple earthquake events that cover a wide range of ground motion levels. Generally, large magnitude earthquakes occur more infrequently in densely populated areas, thus damage data tend to cluster around the low damage states and ground motion 15 levels. This limits the validation of high damage states or ground motion levels (Calvi, 2006). According to Yuan (2008), those seriously damaged buildings in earthquake affected area are mainly masonry buildings.
Therefore, the cause of the abnormal high PGA values of damage state LS4 for "RC_A" and "RC_B" can be attributed to the relative scarcity of damage data at higher intensity/PGA level, especially for RC buildings.
As to building type "Masonry_A" and "Masonry_B" in Fig. 9, for the same intensity level, the PGA values 20 revealed by four damage states of "Masonry_B" are generally higher than that in "Masonry_A". This can be more clearly seen from Fig. 10, in which the intensity-PGA relations are grouped by damage limit states and the PGA values revealed by "Masonry_B" are generally higher than by all the other three building types. To better understand this abnormality, we need to refer to the building seismic resistance level assignment process in this study. In fact, compared with "Masonry_A", buildings assigned as type "Masonry_B" generally have much 25 higher seismic resistance capacity. As aforementioned in Sect. 3.1, level "A" refers to buildings with pre/low/moderate-code seismic resistance capacity, and level "B" refers to buildings with high-code seismic resistance capacity. According to the grouping criteria in Table 3, buildings assigned as "Masonry_B" mainly refer to those built after 2001 with seismic resistance level VIII and above. This is obviously a very high code standard (more building classification details can be found on the online supplementary material). Thus, for the 30 same ground motion level, the damage posed on "Masonry_B" should be much slighter than on "Masonry_A".
Consequently, the corresponding intensity revealed by "Masonry_B" should be lower than by "Masonry_A".
Currently in mainland China, the macro-seismic intensity level in post-earthquake filed surveys is determined by damage states of three reference buildings types, namely (1) Type A: wood-structure, soil/stone/brick-made old building; (2) Type B: single-or multi-storey brick masonry without seismic resistance; (3) Type C: single-or multi-storey brick masonry sustaining shaking of intensity degree VII. While in this study, buildings assigned as "Masonry_B" mainly refer to those constructed after 2001 with seismic resistance level VIII and above, and their seismic resistance capability is obviously much higher than all those three referred Type A/B/C building types.
Therefore, intensity levels derived from damage to those less fragile "Masonry_B" buildings tend to be underdetermined. This may help explain why for the same intensity level, the corresponding PGA revealed by intensity-PGA relation of "Masonry_B" is higher than that of "Masonry_A".
Based on above discussion and the initial analysis in Sect. 4, it can be summarized that (a) Due to the high scatter in originally collected fragility database, the intensity-PGA relations derived for LS1, LS2, LS3 of 5 building type "RC_A" are of low robustness (as validated by the low 2 values in Table 4); (b) Due to the damage data scarcity at high damage states or ground motion levels, intensity-PGA relations for LS4 of "RC_A" and LS4 of "RC_B" are also not fully reliable; (c) Due to the high seismic resistance capability attached to "Masonry_B", the intensity-PGA relations derived for all four damage limit states of "Masonry_B" have the probability to underestimate intensity (or overestimate PGA) compared with "Masonry_A". Therefore, intensity-10 PGA curves derived for "Masonry_A" are of relatively highest robustness/reliability. Actually, the four intensity-PGA curves of "Masonry_A" do coincide very well as expected (Fig. 9). According to Yuan (2008), those seriously damaged buildings in earthquake affected areas are also mainly masonry buildings. Therefore, we consider the median empirical and analytical fragility curves derived for "Masonry_A" (with uncertainties provided in Appendix Fig. A3-A4 and Table B1) are the most representative ones for seismic prone areas in 15 mainland China, compared with those developed for other buildings types in this study.

Average intensity-PGA relation derived for "Masonry_A"
According to the analysis in Sect. 5.3, intensity-PGA curves derived for four damage limit states of "Masonry_A" are of relatively highest robustness. Therefore, we first focus only on building type "Masonry_A" and average its four curves for discrete intensity values, to derive the corresponding averaged PGA values, as listed in Table 5.

20
If we match the data points in Table 5 with a linear relation between intensity and ln(PGA), we find Eq. (5): ln( ) = 0.521 * − 5.43 ± ε ( : g) where ε follows the normal distribution, with 0 as the median value and the standard deviation is σ.
By integrating the uncertainty in both original empirical and analytical fragility data of "Masonry_A" (as shown 25 in Appendix Fig. A3-A4 and Table B1) into the intensity-PGA relation, the averaged standard deviation σ in Eq.
(5) is estimated to be 0.3 (the detailed uncertainty transmission methodology is given in Appendix C). As the "Masonry_A" type is the most common and relevant with buildings damaged in historical earthquakes (Yuan, 2008), we recommend using Eq. (5) for building damage assessment for earthquakes occurred in mainland China, especially in seismic active provinces e.g. Sichuan and Yunnan (Fig. 2).

Comparison with other intensity-PGA relations
Based on the summarization in Sect. 5.3, if we only remove those obviously unreliable intensity-PGA curves, namely LS1, LS2, LS3, LS4 of "RC_A" and LS4 of "RC_B", the range of median PGA values corresponding to each intensity degree can be derived from the remaining intensity-PGA relations, as shown in to derive the intensity-PGA relation in China was quite scarce. Therefore, damaging earthquakes occurred in the United States before 1971 were also largely used, which may not be representative of the situation in China today. Thus, one possible explanation for the relatively low PGAs for low intensity levels (VI, VII) in Table 7 5 (GB/T 17742-1980/2008) is that, the buildings in the 1980s were more fragile than nowadays buildings. Since macro-seismic intensity is a direct macro indicator of building damage, nowadays buildings generally have better seismic resistance capacity and thus require higher ground motion (PGA) than buildings in the 1980s to be equally damaged.
Since the recommended PGA ranges in GB/T 17742-2008 are not so representative of the current building status  Table 8. When comparing our results in Table 5 and Table 6 with that in Table 8, PGA values are quite 15 consistent for both low intensity (VI, VII) and high intensity (VIII, IX) levels, although these data are separately developed by our new approach and by traditional practice. This congruence shows the reasonability of our new approach proposed here in developing intensity-PGA relation.

Conclusion
We established empirical fragility database by evaluating 69 papers and theses, mostly from the Chinese the same damage states either by response spectral methods or by time-history response analysis. These analytic methods provide fragilities as functions of PGA. From this wealth of data, we derived the median fragility curves for these building types by removing outliers using box-plot method.
We proposed a new approach by using fragility as the bridge and derived intensity-PGA relations independently for each building type and each damage state. The potential sources of abnormalities in these newly derived 30 intensity-PGA relations were discussed in detail. Ideally the individual intensity-PGA curves should all coincide and allow us to derive an average relation between intensity and PGA. The coincidence is not 100% perfect and deviations for the cases where they occur were discussed. Given the high damage data abundance and wide distribution of masonry buildings in mainland China, for studies referring to historic earthquakes and their losses in seismic active regions, e.g. Sichuan and Yunnan, we recommend utilizing the intensity-PGA relation derived 35 from "Masonry_A" buildings in Eq. (5).
However, for engineering application, due to the scatter in original fragility datasets and the simplification in using median fragility to derive intensity-PGA relation in our proposed new approach, the use of the preliminary intensity-PGA relations developed here should be with caution. It's also worth to note that, buildings used for empirical intensity determination and for analytical studies do not coincide: a "Masonry_A" building in a postevent field survey may encompass a wider range than in an analytic study. Therefore, following the novel idea of using fragility as the bridge to develop intensity-PGA relation in this study, possible extensions in the future can be performing fragility analysis for more specifically designed building types that are more representative of 5 those widely damaged building types in the fields.

Appendix
In Fig. A1, the comparison of Chinese Seismic Intensity Scale with other internationally adopted scales is presented. Additionally, the correspondence relation between intensity and PGA/PGV range suggested by the current seismic intensity scale in China (GB/T 17742-2008) is also graphically presented in Fig. A2. To better 10 illustrate the scatter of the original fragility datasets we collected, standard deviations of each fragility series are also plotted in Fig. A3 (empirical data) and Fig. A4 (analytical data).
In Table B1, more statistical details about our newly constructed fragility datasets, including the number of fragility data before and after removing the outliers, median fragility values used in deriving fragility curve and the standard deviation of each fragility dataset for each building type and each damage state in Fig. 7 and Fig. 8 15 are listed. Table B2 provides a non-official English translation of China seismic intensity scale: GB/T 17742-

2008, which is modified after CSIS (2019).
Appendix C provides the methodology in transmission of uncertainty from empirical/analytical fragility database to intensity-PGA relation in Eq. (5).

Code/Data availability 20
More fragility extraction and building classification details are available from online supplement in: (Filename: Supplementary_building_classification_details.pdf).
The empirical and analytical fragility data in Fig. 3

Competing interests
The authors declare that they have no conflict of interests.  Wei, F., Cai, Z. and Jiao, S.: A Fast Approach to Regional Hazard Evaluation Based on Population Statistical Data, Acta Seismologica Sinica, 30 (5) Figure 2: The distribution of earthquakes occurred in mainland China and its neighbouring area, for which field surveys were conducted. Detailed earthquake catalogue can be found from the online supplement, which is newly compiled based on Ding (2016) and Xu et al. (2014).

Figure 3:
The distribution of empirical fragility data from post-earthquake field surveys, depicting the relation between the exceedance probability of each damage limit state (LS1, LS2, LS3, LS4) at given macro-seismic intensity levels. The fragility datasets are grouped by building types (masonry and RC) and seismic resistance levels (A and B).

Figure 4:
The distribution of analytical fragility data derived from non-linear analyses, depicting the relation between the exceedance probability of each damage limit state (LS1, LS2, LS3, LS4) at given PGA levels. The fragility datasets are grouped by building types (masonry and RC) and seismic resistance levels (A and B).

Figure 5:
Outlier-check using box-plot method for empirical fragility data. Five macro-seismic intensity levels are used to classify the original fragility datasets: VI, VII, VIII, IX, X. "A" and "B" represent the pre/low/moderate-code and high-code seismic resistance level, respectively (more classification details are available from online supplement). LS1, LS2, LS3, LS4 are the four damage limit states. Outliers are marked by 5 red crosses and red line within each box indicates the 50% quantile fragility value.

Figure 6:
Outlier-check using box-plot method for analytical fragility data. Twelve PGA levels are used to group the discrete analytical fragility datasets: 0.1-1.2 g. "A" and "B" represent the pre/low/moderate-code and high-code seismic resistance level, respectively (more classification details are available from online supplement). LS1, LS2, LS3, LS4 are the four damage limit states. Outliers are marked by red crosses and red 5 line within each box indicates the 50% quantile fragility value. Median fragility curve and error-bar analysis derived from empirical fragility datasets, which depicts the relation between macro-seismic intensity and exceedance probability of each damage limit state (LS1, LS2, LS3, LS4) for masonry and RC building types (Note: these median fragility curves are of varying robustness; see Sect. 4 and Sect. 5.3 for more details). The circle within each bar represents the median exceedance probability 5 of each damage limit state; the length of each bar indicates the value of the corresponding standard deviation. Only intensity and PGA values with truncated exceedance probability ≥1% for each damage limit state of each building type are plotted, since higher damage states can appear only for higher intensities or PGA values (see Sect. 5.2 for more details). Detailed values of median fragility and standard deviation are given in Table B1 Figure 8: Fragility curve and error-bar analysis derived from analytical fragility datasets, which depicts the relation between PGAs (unit: g) and exceedance probability of each damage limit state (LS1, LS2, LS3, LS4) for masonry and RC building types (Note: these median fragility curves are of varying robustness; see Sect. 4 and Sect. 5.3 for more details). The circle within each bar represents the median exceedance probability of each 5 damage limit state; the length of each bar indicates the value of the corresponding standard deviation. Only intensity and PGA values with truncated exceedance probability ≥1% for each damage limit state of each building type are plotted, since higher damage states can appear only for higher intensities or PGA values (see Sect. 5.2 for more details). Detailed values of median fragility and standard deviation are given in Table B1.    Table B1). Only intensity and PGA values with truncated exceedance probability ≥1% for each damage limit state of each building type are plotted, since higher 5 damage states can appear only for higher intensities or PGA values (see Sect. 5.2 for more details).  Table B1). Only intensity and PGA values with truncated exceedance probability ≥1% for each damage limit state of each building type are plotted, since higher 5 damage states can appear only for higher intensities or PGA values (see Sect. 5.2 for more details).  Notes about qualifiers: "very few": <10%; "a few": 10%-50%; "most": 50%-70%; "majority": 70%-90%; "commonly": >90%.  "damage_state" LS1, LS2, LS3, LS4 represent the four damage limit states: slight, moderate, serious-, collapse, respectively;

5
"μ " and "σ " are the regression parameters between intensity/PGA and the corresponding fragilities of each damage limit state; indicates the fitness quality of the regressed median fragility curve, as plotted in Fig. 7 and Fig.8.     Note: "origin fragility number" refers to the number of original fragilities collected for each damage limit state of each building type from previous studies; "fragility number after removing outliers" refers to the remaining fragilities after removing outliers using box-plot check method. Only intensity and PGA values with truncated exceedance probability ≥1% for each damage limit state of each building type are given, since higher damage states can appear only for higher intensities or PGA values (see Sect. 5.2 for more details). B C Notes about Qualifiers: "very few": <10%; "few": 10% -50%; "most": 50% -70%; "majority": 70% -90%; "commonly": >90%.

Appendix C: Methodology in characterization of uncertainty transmission from empirical/analytical fragility database to intensity-PGA relation
The estimation of the uncertainty of the intensity-PGA relation (Eq. (5)) is not a standard procedure like regression analysis. We have fragility as function of intensity with an error on the fragility so that fragility is a random variable. It is also a random variable when derived as function of y = ln(PGA). We express this as If we assume that the error term is small, we can write: ( + • + ) ≈ ( + • ) + ′ ( + • ) • (C8) ′ ( + • ) is the slope of the g(y) curve and has the unit 1/ln(PGA). The value changes along the curve so that we replace it by an average value ̅ ′ . Then, and under the assumption of independence of the two random terms we get = 1 ̅ ′ √ ℎ 2 + 2 (C10)

5
In order to utilize this estimation scheme for our data we approximate ̅ ′ by its value at the 0.5 value of the fragility function: ( ) = 0.5, so that ̅ ′ = ′ ( ).When we do the estimates for each damage class and each building type we find the standard deviations for ln(PGA) according to the following table. The values do vary. A representative/average value appears to be 0.3.