Dynamic maps of people exposure to floods based on mobile phone data

Floods are acknowledged as one of the most serious threats to human lives and properties worldwide. To mitigate 10 the flood risk, it is possible to act separately on its components: hazard, vulnerability, exposure. Emergency management plans can actually provide effective non-structural practices to decrease both people exposure and vulnerability. Crowding maps depending on characteristic time patterns, herein referred to as dynamic exposure maps, provide a valuable tool to enhance the flood risk management plans. In this paper, the suitability of mobile phone data to derive crowding maps is discussed. A test case is provided by a strongly urbanized area subject to frequent floodings located in the western outskirt of Brescia town 15 (northern Italy). Characteristic exposure spatio-temporal patterns and their uncertainties were detected, with regard to land cover and calendar period. This novel methodology appears to be more reliable than crowdsourcing strategies, and has potentials to better address real-time rescues and reliefs supply.


Introduction
Floods are natural phenomena whose hazards afflict nearly 20 million people worldwide (Kellens et al., 2013), posing a serious 20 challenge to the protection of human lives and the liveability of urban settlements. Both low-income and high-income countries are strongly impacted by extreme weather events and a high-confidence increase trend in the resulting economic damages and social costs has globally been documented (Kreibich et al., 2019). As reported by Munich RE (2020) over the period 1980-2019, flooding accounts for some 40% of all loss-related natural catastrophes, with losses worldwide totalling more than US$ 1tn. For instance, floods and landslides in Thailand in 2011 resulted in 43 US$ bn, that is the highest flood losses of all 25 time.
Two major factors can be advocated for justifying such a trend: climate change and increased urbanization and people exposure (Hartmann et al., 2013). These factors involve different components of the risk concept (UN ISDR, 2009), which is given by the combination of hazard, exposure and vulnerability. Climate change was popularly acknowledged as a leading cause for the increases in the frequency and intensity of heavy storms and, consequently, of flood episodes (Solomon et al., 2007). This 30 time information and to better address relief supplies and rescue efforts. In this regard, a detailed and reliable picture of the real-time spatiotemporal variability of the flood risk would be highly beneficial. 65 Presently, large amounts of geospatial data can be obtained from a number of sources, namely remote sensing or aircraft platforms. However, these sources yield situation snapshots, but do not provide information at the spatiotemporal resolution needed for managing urban floodings and are hardly validated in the field. To overcome these problems, the possibility of taking advantage of crowdsourcing techniques has recently attracted much attention (Mazumdar et al., 2017;Rosser et al., 2017;Hirata et al., 2018;Mazzoleni et al., 2018). These techniques have been made available by the widespread proliferation 70 of smartphones and tablets, along with the success of social media. During emergency phases, the advantages in the real-time implementation of the emergency plan are twofold: on the one hand, a large amount of volunteered geographical information suitably georeferenced can be collected (Goodchild, 2007), guiding authorities in developing collaborative flooding maps or in estimating the number and location of exposed people; on the other hand, crucial information can be communicated to exposed people, making them aware of the actual risk magnitude and supporting their capability to face the situation. After the 75 emergency phase, authorities can also exploit the collected data to enhance their preparedness and to better match the emergency management plans to specific real-world needs of the flood prone community.
Potentials and drawbacks of various crowdsourcing techniques have been long debated, even though crowdsourcing has already found successful applications in some weather related disasters (Poser and Dransch, 2010;Hung et al., 2016;Guntha et al., 2018). Such researches have underlined that in many cases, during the emergency phase, crowdsourced data have at 80 least the same quality of the authoritative ones (Goodchild et al., 2017). Nevertheless, several concerns have been pointed out through analyses of crowdsourcing technique applications to real-world disasters: i) the raw data quality is generally poor because of malicious intentions of rough elements of the community or incompetence of stakeholders, so that spurious, erroneous, malformed, redundant or incomplete data must be purged out of the database to make them suitable, ii) the sample significance is basically low, owing to the limited number of exposed people actively participating in crowdsourcing, so that 85 information of general interest must be extrapolated from an exiguous fraction of the whole population, iii) the communication network is not completely reliable, as it frequently fails or malfunctions during disaster occurrences.
Researches have therefore been addressed to filter such rumours from crowdsourced data (Han and Ciravegna, 2019).
However, other approaches can be followed to develop effective dynamic information tools through the exploitation of mobile phone data collected by providers. Such data make it possible to geo-localize mobile phone users over the time, that is to derive 90 time-dependent crowding maps. When such maps are intersected with hazard maps, showing the flooding area extensions corresponding to a selected frequency of occurrence, dynamic exposure maps are obtained. As demonstrated by Carpita and Simonetto (2014) with reference to episodic crowd concentrations due to social events, recurrent spatiotemporal patterns can be derived from mobile phone data by means of geostatistic analyses. Herein, the application of this methodology to periodic spatiotemporal variability of resident population related to home-work mobility is investigated. To do so, additional tools are 95 developed to extrapolate the real-world population from the crowding maps of provider's clients. https://doi.org/10.5194/nhess-2020-201 Preprint. Discussion started: 2 July 2020 c Author(s) 2020. CC BY 4.0 License.
A suitable case study has been identified in the western outskirt of Brescia town (northern Italy), for which a detailed knowledge of the flooding dynamics and a sizeable set of mobile phone data are available. The suggested approach made it possible to derive reliable dynamic exposure maps with respect to the land coverage and the calendar time periods, obtaining estimates of the expected number of people affected by flood hazards along with its uncertainty. 100 Hence, the paper is organized according to the following sections: (i) firstly, the innovative aspects of the geostatistic analysis methodology herein utilized are illustrated, (ii) secondly, the main hydraulic-hydrologic features of the analysed study area are described along with the available mobile phone data, (iii) the methodology application and the results are finally discussed.

Analysis methodology
The proposed geo-statistical approach relies on Erlang mobile phone measures, which consist in the average number of mobile 105 phone users (MPU) bearing a connected SIM. These data are collected by mobile phone providers and recorded at constant time steps with reference to a georeferenced grid of square cells. The availability of such a kind of data is progressively arousing the enthusiasm of the urban planners' community (Becker et al. 2011;Calabrese et al., 2015), as they offer a variety of potential applications. In this study, MPU spatiotemporal variability was summarized by means of daily density profiles (DDP), that provides the variability within a day of MPU referred to a spatial region of interest. Such regions are inundation 110 areas, thus expressing the spatiotemporal variability of people exposed to the flood risk. To define DDP, let e it be the number of MPU in the i-th grid cell in a generic time interval t. Let I r = {i 1 , …, i m } be the set of grid cells related to region r of interest.
Furthermore, let define T d = {t 1 , …, t o } be the set of time intervals in a day d. The daily density profile (DDP rd ) can be defined according to Eq. (1), as a vector of length o of values describing the sum of MPU in region r and day d: (1) 115 Herein, the interest lies in analyzing and classifying the occurrences in a time series of DDP rd , related to a set d = {d 1 , …, d n } of n analyzed days. More precisely, the proposed approach firstly involves the clustering of similar DDP rd , as discussed in detail in the following Section 2.1. The clustering procedure consists of two steps. In the first one, MPU spatial variability inside region r is considered by changing the index i in a R 2 x-y coordinate space; to do so a data reduction strategy is applied.
In the second one, DDP rd temporal variability is evaluated by changing index t in a R 1 space. 120 Clustering mobile phone data according to the above described procedure is not straightforward. Nevertheless, traditional techniques (Arabie and De Soete, 1996) cannot be applied. In fact, data amount to n observations and p = m*o variables (number of MPU values each day). For instance, if one has one year of available data (i.e. n = 365), repeated every 15 minutes (o = 96) in a region covered by 500 grid cells (m = 500), the variable number is far larger than the number of observations (p > n), depicting a high-dimensional context (Donoho, 2000). When analyzing high-dimensional data, several issues, such as 125 the curse of dimensionality (Keogh and Mueen, 2017), need to be faced. With specific regard to data clustering, this issue has been addressed by Bouveyron et al. (2007). However, as suggested by Jovi et al. (2015), high-dimensional data reduction https://doi.org/10.5194/nhess-2020-201 Preprint. Discussion started: 2 July 2020 c Author(s) 2020. CC BY 4.0 License.
provides a suitable solution. To do so, an approach based on the Histogram of Oriented Gradients (HOG) is used in this paper.
Therefore, data reduction acts on index i, in order to convert the support from R 2 (x-y coordinate space) to R 1 .
Once DDP rd are clustered in statistically similar groups, the total number of people in set T d and in region r can be estimated 130 and associated with descriptive bands (DB), as discussed in Section 2.2. In this regard, a crucial concern is given by the lack of MPU data from all companies providing phone services in northern Italy. To deal with this problem, as firstly suggested by Metulini and Carpita (2020), the approach proposed in this paper adopts a strategy to infer the total number of people by matching available mobile phone data to census data.

Data reduction and clustering 135
To cluster similar DDP a technique for high-dimensional data reduction is firstly adopted. Then, reduced data are analyzed by a high-dimensional data clustering. Separately for each element of set T d (i.e. for a given t), let ε it = {e 1,t , e 2,t , …, e im,t }' be the vector of dimension m of MPU in region r in time instant t. The aim is to reduce ε it to a new set of values it , of dimension m' < m, that preserves the relevant information contained in the set ε it . To do so, set ε it , separately for each t, undergoes a histogram of oriented gradients (HOG) feature extraction (Dalal and Triggs, 2005;Tomasi, 2012). HOG is generally applied to red-green-140 blue (RGB), or grey scale, images, in order to find similarities among images and to classify them. Since each value is associated with a location in the R 2 x-y coordinate space, vector ε it can be viewed as a raster featuring colors expressing the magnitude of its elements. The application of HOG method, having defined our setting as above, is straightforward.
When clustering in terms of the spatial distribution of users, the interest lies in the relative distribution of MPU in region r, not in the MPU absolute amount. To be consistent with this aim, HOG was applied to a normalized vector of ε it . Vector z it is thus 145 defined as z it = {e i,t / ∈ (e i,t ), ∀ i ∈ Ir}.. In order to generate the vector of features it , containing the relevant information in z it , the HOG method firstly divides the grid's cells into a number of S smaller grids G 1 , …, G S (G i ∩ G j = Ø,  i = 1,…,S and  j = 1,…,S with i  j), where √ is a parameter to be chosen. The direction and the magnitude matrices by using two different gradient matrices of each grid G i (see for details in Dalal and Triggs, 2005) are then obtained. These matrices are used to derive the histogram of gradients with k equal bins, where k is a parameter that need to be chosen. it is the vector of 150 features given by all the elements of the obtained histogram of gradients when stacked  S. The length of vector it is S*k.
Subsequently, the vector it is stacked over the subscript t, obtaining the vector of features d for region r and day d, of dimension S*k*o. d contains the features (variables) undergoing the first clustering step of the method, in which the objects to be clustered, according to how MPU distribute over region r according to index i, are represented by days. For all days in the data set d = 155 {d 1 , …, d n }, d is computed, and a k-mean cluster analysis (Hartigan and Wong, 1979) is performed, where the n days are the objects to be clustered. d contains the values of the S*k*o (with S*k*o < m*o) variables for day d to be attributed to a cluster.
According to Hartigan and Wong criterion, the number of clusters H is chosen by analyzing the decreasing trend of the ratio between the total within sum of squares ( ℎ ) and the total sum of squares ( ), that needs to be minimized with respect to the number of groups H. For a certain H, total within sum of squares is defined as ℎ = 160 , where is the centroid vector (length S*k*o) for cluster i; total sum of squares is defined as = ∑ ( − ) 2 , where is the centroid vector for the full set of data.. At this point the elements in d = {d 1 , …, d n } (the days) have been assigned to a number of clusters C 1 , …, In the second step, to account for the MPU variability over time, we consider, for a given region r, as objects, the vector DDP rd , 165 considered as the collection of functional observations in t 1 ). To do so, a model-based functional data clustering method (Bouveyron and Come, 2015) was adopted, and days d (cluster's objects) were grouped in terms of the o DDP rd values (cluster's variables), separately for each cluster defined in the previous step. The aim is to consider the similarities in the functional form of the DDP rd , seen as a curve of values (y-axis) with respect the x-axis (time instants). In doing so, each group's curves are modelled by their own set of 170 distributional parameters (see for details Bouveyron and Come, 2015). The method adopted in this work is suitable for highdimensional dataset, since the clustering process applies the criteria of the sub-space clustering (Agrawal et al., 1998), adopted to consider just the minimum number of variables needed for grouping objects, thus reducing the dimensionality. In detail, it is herein proposed to adopt the following path: i) anomalous DDP rd are removed using functional data outlier detection by likelihood ratio test (LRT), as proposed by Febrero-Bande et al. (2008); ii) clustering method developed by Bouveyron et al. 175 (2015) is applied, along with funFEM package in R.
With this strategy we aim at assign the elements in d = {d 1 , …, d n } (the days) to a number of final clusters C F,1 , …, C F,Z (C F,i ∩ C F,j = Ø, ∀ = 1, . . , , ∀ j = 1, …, Z), with Z ≥ H. Thus, the adoption of these two steps would permit a representation of the dynamic of MPU's presences, in the form of a DDP for each group of days in region r, where, with representative we intend that to each group belongs days that are similar each other and dissimilar in between, in terms of index i (spatial 180 distribution of MPU) and index t (temporal dynamic of MPU)

Population assessment
If the clustering strategy makes it possible to represent the dynamic of the presence of mobile phone users in region r over a set of time instants for a group of "representative" days, an additional strategy is needed to estimate the total amount of people.
Indeed, in most times, data are available just for one mobile phone company. To have a reliable estimated of the total number 185 of people that are actually present in the study area, users of other mobile phone providers must be considered, as well.
Collecting all these data is either unfeasible or an unsustainably expensive. To perform a national level analysis, a convenient solution is represented by the use of the market share of the provider company, that can be applied to "correct" the DDP rd .
Hence, an estimate of the total number of people (e.g. let s n to be the national level share than can assume values in the range Thus, to suitably estimate the market share, the smallest level of aggregation, represented by the "Sezioni di Censimento" (SC) (i.e. population census districts), was used in this study (ISTAT, 2017). The following strategy is suggested: data on the number of residents from administrative archives were compared to the number of TIM users on a residential area in late evening hours. Bering in mind the characteristics of the social dynamics of this residential areas, it is reasonable to assume that during 200 these hours residential SC are populated only by residents. Such comparisons were performed separately for each SC using ISTAT data (Anagrafe Comunale).
First, since the MPU grid is made of square cells while SCs are irregular polygons, the number of TIM users belonging to each SC was estimated by intersecting these spatial data. Thus, to count the number of TIM users in each polygon the portion of the cell belonging to the SC polygon were calculated by using the function extract in raster package, R. Let Cell j ; j = 1, 2, … , J SC 205 be the TIM cells (pixel) overlapping a chosen SC, the ratio A j in Eq. 2 which represents the portion of Cell j covered by the chosen SC (for each SC, A j > 0; j = 1, 2, …, J SC ). If Cell j is completely included in SC, then A j = 1, otherwise A j < 1. Let TUC j be the density of TIM Users in Cell j , the estimated number of TIM users in SC ETU SC is computed as shown in Eq. (3). 210 The Estimated TIM Market Share in SC ETMS SC is thus given by the ratio in Eq. (4), where P SC is the resident number assessed by the population census the for the SC.

= (4)
Differently from s n , range is not necessarily in [0,1], since could be larger than P SC . An application example 215 of this procedure can be found in Metulini and Carpita (2019b). In the count of P SC , elderly people (>80 years) and children (< 11 years) were excluded, aiming at taking into consideration only people bearing a smartphone. The distribution of ETMS SC can be used as a proxy for the TIM market share at city level. More specifically, it appears to be convenient to use the median of the distribution of ETMS SC , which is preferable to the mean in those cases when the distribution is asymmetric. Let me(.) be the median statistics, the estimate ̂ for a given region r for a given day d is finally given by Eq. (5). 220 ̂= ( )

Result representation
For the sake of result interpretation, a graphical representation is herein adopted. Let consider the vector DDP rd to be a curve of functional observations x rd (T d ) representing the sum of MPU in region r and day d (in y-axis) with respect to time instants The profile for representative days can be displayed by using functional box plots (FBP), the 225 analogue of the traditional box plot for curves (Sun and Genton, 2011;.
In FBP a group of curves is ordered using the concept of "band depth", a "median" value and an "envelope" is generated, that can be used to define a functional version of traditional descriptive bands. Moreover, it is possible to assign a curve to the outlier group, if it exceeds by 1.5 the envelope margins in at least one time instant.
A FBP strategy is performed separately for each final cluster. Let consider cluster h and let d h = {d 1h , …, d nh } be the days 230 belonging to cluster h, and let ̂, ℎ = [̂, 1ℎ , … ,̂, ℎ ] be the matrix of dimension o*n h with a ̂ in each column. Let consider each vector to be a curve. FBP is applied to matrix ̂, ℎ to generate the profile plot estimating the dynamic of the total number of people (that we will call "city users", or simply "users") in different hours (with DB) in representative days.

Case study description 235
The study area was selected as an emblematic and widespread situation of the unacceptable high flood risk affecting the foothill zone of the Po River Valley (Lombardy Region, northern Italy). It lies in the western outskirt of Brescia town ( Figure 1) and is overall included in the watershed of the Oglio River, a primary left-bank tributary of the Po River. The main drainage is supplied by the Mella River, bounding the eastern side of the study area, that is a left-bank tributary of the Oglio River. As can be seen in Figure 1, the study area features five natural streams originating from the southern boundary of the Alpine chain. 240 From West to East, they are: Laorna, Gandovere, Vaila, La Canale and Solda.
Before the area was anthropized, most of such streams probably had been flooding the alluvial plain swamping into marshes, without a main outlet in the main river network. This was the result of both their almost ephemeral regime and the terrain endorheic morphology. As the agricultural use of the alluvial plain grooved, these streams were connected to the constructed irrigation-drainage network, in order to exploit their low flow for irrigation purposes and to drain the flood flow into the Mella 245 River. These constructed downstream reaches feature two main drainage canals the Gandoverello canal and the Mandolossa canal, depicted in Figure 1. They have become the artificial downstream reaches of the five watersheds, providing a drainage capacity in South direction both for the mountain watersheds and low lands located in the alluvial plain. The total catchment area amounts to 112.3 km 2 , featured by an average imperviousness of about 22%; further details on the watershed hydrologic characteristics are available in the supplementary material. 250 In particular, the Laorna drainage path was strongly manipulated by a constructed straight canal 7 kilometres long, that diverts its streamflow towards the Gandovere stream and intercepts additional surface runoff produced by the northward low land. https://doi.org/10.5194/nhess-2020-201 Preprint. Discussion started: 2 July 2020 c Author(s) 2020. CC BY 4.0 License.
Both streams thus confluence into the Gandoverello canal. This was formerly the downstream reach of the Gandovere stream, whose limited conveyance capacity made it necessary to decrease the hydrologic load. Thus, a flow divider was constructed upstream of the Laorna confluence, so that almost half of the Gandovere streamflow is diverted towards the Mandolossa canal 255 inlet by means of a diversion constructed canal that intercepts the Vaila stream. All these flow discharges, along with those coming from the La Canale stream and the Solda stream, converge into the inlet of the Mandolossa canal, which is characterized by the largest conveyance capacity in the study area.
This drainage network is also exploited by the irrigation system of Franciacorta, a vineyard-agricultural district located West of the study area, for the final disposal of the residual flow discharges. In Figure 1

Hazard mapping
Since the late 50s of the last century, this area has been subject to a deep urbanization sprawl, yielding the present land cover condition depicted in Figure 2. The dramatic increase in both the urban fabric and the industrial-commercial coverages occurred at the expenses of the croplands and the permanent crops, so that sparse and isolated fabrics have evolved in a continuous and heterogeneous urbanization. In addition, various transport units have been upgraded to speedways and two 270 highways were constructed. The flood risk perception in this this area was historically related to the Mella River inundations, which affected the Brescia outskirt since the late 60s of the last century, until its riverbed underwent severe engineering works.
Conversely, with reference to the secondary stream network, the absence of a clear risk perception allowed such an urbanization sprawl to occur regardless of the floodplain extents. As Figure 2 clearly shows, most of the urban fabric areas and the industrial and commercial settlements are adjacent to the stream network. 275 The increase in the land coverage yielded huge impermeability degrees for the plain watersheds. Except for a combined sewer overflow located in the West Brescia watershed and discharging into the Mella River, all the stormwaters produced by these urbanizations discharge into this secondary stream network, as well as those produced by many settlements in the Franciacorta district, which improperly exploits the irrigation system as a final receipt of combined sewer systems. Moreover, the low risk perception has led to a significant impairment of the functionality of these canals. The stream flow is now constrained into a 280 number of narrow culverts and bridges with low decks and large piers. In addition, urbanized canals are no longer maintained, so that the riparian vegetation groves in an uncontrolled manner. The combination of the increase in the peak flow discharges and in the exposure, along with the decrease in the stream network conveyance capacity has led to a dramatic increase in the flood risk. Flood episodes explained by the secondary hydrographic network insufficiency have been observed since the late 90s, evidencing an empirical frequency of occurrence far less than 20 years. The hazard mapping was thus referred to return 285 periods spanning from 5 years to 20 years, which are significantly less than those conventionally required in Italy for a secondary stream network to be considered verified (20-50 years). The urban areas exposed to floodings are estimated in 160 ha, 231 ha and 330 ha, with respect to 5 years, 10 years and 20 years return periods. About 30% of those areas are residential fabrics whereas the remaining 70% is given by industrial and commercial settlements.
The hazard analysis was herein conducted by using a design event method. As demonstrated by Balistrocchi et al. (2013) in 290 this climatic context, a design event method is capable to provide results comparable to those of more sophisticated continuous approaches, if it is based on the Chicago synthetic hyetograph with duration equal to the double of the catchment time of concentration. A classical leaf hydrologic model was developed in accordance with the sub-catchments subdivision illustrated in Figure 1. Extensive surveys of the stream cross sections and the inline structures were carried out to assess the actual conveyance capacities of the stream reaches. Flooding volumes were hence estimated by limiting the flood hydrographs 295 generated through the hydrologic model to the overflow threshold discharges. Surveys of the historical flooding extensions occurred during the last three decades addressed the delimitation of the flood prone areas. Total exposed urban areas amount to about 160 ha (return period 5 yr), 230 ha (return period 10 yr) and 330 ha (return period 20 yr). Most of such areas are devoted to industrial-commercial settlements (70 %), whereas the remaining part is residential fabrics of medium-low density.
The resulting flood hazard map is reported in Figure 2 along with the land cover, and highlights the large amount of residential 300 fabrics, industrial and commercial settlements potentially affected by storm events featuring low-medium return periods. An unacceptably high flood risk is therefore evidenced for the study area. Most of such areas were too dispersed or small to have suitable intersections with MPU grid cells. Therefore, they were grouped in four macro-areas referred to the river network and distinguished between urban fabrics (red) and commercial-industrial settlements (dark green). As shown in Figure 2 they are: Laorna and Gandovere streams confluence (areas 1 and 5), La Canale and Solda streams (areas 2 and 6), southern Gandovere 305 canal (areas 3 and 7), Mandolossa canal in Roncadelle Municipality (areas 4 and 8).

Available mobile phone data
This work focuses on mobile phone data provided by Telecom Italia Mobile (TIM), which is currently the largest operator in Italy in this sector. According to national economic newspaper, TIM national share amounted to 30.2 % in December 2016 (Il Sole 24 ore). In our analysis, Erlang measure data represent the average number of mobile phone SIMs (both calling and not 310 calling) that are assigned to that grid's cell in that quarter. Erlang measure data have already been used in the context of urban planning along with statistical methods by Carpita and Simonetto (2014) and Metulini and Carpita (2019a), who analyzed the presence of people during big events in the city of Brescia, by Zanini et al. (2016), who find, by mean of an Independent Component Analysis (ICA), a number of spatial components that separate main areas of the city of Milan, and by others (Manfredini et al., 2015. 315 In this study, reliable Erlang measures of MPU recorded by TIM company are available. The investigated area is marked in black solid line in Figure 1 (WGS 84 UTM 32 N coordinates: 5,040,049,980N,585,970E, area about 64 km 2 ) and is centered on the Mandolossa-Gandovere network. The area is covered by a georeferenced grid of square cells with 150 m sides, which provides the number of TIM users every 15 minutes. In details, for each grid's cell and for each time interval, the corresponding record refers to the average number of mobile phones simultaneously connected to the network. For instance, 320 Figure 3 shows a detail of the spatiotemporal distribution of TIM MPUs occurred on Wednesday, November 18 th 2015, in a sample area of 20x20=400 cells, near to the Mandolossa inlet. Therein, exposed areas, obtained by intersecting the urban covers with the flooding areas, were also reported. Thus, Figure 3 provides a sequence of snapshots of a dynamic map of people exposure to floods. As can be seen, the spatial distribution of raw data is realistic, as major densities suitably concentrate along the main street network and in the urban areas. The temporal variability is also reasonable; for instance, lower densities 325 are evidenced during nighttime in industrial sectors and main streets; see, for instance, the industrial settlement near to the confluence of La Canale stream in the Mandolossa canal (flooding area marked with 6 in Figure 2). The mobility feature of these data is hidden, meaning that it is not possible to trace the path followed by a single MPU over time. Measures are available in the period 2014-2016, even though after data inspection a more limited subset was found to be suitable for the analysis (from July 1 st 2015 to August 11 th 2016), due to data collection issues. 330 4 Analysis procedure application

Procedure parameterization
The application of the HOG procedure to reduce the dataset dimensionality was performed for each quarter of day in T d , by dividing the original grid in 9 smaller grids G i , i = 1, …, 9. The parameter √ was thus set at 3. For each G i , gradients and direction were then computed and the histogram of gradients with k=4 bins corresponding, respectively, to angles 0-45, 45-335 90, 90-135 and 135-180 was obtained. In general, the recognition of the analyzed object is improved by increasing the number of bins. This value was chosen in order to maximize k but, at the same time, avoiding the presence of zeros in the vector of HOG features. Extracted features count for 3 2 *4 = 36 in each quarter, which correspond to a dimensionality reduction in the order of 400/36 ≈ 11. The final vector d with all quarters of the same day stacked for sample area near to the Mandolossa inlet (sample area evidenced in Figure 3) and for day d amounts to 36*96 = 3456 features. 340 The hierarchical k-means cluster analysis, where the objects of the cluster are the days d and the variables are represented by the features of d , was performed on a total amount of 360 days (d = 360) from July 1st 2015 to August 11th 2016. After data inspection, only the days of the last available year (from July 1st 2015 to August 11th 2016) were included in the analysis, since the first year (April, 2014 to June 30th 2015) features some collection problems. In effect, a configuration with 3 clusters sharply separating the days of the first year (till June 2015) and the days of the second year (by July 2015) was estimated, by 345 performing the cluster analysis by using the full set of data. In the final sample all holidays were removed, in consideration of their specific characteristics with respect to typical days. More precisely, August, 15 th , 1 st and 2 nd November, 8 th December, 24 th to 26 th December, 31 th December, January 1 st and January 6 th . 27 th and 28 th March (Easter), 25 th April, May 1 st , June 2 nd were removed. In addition, those days where a large amount of data (>10%) were missing were removed, as well. Conversely, data in those days where missing data were less than 10% were maintained. A test for possible presence of curse of 350 dimensionality, based on the distribution of the distances among pairs of objects, has been performed. A unimodal distribution that suggests the absence of such a problem was derived. On the whole, the amount of suitable data appears to be sufficient to get reliable estimates of people exposed to the flooding risk in the study area.
The number of first-step clusters was chosen according to the relative decreasing trend of the total within sum of squares with the increase in the number of groups. Figure 4 shows this trend, evidencing that a splitting in 4 clusters would decrease by half 355 the total within sum of squares with respect to a 1 cluster splitting. Since this decrease appears to be satisfactory, the sample of days was split in H = 4 clusters, where, C 1 mainly corresponds to all days of July, August and September (green spine-plots shown in Figure 5), C 2 corresponds to working days from February to June (blue spine-plots shown in Figure 5), C 3 corresponds to working days from October to January (red spine-plots shown in Figure 5) and C 4 corresponds to the weekends except for those of summer (yellow spine-plots shown in Figure 5). 360 Hence, SMU variability over time instants was accounted for by considering the Mandolossa's DDP of each day as a functional curve. Firstly, those days that have to be considered outliers were removed by using the curve outlier detection method (Febrero-Bande et al., 2008) separately for each first-step cluster. Secondly, it was evaluated whether days should be further grouped in terms of dissimilarity in DDP functional curve dynamic. To do so, the assumption of independence of our functional data was tested by using Portmanteau (Gabrys and Kokoszka, 2007) and distance correlation (Székely and Rizzo, 2013) tests. 365 Model based functional data clustering techniques (Bouveyron et al., 2015) suggests to split the "summer" group in 3 subgroups, containing, respectively, the days of July, the days of August and the days of September. This second-step splitting leads to Z=6 final clusters, where C F,1 includes days of July, C F,2 includes days of August, C F,3 includes days of September and C F,4 , C F,5 and C F,6 match, respectively, C 2 , C 3 and C 4 .
Illustrations of DDPs for representative days by using functional box plots are reported in Figure 6 (residential fabrics) and in 370 Figure 7 (commercial and industrial settlements), for each of the 8 flooding areas shown in Figure 2 for the 10 years return period. Although functional data clustering suggests the splitting in six groups, for sake of clarity summer months (July, August and September) were combined in a single final cluster (Cluster 1, C1), as well as all the working-days from October to June in a single final group (Cluster 2, C2) and the weekends from October to June (Cluster 3, C3), thus leading to 3 clusters.
To extract the number of people in each quarter of each day from the grid's cells to the irregular polygon of each area (i.e. to 375 find , for area r i , i = 1, .., 8, and day d) the procedure described in Section 2.2 was applied. Hence, MPUs were firstly divided by a constant c = 0.85, in order to consider children (>12 years) and old peoples (>80 years), who likely do not have smartphones (i.e. about 85% of people are in the age range [12,80] in Brescia). Then, by estimating the median value of the market share ratio at SC level adopting the strategy in Section 2.2, which amounts to about 20%, the estimated DDP s for each area and for each day were derived by applying Eq. (5). The estimated market share is also consistent with that found by 380 Carpita and Fabbris (2019). Estimated DDP ri,d underwent the functional box plot strategy, separately for the days d in the 3 clusters (with outliers excluded) and for r i corresponding to the 8 areas. In Villaggio Badia residential area located North of Mandolossa canal inlet (flooding area 2 in Figure 2), the city user number varies between a minimum of 1200 people and a maximum of 1400 people, during and average day (inhabitant density about 30-35 ha -1 ). During the working-days of months from October to June (cluster C2), the peak reaches 1600 users. Moreover, the descriptive bands appear to be wider in summer (cluster C1) and on weekends (cluster C3) as compared to cluster 2, where bands are narrower (i.e. lower variability between days). 395

Results and discussion
Residential areas along the southern Gandovere canal (flooding area 3 in Figure 2) are little populated. Only 50-70 users are there during an average day of summer (cluster C1) or during the week end (cluster C3) (inhabitant density about 18-23 ha -1 ).
The number amounts to more than 80 people on working hours of working days (cluster C2). Number of city users in Roncadelle's residential area located along the Mandolossa canal (flooding area 7 in Figure 2) is less sensitive to working hours, especially during summer. In summer the number of city users varies from a minimum of about 600 up to a maximum 400 of 700. City users are about 800 during working hours of working days and weekends (days belonging to clusters C2 and C3).
Industrial and commercial settlement of Moie di Sotto (flooding area 5 in Figure 2) feature 1000-1500 people (night, first hours of the day-working hours) in summer. These numbers increase up to about 1200-1800 in working days and to about 1100-1600 in weekends. This high density is mainly due to the presence of a commercial outlet of regional interest in this area. Industrial and commercial settlements along La Canale and Solda stream network (flooding area 6 in Figure 2) are very 405 highly populated by city users and the difference between the number of people in summer and in working days is significant.
Daily minimum and maximum values are included between 2000 and 3000 in summer, and between 2500 and 3500 during working days. Weekends follow a stable dynamic (about 2500 people all along the day).
Flooding areas related to southern Gandovere canal (flooding area 8 in Figure 2) presents a productive area with an average number of users varying from 250 to 380 in summer and from 300 to 420 during working days. In the same way of Villaggio 410 Badia (southern part of flooding area 2 in Figure 2), the number of users along the weekends is stable and it stands to about 320/330. The industrial sector of Roncadelle (flooding area 4 in Figure 2) is another highly populated area, featuring more than 2000 people during the day. In some particular working days and weekends this number reaches about 3000 (red dashed lines of outliers in Figure 7). In this area a remarkable difference in the number of users between working days, summer days and weekends was not detected. 415

Conclusions
In this paper a novel approach to the assessment of the risk related to people exposure to floodings based on a geostatistical analysis of Erlang measures was proposed. Such a procedure takes advantage of data reduction (histogram of oriented gradients discussed in Section 2.1), in order to face the dimensionality curse issue. Its suitability and potentials were demonstrated with regard to an urban outskirt area located near to Brescia (Lombardy, Northern Italy), which is affected by widespread and 420 frequent floodings. In Figure 3 the possibility of expressing the spatiotemporal variability of exposed people by using time variable maps was illustrated. These data feature high spatial resolution (150 m) and short time step (15'), thus providing reliable assessments even for the smallest analyzed areas (about 4-5 ha) and a precise evaluation of the temporal dynamic.
Indeed, daily density profiles can be derived according to this procedure. Then, these profiles can be clustered yielding groups of similar daily time patterns. Clustering results are definitively meaningful, since working days and weekends are 425 acknowledged to show different temporal dynamics, when they belong to working months (from October to June). Conversely, daily dynamics in summer months (July, August and September, usually exploited for the longest holydays in Italy), must be regarded as different from the others. In addition, working days and weekends feature more similar daily density profiles during such months. As can be seen in Figure 6 and in Figure 7, the daily temporal variability of people exposed to floodings can be assessed with respect to the day cluster and the type of urban areas (residential or industrial-commercial), both in terms 430 of expected value (the median) and uncertainty (confidence band), thus providing a comprehensive information to agencies and authorities devoted to the flood risk management.
The need to assess the entire population would theoretically require the gathering of a huge amount of datasets, from all the providers that operate in the area of interest. This issue would lead to a relevant increase in the data collection cost and would be difficult to overcome. Nevertheless, census data make it possible to infer the total population from the users of a single 435 provider, by means of local estimates of its market share, as discussed in Section 2.2.
It worth underling that this statistical support, along with the high spatiotemporal resolution and the reliability of the raw data, makes the proposed procedure particularly appealing in order to decrease the errors of exposed people estimates. Such a support is not provided by crowdsourcing techniques, which are based on voluntary data supplies and commonly rely on very limited datasets with respect to the amount of the exposed people. A second advantage that must not be disregarded lies in the 440 possibility of exploiting dynamic exposure maps, or alternatively the clustered daily density profiles, off-line that is independently of the potential malfunctioning of the mobile phone connection during the flood episode. Conversely, crowdsourcing could be strongly compromised by the difficulties of connecting to the network during the emergency period.
Indeed, dynamic exposure maps derived by mobile phone data have strong potentials to substantially improve emergency plans, so that real-time rescues, relief supplies and traffic management could be better addressed. 445 https://doi.org/10.5194/nhess-2020-201 Preprint. Discussion started: 2 July 2020 c Author(s) 2020. CC BY 4.0 License.