Dynamic maps of human exposure to floods based on mobile phone data

Floods are acknowledged as one of the most serious threats to people’s lives and properties worldwide. To mitigate the flood risk, it is possible to act separately on its components: hazard, vulnerability, exposure. Emergency management plans can actually provide effective nonstructural practices to decrease both human exposure and vulnerability. Crowding maps depending on characteristic time patterns, herein referred to as dynamic exposure maps, represent a valuable tool to enhance the flood risk management plans. In this paper, the suitability of mobile phone data to derive crowding maps is discussed. A test case is provided by a strongly urbanized area subject to frequent flooding located on the western outskirts of Brescia (northern Italy). Characteristic exposure spatiotemporal patterns and their uncertainties were detected with regard to land cover and calendar period. This novel methodology still deserves verification during real-world flood episodes, even though it appears to be more reliable than crowdsourcing strategies, and seems to have potential to better address real-time rescues and relief supplies.

Abstract. Floods are acknowledged as one of the most serious threats to people's lives and properties worldwide. To mitigate the flood risk, it is possible to act separately on its components: hazard, vulnerability, exposure. Emergency management plans can actually provide effective nonstructural practices to decrease both human exposure and vulnerability. Crowding maps depending on characteristic time patterns, herein referred to as dynamic exposure maps, represent a valuable tool to enhance the flood risk management plans. In this paper, the suitability of mobile phone data to derive crowding maps is discussed. A test case is provided by a strongly urbanized area subject to frequent flooding located on the western outskirts of Brescia (northern Italy). Characteristic exposure spatiotemporal patterns and their uncertainties were detected with regard to land cover and calendar period. This novel methodology still deserves verification during real-world flood episodes, even though it appears to be more reliable than crowdsourcing strategies, and seems to have potential to better address real-time rescues and relief supplies.

Introduction
Floods are natural phenomena whose hazards afflict nearly 20 million people worldwide (Kellens et al., 2013), posing a serious challenge to the protection of people's lives and the liveability of urban settlements. A high-confidence increasing trend in the economic damages and social costs due to extreme weather events has globally been documented (Kreibich et al., 2019). Two major factors can be advocated for justifying such a trend: climate change and increased urbanization (Hartmann et al., 2013). These factors involve different components of the risk concept (UN ISDR, 2009), which is given by the combination of hazard, exposure, and vulnerability. Climate change was popularly acknowledged as a leading cause for the increases in the frequency and intensity of heavy storms and, consequently, of the flood hazard (Solomon et al., 2007). However, according to the Intergovernmental Panel on Climate Change (IPCC) (Hartmann et al., 2013) and as also confirmed by up-to-date analyses of flood intensity in Europe (Blöschl et al., 2019), the absence of a likely global trend in the incidence of floods arises. Reasons lie in the high regional variability in heavy storm trends as well as in the strong influence played by watershed hydrologic characteristics and local flood management practices.
In contrast, population urbanization represents a likely global trend, though characterized by a strong regional variability. Migration from the countryside or mountain areas to cities is the main driver of urban sprawl. In 2008, for the first time in human history, more than half of the world population was living in urban settlements, and the percentage continues to augment (UN DESA Population Division, 2012). Touristic demand is an additional driver for urbanization growth that plays a peculiar role in developed countries. For instance, in Italy many areas are affected by emigration, namely Alpine and Apennine valleys, southern regions, and islands, and urbanization growth rates are equal to or greater than the na-Published by Copernicus Publications on behalf of the European Geosciences Union. tional average rate. A clear example is provided by southern Italy, where the annual rate of soil consumption between 2017 and 2018 was 0.23 %, greater than the national average of 0.21 % (ISPRA, 2019), even if southern Italy faced a population decrease of 1.5 % between 2015 and 2019 (the national average loss was estimated at 0.7 %) (CENSIS, 2019). An additional example is provided by the Sondrio province (in the mountainous part of the Adda river basin, Lombardy, northern Italy), where the yearly soil consumption per capita is 1.11 m 2 , whereas the regional one is 0.63 m 2 .
Urbanization determines dramatic increases in human exposure and vulnerability to floods in immigration areas since most of recent urbanization lies in flood-prone areas, and local communities are not able to put effective flood defence practices in place. Moreover, urbanization usually leads to the impairment of the conveyance capacity of the stream network so that flooding areas are basically larger than in the undeveloped condition. The urbanization sprawl consequently results in increased damage to communities, private properties, and public infrastructures and must be regarded as the main cause for the likely increasing trend of flood risk (Barredo et al., 2009). A vast body of research on flood risk changes under economic and population growth scenarios indicates that this contribution is at least equal to but commonly larger than the climate change one (Feyen and Dankers, 2009;Maaskant et al., 2009;Bouwer et al., 2010;Te Linde et al., 2011;Rojas et al., 2013).
This flood risk trend forced an unavoidable shift in the paradigms of flood defence, by recognizing that not all events can be completely controlled and that structural practices have limits (Johnson and Priest, 2008). Thus, focus must be placed on how to mitigate the damages to flood-prone communities (European Union, 2007). Over the last decades, flood risk management has evolved from a structural-based defence approach, aimed at decreasing the hazard component, towards a more holistic perspective (Merz et al., 2010;Arrighi et al., 2019) taking into consideration vulnerability and exposure. Novel concepts were introduced, such as residual risk (UN ISDR, 2009), accounting for the potential structural failure of the defence system (Vorogushin et al., 2009;Schumann, 2017;Balistrocchi et al., 2019), and resilience, that is the ability to recover from a damage or to absorb an impact (Liao, 2012).
Differently from the hazard, the mitigation of exposure and vulnerability can be pursued by means of non-structural practices. Among them, a prime role is played by emergency management plans, which allow authorities responsible for the protection of local communities to dispatch timely and appropriate mitigation measures during the occurrence of flood episodes. The development of effective emergency management plans is closely related to the concept of authorities' preparedness (European Union, 2007). Such plans are actually intended to provide people with early warnings, reliable real-time information, and the improvement of relief supplies and rescue efforts. In this regard, a detailed and reli-able picture of the real-time spatiotemporal variability in the flood risk would be highly beneficial.
Presently, large numbers of geospatial data can be obtained from a number of sources, namely remote sensing or aircraft platforms. These sources yield situation snapshots but do not provide information at the spatiotemporal resolution needed for managing urban flooding and are hardly validated on the field. To overcome these problems, the possibility of taking advantage of crowdsourcing techniques has recently attracted much attention (Mazumdar et al., 2017;Rosser et al., 2017;Hirata et al., 2018;Mazzoleni et al., 2018). These techniques have been made available by the widespread proliferation of smartphones and tablets, along with the success of social media. During emergency phases, the advantages in the real-time implementation of the emergency plan are twofold: firstly, a large amount of volunteered geographical information suitably georeferenced can be collected (Goodchild, 2007), guiding authorities in developing collaborative flooding maps or in estimating the number and location of exposed people; secondly, crucial information can be communicated to the people exposed, making them aware of the actual risk magnitude and enhancing their capability to face the situation. After the emergency phase, authorities can also exploit the collected data to enhance their preparedness and to better match the emergency management plans to specific real-world needs of the flood-prone community.
Capabilities and drawbacks of various crowdsourcing techniques have long been debated, even though crowdsourcing has already found successful applications in some weather related disasters (Poser and Dransch, 2010;Hung et al., 2016;Guntha et al., 2018). Such studies have underlined that, in many cases, during the emergency phase, crowdsourced data have at least the same quality as the authoritative ones (Goodchild et al., 2017). Nevertheless, several concerns have been pointed out through analyses of crowdsourcing technique applications to real-world disasters: (i) the raw data quality is generally poor because of malicious intentions of nasty elements in the community or incompetence of stakeholders so that spurious, erroneous, malformed, redundant, or incomplete data must be purged out of the database to make them suitable; (ii) the sample significance is basically low, owing to the limited number of exposed people actively participating in crowdsourcing so that information of general interest must be extrapolated from an exiguous fraction of the whole population; (iii) the communication network is not completely reliable as it could fail or malfunction during disaster occurrences.
Efforts of the research have therefore been addressed towards the development of methodologies to filter such rumours from crowdsourced data (Han and Ciravegna, 2019). However, other approaches can be followed to develop effective dynamic information tools through the exploitation of mobile phone data collected by providers. Such data make it possible to geo-localize mobile phone users over time in order to derive time-dependent crowding maps. When such maps are intersected with hazard maps, showing the flooding-area extensions corresponding to a selected frequency of occurrence, dynamic exposure maps are obtained. As demonstrated by Carpita and Simonetto (2014) with reference to episodic crowd concentrations due to social events, recurrent spatiotemporal patterns can be derived from mobile phone data by means of geostatistical analyses. Herein, the application of this methodology to the periodic spatiotemporal variability in the resident population related to homework mobility is investigated. To do so, additional tools are developed to extrapolate the real-world population from the crowding maps of providers clients.
Dynamic exposure maps respond to a general demand for a dynamic approach to flood risk assessment and management (Viglione et al., 2014). Actually, flood risk varies over time not only with regard to climate non-stationarities and urban development trends since hydrological, economical, political, technological, and social processes are also involved. For instance, effective campaigns devoted to increasing awareness of flood risk, to promoting people's capability to undertake effective waterproofing practices, or to exploiting warning systems can dramatically diminish the flood risk over time. The same occurs by keeping the memory of flood disasters. These processes are interrelated and evolve over time (Di Baldassare et al., 2013). In this regard, some authors proposed dynamic agent-based models to assess the temporal change in flood risk (Dawson et al., 2011;Haer et al., 2016). Such models are capable of performing a spatially distributed analysis of flood risk, accounting for multiple factors, or agents, and their action-feedback relationships.
It must be pointed out that dynamic risk maps should account for the spatial heterogeneity of urbanization in order to obtain precise assessments. Flood-prone urbanization can feature very different characteristics relevant to both exposure and vulnerability, even inside the same watershed. Land use is the first discrimination level to be considered as population densities and temporal patterns could significantly differ in commercial, industrial, service, transport, and residential areas. Secondly, the fabric type may also have remarkable impacts on the overall risk. Indeed, Fuchs et al. (2015) evidenced significantly different exposures among various types of land use (tourist accommodation, commercial, recreational, residential) in Austria. Urbanization heterogeneity is also relevant for flood risk studies in developing countries as shown by Vu and Ranzi (2017); in their assessment of flood risk in central Vietnam, they estimated the exposure and vulnerability of buildings and people by collecting questionnaires including data on building types.
To demonstrate the capabilities of the geostatistical analysis proposed herein, it is applied to a suitable case study, identified in the western outskirts of Brescia (Lombardy, northern Italy). A detailed knowledge of the flooding dynamics and a sizable set of mobile phone data are available for this watershed. The suggested approach made it possible to derive reliable dynamic exposure maps with respect to the land coverage and the calendar time periods, obtaining estimates of the expected number of people affected by flood hazards along with its uncertainty. Hence, the paper is organized according to the following sections: (i) firstly, the innovative aspects of the geostatistical analysis methodology utilized herein are illustrated; (ii) secondly, the main hydraulichydrologic features of the analysed study area are described along with the available mobile phone data; (iii) the methodology application and the results are finally discussed.

Analysis methodology
The proposed geostatistical approach relies on erlang mobile phone measures. An erlang is the unit of measure of traffic intensity in a telecommunication system or network and it is widely used to quantify load and efficiency. The name is a tribute to Agner Krarup Erlang (1878-1929, a Danish mathematician and statistician who was the first to work on traffic engineering (Erlang, 1909). In this study, erlang measures consist of two-dimensional matrices which provide the spatial distribution of the average number of mobile phone users (MPUs) bearing a SIM connected to the network within a temporal interval and inside a spatial region. These data are collected by mobile phone providers and recorded at constant time steps with reference to a georeferenced grid of square cells. The availability of this kind of data is progressively capturing the attention of urban planners (Becker et al., 2011;Calabrese et al., 2015) as they offer a variety of potential applications. In this study, the MPU spatiotemporal variability was summarized by means of daily density profiles (DDPs) that provide the variability within a day of the MPU included in a spatial region of interest. Such regions are inundation areas, thus expressing the spatiotemporal variability in people exposed to the flood risk. To define DDP, let e it be the number of MPUs in the ith grid cell in a generic time interval t. Let Herein, the interest lies in analysing and classifying the occurrences in a time series of DDP rd , related to a set d = {d 1 , . . ., d n } of n analysed days. More precisely, the proposed approach firstly involves the clustering of similar DDP rd , as discussed in detail in Sect. 2.1. The clustering procedure consists of two steps. In the first one, MPU spatial variability inside region r is considered by changing index i in an R 2 x-y coordinate space; to do so a data reduction strategy is applied. In the second one, the DDP rd temporal variability is evaluated by changing index t in an R 1 space.
The characteristics of our data raise some issues related to the choice of the clustering technique to be chosen. In fact, traditional techniques (Arabie and De Soete, 1996) may not produce robust results when the number of variables is larger than the number of observations. Our data amount to n observations and p = m · o variables (number of MPUs per day). For instance, let us consider a case in which available data refer to 1 year (i.e. n = 365): information in each cell of the grid is available 4 times per hour (thus, o = 96), and the region is covered by 500 grid cells (m = 500). It follows that the number of variables is much larger than the number of observations (p>n), and so we refer to a high-dimensional data set-up (Donoho, 2000). In high dimensionality, some issues need to be considered, such as those of the curse of dimensionality (Keogh and Mueen, 2017). Bouveyron et al. (2007) addressed this issue with regard to clustering. However, as suggested by Jovi et al. (2015), a suitable solution is represented by data reduction. To do so, the histogram of oriented gradients (HOG) approach is used in this paper. Therefore, data reduction works on index i in order to convert the support from R 2 (x-y coordinate space) to R 1 .
Once the DDP rd are clustered in statistically similar groups, the total number of people in set T d and region r can be estimated and associated with descriptive bands (DBs), as discussed in Sect. 2.2. In this regard, there is crucial concern due to the lack of MPU data from all companies providing phone services in northern Italy. To deal with this problem, as first suggested by Metulini and Carpita (2020), the approach proposed in this paper adopts a strategy to infer the total number of people by matching census data to available mobile phone data.

Data reduction and clustering
To cluster similar DDPs, a technique for high-dimensional data reduction is adopted first. Then, reduced data are analysed by using a high-dimensional data clustering. Separately for each element of set T d (i.e. for a given t), let ε it = e 1,t , e 2,t , . . ., e im,t be the MPU vector of region r in time instant t (dimension m). The aim is to reduce ε it to the vector of values κ it (dimension m <m), which contains the relevant information contained in set ε it . To do so, set ε it , separately for each t, undergoes a histogram of oriented gradients (HOG) feature extraction (Dalal and Triggs, 2005;Tomasi, 2012). Traditionally, HOG is applied to redgreen-blue (RGB) images in order to find similarities among images and to classify them. Vector ε it can be considered a raster that expresses the magnitude of its elements through colours since each value has an associated location in the R 2 x-y coordinate space. In consideration of our setting, the application of the HOG method looks straightforward.
According to first-step clustering, the interest lies in the relative distribution of the MPUs in region r, not in the absolute number of users. To be consistent with this aim, the HOG was applied to a normalized vector of ε it . Vec-tor z it is thus defined as z it = e i,t /max i ∈ I r (e i,t ), ∀ i ∈ I r . In order to obtain the vector of features κ it , by displaying the relevant information in z it , the HOG method firstly divides the cells of the grid in S smaller grids G 1 , . . . , G S (G i ∩ G j = Ø, ∀i = 1, . . ., S and ∀j = 1,. . . , S is a parameter that needs to be chosen. Then, by using two different gradient matrices of each grid G i (for details, see Dalal and Triggs, 2005), the direction and the magnitude matrices are obtained. These matrices are used to derive the histogram of gradients with k equal bins, where k is a parameter that needs to be chosen. The larger the parameter k is, the better the results are; moreover, in related literature k usually ranges from 4 to 20 (Salhi et al., 2013). The vector of features κ it is given by all the elements of the derived histogram of oriented gradients, stacked ∀S. Considering that the length of κ it is S · k, the vector κ it is stacked over the subscript t in order to derive (for region r and day d) the vector of features κ d (dimension S · k · o). κ d is used in the first clustering step of the method, where days represent the objects to be clustered in terms of how the MPUs are distributed over region r according to index i. κ d is computed for all the days in dataset d = {d 1 , . . ., d n } and a k-means cluster analysis (Hartigan and Wong, 1979), in which the objects to be clustered are the n days, and κ d contains the values of the S · k · o (with S · k · o<m · o) variables for day d to be attributed to a cluster. According to the Hartigan-and-Wong criterion, the cluster number H is chosen by looking at the ratio between the total within the sum of squares (Tot within SS H ) and the total sum of squares (Tot SS H ) for different values of H that needs to be minimized with respect to H . For a certain H , the total within the sum of squares is defined as Tot within where µ is the centroid vector for the full set of data. At this point the elements in d = {d 1 , . . ., d n } (the days) have been assigned to a number of clusters C 1 , . . . , C H (C i ∩C j = Ø, ∀i = 1, . . ., H , and ∀j = 1, . . ., H with i = j ).
In the second step, when the MPU variability over time is accounted for, we consider the vector DDP rd (for a given region r) as the collection of functional observations x rd m l=1 e il,t1 in t 1 ). To do so, we adopt a model-based functional data-clustering method (Becker et al., 2011) since it is more flexible than the alternatives: to each cluster it provides an estimated functional curve with specific parameters. We group days d (cluster's objects) in terms of the o DDP rd values (cluster's variables) separately for each cluster of the previous step. Similarities in the functional form of the DDP rd are considered if viewed in terms of curves of values (y axis) with respect to time instants (x axis). The curves of each group are modelled by using a group-specific set of distributional parameters (see Becker et al., 2011 for details). The adopted method can be used in high dimensionality since the clustering process employs the criterion of the sub-space clustering (Agrawal et al., 1998), which is adopted when one is only interested in considering the minimum number of variables needed for grouping objects, and thus to reduce the dimensionality. In detail, it is proposed herein to adopt the following path: (i) functional data outlier detection by likelihood ratio test (LRT) is adopted to remove the anomalous DDP rd , as proposed by Febrero-Bande et al. (2008); (ii) the Bouveyron et al. (2015) clustering method is applied using the funFEM package in R.
The aim of this strategy is to assign the elements in d = {d 1 , . . ., d n } (the days) to final clusters C F,1 , . . . , C F,Z (C F,i ∩ C F,j = Ø, ∀i = 1, . . ., Z, ∀j = 1, . . . , Z), with Z ≥ H . Thus, adopting these steps makes it possible to represent the dynamic of the MPUs' presence in terms of a representative DDP for each group of days. Representative here means that each group includes days that are similar in terms of index i (spatial distribution of MPUs) and index t (temporal dynamic of MPUs) and dissimilar from those included in other groups.

Population assessment
With the clustering strategy it is possible to display the number of mobile phone users in region r for a set of time instants in clusters of days. However, the estimate of the total amount of people is needed for developing dynamic exposure maps. Unfortunately, data availability regards only one mobile phone company. To have a reliable estimation of the total number of people actually present in the study area, users of other mobile phone providers must be considered as well. Collecting all these data is either unfeasible or unsustainably expensive. To perform analysis on a national scale, a convenient solution is represented by the use of the market share of the provider company, which can be applied to "correct" the DDP rd . Hence, an estimation of the total number of people can be obtained (e.g. let s n be the nationallevel share assuming values in the range [0,1]; the correct DDP would be DDP correct = DDP rd s n ). Country-level estimates are available from Il Sole 24 Ore newspaper (Il Sole 24 Ore, 2017). However, the market share usually varies significantly among cities, according to different socio-economic characteristics of users. Thus, to suitably estimate the market share, the smallest level of aggregation, represented by the "Sezioni di Censimento" (SCs) (i.e. population census districts), was used in this study (ISTAT, 2011). The following strategy is suggested: the number of residents from administrative archives is compared with the number of Telecom Italia Mobile (TIM) users in a residential area in the late evening hours. Bearing in mind the characteristics of the social dynamics of the analysed residential areas, it is reasonable to assume that, in the late evening hours, residential SCs are only populated by residents. The comparisons, using data from ISTAT (Anagrafe Comunale), were performed separately for each SC.
Since the MPU grid is made of square cells, while SCs are irregular polygons, the number of TIM users belonging to each SC was estimated by intersecting these spatial data. Thus, the portion of the cell belonging to the SC polygon was calculated in order to count how many TIM users are present in each polygon by using the function extract in the raster package in R. Let Cell j (j = 1, 2, . . ., J SC ) be the TIM cells (pixels) which overlap a chosen SC; the ratio A j in Eq. (2), represents how much of Cell j is included in the chosen SC (for each SC, A j >0; j = 1, 2, . . ., J SC ). If Cell j is completely covered by the SC, then A j = 1, otherwise A j <1. Let TUC j be the density of TIM users in Cell j ; the estimation of the number of TIM users in SC ETU SC is computed as shown in Eq. (3).
The estimated TIM market share in SC ETMS SC is thus given by the ratio in Eq. (4), where P SC is the resident number assessed by population census for the SC.
Differently from s n , the ETMS SC range is not necessarily [0,1] since P SC could be smaller than ETU SC . An application example of this procedure can be found in Metulini and Carpita (2019b). In the count of P SC , children (younger than 11 years) and elderly people (older than 80 years) were excluded, aiming to take into consideration only people bearing a smartphone. The distribution of ETMS SC can be used as a proxy for the TIM market share on a city level. More specifically, it appears to be convenient to use the median of the distribution of ETMS SC . The median is preferable to the mean in those cases when the distribution is symmetric. Metulini and Carpita (2019b) showed the presence of a strongly asymmetric distribution of ETMS SC for the case study of the city of Brescia. Moreover, Carpita (2019) showed good results in terms of the comparison between estimated people and official data by using the median for the case study of Lake Iseo during the Floating Piers. Let me(.) be the median statistics; the estimated DDP rd for a given region r for a given day d is finally given by Eq. (5).

Result representation
For the sake of result interpretation, a graphical representation is herein adopted. Let us consider the DDP rd vector to be a functional curve x rd (T d ) displaying, on the y axis, the sum of MPUs in region r and day d with respect to time instants T d ∈ (t 1 , . . ., t o ) on the x axis. Functional box plots (FBPs) (Sun andGenton, 2011, 2012) can be used to display the profile for representative days.
In FBPs, a cluster of curves is ordered in terms of its "band depth". A "median" value and an "envelope" are generally used to define the functional counterpart of traditional descriptive bands. Moreover, one can detect outlier curves. Specifically, a curve is an outlier if it exceeds the margins of the envelope by 1.5 in at least one considered time instant.
Separately for each final cluster, an FBP is derived. Let us consider cluster h, let d h = {d 1h , . . ., d nh } be the group of days that belong to cluster h, and let DDP rd,h = [ DDP r,d1h , . . ., DDP r,dnh ] be the matrix of dimension o · n h with a DDP rd in each column. By considering each vector as a curve, the FBP representing the profile plot of the total number of people (that we will call "city users", or simply "users") at different hours (with DBs) on representative days is computed using matrix DDP rd,h .

Case study description
The study area was selected as an emblematic and widespread situation of the unacceptable high flood risk affecting the foothill zone of the Po river valley (Lombardy Region, northern Italy). It lies on the western outskirts of Brescia ( Fig. 1) and is overall included in the watershed of the Oglio river, a primary left-bank tributary of the Po river. The main drainage is supplied by the Mella river, which is a left-bank tributary of the Oglio river and bounds the eastern side of the study area. As can be seen in Fig. 1, the study area features five natural streams originating from the southern boundary of the Alpine chain. From west to east, they are Laorna, Gandovere, Vaila, La Canale, and Solda.
Before the area was anthropized, most of such streams had probably flooded the alluvial plain swamping into marshes without a main outlet in the main river network. This was the result of both their almost ephemeral regime and the endorheic morphology of their watersheds. As the agricultural use of the alluvial plain grew, these streams were connected to the constructed irrigation-drainage network in order to exploit their low flow for irrigation purposes and to drain the flood flow into the Mella river. These constructed downstream reaches feature two main drainage canals: the Gandoverello canal and the Mandolossa canal, depicted in Fig. 1. They have become the artificial downstream reaches of the five watersheds, providing a drainage capacity in the southern direction both for the mountain watersheds and lowlands located in the alluvial plain. The total catchment area amounts to 112.3 km 2 , featured by an average imperviousness of about 22 %.
In particular, the Laorna drainage path was strongly manipulated by a constructed straight canal 7 km long that diverts its streamflow towards the Gandovere stream and intercepts additional surface run-off produced by the northward lowlands. Both streams thus converge into the Gandoverello canal. This was formerly the downstream reach of the Gandovere stream, whose limited conveyance capacity made it necessary to decrease the hydrologic load. Thus, a flow divider was constructed upstream of the Laorna confluence so that almost half of the Gandovere streamflow is diverted towards the Mandolossa canal inlet by means of a diversion canal that intercepts the Vaila stream. All these flow discharges, along with those coming from the La Canale stream and the Solda stream, converge into the inlet of the Mandolossa canal, which is characterized by the largest conveyance capacity in the study area.
This drainage network is also exploited by the irrigation system of Franciacorta, a vineyard agricultural district located west of the study area, for the final disposal of the residual flow discharges. In Fig. 1 two of the most important irrigation canals belonging to this system, Seriola Castrina and Seriola Nuova, are reported. Their discharges mainly affect the Gandovere and Laorna streamflows. Further contributions come from the western Brescia outskirts in terms of both urban storm waters and irrigation excess flows, which directly drain into the Mandolossa canal. Finally, water table resurgences of the high Po river valley are also present downstream of the Mandolossa canal inlet. Their fresh waters are however intercepted and drained by the downstream reach of the Mandolossa canal.

Hazard mapping
Since the late '50s of the last century, this area has been subject to a deep urbanization sprawl, yielding the present land cover condition depicted in Fig. 2. The dramatic increase in both the urban fabric and the industrial-commercial coverage has occurred at the expense of croplands and permanent crops so that sparse and isolated fabrics have evolved in a continuous and heterogeneous urbanization. In addition, various transport units have been upgraded to speedways, and two highways were constructed. The flood risk perception in this area has historically been related to the Mella river inundations, which affected the Brescia outskirts in the late '60s of the last century until its riverbed underwent important engineering works. Conversely, with reference to the secondary stream network, the absence of a clear risk perception allowed such an urbanization sprawl to occur regardless of the floodplain extents. As Fig. 2 clearly shows, most of the urban-fabric areas and the industrial and commercial settlements are adjacent to the stream network.
The increase in the land coverage yields huge impermeability degrees for the plain watersheds. Except for a combined sewer overflow located in the western Brescia watershed that discharges into the Mella river, all the storm waters produced by these areas discharge into this secondary stream network as well as those produced by many settlements in the Franciacorta district, which improperly exploits the irrigation system as a final receipt of combined sewer systems. Moreover, the low risk perception has led to a significant impairment of the functionality of these canals. The streamflow is now constrained to a number of narrow culverts and bridges with low decks and large piers. In addition, urbanized canals are no longer maintained so that the riparian vegetation grows in an uncontrolled manner. The combination of the increase in the peak flow discharges and in the exposure, along with the decrease in the stream network conveyance capacity, has led to a dramatic increase in the flood risk. Flood episodes explained by the secondary-hydrographic-network insufficiency have been observed since the late '90s, evidencing an empirical frequency of occurrence far less than 20 years. The hazard mapping was thus conducted by considering return periods spanning from 5 to 20 years, which are significantly less than those conventionally required in Italy for a secondary stream network to be considered verified (20-50 years). Flooding-hazard map of the study area comparing present land cover and flooding-area extensions referring to return periods varying between 5 and 20 years; exposed residential areas are (1) Laorna and Gandovere streams, (2) La Canale and Solda streams, (3) southern Gandovere canal, and (7) Mandolossa canal, and exposed industrial and commercial settlements are (4) Mandolossa canal, (5) Laorna and Gandovere streams, (6) La Canale and Solda streams, and (8) southern Gandovere canal; the base map is the Lombardy Regional Technical Map (CTR), 1 : 5000, provided by the Lombardy Region (http://www.geoportale.regione. lombardia.it/, last access: 3 March 2020).
The hazard analysis was herein conducted by using a design event method. As demonstrated by Balistrocchi et al. (2013) in this climatic context, a design event method is capable of providing results comparable to those of more sophisticated continuous approaches if it is based on the Chicago synthetic hyetograph with a duration equal to double the catchment time of concentration. A classical leaf hydrologic model was developed in accordance with the subcatchment subdivision illustrated in Fig. 1. Extensive surveys of the stream cross sections and the inline structures were carried out to assess the actual conveyance capacities of the stream reaches. Flooding volumes were hence estimated by limiting the flood hydrographs generated through the hydrologic model to the overflow threshold discharges. Surveys of the historical flooding extensions that occurred during the last 3 decades addressed the delimitation of the flood-prone areas. The total amount of exposed urban area is about 160 ha (return period 5 years), 230 ha (return period 10 years), and 330 ha (return period 20 years). Most of such areas are devoted to industrial-commercial settlements (70 %), whereas the remaining part includes residential areas of medium to low density featuring similar fabric types (detached or semi-detached houses). Distinguishing between these two classes was found to be useful in decreasing the estimate uncertainty in population estimates. The resulting flood hazard map is reported in Fig. 2 along with the land cover, and it highlights the large number of residential fabrics, and industrial and commercial settlements potentially affected by flood events featuring low to medium return periods. An unacceptably high flood risk is therefore evidenced for the study area. Most of such areas were too dispersed or small to have suitable intersections with MPU grid cells. Therefore, they were grouped into four macro-areas referring to the individual canals and the land use that was classified in urban fabrics (red) and commercial-industrial settlements (dark green). As shown in Fig. 2, they are the Laorna and Gandovere streams confluence (areas 1 and 5), the La Canale and Solda streams (areas 2 and 6), the southern Gandovere canal (areas 3 and 7), and the Mandolossa canal in Roncadelle Municipality (areas 4 and 8).

Available mobile phone data
In this work we use mobile phone data provided by Telecom Italia Mobile (TIM), which is currently the largest mobile phone operator in Italy. According to the national economic newspaper, TIM's national share amounted to 30.2 % in December 2016 (Il Sole 24 ore). In our analysis, erlang measure data represent the average number of both calling and noncalling mobile phone SIMs (assigned to a certain cell of the considered grid in a certain quarter. Statistical research in the area of urban planning using erlang measure data is increasing: for example Carpita and Simonetto (2014) and Metulini and Carpita (2019a) studied the dynamics of people's presence at big events in the city of Brescia; Zanini et al. (2016) applied an independent component analysis (ICA) method for separating the city of Milan into a few main areas. Other works (Manfredini et al., 2015;Secchi et al., 2015) used erlang measure data to study the dynamics of people's presence in Milan.
In this study, reliable erlang measures of MPUs recorded by the TIM company are available. The investigated area is marked by a solid black line in Fig. 1 (WGS 84 UTM 32 N coordinates: 5 040 920-5 049 980 N, 585 970-592 970 E, area about 64 km 2 ) and is centred on the Mandolossa-Gandovere network. The area is covered by a georeferenced grid of square cells with 150 m sides, which provides the number of TIM users every 15 min. More precisely, for each cell of the grid and for each interval of time, the recorded data refer to an average measure of the number of mobile phones simultaneously connected to the network. For instance, Fig. 3 shows a detail of the spatiotemporal distribution of TIM MPUs that occurred on Wednesday, 18 November 2015, in a sample area of 20×20 = 400 cells near the Mandolossa inlet. Therein, exposed areas, obtained by intersecting the urban covers with the flooding areas, were also reported. Thus, Fig. 3 provides a sequence of snapshots of a dynamic map of human exposure to floods. As can be seen, the spatial distribution of raw data is realistic as major densities are observed along the main street network and in the urban areas. The temporal variability is also reasonable; for instance, lower densities are evidenced during nighttime in industrial sectors and main streets; see, for instance, the industrial settlement near the confluence of the La Canale stream in the Mandolossa canal (flooding area marked with 6 in Fig. 2). In these data, the information about the user mobility is hidden, meaning that one cannot trace the path followed by a single MPU over time. Measures are available for the period 2014-2016, even though after data inspection a more limited subset was found to be suitable for the analysis (from 1 July 2015 to 11 August 2016) due to data collection issues.
4 Analysis procedure application

Procedure parameterization
The application of the HOG procedure to reduce the dataset dimensionality was performed for each quarter of a day in T d , by dividing the full grid into nine smaller grids G i , i = 1, . . ., 9. The parameter √ S was thus set at 3. For each G i , gradients and direction were then computed, and the histogram of oriented gradients choosing k = 4 bins corresponding, respectively, to angles 0-45, 45-90, 90-135, and 135-180 • was obtained. In general, one can improve the recognition of the analysed grid by increasing the number of bins. This value was chosen in order to maximize k but, at the same time, to avoid the presence of HOG features with zero values. In each quarter, the extracted features are 3 2 · 4 = 36. Therefore, the dimensionality reduction is of the order of 400/36 ≈ 11. The final vector κ d for the sample area near the Mandolossa inlet (sample area evidenced in Fig. 3), with all the quarters stacked for the same day and for day d, contains 36 · 96 = 3456 features.
The hierarchical k-means cluster analysis, with days used as the objects of the cluster and the features of κ d used as cluster variables, was performed on a total number of 360 d (d = 360) from 1 July 2015 to 11 August 2016. After data inspection, only the days of the last available year (from 1 July 2015 to 11 August 2016) were included in the analysis since the first year (April 2014 to 30 June 2015) features some collection problems. In effect, a configuration with three clusters sharply separating the days of the first year (until June 2015) and the days of the second year (by July 2015) was estimated by performing the cluster analysis by using the full set of data. In the final sample all holidays were removed in Figure 3. Snapshots of a dynamic map showing the spatiotemporal distribution of mobile phone users (MPUs) that occurred on 18 November 2015 (Wednesday) in urban areas exposed to 10-year return period flooding; the base map is the Lombardy Regional Technical Map (CTR), 1 : 5000, provided by Lombardy Region (http://www.geoportale.regione.lombardia.it/, last access: 3 March 2020).
consideration of their specific characteristics with respect to typical days. More precisely, 15 August, 1 and 2 November, 8 December, 24 to 26 December, 31 December, 1 January and 6 January, 27 and 28 March (Easter), 25 April, 1 May, and 2 June were removed. In addition, those days where a large number of data (>10 %) were missing were removed as well. Conversely, data on those days where missing data were less than 10 % were maintained. A test for the possible presence of curse of dimensionality, based on the distribution of the distances among pairs of objects, was performed. A unimodal distribution that suggests the absence of such a problem was derived. On the whole, the number of suitable data appears to be sufficient to get reliable estimates of people exposed to the flooding risk in the study area.
The number of first-step clusters was chosen according to the relative decreasing trend of the total within the sum of squares with the group number increase. Figure 4 shows this trend, evidencing that a splitting into four clusters would decrease the total within the sum of squares by half with respect to a one-cluster splitting. Since this decrease appears to be satisfactory, the sample of days was split into H = 4 clusters, where, C 1 corresponds to all the days mostly occurring in July, August, and September (green spine plots shown in Fig. 5); C 2 corresponds to working days mostly occurring from February to June (blue spine plots shown in Fig. 5); C 3 corresponds to working days mostly occurring from October to January (red spine plots shown in Fig. 5) and C 4 corresponds to the weekends except for those included in cluster C 1 (yellow spine plots shown in Fig. 5).
Hence, MPU variability over time instants was accounted for by considering the Mandolossa's DDP of each day as a functional curve. Firstly, those days that have to be considered outliers were removed by using the curve outlier detection method (Febrero-Bande et al., 2008) separately for each first-step cluster. Secondly, it was evaluated whether days should be further grouped in terms of dissimilarity in the DDP functional curve dynamic. To do so, the assumption of independence of our functional data was tested by using Portmanteau (Gabrys and Kokoszka, 2007) and distance correlation (Székely and Rizzo, 2013) tests. Model-based functional data-clustering techniques (Bouveyron et al., 2015) suggest splitting the "summer" group into three sub-groups containing, respectively, the days of July, the days of August, and the days of September. This second-step splitting leads to Z = 6 final clusters, where C F,1 includes days of July; C F,2 includes Figure 5. Spine plots representing the first-step clustering of days along (a) months and (b) days of the week (green: all days mostly occurring in July, August, and September; blue: working days mostly occurring from February to June; red: working days mostly occurring from October to January; yellow: weekends mostly occurring from October to June). days of August; C F,3 includes days of September; and C F,4 , C F,5 , and C F,6 match, respectively, C 2 , C 3 , and C 4 .
Illustrations of DDPs for representative days by using functional box plots are reported in Fig. 6 (residential fabrics) and in Fig. 7 (commercial and industrial settlements) for each of the eight flooding areas shown in Fig. 2 for the 10-year return period. Although functional data clustering suggests splitting into six groups, for the sake of clarity, the summer months (July, August, and September) were combined in a single final cluster (cluster 1, C1) as well as all the working days from October to June in a single final group (cluster 2, C2) and the weekends from October to June (cluster 3, C3), thus leading to three clusters.
To extract the number of people in each quarter of each day from the grid's cells to the irregular polygon of each area (i.e. to find DDP ri,d for area r i ; i = 1, . . ., 8; and day d), the procedure described in Sect. 2.2 was applied. Hence, MPUs were firstly divided by a constant c = 0.85 in order to consider children (>12 years) and old people (>80 years), who likely do not have smartphones (i.e. about 85 % of the people are in the age range [12,80] in Brescia). Then, by estimating the median value of the market share ratio at the SC level by adopting the methodological strategy in Sect. 2.2, which amounts to about 20 %, the estimated DDP s for each area and for each day were derived by applying Eq. (5). The estimated market share is also consistent with that found by Carpita (2019). Estimated DDP ri,d underwent the functional box plot strategy separately for days d in the three clusters (with outliers excluded) and for r i corresponding to the eight areas illustrated in Fig. 2. Figure 6. Functional box plots of exposed people ("city users") inside residential areas: (a) Moie di Sotto (area 1 in Fig. 2), (b) Villaggio Badia and Fantasina (area 2 in Fig. 2), (c) southern Gandovere canal (area 3 in Fig. 2), (d) Roncadelle (area 7 in Fig. 2). Cluster 1 (July, August, September; C1), cluster 2 (working days from October to June; C2), cluster 3 (weekends from October to June; C3). Figures 6 and 7 show, respectively, the resulting functional box plots for residential and productive areas, where the estimated number of city users in different time instants is indicated. The three clusters of days are reported separately. Overall, the number of city users is lower during the first hours of the day and increases in the morning, reaching a peak during working hours (09:00-13:00 and 14:00-18:00 local time) both in residential and in productive areas. In the Moie di Sotto residential area located at the confluence of the Laorna stream and the Gandovere stream (flooding area 1 in Fig. 2), the number of people is estimated at about 200 during the first hours of the day and during the night and increases to about 250 during working hours (inhabitant density is about 25-30 ha −1 ). The dynamic, similar in all the three clusters, shows irrelevant differences among different periods of the year.

Results and discussion
In the Villaggio Badia residential area located north of the Mandolossa canal inlet (flooding area 2 in Fig. 2), the city user number varies between a minimum of 1200 people and a maximum of 1400 people during an average day (inhabitant density is about 30-35 ha −1 ). During the working days of the months from October to June (C2), the peak reaches 1600 users. Moreover, the descriptive bands appear to be wider in summer (C1) and on weekends (C3) as compared to cluster 2, where bands are narrower (i.e. lower variability between days).
Residential areas along the southern Gandovere canal (flooding area 3 in Fig. 2) are not very populated. Only 50-70 users are there during an average summer day (C1) or during the weekend (C3) (inhabitant density is about 18-23 ha −1 ). The number amounts to more than 80 people during working hours of working days (C2). The number of city users in Roncadelle's residential area located along the Mandolossa canal (flooding area 7 in Fig. 2) is less sensitive to working hours, especially during summer. In summer, city users vary from a minimum of about 600 up to a maximum of 700. There are about 800 city users during working hours of working days and weekends (days belonging to clusters C2 and C3).
Industrial and commercial settlements of Moie di Sotto (flooding area 5 in Fig. 2) feature 1000-1500 people (minimum during the night and first hours of the day, maximum during the working hours) in summer. These numbers increase up to about 1200-1800 on working days and to about 1100-1600 at weekends. This high density is mainly due to the presence of a commercial outlet of regional interest in this area. Industrial and commercial settlements along the La Canale and Solda stream network (flooding area 6 in Fig. 2) are very highly populated by city users, and the difference between the number of people in summer and on working days is significant. Daily minimum and maximum values are included between 2000 and 3000 in summer and between 2500 and 3500 during working days. Weekends follow a stable dynamic (about 2500 people throughout the entire day).
Flooding areas related to the southern Gandovere canal (flooding area 8 in Fig. 2) involve a productive area with an average number of users varying from 250 to 380 in summer and from 300 to 420 during working days. In the same way as Villaggio Badia (southern part of flooding area 2 in Fig. 2), the number of users during the weekends is stable, Figure 7. Functional box plots of exposed people ("city users") inside industrial-commercial settlements: (a) Moie di Sotto (area 5 in Fig. 2), (b) Villaggio Badia and Fantasina (area 6 in Fig. 2), (c) southern Gandovere canal (area 8 in Fig. 2), (d) Roncadelle (area 4 in Fig. 2). Cluster 1 (July, August, September; C1), cluster 2 (working days from October to June; C2), cluster 3 (weekends from October to June; C3).
and it stands at about 320-330. The industrial sector of Roncadelle (flooding area 4 in Fig. 2) is another highly populated area, featuring more than 2000 people during the day. On some particular working days and weekends, this number reaches about 3000 (dashed red lines of outliers in Fig. 7). In this area, a remarkable difference in the number of users between working days, summer days, and weekends was not detected.

Conclusions
In this paper a novel approach to the assessment of the risk related to human exposure to flooding based on a geostatistical analysis of erlang measures was proposed. Such a procedure takes advantage of data reduction (histogram of oriented gradients discussed in Sect. 2.1) in order to face the dimensionality curse issue. Its suitability and potential were demonstrated with regard to a suburban area located near Brescia (Lombardy, northern Italy), which is affected by widespread and frequent flooding. In Fig. 3 the possibility of expressing the spatiotemporal variability in exposed people by using time-variable maps was illustrated. These data feature high spatial resolution (150 m) and a short time step (15 ), thus providing reliable assessments even for the smallest analysed areas (about 4-5 ha) and a precise evaluation of the temporal dynamics. Indeed, daily density profiles can be derived according to this procedure. Then, these profiles can be clustered, yielding groups of similar daily time patterns. Clustering results are definitively meaningful since working days and weekends are acknowledged to show differ-ent temporal dynamics when they belong to working months (from October to June). Conversely, daily dynamics in summer months (July, August, and September; usually exploited for the longest holidays in Italy) must be regarded as different from the others. In addition, working days and weekends feature more similar daily density profiles during such months. As can be seen in Figs. 6 and 7, the daily temporal variability in people exposed to flooding can be assessed with respect to the day cluster and the type of urban areas (residential or industrial-commercial), both in terms of expected value (the median) and uncertainty (confidence band), thus providing comprehensive information to agencies and authorities devoted to the flood risk management.
The need to assess the entire population would theoretically require the gathering of a huge number of datasets from all the providers that operate in the area of interest. This issue would lead to a remarkable increase in the data collection cost and would be difficult to overcome. Nevertheless, census data make it possible to infer the total population from the users of a single provider by means of local estimates of its market share, as discussed in Sect. 2.2.
It is worth underlining that this statistical support, along with the high spatiotemporal resolution and the reliability of the raw data, makes the proposed procedure particularly appealing in order to decrease the errors in exposed-people estimates. Such a support is not provided by crowdsourcing techniques, which are based on voluntary data supplies and commonly rely on very limited datasets with respect to the number of exposed people. A second advantage that must not be disregarded lies in the possibility of exploiting dynamic exposure maps, or alternatively the clustered daily density profiles, by directly implementing them in emergency plans, regardless of the potential malfunctioning of the mobile phone connection during the flood episode. Conversely, crowdsourcing could be strongly compromised by the difficulties of connecting to the network during the emergency period. Indeed, dynamic exposure maps derived by mobile phone data have strong potential to substantially improve emergency plans so that real-time rescues, relief supplies, and traffic management could be better addressed.
Future developments of the geostatistical approach proposed in this paper could be addressed towards multiple items which deserve further in-depth analyses. First, complete mobile phone data feature more additional information than those available for this study. Actually, datasets include the collection of matrices of origin-destination (OD) vectors in different seasons, days of the week, and hour of the day, which are constructed by using the SIM identification numbers. Hence, it would be possible to track users down. By knowing the density of origin-destination vectors of the paths around critical traffic nodes, it would be possible to forecast potential critical conditions for mobility and better manage traffic in a more precise manner.
Second, coupling traffic-management-decision support systems with real-time rainfall-run-off-flooding modelling is also a research perspective being considered. Presently, the exploitation of mobile phone data in real time is problematic. Nevertheless, in the future, a more common use of 5G and GPS technologies in mobile devices will facilitate the realtime assessments of the spatial distribution of people. From a prevention perspective, this could make the identification of preferential traffic flows possible, thus evidencing potential risks during inundation onsets or emergency situations. Alternative safe pathways could be identified and communicated to exposed people in order to facilitate their evacuation. Third, it would be possible to profile the SIM users, even if they remain anonymous and their privacy is respected. Users could be categorized in order to isolate specific targets from the whole user set, and their behaviours could be statistically analysed separately from the others. Thus, a future development of the statistical matching procedure between mobile phone data and census data could use demographic and socio-economic information about the SC areas, for example the ISTAT ARCH.I.M.E.DE database (https: //www.istat.it/it/archivio/190365, last access: 1 July 2020). Since it is likely to assume heterogeneous behaviours of individuals, this database will be beneficial in future works in order to organize individuals into classes in terms of their age, gender, income, or their job. In fact, different mobile phone companies have different costs, and this may affect the choice of different classes of individuals. Fourth, exposed human behaviours and habits can significantly change after hydro-climatic alarms or during flood events since people would be aware of flood hazard and limit their exposure in flood-prone areas. Mobile phone data make the identification of anomalous exposed human mobility dur-ing alarms possible, while the geostatistical approach proposed herein provides a tool to analyse whether and the extent to which human behaviours are different from those of common days. Actually, a sample far larger than 2 years would be beneficial to make such statistics more reliable since alarms involve a few days in a year. It is worth noting that, in the analysed area, the risk perception towards the secondary network is almost absent as well as a capillary local warning system. Flood risk perception is mainly related to the primary hydrographic network (i.e. Mella river). Therefore, regarding the specific test case, in this research the possibility of drastic changes in human behaviour during heavy-rainfall alarms is not expected. Indeed, virtuous human behaviours are usually the result of extensive campaigns to raise public awareness against flood risk, coupled with trusted and effective warning systems. Hence, a final objective of the ongoing research will take into consideration the effectiveness of the non-structure practices that will be adopted to mitigate flood risk in the test watershed.
Data availability. Mobile phone data can be shared after motivated request due to restrictions enforced by the provider. Geographical data can be freely downloaded at http://www.geoportale.regione. lombardia.it/ (Lombardy Region, 2020). Hydrological-simulation outcomes and models cannot be shared at this time because they are part of ongoing research.
Author contributions. RR and MC delineated the research topic, guided its development, and supervised the writing of the paper. MB and RM performed the hydrologic analyses and the geostatistical analyses, respectively, and wrote the paper.
Competing interests. The authors declare that they have no conflict of interest.
Disclaimer. The content of this article does not reflect the official opinion of the Cariplo Foundation. Responsibility for the information and views expressed in this paper lies entirely with the authors. The research will continue after additional funding by the Lombardy Region (Mo.So.Re. project on sustainable and resilient mobility).