INSPIRE standards as a framework for artificial intelligence applications: a landslide example

This study presents a landslide susceptibility map using an artificial intelligence (AI) approach based on standards set by the INSPIRE (Infrastructure for Spatial Information in the European Community) framework. INSPIRE is a European Union spatial data infrastructure (SDI) initiative to standardize spatial data across borders to ensure interoperability for management of cross-border infrastructure and environmental issues. However, despite the theoretical effectiveness of the SDI, few real-world applications make use of INSPIRE standards. In this study, we show how INSPIRE standards enhance the interoperability of geospatial data and enable deeper knowledge development for their interpretation and explainability in AI applications. We designed an ontology of landslides, embedded with INSPIRE vocabularies, and then aligned geology, stream network, and land cover datasets covering the Veneto region of Italy to the standards. INSPIRE was formally extended to include an extensive landslide type code list, a landslide size code list, and the concept of landslide susceptibility to describe map application inputs and outputs. Using the terms in the ontology, we defined conceptual scientific models of areas likely to generate different types of landslides as well as map polygons representing the land surface. Both landslide models and map polygons were encoded as semantic networks and, by qualitative probabilistic comparison between the two, a similarity score was assigned. The score was then used as a proxy for landslide susceptibility and displayed in a web map application. The use of INSPIRE-standardized vocabularies in ontologies that express scientific models promotes the adoption of the standards across the European Union and globally. Further, this application facilitates the explanation of the generated results. We conclude that public and private organizations, within and outside the European Union, can enhance the value of their data by making them INSPIREcompliant for use in AI applications.

Abstract. This study presents a landslide susceptibility map using an artificial intelligence (AI) approach based on standards set by the INSPIRE (Infrastructure for Spatial Information in the European Community) framework. INSPIRE is a European Union spatial data infrastructure (SDI) initiative to standardize spatial data across borders to ensure interoperability for management of cross-border infrastructure and environmental issues. However, despite the theoretical effectiveness of the SDI, few real-world applications make use of INSPIRE standards. In this study, we show how INSPIRE standards enhance the interoperability of geospatial data and enable deeper knowledge development for their interpretation and explainability in AI applications. We designed an ontology of landslides, embedded with INSPIRE vocabularies, and then aligned geology, stream network, and land cover datasets covering the Veneto region of Italy to the standards. INSPIRE was formally extended to include an extensive landslide type code list, a landslide size code list, and the concept of landslide susceptibility to describe map application inputs and outputs. Using the terms in the ontology, we defined conceptual scientific models of areas likely to generate different types of landslides as well as map polygons representing the land surface. Both landslide models and map polygons were encoded as semantic networks and, by qualitative probabilistic comparison between the two, a similarity score was assigned. The score was then used as a proxy for landslide susceptibility and displayed in a web map application. The use of INSPIRE-standardized vocabularies in ontologies that express scientific models promotes the adoption of the standards across the European Union and globally. Further, this application facilitates the explanation of the generated results. We conclude that public and private organizations, within and outside the European Union, can enhance the value of their data by making them INSPIREcompliant for use in AI applications.

INSPIRE
Data accessibility and interoperability are key for multinational cross-border applications and fundamental for economic development (European Parliament and the Council, 2007). Different countries have different languages and data standards, hindering infrastructure planning, disaster risk reduction initiatives, and effective legislative implementation. To overcome these challenges, the European Union initiated INSPIRE (Infrastructure for Spatial Information in the European Community; Directive 2007/2/EC; European Parliament and the Council, 2007). INSPIRE is structured in 34 spatial data themes organized in three annexes. The themes span administrative (e.g. street addresses) and environmental domains (e.g. geology), and all EU countries are mandated by law to have implemented the data framework by 2021 (European Parliament and the Council, 2014). Each theme defines a data model and has adopted a set of vocabularies to populate interoperable datasets based on that data model. EU countries are aligning and serving INSPIRE data at a slow pace, and currently relatively few INSPIREcompliant datasets are available across Europe (Cho and Crompvoets, 2019). Conferences and competitions are cur-rently being organized to promote its implementation and to show the potential impact of real-world applications built on INSPIRE datasets (European Commission, 2019). This project was first presented at one of these conferences, the Inspire Helsinki 2019 data challenge under the "Let's make the most out of INSPIRE!" topic, where the project won first prize.

Artificial intelligence
Artificial intelligence (AI) studies "the synthesis and analysis of computational agents that act intelligently" (Poole and Mackworth, 2017). Part of acting intelligently is building models of the world that make predictions. Probabilistic predictions are the most useful ones for subsequent decision making and can be learned from data (Pearl, 1988). All models are based on human knowledge and data (observations of the world). For some problem domains, society has collected an overwhelming number of data, and still, useful human knowledge of the domain can be very vague. Machine learning has made great progress recently for such cases, particularly with deep learning (Goodfellow et al., 2016). However, for domains with relatively limited data but that are still very large in volume, human knowledge (which may be represented in computers through the use of ontologies) can complement the data to make useful predictions (Pearl, 1988). Many environmental problems do not have enough data (e.g. lack of extensive landslide databases) to be solved by deep learning but do have enough data to generate useful products when combined with human expertise (expressed in ontologies; Poole and Mackworth, 2017). The term artificial intelligence is commonly used to indicate only the machine learning part of the field, especially in the landslide literature (e.g. Dieu and Gjermundsen, 2020). In this paper we use the term "AI" in its broader connotation, which also includes the ontological method used in this paper. See below for the description of the method and definition of ontologies.

The need for standards, ontologies, and taxonomies
Consistent, well-defined vocabularies and data standards are essential in computer science applications, especially in AI. For data to have meaning and to combine multiple datasets, vocabularies must be consistent and clearly defined. Deep learning techniques require meanings for the inputs and the outputs, but the internal representations do not have welldefined meanings, making the models very opaque (Marcus, 2018). Other representations, such as logical and probabilistic representations, support internal reasoning using symbols with well-defined meanings, which lend themselves to use in explanations (Marcus and Davis, 2019).
Ontologies are "a specification of the meanings of the symbols in an information system" (Poole and Mackworth, 2017). In particular, an ontology defines the vocabulary for individuals and relationships within a knowledge domain.
Individuals may be concrete entities (e.g. a rock) or abstract concepts (e.g. numbers); relationships are properties that describe how individuals are connected. Typical examples of relationships include "is-a-kind-of", "is-part-of", "issuperclass-of", "has-some-property"; the ontology also defines axioms controlling the use of the vocabulary for logical and thematic consistency (Poole and Mackworth, 2017). Given these axioms, the vocabulary can be unambiguously interpreted according to the rules of symbolic logic, and implicit relationships between entities or instances of those entities can be inferred.
Vocabularies can be Aristotelian taxonomies, which are logically consistent and multi-hierarchical. Aristotelian taxonomies are constructed by defining concepts from their relation to a more general parent concept (genus) and using differentiating properties (differentia) to distinguish concepts within the same genus (Aristotle, 350 BC). For example, "slides in soil" and "slides in rock" share the same parent concept "slides", and they are differentiated by the property dealing with the material type, "soil" and "rock", which make them uniquely identifiable. Taxonomies based on Aristotelian definitions support multi-hierarchical knowledge networks and can be used by computers to make logical inferences (Poole et al., 2009;Smith, 2003). The term "multihierarchical" implies that there is more than one way to move through a taxonomy to arrive at a particular node or term. For example, the landslide taxonomy can be arranged based on different properties. If the landslide types are firstly arranged based on the type of movement and then based on the type of material, one path within the taxonomy would be landslide > slides > slides in rock and slides in soil. Alternatively, if the landslide types are arranged first based on the material type and then on the movement type, the path of the taxonomy would be landslide > landslides in rock > slides in rock and flows in rock. Both paths are valid, but they reach the same concept in different ways. The natural hazard classification code list extension for landslides presented in this paper was prepared using the open-access Aristotelian Class Editor (ACE) software (Minerva Intelligence, 2019d). Knowledge stored in a domain-specific ontology (e.g. geohazards) can be accessed by computers, allowing for data investigation through various AI techniques, including probabilistic matching between semantic networks, the technique used in this study.
Significant progress has been made in the development of taxonomies for geoscience information interchange by the International Union of Geological Sciences (IUGS ) Commission for the Management and Application of Geoscience Information (CGI) Geoscience Terminology Working Group, which produced the GeoSciML standard along with the Open Geospatial Consortium (OGC; CGI, 2003). However, ontology applications in earth sciences are scarce. Notable exceptions are in economic geology (Smyth et al., 2007), geohazards (Jackson Jr et al., 2008), and disaster risk reduction domains (Phengsuwan et al., 2019;Sermet and Demir, 2019).
The INSPIRE framework, through its standardized vocabularies (code lists), provides a necessary foundation upon which AI applications with explainable output can be constructed. INSPIRE application examples in landslide studies include the LAND-deFeND Italian landslide database structure (Napolitano et al., 2018) and a deep learning algorithm to map landslide susceptibility (Hajimoradlou et al., 2020). In the implementation of deep learning by Hajimoradlou et al. (2020), training features were labelled with INSPIREcompliant semantics to enable reproducibility of the experiment by other researchers.
In this study, we present an AI-based landslide susceptibility application using a natural hazard ontology. We do so by building from the ontology created by Jackson Jr et al. (2008) and by embedding INSPIRE code lists wherever possible as well as aligning input and output data to the INSPIRE data standards.

Landslide susceptibility and hazard
Landslide susceptibility is defined as the relative spatial probability of occurrence for a landslide based on the intrinsic properties of a site (SafeLand, 2011). The concept of susceptibility differs from hazard in that the temporal probability of occurrence, the triggering factors, and the magnitude of the event are not considered in the definition of a susceptibility map (SafeLand, 2011;Van Den Eeckhaut and Hervás, 2012). To produce landslide susceptibility maps, three approaches are usually applied: statistical, physical, and expertbased (SafeLand, 2011). Statistical methods rely on the analysis of landslide databases and their relation to landscape properties (see review by Reichenbach et al., 2018), physical methods calculate the limit equilibrium between failureresisting and failure-driving forces in slopes (e.g. Baum et al., 2008), and expert-based methods rely on expert opinion and the assumption that influencing factors are known and are specified in the models (Dai et al., 2002). The AI approach used in this study is an example of the expert-based approach as the models follow rules that represent the reasoning process of a landslide expert, providing semi-quantitative susceptibility maps.

Methods
Figure 1 outlines the methodological workflow followed in this study to produce explainable landslide susceptibility assessments in the Veneto region of Italy. We extended IN-SPIRE (Sect. 2.1); we constructed an ontology (Sect. 2.2); and we defined expert models (Sect. 2.2.1) and instances, represented by mapping polygons (Sect. 2.2.2). We then compared the similarity of models and instances to produce a matching score, which is used as a susceptibility indicator (Sect. 2.2.3). Finally, the results are delivered in an interactive web map (Sect. 2.2.4). The workflow followed in this study and corresponding method sections. We extended INSPIRE, defined an ontology, expert models and mapping instances. We compared models and instances to deliver a susceptibility map which is available online.

INSPIRE extension
Technical guideline documents outline the data structure for each theme within the INSPIRE directive, its encoding rules, its metadata standards, and some of its use cases. Data structures are formally represented using Unified Modeling Language (UML), modelling thematic entities as feature types, defining properties for each feature type, and characterizing relationships between feature types. Where applicable, standardized vocabularies are adopted for property value ranges. INSPIRE themes can be understood as an ontology (See Sect. 2.2 below) by defining various entities and the relationships between them.
INSPIRE data models are implemented as Geography Markup Language (GML) application schemas (https://inspire.ec.europa.eu/XML-Schemas/ Data-Specifications/2892, last access: 26 October 2020) and serialized using Extensible Markup Language (XML). This enables data distribution provided as Open Geospatial Consortium (OGC)-compliant web services. Geospatial features are located using vector-based spatial data. Feature properties have value types (e.g. geometry for vector datasets); properties whose value ranges are controlled vocabularies have values implemented as code lists. Code lists incorporate vocabularies developed outside of INSPIRE (e.g. IUGS CGI rock type taxonomy). Some code lists within INSPIRE are not extensible, some are extensible with narrower values, and some allow additional values at any level. Code list values, definitions, and hierarchical structures are stored in the INSPIRE registry (https://inspire.ec.europa.eu/codelist, last access: 26 October 2020), making them accessible to and reusable by anyone. INSPIRE schemas can also be extended to include additional concepts and/or feature types. For this project, we worked with four INSPIRE themes: Geology, Land Cover, Hydrography, and Natural Risk Zones. The Natural Risk Zone application schema was not fully adequate for this application as it lacked the "landslide susceptibility" concept and "landslide type" code lists (Tomas et al., 2015). We addressed this issue by formally extending the Natural Risk Zone schema and the natural hazard code list.

Ontologically grounded probabilistic matching
The method used to produce INSPIRE-based landslide susceptibility maps, uses qualitative probabilistic reasoning that incorporates expert knowledge, making qualitative predictions based on comparisons between models and instances (e.g. Sharma et al., 2010;Smyth et al., 2007;Poole and Smyth, 2005;Smyth and Poole, 2004). A model is a set of rules defined a priori by an expert, based on the scientific literature, making use of the entities and properties defined in the ontology. These models aim to represent expert conceptualized descriptions of a given phenomenon or entity (e.g. landslide susceptibility). The properties used in a model description are concepts stored in the ontology, along with frequency terms (e.g. soil slide -has slope -moderately steep -always). Frequency terms used in this study are "always", "usually", "sometimes", "rarely", and "never". These terms were chosen as they express experience-based judgements that geoscience practitioners may use in field assessments. The term "never" allows the system to explicitly deal with negation (e.g. soil slide -has surficial material -bedrocknever). The properties and the frequency terms are encoded in semantic triple format (W3C Working Group, 2014), and the resulting model is a semantic network. Semantic networks are a graph representation of knowledge, where nodes are concepts, and edges are the semantic relation between concepts (Shapiro, 1992); see Fig. 2 for example. Real-world areas on the ground (map units -more generally referred to as "instances") are also described by semantic networks using the same properties stored in the ontology, but triples are accompanied by Boolean qualifiers to represent presence or absence of a specific property (e.g. polygon -has slopesteep -present). Comparisons, referred to as matches, between instances and models are possible because models and instances all use the same structured terminology, as controlled by the ontology.
Similarity scores are awarded based on the type of match between instance and model properties, the semantic distance in the taxonomy of compared property values, and the model property frequency term (Fig. 2). Match types include "exact", "a kind of (AKO) exact", and "possible". An exact match indicates that the property value term used in the model is present in the instance (in Fig. 2a), in which case a full score is awarded for this component of the compared semantic networks. An AKO exact match indicates that the property value term found in the instance is a kind of the property value term found in the model (in Fig. 2b), in which case a full score is also awarded. A "possible" match occurs when the property value term in the instance is broader than the property value term in the model, based on the defined taxonomies, in which case the score is divided by the semantic distance between the two terms. For example, "forest" is a more specific type of "forest and semi-natural areas" (in Fig. 2c) and results in the score being divided by 2. The score is lower because the instance is only possibly the kind of value that the model is looking for.
In this study, an exact match or an AKO exact match of a property with frequency "always" scores 10 000, "usually" scores 9000, "sometimes" scores 1000, "rarely" scores "100", and "never" scores −10 000; unmatched attributes are awarded −10 points. These scores are an arbitrary representation of the degree of surprise that uses order-of-magnitude numbers to distinguish qualitative measures. For an extensive review of the probabilistic comparison method, see Smyth and Poole (2004), Poole and Smyth (2005), Smyth et al. (2007), and Sharma et al. (2010). This approach has been applied in economic geology to generate mineral deposit exploration targets (Smyth et al., 2007) and in geohazard mapping to produce landslide susceptibility maps (Jackson Jr et al., 2008).

Landslide models
This paper presents an AI expert-based landslide susceptibility map for three different landslide types: debris flows, slides in soil, and slides in rock for the Veneto region of Italy. These three landslide types are conceptualizations of landslide models defined using knowledge recorded in the scientific literature. These landslide models are intended to be proof of concept of models that can be used in the semantic approach proposed in this paper. In particular, some of the properties used in the models are drafted from literature analysis of logging-related landslides in British Columbia, Canada (Jackson Jr, 2019). Here we briefly summarize the models; detailed explanations of each property-value-frequency combination are provided in Appendix C.
The "Debris Flow" model describes the channels that may generate a debris flow. Debris flows are flow-like landslides generated when saturated sediments move down a steep channel. They can be originated when a slide in soil intersects a flowing body of water or when saturated bed sediments are mobilized and begin flowing downstream. Debris flows are usually triggered by intense and persistent rainfall (Hungr et al., 2014). To visualize the Debris Flow model, see the table in Appendix C or navigate to https://italy. minervageo.com/debris-flow-model/ (last access: 26 October 2020).
The "Slides in Rock" model describes slopes that may generate slides in rock. Slides in rock form when steep Figure 2. Graphical representation of the matching process between expert-defined models and map polygon instances. Panel (a) is an example of an exact match between the property value "colluvium"; (b) is an example of a kind of (AKO) exact match because "gully erosion" is a more specific kind of "erosional process". The model is looking for an "erosional process" and found a "gully erosion"; (c) is an example of a possible exact match because "forest and semi-natural areas" is a broader concept of "forest". The model is looking for "forest", but we do not know whether the instance is a "forest". We only know that the instance is "forest and semi-natural areas". The vocabulary and the hierarchy are controlled by the ontology. Note that frequency terms for model properties are not shown in this figure.
rock slopes and cliffs fail under the influence of gravity and are commonly triggered by intense rainfall or earthquakes. Slides in rock are usually very fast, and the failure can occur along planar, curved, and/or multiple surfaces. This model represents the collective class of landslides that have as material "rock" and as movement type "slide", including rotational, planar, compound, wedge, and irregular slides in rock (Hungr et al., 2014). Given the regional scale of this study, we do not have the data resolution to determine the possible failure plane geometry. For example, we cannot identify slopes more susceptible to planar rock slides than to rotational rock slides. To visualize the Slides in Rock model, see the table in Appendix C or navigate to https://italy. minervageo.com/the-roberti-slides-in-rock-model/ (last access: 26 October 2020).
The "Slides in Soil" model describes slopes that may generate slides in soil. Slides in soil are downslope movements of soil under the influence of gravity, commonly triggered by intense rainfall or earthquakes. They can be slow or fast, and the failure can occur along one or many planar or curved surfaces (Hungr et al., 2014). With slides in soil, we refer to the collective class representing all landslides that have as material "soil" and as movement type "slide", including rotational, planar, and compound clay, silt, sand, gravel, and debris slides. Given the regional scale of this study, we do not have the data resolution to determine the possible failure plane geometry and the specific kind of soil that is involved in the failure. To visualize the Slides in Soil model, see the table in Appendix C or navigate to https://italy.minervageo. com/slides-in-soil/ (last access: 26 October 2020).
In the presence of higher-resolution information such as rock bedding orientation or shear geometry and stratigraphy in soil masses, susceptibility to specific kinds of rock slides (e.g. planar vs rotational) or different kinds of slides in soil (e.g. clay compound slide vs. clay planar slide) may be mapped.

Map polygon instances
The definition of the mapping unit is a critical step in any landslide susceptibility mapping application, and there are many different approaches to subdividing the area of interest to identify areas susceptible to slides in soil or rock (see review by Guzzetti et al., 1999). For this study, we used slope units, which are a geomorphic representation of single slopes bounded by drainage and divide lines (Guzzetti et al., 1999), as mapping units. We used the r.slopeunits software to automate the slope unit delineation (Alvioli et al., 2016(Alvioli et al., , 2020. We used stream line vector shape files provided by the Veneto Regional Government, buffered by a distance of 5 m as mapping units to map debris flow susceptibility. In total, the region of Veneto was subdivided into 93 262 polygons, of which 9302 are stream buffer polygons, and 83 960 are slope unit polygons. We used a spatial overlay analysis to aggregate data describing the physical properties of the mapping units. The analysis aggregated the properties from all features that intersect the mapping units. For each property in an input layer, an aggregation type is specified as either (a) list, whereby all of the intersecting properties are concatenated into the map- ping unit (e.g. multiple rock types), or (b) Boolean evaluation, which checks whether or not the mapping unit was intersected by a specific input feature (e.g. a fault). The properties describing each mapping unit polygon were converted into semantic networks, one network for each polygon. This conversion allows for semantic reasoning to compare and rank, based on similarity, the mapping units (hereon instances) against the expert-defined landslide models to evaluate landslide susceptibility.

Matching, susceptibility, and run-out
The similarity score between a given model and instance is used as a proxy of landslide susceptibility. A high similarity score between an instance and a landslide susceptibility model signals a high susceptibility to that type of landslide. We deliver the similarity score between models and instances as susceptibility on the output maps.
After the susceptibility assessment, a first-order estimate of hazard is provided by calculating the likely extent of landslide run-out for the most susceptible (99.9th percentile score, i.e. top 1 in 1000) instances for each model. Various physical methods have been developed to calculate potential landslide run-out given the physical properties of the material and the topography (see review by McDougall, 2016). To compute the potential run-out extents, we applied the r.avaflow code (Mergili et al., 2017), which is an open-source software package implementing the two-phase debris flow model developed by Pudasaini (2012). Physical model parameters for "Slides in Rock" are inferred from the backcalculations of the recent Mt. Joffre landslide, in British Columbia, Canada (Friele et al., 2020); "Slides in Soil" and "Debris Flow" parameters use the default r.avaflow parameters for those landslide types (Table 1).
Various landslide size classes were simulated for each map instance, ranging from class 4 to class 6 (Jakob, 2005). Classes 4 to 6 were chosen to provide a preliminary hazard assessment, where a class 4 event may have an approximate return interval of hundreds of years, and class 6 events are very unlikely and extreme events with return intervals on the order of thousands of years (Jakob, 2005).

Web map
This study's landslide susceptibility maps and hypothetical landslide run-outs for slides in soil, slides in rock, and debris flows are delivered as an interactive web map based on OpenLayers (MetaCarta, 2005) and React (Facebook, 2013 (ISA, 2016). The registry service is packaged within a collection of Docker (Hykes, 2013) containers and hosted on a local server.
The Natural Risk Zone core (NZ-core) schema extension, which includes the Natural Risk Zone Susceptibility feature type, was based on SafeLand recommendations (SafeLand, 2011). The Natural Hazard classification code list was extended (Minerva Intelligence, 2019b) to include a classification of various landslide types using the updated Varnes landslide classification (Hungr et al., 2014), which is a landslide classification widely adopted within the scientific community, and a new code list of landslide size classes (Minerva Intelligence, 2019c) based on Jakob (2005). The landslide size code list contains 10 landslide size classes based on landslide volume and descriptions of approximate damage potential.

Code list extension
The Natural Hazard classification code list extension for landslides considers material type and failure movement, splitting the tree first on type of movement and then on type of material following Hungr et al. (2014) (Fig. 3). Other properties, such as water content, depth of failure, rate of movement, loading state, channelized state, and failure plane geometry (see Appendix B), are used to describe the individual landslide types as the unique combination of these properties allows for unambiguous classification in an Aristotelian taxonomy. We used these properties because, even if not shown in the final taxonomic tree, they are explicitly applied in the wordy description of landslide types by Hungr et al. (2014).
The formal extension registration process via the INSPIRE registry software does not enable the representation of such multi-hierarchical classifications. Because of this we had to work with a single tree hierarchy and consequently chose to first divide the classes based on type of failure followed by a division based on the type of movement (Fig. 3).

Schema extension: susceptibility
The INSPIRE Natural Risk Zone schema includes hazard and risk feature types, but the concept of susceptibility as a feature type is missing. To overcome this problem, we extended the INSPIRE Natural Risk Zone core XML schema, adding a Natural Risk Zone Susceptibility schema (Minerva Intelligence, 2019e). The Natural Risk Zone Susceptibility schema includes abstract susceptibility area and susceptibility area feature types (Fig. 4). The susceptibility area feature type is modelled following the structure of the hazard area and risk zone feature types in the INSPIRE Natural Risk Zone core schema. Susceptibility area has three elements: Geometry, Influencing Factor, and Relative Spatial Likelihood of Occurrence (Fig. 4). Geometry, as with all INSPIRE vector datasets, is the geometric representation of the extent of the feature on the earth's surface as a spatial feature. Influencing Factors are defined as the intrinsic, preparatory variables which make an area susceptible to a hazard (Safe-Land, 2011). Influencing Factors are unbounded in multiplicity (i.e. can be as many as needed) and can be defined qualitatively or quantitatively. Qualitative Influencing Factors are expressed as a string, while quantitative Influencing Factors are expressed as GML:MeasureType (Fig. 4). Whether defined quantitatively or qualitatively, the Influencing Factor can also define a DataSetType attribute, such as slope or air quality. Influencing Factors are used in the calculation of Relative Spatial Likelihood of Occurrence, which is an element that can be quantitatively or qualitatively defined (Fig. 4). The Relative Spatial Likelihood of Occurrence refers to values that represent the spatial probability of occurrence of a specific hazard type given the influencing factors present in the area (SafeLand, 2011). The Influencing Factor element allows end users of susceptibility area datasets to understand which known conditions of the specific area led to the resultant susceptibility.  Figure 5 shows how different tools in Hale Studio are used to align properties from the source dataset to the target dataset. For example, the field "eta" -"age" in Italian -of the original Veneto dataset was directly mapped to four different INSPIRE fields: the olderNamedAge.href and title and the youngerNamedAge.href and title. Note that older-NamedAge.href and youngerNamedAge.href are hyperlinks to the code list value ID, and the title is the actual code list term from the GeochronologicEraValue code list. This alignment is done with many classification methods, including Groovy scripts, formatted strings, and assign-alignment tools. For further explanation on term alignments, refer to the documentation of Hale Studio (WeTransform, 2008). Datasets used that were not compliant with INSPIRE include lakes, watersheds, permafrost, fire, slope angle, faults, soil, roads, and railways (Table 3).

Web map
The 83 960 slope units and 9302 stream buffer instances (Fig. 6) were encoded with the available data, then transformed from vector files into semantic network format. Then, each polygon was matched against the expert-based Slides in Soil, Slides in Rock, and Debris Flow models and colour-coded according to matching-score percentile to portray landslide susceptibility (Fig. 6). The left-side panel of the web map shows the landslide model layers, the reference layers, and different base maps (Fig. 7). By clicking on a polygon (instance), a pop-up window opens (Fig. 7): this window contains the name and hyperlink to the IN-SPIRE registry code list definition of the landslide type investigated, the attributes that are present in the mapping unit (e.g. bedrock lithology, erosional process, etc.), the instance percentile rank and total match score, the hyperlink to the comparison of the instance against other landslide models (e.g. the Slides in Rock model), and (only for the 99.9th percentile score, top 1 in 1000) buttons to turn on the display of landslide run-out for different landslide classes as well as the hyperlink to the match report.
The match report is a detailed table showing the results from the model instance semantic matching, ensuring the explainability of the results. Each line corresponds to a property-value-frequency term (e.g. has slope -moderately steep -always) comparison between the model and the instance, how they match (with a hyperlink to textual explanation on how the score was awarded), the numerical score value (see Table 4 for example), a textual explanation on why that attribute was chosen, and the original data value (Table 5). An "advice" button opening textual advice expressing which of the instance's unmatched attributes may change the score is available. This advice is a sort of data advice: it invites the user to check in the field or in some other databases if, for example, a fault is present in that specific instance.

INSPIRE as a framework for explainable AI
Across society, the use of numerous complex and nonstandardized earth science taxonomies results in interoperability limitations, which hinder the widespread implementation of explainable AI solutions to natural-hazard-related problems. This is evident in the landslide domain, where data layers for landslide susceptibility analysis, ranging from landslide databases (Van Den Eeckhaut et al., 2013) to geomorphology maps, vary across regions and countries. Conse-quently, despite the wealth of scientific literature on landslides in general and landslide susceptibility in particular (Reichenbach et al., 2018), broad-scale operational landslide hazard management systems are scarce , resulting in significant human and economic losses (Froude and Petley, 2018).
INSPIRE partially addresses this problem by providing standardized data structures for data hosting and standard terminology to use within those structures. This study illustrates that, once INSPIRE-compliant, European data can be subjected to analytical methods that can be applied for practical application to multiple other equivalent INSPIRE-       By maintaining carefully curated standards, INSPIRE can play a critical role in AI applications that seek to be "explainable" (Gilpin et al., 2019). Its code lists can be mapped into ontology properties, enabling machines to make inferences of semantic and hierarchic relations based on data. The explainability in the application presented in this study is provided in the form of a comprehensive match report, which can be opened via an information pop-up for each instance. The match report provides the user with complete access to the logic that drives the AI reasoning engine, allowing interrogation of the results displayed on the map. By embedding explanations in a user-friendly interface, ontologically based AI can improve the understanding of complex geospatial problems by decision makers, insurance companies, and the general public.
Public and private organizations, within and outside the European Union, can significantly enhance the value of the data they collect and publish by using INSPIRE-compliant standards not only in natural hazard mapping but also in other domains. A comparative study of regional spatial data infrastructure (SDI) in the context of INSPIRE implementation (Craglia and Campagna, 2010) showed that inefficient data access and use at the European level results in annual economic losses in the EUR 100-200 million range. The same study shows that the regional SDI of Lombardy, Italy, allowed savings of EUR 3 million per year to companies working in environmental impact assessments (EIAs) and strategic environmental assessments (SEAs). Savings in the same order of magnitude can be expected by adopting INSPIRE standards in the domain of geological-hazard assessment.

INSPIRE extension and limitations
INSPIRE-compliant datasets are still rare across European countries in general and in Italy in particular (Cetl et al., 2017;Mijić and Bartha, 2018;Cho and Crompvoets, 2019). Consequently, we were unable to identify a jurisdiction in Europe with INSPIRE-compliant datasets for all the inputs necessary for this study. Therefore, instead of using alreadycompliant data, a region optimal for demonstrating the interrelationship between INSPIRE and explainable AI was chosen, and some of the data for that region was made INSPIREcompliant. In doing so, the study provides both a case study of dealing with non-INSPIRE-compliant data and an illustration of the rewards achievable by making a coherent set of data INSPIRE-compliant.
The code lists and application schemas in the INSPIRE Natural Risk Zone theme lacked the level of detail necessary for this application. This is understandable as, given the broad scope of the directive, schemas lack the necessary granularity for specific applications. INSPIRE is intended to be used as an overarching umbrella under which domain-specific applications can find their place by extending it where necessary. The Natural Risk Zone theme (Tomas et al., 2015) and the extension presented in this work are an example of using this extension facility. Within the Natural Risk Zone theme, the Natural Hazard category value code list includes geological and hydrological hazards, including "flood" and "landslide", but the different subclasses of floods and landslides are not specified. For this kind of landslide susceptibility assessment, the clear definition of landslide types, landslide size classes, and susceptibility was fundamental. For example, a debris flow, which moves rapidly (metres per second), and an earth flow, which may move slowly (metres per year), present entirely different hazards; they can both destroy property, but it is unlikely for an earth flow to result in fatalities, while the opposite can be said of debris flows (Hungr et al., 2014). The definition of landslide sizes is also important: a size class 1 debris flow has a smaller impact area than a size class 6 event, but, by having a higher frequency, it may result in greater losses (Jakob, 2005).
From a data structure perspective, INSPIRE code lists cannot currently host multi-hierarchical taxonomies. This limits the nature of reasoning that can be brought to bear on them. We understand the technical difficulties in handling multihierarchical taxonomies but hope that future versions of the registry software will be able to handle these complex knowledge representations.
The INSPIRE Natural Risk Zone theme also lacks the definition of susceptibility as a concept and feature type. The term susceptibility is not implemented as a feature type because for most hazards (e.g. floods and earthquakes), the concept is embedded within the concept of hazard likelihood (Tomas et al., 2015). This does not apply in the landslide domain, where susceptibility and hazard are distinct concepts (e.g. Van Den Eeckhaut and Hervás, 2012). In this study, we implemented the susceptibility feature type. Although we applied this feature type in the landslide domain, it will be useful for other natural hazard applications, when the spatial likelihood of hazard occurrence must be expressed separately from the general concept of hazard likelihood.
The extensibility of INSPIRE allows for domain-specific applications, like the approach presented in this paper, to fit within the INSPIRE framework. However, problems may also arise from the fact that INSPIRE is extensible. Extensibility allows greater precision in terminology and schema for a specific application, but this allows different public and private institutions to implement separate and eventually incompatible extensions. For example, another landslide classification may be implemented by another institution: this implementation may not be interoperable with the one presented in this study but will have the same INSPIRE compliance, leading to two conflicting standards. Much work remains at the level of thematic clusters to implement as many standardized vocabularies and schemas as possible. Our extension is open and free, and we hope that other entities will adopt it for other landslide applications.

Ontological probabilistic matching for landslide susceptibility mapping
The semantic AI system applied in this study aimed to replicate the reasoning with uncertainties typical of geological assessments, applying the terminology that geological and geotechnical professionals use in their daily practice (Smyth et al., 2007). Since they are based on expert-defined models, the landslide susceptibility maps produced in this study are comparable to qualitative heuristic assessments (Safe-Land, 2011). The choice of using a qualitative method for landslide susceptibility assessment is in contrast with recent recommendations for the application of quantitative methods (Corominas et al., 2014). However, in current professional geological assessments and geomorphological mapping applications, expert judgement is still widely applied (e.g. Association of Professional Engineers and Geoscientists of British Columbia, 2010; Guzzetti et al., 2012), and quantitative (statistically and physically based) methods rely on data that are not always available or are of unknown quality. For example, landslide databases necessary for statistically based susceptibility mapping are often incomplete, inaccurate, and geographically limited (Guzzetti et al., 2012). Further, the geotechnical parameters necessary for running physical models are usually approximated to carry out regional-scale studies (e.g. Mergili et al., 2014). The semantic AI system applied in this study can be used in cases of data scarcity and, if coupled with numerical methods, can improve the explainability of predictions. For example, by embedding the ontology concepts related to statistical parameters (e.g. receiving operating curves, confidence intervals) or physical parameters (e.g. friction angles, viscosity), it will be possible for the numerical outputs of quantitative methods to be explained in natural language, helping to reduce the gap between scientists and decision makers (Newman et al., 2017).
The main goal of this paper is not to present the semantic matching approach but to show an example of how to modify INSPIRE to make it possible to use for landslidespecific applications. By suggesting these landslide-specific schema and code list extensions, we lay the foundation for INSPIRE-compliant landslide susceptibility studies. Other organizations can build on top of these extensions, and future landslide susceptibility applications can be compared as they formally refer to the same data structure and semantics. Note that we neither force any specific data and modelling variable selection nor modelling approach for a landslide susceptibility, hazard, or risk calculation. Such an effort is beyond the scope of this paper and, to some extent, has already been addressed by the SafeLand project (e.g. Safe-Land, 2011); rather, we provide the data structure and semantics to store and share whichever method has been chosen by the modeller. For example, data selection for calculation of landslide susceptibility is encompassed in the schema structure under "Influencing Factor", which is "unbounded in multiplicity and can be defined qualitatively or quantitatively", leaving a broad range of possibilities to the modeller.
Regarding the data quality, it is discussed in the Natural Risk Zone schema, and it refers to ISO standards (INSPIRE Thematic Working Group Natural Risk Zones, 2013). However, we recognize that specific code lists (semantics) dealing with data quality and model uncertainty are missing. We hope that the INSPIRE thematic group will address this point.
This study presents an AI method, based on semantic network comparison, to produce landslide susceptibility maps using an ontology and standardized taxonomies within the framework provided by the INSPIRE Natural Risk Zone theme. This method does not need an accurate landslide inventory to make predictions as it uses qualitative probabilistic reasoning that incorporates expert knowledge. We produced susceptibility maps for debris flow, slides in soil, and slides in rock for the province of Veneto, Italy. To produce the maps for specific landslide types, we extended the Natural Risk Zone theme to encompass both the concept of susceptibility and the different types of landslides. In particular, we registered a landslide classification extension of the Natural Hazard category code list, a landslide size class code list, and susceptibility area and abstract susceptibility area feature type schema extensions. After defining the extension, we aligned key input layers (geology, streams, and land cover) to INSPIRE and, by using an ontologically grounded probabilistic matching algorithm, we produced the landslide susceptibility layers. The processing outputs were mapped to the Natural Risk Zone Susceptibility schema extension. Then, potential impact zones of landslides for multiple landslide size classes were physically modelled for a subset of the instances with the highest susceptibility scores. Finally, the results were presented in a user-friendly interface, embedding plain-language explanations on how the score was assigned and advising on how to improve the matching.
We have demonstrated the value of INSPIRE compliance by showing how it enhances information and knowledge interoperability and allows for explainability in AI applications by standardized interrogation of their inputs and outputs. Ontologies provide the formal structure for INSPIRE code lists to run algorithms similar to that applied here. The maps can explain the scientific results that they portray, and consequently improve the understanding of complex geospatial problems not only by domain experts but also by decision makers and other non-specialized interested parties.
This study also illustrates that, in their current state of development, the INSPIRE standards are not sufficiently expressive to support complex landslide susceptibility mapping. We provided an example of how INSPIRE's extension capabilities may be implemented to add the required expressivity. Through its Re3gistry register, this extension framework ensures that the expressivity extensions are documented and available to all interested parties for reuse. In doing so, it sets the context for the ongoing refinement of standards by the INSPIRE thematic committees.
Appendix A: Dictionary of terms Term Description Code list A dataset specifying terms for populating INSPIRE properties that require controlled vocabulary. CLC CORINE Land Cover, a classification system for land cover based on vegetation and land use. Feature type A data type representing a thematic entity in a domain of interest, typically with some geospatial location specified by vector-based spatial data. IFFI Italian Landslide Inventory. Instance A data item that represents an individual, specific real-world entity; for this application an instance is a spatial feature, either a slope unit polygon or a stream buffer polygon.

Model
A conceptualization of the entities, properties, and relationships in some domain of interest, in this case landslides. Three landslide models were used in this project: Debris Flow, Slides in Soil, and Slides in Rock. Ontology A formal representation of a conceptualization of the entities, properties, relationships, and rules describing the relation between the entities in some domain of interest. Semantic network A graph network of arcs and nodes that represent concepts in a domain of interest. Schema A representation of a data model describing the structure of a data theme. Slope unit A map unit polygon that is derived from the digital elevation model, defined by hydrologic drainage and divide lines. Taxonomy Hierarchical classification scheme based on shared characteristics between entities. Triple A semantic triple is a subject-object-predicate expression that asserts a fact, and it is the basic unit of a semantic network.  (Hungr et al., 2014) Fall A fall starts with the detachment of soil or rock from a steep slope along a surface on which little or no shear displacement takes place. The material then descends largely through the air by falling, saltation, or rolling (Cruden and Couture, 2011). Topple A topple is the forward rotation of material about a point or axis below the centre of gravity of the displaced mass (Cruden and Couture, 2011). Slide A slide is a downslope movement occurring dominantly on surfaces of rupture or relatively thin zones of intense shear strain (Cruden and Couture, 2011). Spread Spread is an extension of mass combined with a general subsidence of an upper fractured mass of material into softer underlying material (Cruden and Couture, 2011). Flow A flow is a spatially continuous movement in which surfaces of shear are short-lived, closely spaced, and not usually preserved (Cruden and Couture, 2011). Slope deformation Slow, sometimes unmeasurable deformation of slopes (Hungr et al., 2014).

Mud
Plastic, unsorted, and close-to-liquid-limit material. CL, CH, and CM unified soil classes (Hungr et al., 2014). Clay Plastic, can be modelled into standard thread when moist, has dry strength. GC, SC, CL, MH, CH, OL, and OH unified soil classes (Hungr et al., 2014). Sensitive Sensitive or quick clay is a special type of clay prone to sudden strength loss upon disturbance. From a relatively stiff material in the undisturbed condition, an imposed stress can turn such clay into a liquid gel (Geertsema, 2013).

Soft
Easily molded with fingers. Point of geologic pick easily pushed into shaft of handle. Easily penetrated several centimetres by thumb (Hungr et al., 2014;USDA, 2012).

Stiff
Indented by thumb with great effort. Point of geologic pick can be pushed in up to 1 cm. Very difficult to mold with fingers. Just penetrated with hand spade (Hungr et al., 2014;USDA, 2012 (Howes and Kenk, 1997) Hummocky topography may be indicator of landslide debris. Has water River/stream Always (Howes and Kenk, 1997) Debris flows occur periodically on established paths, usually gullies and first-or second-order streams.

Has rainfall
Mild rainfall Rarely (Friele, 2012;Segoni et al., 2018) Debris flows are triggered by intense rainfall (Segoni et al., 2018). Rainfall thresholds for this study are derived from Friele (2012). Has geomorph process Erosional process Always (Bovis and Jakob, 1999)) Streams with active erosional processes are more likely to experience debris flows than streams with less active erosional processes. Has geomorph process Mass movement Always (Guzzetti et al., 2012) Landslides are more likely to occur on slopes or valleys that have experienced landslides before. Has been logged within years 5-10 years Always (Jackson Jr, 2019) Landslides are extremely likely by 5 to 10 years after tree harvesting. Most of tree roots have died, and new trees are too small to provide anchoring effect with their roots on the slope. Has been logged within years 10-20 years Usually (Jackson Jr, 2019) Landslides are likely by 10 to 20 years after tree harvesting as new trees are starting to provide anchoring effect with their roots on the slope. Has been logged within years 0-5 years Usually (Jackson Jr, 2019) Landslides are likely by 0 to 5 years after tree harvesting as the trees are dead, but some roots are still providing anchoring effect on the slope. Has fire within years 0-2 years Always (Jackson Jr, 2019) Debris flows are very likely for 2 years after a wildfire. Water cannot infiltrate; runoff and erosion increase as the soil becomes water-repellent and loses cohesion because of the fire heat. Has fire within years 3-5 years Usually (Jackson Jr, 2019) Debris flows are likely between 3 and 5 years after a wildfire. The water-repellent soil horizon degrades, but the roots of dead trees are starting to rot, and they do not support the slope with their anchoring effect anymore. Has fire within years 5-10 years Always (Jackson Jr, 2019) Debris flows are very likely between 5 and 10 years after a wildfire. Roots of dead trees decay, and they are not supporting the soil anymore as for the case of tree harvesting. Has fire within years 10-20 years Usually (Jackson Jr, 2019) Debris flows are likely between 10 and 20 years after a wildfire. Always (Bovis and Jakob, 1999) Debris flows are common in areas with easily erodible material. Has stream order 1 Always (Hungr et al., 2014) Debris flows occur periodically on established paths, usually gullies and first-or second-order streams. Has stream order 2 Always (Hungr et al., 2014) Debris flows occur periodically on established paths, usually gullies and first-or second-order streams. Has stream order 3 Rarely (Hungr et al., 2014) Debris flows occur periodically on established paths, usually gullies and first-or second-order streams. Has stream order 4 Rarely (Hungr et al., 2014) Debris flows occur periodically on established paths, usually gullies and first-or second-order streams. Has stream order 5 Rarely (Hungr et al., 2014) Debris flows occur periodically on established paths, usually gullies and first-or second-order streams. Has been logged within years > 20 years Sometimes (Jackson Jr, 2019) By 20 years since logging, trees have regrown, and the roots are anchoring the soil again. Has geomorph process Debris flow Always (Bovis and Jakob, 1999;Wilford et al., 2004) Melton ratio (number that takes into account relief and area of a watershed) and watershed length allow discrimination of debris flow, debris flood, and flood-prone fans. Has landslide type Debris flow Always (Hungr et al., 2014) Debris flows occur periodically on established path. Determining the frequency of event is a non-trivial task, but the fact that someone mapped a debris flow in a specific channel indicates the channel as prone to debris flow events. Has landslide type Fall Usually (Bovis and Jakob, 1999 (Howes and Kenk, 1997) The presence of blocks can be an indicator of landslide processes. Has texture Rubble Always (Howes and Kenk, 1997) The presence of rubble is an indicator of landslide processes.  (Guzzetti et al., 2012) Where there is soil, it is less likely that there will be steep slopes and rock slides. But soil slides are a sign of an unstable slope and therefore are not explicitly negatively correlated to rock slides. Has surficial form Cliff Always (Hungr et al., 2014) Cliffs can generate rock slides. Has texture Rubble Always (Howes and Kenk, 1997) The presence of blocks can be an indicator of landslide processes. Has texture Blocks Always (Howes and Kenk, 1997) The presence of rubble is an indicator of landslide processes. Has surficial form Cones Always (Howes and Kenk, 1997) Cones may be formed by rock slide debris; hence they can be considered an indicator of rockslide activity.   (Jackson Jr, 2019) Roads are an aggravating factor for landslide activity as compared to undisturbed slopes.
Has thickness Blanket Always (Jackson Jr et al., 2008) Soil slides can occur when there is enough soil that can be mobilized on a slope.

Has thickness
Mantle of variable thickness Usually (Jackson Jr et al., 2008) Soil slides can occur when there is enough soil that can be mobilized on a slope. Has thickness Veneer Sometimes (Jackson Jr et al., 2008) Soil slides can occur when there is enough soil that can be mobilized on a slope.
Has thickness Thin veneer Rarely (Jackson Jr et al., 2008) Soil slides can occur when there is enough soil that can be mobilized on a slope. Has rainfall Extreme rainfall Always (Friele, 2012;Segoni et al., 2018) Landslides can be triggered by intense rainfall (Segoni et al., 2018) or snowmelt. Rainfall thresholds for this study are derived from Friele (2012).

Has bed rock
Metamorphic rock Always (Bovis and Jakob, 1999) Metamorphic foliated rocks have usually weak geotechnical properties. Basins underlain by these weak rocks are likely to experience more landslides compared to basins underlain by stronger lithologies. Has texture Blocks Always (Howes and Kenk, 1997) The presence of block can be an indicator of mass movement processes. Has texture Rubble Always (Howes and Kenk, 1997) The presence of rubble is an indicator of mass movement processes. Has been logged within years > 20 years Sometimes (Jackson Jr, 2019) By 20 years since logging, trees have regrown, and the roots are anchoring the soil again. Has been logged within years 10-20 years Usually (Jackson Jr, 2019) Landslides are likely by 10 to 20 years after tree harvesting as new trees are starting to provide anchoring effect with their roots on the slope. Has been logged within years 5-10 years Always (Jackson Jr, 2019) Landslides are extremely likely by 5 to 10 years after tree harvesting. Most of tree roots have died, and new trees are too small to provide anchoring effect with their roots on the slope. Has been logged within years 0-5 years Usually (Jackson Jr, 2019) Landslides are likely by 0 to 5 years after tree harvesting as the trees are dead, but some roots are still providing anchoring effect on the slope. Has fire within years > 20 years Sometimes (Jackson Jr, 2019) After 20 years since a wildfire, trees have regrown, and the wildfire effects on slope stability have diminished.
-Data from the Veneto Geoportal are available under the "Italian Open Data License 2.0" (Regione del Veneto, 2020).
-CORINE Land Cover data are available under EEA standard reuse policy: reuse of content on the EEA website for commercial or non-commercial purposes is permitted free of charge, provided that the source is acknowledged (Feranec et al., 2016).
-The Tinitaly digital elevation model (DEM) is available upon request by sending an email to simone.tarquini@ingv.it with the subject of TINITALY DEM. Terms and conditions of use: data are provided for research purposes only. Data are provided solely to the person named on this application form and should not be given to third parties. Third parties who might need access to the same dataset are required to fill out their own application forms. Data from INGV are available under "Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)" (Tarquini et al., 2007).
Author contributions. -GR, JM, CS, and DP wrote the paper.
-GR conceptually designed the susceptibility schema, landslide extension, and the expert-based landslide models and expanded the geohazard ontology.
-JM implemented the INSPIRE schema and code list extension and designed the web map application.
-DP and CS designed the qualitative probabilistic method used to calculate susceptibility.
-SL and BB implemented and maintain the web map.
-VW implemented and maintained the geohazard ontology.
-BB and CA implemented the qualitative probabilistic algorithm.
-SR supported the semantic implementations and edited the manuscript.
-DB helped in the redaction of the manuscript as well as reviewed the landslide models and the code list extensions.