class: center, middle [
](https://www.archesproject.org/wp-content/uploads/2020/02/arches-logo-r-only.svg) # Basic Concepts for Modeling in Arches The Arches Resource Model Working Group (ARM WG)
Documentation Updated: November 1, 2020 --- name: scope .left-column[ ## Learning Objectives ] .right-column[ .extra-padding[To understand: 1. the basic [.orange[concepts]](#concepts) behind modeling in Arches ]] --- name: concepts # Table of Contents .left-column[ ##.orange[Concepts] ] .right-column[ .orange[[Introduction](#basic_intro) [.orange[What is a Data Model?]](#data_model) * [Types of Data Models](#data_model_types) * [Structured data](#structured_data) * [Key elements of structured data](#structured_data_elements) * [Clean Data](#clean_data) * [Controlled Vocabularies](#controlled_vocabularies) * [Universal Unique Identifiers](#UUID) * [Triples](#triples) * [Structured data in a data model](#structured_data_in_model) ]] --- name: concepts2 #Table of Contents .left-column[ ##.orange[Concepts] .orange[(continued)] ] .right-column[ .orange[ [.orange[What is a Semantic Data Model?]](#semantic_data_model) * [Semantic Standards](#semantic_standards) * [Semantic Formats](#semantic_formats) * [Semantic Ontologies](#semantic_ontologies) * [Semantic Web](#semantic_web) ]] --- name: concepts3 #Table of Contents .left-column[ ##.orange[Concepts] .orange[(continued)] ] .right-column[ .orange[ [.orange[What is an Arches Resource Model?]](#arches_resource_model) * [Resource Model and Branches](#resourcemodels_branches) * [Anatomy of an Arches Branch](#branch_anatomy) * [Data Types](#datatypes) * [Concept Collections](#conceptcollections) * [Relationships between Resource Models](#resource-instance) * [Arches Designer](#arches_designer) * [Encoding the ontology](#resourcemodel_ontology) * [Reference Data Manager](#RDM) [.orange[The Modeling Process]](#modeling_process) * [Data requirements](#data_requirements) * [Data content standards](#data_content) * [Creating a conceptual model](#conceptual_model_building) * [Building Arches Resource Models](#arm_building) ]] --- class: center, middle name: basic_concepts #What are the basic .orange[concepts] you should know for modeling in Arches? --- name: basic_intro # Introduction The information in this section introduces some concepts that will be helpful to understand before engaging with the ARM WG methodology for Arches modeling. Generally, the concepts have a direct relationship to the modeling process and why it's important. --- class: center, middle name: basic_concepts1 #What are the basic .orange[concepts] you should know for modeling in Arches? [
](https://www.archesproject.org/wp-content/uploads/2020/08/Concepts_progress1_text.png) --- name: data_model #What is a Data Model? A data model establishes the overall organization of data in any given database or system. Data models can work on a conceptual level as well as a functional level. A conceptual data model provides the overall vision and framework for data organization but might not be adequately interpreted, structured and encoded for a particular software application. A data model that is also functional, such as an Arches Resource Model, is ready to be loaded into and expressed through a software application, in this case, Arches. This documentation focuses on modeling functional data models, Arches Resource Models, that rely on conceptual frameworks. --- name: data_model_types #Types of Data Models In addition, there are different types of data models, often corresponding to the type of database or system. For example, a relational data model, which corresponds to relational databases, is based on a table structure. A graph data model, which often corresponds to graph databases, focuses on the data itself and the relationships between data. As a note, Arches Resource Models are graph data models. Because Arches Resource Models are graph data models, it is important to understand [.orange[structured data]](#structured_data) and how it forms the basis for graph data models. --- name: structured_data # Structured data Structured data is data that is organized and formatted in a way that is machine-readable, or in other words, usable by computer applications for processing, analysis and other functions. In relational databases, the structure of the data is created by tables. However, in order to increase portability, interoperability and longevity, data should be structured to be self-describing and independent of any particular software application. To understand how structured data can achieve this, it is important to understand some key elements of structured data. --- name: structured_data_elements # Key Elements to Structured Data The following elements help to form the foundation for data that is meaningful and useful: 1. [.orange[Clean Data]](#clean_data) 2. [.orange[Controlled Vocabularies]](#controlled_vocabularies) 3. [.orange[Universal Unique Identifiers]](#UUID) 4. [.orange[Triples]](#triples) Structured data consists of clean and consistent data, with terminology that is defined by established controlled vocabularies. All database types can benefit from data that have these two elements. Structured data also has elements that are essential in graph databases: the ability for each instance of data to be uniquely identified, preferably through a universal unique identifier, and the ability to create meaninful relationships between data instances through the use of triples. --- name: clean_data # Clean Data Clean data is consistent in terminology, formatting, and structure through the entirety of the table or database. For example: consistent date formatting i.e. 2019-10-02 instead of 10/2/19 preferred terminology i.e. United States of America instead of USA Organizational standards help determine preferred format and structure [OpenRefine](https://www.openrefine.org) is a great, open source tool to help clean data --- name: controlled_vocabularies # Controlled Vocabularies Controlled Vocabularies are the set of standards chosen for preferred terminology used within a database. They help create consistency when data can be incredibly messy, with misspellings, homonyms, and cultural/national differences. Preferred vocabularies are established and stored in thesauri that should be shared for enhanced data interoperability. Here is a [primer to controlled vocabularies](https://www.archesproject.org/wp-content/uploads/2020/01/Controlled-Vocabularies_Jan2020.pdf) posted in the [Implementation Considerations](https://www.archesproject.org/implementation-considerations/) for the Arches Project. Some examples: [The Getty Art and Architecture Thesaurus (AAT)](https://www.getty.edu/research/tools/vocabularies/aat/) [Library of Congress Subject Headings (LCSH)](http://id.loc.gov/authorities/subjects.html) The Arches Project manages its Controlled Vocabularies and Thesauri through the Reference Data Manager (RDM). For a more detailed guide on the RDM, [click here](***link to RDM slide # Reference Data Manager***). --- name: controlled_vocabularies2 # Controlled Vocabularies Links and Tutorials [Introduction to Controlled Vocabularies: Terminology for Art, Architecture, and other Cultural Works (Online Edition) by Patricia Harpring](https://www.getty.edu/research/publications/electronic_publications/intro_controlled_vocab/) [Controlled Vocabulary and Thesaurus Design from the Library of Congress Cataloger's Learning Workshop](https://www.loc.gov/catworkshop/courses/thesaurus/pdf/cont-vocab-thes-trnee-manual.pdf) [List of taxonomies and controlled vocabularies: University of Puttsburgh Libraries Subject Guide](https://pitt.libguides.com/metadatadiscovery/controlledvocabularies) [Discover Digital Libraries: Theory and Practice- Chapter 5: Metadata](https://www.sciencedirect.com/science/article/pii/B9780124171121000053)(PAYWALL) [Simple Knowledge Organization System (SKOS) Wikipedia entry](https://en.wikipedia.org/wiki/Simple_Knowledge_Organization_System#Overview) [W3C SKOS Primer](https://www.w3.org/TR/skos-primer/) --- name: UUID # Universal Unique Identifiers Universal Unique identifiers (UUID) are associated with any given entity within a database structure. Each entity has a uniquely coded identifier that ensures exactly what it is. A UUID is a 128-bit number that differentiates the term from any other possibilities found online. This 128-bit number is difficult to replicate, as there are 3.4 x 10^38 possible alphanumeric combinations (an extremely large number!). Arches utilizes UUIDs, e.g. 662b53c0-2e26-4b87-a6d0-109b7f611e05 Similarily, a UID (Unique Identifier) or URI (Uniform Resource Identifier) may be unique within an organization, but not necessarily unique universally. For example, a university student identification number is a UID that will not be repeated within the context of that university. A Social Security number also is a UID because it is specific to one single person within one single context and cannot be repeated. A UID can also link an entity to a controlled vocabulary. The AAT record number can replace the name of any entity because it links back to the original preferred authority record. For example, the AAT record for 'database' is http://vocab.getty.edu/aat/300028543, with the UID being 300028543. Usage of a UID or UUID is important within a database or spreadsheet in order to faciliate sorting and filtering information, as well has link back to specific entity within the database system. --- name: triples # Triples A triple is a data configuration in which a data entity is linked by a relationship property to another data entity. For example, .red[a person (entity)] .blue[is identified by (property)] .darkgreen[a name (entity)] [
](https://www.archesproject.org/wp-content/uploads/2020/08/Slide7.jpeg) --- name: structured_data_in_model #Structured data in a data model A graph data model essentially creates a network of interconnected triples. This covers the structure of a graph data model. The next section deals with how to overlay the data model with semantic meaning. --- class: center, middle name: basic_concepts2 #What are the basic .orange[concepts] you should know for modeling in Arches? [
](https://www.archesproject.org/wp-content/uploads/2020/08/Concepts_progress2_text.png) --- name: semantic_data_model #What is a semantic data model? When previously describing this triple: .red[a person (entity)] .blue[is identified by (property)] .darkgreen[a name (entity)] both entities and the property are understandable to humans who understand English because they are written in a natural language. However, a computer only sees strings of letters, so both entities and the property must be given additional machine-readable meaning, so that a computer can parse and interpret the data. A semantic data model is a graph data model for which each entity and property in the data model is associated and encoded with semantic metadata that describes what each entity and property refers to in a way that conforms to standard computer formats. Further, to ensure that data can be more portable, interoperable, and reusable over the long term, it is essential that the conceptual framework for as well as the format of the semantic metadata be based on recognized standards. --- name: semantic_standards #Semantic Metadata Standards There are two types of standards that interact when it comes to semantic metadata: * Standards that define the data format (i.e. RDF/XML) * Standards that define the conceptual data framework or ontology (i.e. CIDOC CRM) --- name: semantic_formats # Semantic Formats: RDF RDF (Resource Description Framework): A classification schema to construct conceptual data models. The RDF is a set of standards specified by the [World Wide Web Consortium (W3C)](https://www.w3.org/2001/sw/wiki/RDF). The RDF is constructed by a series of expressions about a resource that are formatted in a [semantic triple](https://www.w3.org/TR/2014/REC-n-triples-20140225/Overview.html). This semantic triple is structured: subject-predicate-object The RDF defines a model and a set of elements through a domain-specific syntax to encode information in a machine-readable format. .footnote[Further reading:
* [RDFa 1.1 Primer- W3C](https://www.w3.org/TR/2015/NOTE-rdfa-primer-20150317/)] --- name: semantic_formats_2 # Other Semantic Formats: * [RDF/XML](https://www.w3.org/TR/rdf-syntax-grammar/): The required syntax to express RDF graphs in XML. * [OWL (Web Ontology Language)](https://www.w3.org/OWL/): A semantic web language to describe the relationship about things and relationships between them. In addition to RDF and SPARQL, OWL is part of the W3C Semantic Web technology stack. * [SPARQL (SPARQL Protocol and RDF Query Language)](https://www.w3.org/2001/sw/wiki/SPARQL): a query language used to parse and navigate information stored within an RDF graph. It translates the complex web of interlinked data into tabular data, making data more easily accessible. * [SKOS (Simple Knowledge Organization System)](https://www.w3.org/TR/skos-primer/): a simple way to express structure and content within a concept scheme. SKOS is built upon RDF for simple publication of data as Linked Data. .footnote[Further reading:
* [RDF/XML- w3schools](https://www.w3schools.com/xml/xml_rdf.asp) * [SPARQL Tutorial- Programming Historian](https://programminghistorian.org/en/lessons/retired/graph-databases-and-SPARQL) * [SKOS Reference- W3C](https://www.w3.org/TR/skos-reference/)] --- name: semantic_ontologies #Semantic Ontologies An ontology is a formal organization of a data structure or knowledge graph based on standards, establishing relationships between entities and their properties. An ontology is a way to structure semantic understanding shared by members of a specific domain. In the cultural heritage domain, which is the main focus of the Arches Resource Model Working Group, there are prevalent existing ontologies, developed by domain experts, that are built and widely used for institutions to model and integrate their own data resources according to established structure and guidelines. Examples: * [CIDOC-CRM](http://www.cidoc-crm.org/) - The ontology that Arches comes preloaded with. * [Linked.Art](https://linked.art/) - The ontology, based primarily on the CIDOC-CRM, that the Arches Resource Model Working Group uses. Links: * [Ontologies and the Semantic Web (2003)](https://asistdl.onlinelibrary.wiley.com/doi/pdf/10.1002/bult.283) --- name: semantic_web # Semantic Web The Semantic Web is the vision to make the world wide web machine-readable, such that all online data and resources are linked by a Unique Resource Identifier (URI). This will promote the navigation between resources and to improve automation for information retrieval online. The goals of the Semantic Web are to: - Structure meaningful content on web pages to promote machine automation in accessing and processing data * Encode the semantics (meaning) of the data through designated frameworks (RDF and OWL) - Promote Natural Language Processing and Semantic Search to support enhanced navigation and searchability on the web and from a variety of different sources. .footnote[Further reading:
* ["The Semantic Web": First published article in 2001 by Tim Berners-Lee, et al](https://www-sop.inria.fr/acacia/cours/essi2006/Scientific%20American_%20Feature%20Article_%20The%20Semantic%20Web_%20May%202001.pdf)] --- class: center, middle name: basic_concepts3 #What are the basic .orange[concepts] you should know for modeling in Arches? [
](https://www.archesproject.org/wp-content/uploads/2020/08/Concepts_progress3_text.png) --- name: arches_resource_model #What is an Arches Resource Model? An Arches Resource Model is a semantic graph data model formatted for use with the Arches Platform. Arches Resource Models also include the information and formatting for the data entry interface (i.e. forms) to input the data and the report to display the data for each Resource Model. [
](https://www.archesproject.org/wp-content/uploads/2020/08/Slide1.jpeg) --- name: arches_resource_model2 For example, if your Arches instance records information on Buildings, People, and Activities, you would typically create a Resource Model for Building, a Resource Model for People and a Resource Model for Activities. This would result in a data entry interface and report template for Buildings, a data entry interface and report template for People, and a data entry interface and report template for Activities. All of this is encoded in a JSON file that can be exported from one Arches implementation and imported into another. In other words, the same Arches Resource Model or set of Arches Resource Models can be used by different Arches implementations. The Arches Resource Model Working Group focuses primarily on creating guidance on the semantic graph model portion of the Arches Resource Model as well as providing sample Arches Resource Models, as well as Branches, that can be used and modified by various Arches implementers. --- name: resourcemodels_branches #Resource Models and Branches (Arches) Resource Models consist of discrete Branches that function as smaller graph models within the larger whole. [
](https://www.archesproject.org/wp-content/uploads/2020/08/Slide2.jpeg) For example, a Resource Model for Person might be composed of a Name branch that contains all of the data related to a person's name and a Description branch that contains all of the data relating to a descriptive statement about a person. --- name: resourcemodels_branches2 [
](https://www.archesproject.org/wp-content/uploads/2020/08/Slide3.jpeg) A Branch can be used by many different Resource Models. For example, basic Name and Description branches can be used by the respective Resource Models for People, Buildings, and Activities. And similar to Resource Models, Branch information is encoded in a JSON file that can be exported from one Arches implementation and imported into another. --- name: branch_anatomy #Anatomy of an Arches Branch An Arches Branch typically groups data together that is thematically related. In the example below, this simplified Name Branch consists of the triple formed by "Name"(entity) "has type"(property) "Name Type"(entity). [
](https://www.archesproject.org/wp-content/uploads/2020/08/Slide4.jpeg) In Arches, the entities are represented by Nodes and the properties are the Relationships. "Name" and "Name Type" are Nodes and "has type" is the Relationship between the Nodes. --- name: branch_anatomy2 In using the CIDOC CRM ontology, each Node is assigned a CRM Class and each Relationship is assigned a CRM Property. The semantic statement formed by the branch then reads as: "Name" (as defined by CRM Class E41 Appellation) "has type" (as defined by CRM Property P2 Has Type) "Name Type" (as defined by CRM Class E55 Type). [
](https://www.archesproject.org/wp-content/uploads/2020/08/Slide5.jpeg) This defines the structure for a very simplified version of a Name branch. It is possible that more types of information would need to be represented in a Name branch, such as the language of Name or the source of a Name, and additional Nodes and Relationships can be added to accommodate such needs. It is important to note that Arches can easily be configured to record as many instances (entries) of the data represented in the Name branch as needed. --- name: branch_anatomy3 For example, below are two instances of the Name branch for the same Person. [
](https://www.archesproject.org/wp-content/uploads/2020/08/Slide6.jpeg) Mark Twain, the author of "The Adventures of Tom Sawyer", was given the name "Samuel Clemens" at birth, and above is an example of how the Name branch structure shown is able to represent both names along with the type of name associated with each. --- name: datatypes #Data Types Each Node in a Branch/Resource Model is assigned a data type that defines what kind of data can be entered for that Node and how data is entered. Here is a listing of some of the data types that can be assigned to a Node in Arches: * **Semantic** - Nodes with a semantic data type carry no data, but serve to enforce the semantic structure and grouping of nodes. * **String** - This data type supports the entry of alphanumeric text. * **Number** - This data type supports entry of numbers. * **Date** - This data type supports the entry of date and time. * **Geojson Feature Collection** - This data type supports the entry of geospatial data. * **File** - This data type supports the upload of various file types and how they are visualized. * **Resource-Instance** - This data type supports the connection of Resource Models to each other via a specific Node. * **Concept** - This data type supports the selection of data from a controlled vocabulary managed by the Arches Reference Data Manager. * **Domain** - This data type supports the selection of data from a list that is not managed by the Arches Reference Data Manager. * **IIIF Annotation** - This data type supports the import and annotation of images served by a IIIF image server. * **Extended Date Time Format** - This data type supports of the input of date information according to the Library of Congress' Extended Data Time Format. --- name: conceptcollections #Concept Collections The Concept data type connects a Node to a controlled vocabulary for data entry. In Arches, controlled vocabularies are called Concept Collections. In the Name branch example below, the node for Name Type would be assigned the Concept data type which would also necessitate the designation of a Concept Collection that contains the Concepts (or terms) for Name Type. [
](https://www.archesproject.org/wp-content/uploads/2020/08/Slide6.jpeg) The Concept Collection would include the Concepts of "Birth Name" and "Pen Name", as well as other types of names. Concepts and Concept Collections are managed by Arches in the Reference Data Manager. As described in the section on Controlled Vocabularies, Concepts and Concept Collections have their own organizing principles and are an extension of the data model structure. --- name: resource-instance #Relationships between Resource Models Through the resource-instance data type, relationships can be created between Resource Models. [
](https://www.archesproject.org/wp-content/uploads/2020/08/Slide8.jpeg) In the example above, the Person Resource Model has a Node for Building that the Person currently or formerly owned. This Node was assigned a resource-instance data type and links to the Resource Model for Building. This allows instances of the Building Resource Model to be associated with many different instances of the Person Resource Model and other Resource Models. --- name: arches_designer #Arches Designer Arches Designer is a tool that gives Arches administrators the ability to create Resource Models and in doing so, dynamically generate the Arches user interface and design the underlying data structure with no coding experience necessary. [
](https://www.archesproject.org/wp-content/uploads/2020/08/arches_designer.png) .footnote[Further reading:
* [Arches documentation on how to use Arches Designer](https://arches.readthedocs.io/en/latest/designing-the-database/#arches-designer)] --- name: resourcemodel_ontology #Encoding the ontology Most Arches packages use an ontology that is CIDOC CRM-based, and as a result, the Arches Designer generally has the classes and properties of the CIDOC CRM preloaded and restricts the selection of inappropriate class and property combinations. [
](https://www.archesproject.org/wp-content/uploads/2020/08/arches_designer2.png) --- name: RDM # Reference Data Manager The Reference Data Manager is the Arches interface that gives Arches administrators the ability to manage the controlled vocabularies that power both concept search and data entry. [
](https://www.archesproject.org/wp-content/uploads/2020/08/RDM_Languages.jpg) --- The RDM manages controlled vocabularies through broader thesaurus concepts. [
](https://www.archesproject.org/wp-content/uploads/2020/08/RDM_Languages.jpg) [
](https://www.archesproject.org/wp-content/uploads/2020/08/RDM_materials.jpg) [
](https://www.archesproject.org/wp-content/uploads/2020/08/RDM_import.jpg) Easily import authoritative thesauri, such as from the Getty Art and Architecture Thesaurus (AAT). --- class: center, middle name: basic_concepts4 #What are the basic .orange[concepts] you should know for modeling in Arches? [
](https://www.archesproject.org/wp-content/uploads/2020/08/Concepts_progress4_text.png) --- name: modeling_process #The Modeling Process To create one or more data models from scratch, generally one should start by having a clear idea of what the overall purpose and goals are of the system that the data models will reside in. This is particularly important with Arches, since the Resource Models determine all interactions with the data, so not only the data organization but also the data entry and discovery. Once the overall system requirements are defined, then the modeling process can begin by assembling the data requirements. --- name: data_requirements #Data requirements When gathering data requirements, here are some helpful questions to ask: * Are there existing data that need to be incorporated in the system? In other words, are there any legacy data? * Are there existing data standards that I have to comply with? Or will the system's data be interacting in some way with data housed in other systems? Once you know what data you want to manage in your system and how it will or will not be interacting with data in other systems, you can start to create a conceptual model that encompasses all of your data. --- name: data_content #Data content standards A data content standard defines the essential items of information that should form a dataset, specific to a use case. An example of a data content standard is MIDAS Heritage. In addition to suggesting "the minimum level of information needed for recording heritage assets" in the United Kingdom, the MIDAS Heritage data standard also "covers the procedures involved in understanding, protecting and managing these assets." CIDOC is now finalizing an international standard for the inventory of archaeological and architectural heritage known as the International Core Data Standard for Archaeological and Architectural Heritage, through the input of the ICOMOS International Documentation Committee (CIPA). It is based on the earlier Core Data Index to Historic Buildings and Monuments of the Architectural Heritage adopted by the Council of Europe in 1992, and the Core Data Standard for Archaeological Sites and Monuments, which resulted from collaboration between CIDOC and the Council of Europe and was adopted in 1995. An example of a data standard for metadata, regardless of domain, is the Dublic Core Metadata Initiative. .footnote[Further reading:
* [MIDAS Heritage](https://historicengland.org.uk/images-books/publications/midas-heritage/) * [Dublin Core Metadata Initiative](http://dublincore.org/specifications/dublin-core/)] --- name: conceptual_model_building #Creating a conceptual model A conceptual model organizes all of the data in a system in order to make the relationships between data as explicit as possible. Examples of creating simple conceptual models are forthcoming. --- name: arm_building #Building Arches Resource Models And once you have a good conceptual framework for how you want your data to behave in Arches, here some brief guidance on how to translate that into Arches Resource Models. This would be the basic creation of Arches Resource Models. We will go into the Arches Resource Model Working Group suggested modeling patterns in the next section.