Hi @ceiag,
Thanks for providing the graph view of your example from NCIt. I'll use that as my basis/reference as what you are looking for upon importing the Thesaurus.owl OWL ontology.
Your hunch regarding your an incorrect graph configuration you have set prior to import causing the issues you have illustrated above is correct but it isn't exactly the only reason the graph view from NCIt differs in naming convention (from what you are seeing in Neo4j). What you are seeing in Neo4j with the current configuration is actually the true raw OWL ontology (disregarding that the uris have been shortened by default and prefixed with an nsx__. This is a result of Neo4j adhering to the default value for the handleVocabUris
parameter (see more here --> Configuring Neo4j to use RDF data)).
The NCIt Graph View on the other hand is a modified/transformed graph visualization that is surfacing rdfs:label
for the associations contained in this ontology rather than the URI or shortened URI (in Neo4j we are seeing the shortened). Reference NCI Thesaurus documentation regarding how the metadata within this ontology translates to "human readable language". (Thesaurus.owl metadata documentation)
With that said there is a way to perform this transformation within Neo4j! No worries! But first, let's first take a quick look at your current graphConfig:
When using the current graphConfig, handleMultival
has been set to "Array"
. When setting handleMultival
to "Array"
, this is instructing Neo4j to import and store all property values as arrays (including properties that wouldn't make sense to be stored as arrays --> for example: single value properties). In addition to all property values being stored as arrays when handleMultival
is set to "ARRAY"
in our GraphConfig
if we don’t provide a list of property URIs as multivalPropList
(within the graphConfig) all properties will be stored as arrays. So if handleMultival
needs to be set to "ARRAY"
, you need to also specify multivalPropList
within the graphConfig as-well. This isn't contributing to the reason you are seeing ns2__A31
rather than Has_GDC_Value
but this is storing all node property values as arrays when they all should not.
Easy Initial Solution to Node Properties as Array Problem: Change your graphConfig to either omit handleMultival
entirely (if the ontology doesn't contain any multi-value properties/you don't care about those properties that are multival) OR specify the exact multi-valued property(s) that should be stored as an array by specifying multivalPropList
within your graphConfig. Take a look below:
graphConfig:
CALL n10s.graphconfig.init( { handleRDFTypes: "LABELS_AND_NODES" } );
Cypher Statement to Review:
MATCH (alpelisib)-[r:ns2__A32]->(pharmSub)
WHERE alpelisib.ns2__NHC0 = "C94214"
RETURN alpelisib, r, pharmSub;
Result Vis:
Note that this is 100% correct based on the OWL file:
<!-- http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#A31 -->
<owl:AnnotationProperty rdf:about="http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#A31">
<NHC0>A31</NHC0>
<P106>Conceptual Entity</P106>
<P108>Has_GDC_Value</P108>
<P90>Has_GDC_Value</P90>
<P97>An association that connects a concept representing a GDC property to its dedicated permissible value concept(s).</P97>
<rdfs:label>Has_GDC_Value</rdfs:label>
<rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#anyURI"/>
</owl:AnnotationProperty>
we can see that [http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#A31](http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#A31)
has been shortened by Neo4j to ns2__A31 (as expected as handleVocabUris
within the graphConfig has defaulted to its default value "SHORTEN"
).
As we can see in the OWL snippet, the AnnotationProperty has rdfs:label
value of Has_GDC_Value
, but upon import using this graphConfig, Neo4j is simply shortening the URI of the predicate to its raw value. If you'd like to further edit what these relationshipTypes (it sounds like you do or want to mirror NCIt graph view), refer to Mapping Graph Models - Neosemantics (4.3). This will walk you through how to set the proper graph configuration to allow you to utilize other neosemantics (n10s) procedures to add namespace prefix definitions and create actual mappings for individual elements in the graph to elements to match the NCIt graph view.
To help you get going I have provided the steps required to take below too .
(Please note: You'll have to add each relationshipType as a distinct mapping using n10s.mapping.add()).
Solution To Get You Started:
// Create Uniqueness Constraint
CREATE CONSTRAINT n10s_unique_uri ON (r:Resource) ASSERT r.uri IS UNIQUE;
// Create GraphConfig --> need to SET handleVocabUris to Map. This will enable ability to ensure Neo4j mirrors NCIt Graph View
CALL n10s.graphconfig.init( {
handleVocabUris: "MAP"
});
// Create Prefix Definitions (using addFromText procedure from n10s)
CALL n10s.nsprefixes.addFromText('
<rdf:RDF xmlns="http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#"
xml:base="http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:owl="http://www.w3.org/2002/07/owl#"
xmlns:oboInOwl="http://www.geneontology.org/formats/oboInOwl#"
xmlns:Thesaurus="http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#"
xmlns:xml="http://www.w3.org/XML/1998/namespace"
xmlns:protege="http://protege.stanford.edu/plugins/owl/protege#"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:dc="http://purl.org/dc/elements/1.1/">
');
// Create Mapping from http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#A31 to Has_GDC_Value
CALL n10s.mapping.add("http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#A31", "Has_GDC_Value");
// Create Mapping from http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#A32 to Is_Value_For_GDC_Property
CALL n10s.mapping.add("http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#A32", "Is_Value_For_GDC_Property");
// Add all mappings...
// Lastly... Import Thesaurus.owl
CALL n10s.rdf.import.fetch('file:///var/lib/neo4j/import/Thesaurus.owl', 'RDF/XML');
Now we can query the graph & see the transformation:
MATCH (x)-[r:Is_Value_For_GDC_Property]->(y)
WHERE x.NHC0 = 'C94214'
RETURN x, r, y;
Desired Result!:
I hope this is of help to you! Feel free to ping back if you need more help!
Best,
Rob