cancel
Showing results for 
Search instead for 
Did you mean: 

Importing ncithesaurus ontology into neo4j

ceiag
Node

Hi Neo4j community,

Looking for some general advice on the most efficient way of converting/importing an ontology file (Thesaurus_22.06d.OWL (https://evs.nci.nih.gov/ftp1/NCI_Thesaurus/Thesaurus_22.06d.OWL.zip)) into a neo4j knowledge graph. 

Have explored using the Neosemantics plugin, and although I have had some success I'm not convinced it's importing the data in its entirety,  Have used the following commands

CREATE CONSTRAINT n10s_unique_uri ON (r:Resource)
ASSERT r.uri IS UNIQUE;

call n10s.graphconfig.init( { handleMultival: "ARRAY" })

CALL n10s.onto.preview.fetch("file:///home/xxxxx/.config/Neo4j Desktop/Application/relate-data/dbmss/dbms-785d18d6-677e-4238-b22b-a94227bc4930/import/Thesaurus.owl","RDF/XML");


The result is as follows, the difference between the number of triplesLoaded and triplesParsed is somewhat disconcerting, and as far as I can tell not all relationships are displaying.

I'm not sure if it's related to the initial graph config have tried various variations to no avail. 

╒═══════════════════╤═══════════════╤═══════════════╤════════════╤═══════════╤════════════╕
│"terminationStatus"│"triplesLoaded"│"triplesParsed"│"namespaces"│"extraInfo"│"callParams"│
╞═══════════════════╪═══════════════╪═══════════════╪════════════╪═══════════╪════════════╡
│"OK"               │921897         │8734522        │null        │""         │{}          │
└───────────────────┴───────────────┴───────────────┴────────────┴───────────┴────────────┘


Any advice would be greatly appreciated. 

Using neo4jDesktop 1.4.15

Thanks

Chris

1 ACCEPTED SOLUTION

Rcolinp
Ninja
Ninja

Hey there @ceiag - maybe I can be of some help,

Right off the bat - you are correct that the import is not importing the ontology entirely or preserving all triples. 

That said:
There are a few things you can do here. Take a look at the n10s documentation for importing ontologies. You'll find there it notes that only the following 6 criteria will be accounted for upon import. 

  1. Named class (category) declarations with both rdfs:Class and owl:Class.

  2. Explicit class hierarchies defined with rdf:subClassOf statements.

  3. Property definitions with owl:ObjectProperty, owl:DatatypeProperty and rdfs:Property

  4. Explicit property hierarchies defined with rdfs:subPropertyOf statements.

  5. Domain and range information for properties described as rdfs:domain and rdfs:range statements.

Restrictions defined with owl:Restriction
(i believe the rdf:subClassOf is a typo and means rdfs:subClassOf)

A Solution (possibly helpful alternative):

Rather than using:

 

 

n10s.onto.import.fetch(url :: STRING?, format :: STRING?)

 

 

Try using:

 

 

n10s.rdf.import.fetch(url :: STRING?, format :: STRING?)

 

 

This will import the ontology and preserve all triples. 

Hope this helps! Feel free to loop back for more help.

I did an example import using your configuration but using n10s.rdf.import.fetch() & here is output:

Terminationstatus: "OK" | triplesLoaded: 8734522 | triplesParsed: 8734522
{
  "owl": "http://www.w3.org/2002/07/owl#",
  "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
  "ns0": "http://purl.org/dc/elements/1.1/",
  "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
  "ns2": "http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#",
  "ns1": "http://protege.stanford.edu/plugins/owl/protege#",
  "ns3": "http://www.geneontology.org/formats/oboInOwl#"
}

We can now see that triplesLoaded is equal to triplesParsed 🙂

Best Regards,
Rob

View solution in original post

2 REPLIES 2

Rcolinp
Ninja
Ninja

Hey there @ceiag - maybe I can be of some help,

Right off the bat - you are correct that the import is not importing the ontology entirely or preserving all triples. 

That said:
There are a few things you can do here. Take a look at the n10s documentation for importing ontologies. You'll find there it notes that only the following 6 criteria will be accounted for upon import. 

  1. Named class (category) declarations with both rdfs:Class and owl:Class.

  2. Explicit class hierarchies defined with rdf:subClassOf statements.

  3. Property definitions with owl:ObjectProperty, owl:DatatypeProperty and rdfs:Property

  4. Explicit property hierarchies defined with rdfs:subPropertyOf statements.

  5. Domain and range information for properties described as rdfs:domain and rdfs:range statements.

Restrictions defined with owl:Restriction
(i believe the rdf:subClassOf is a typo and means rdfs:subClassOf)

A Solution (possibly helpful alternative):

Rather than using:

 

 

n10s.onto.import.fetch(url :: STRING?, format :: STRING?)

 

 

Try using:

 

 

n10s.rdf.import.fetch(url :: STRING?, format :: STRING?)

 

 

This will import the ontology and preserve all triples. 

Hope this helps! Feel free to loop back for more help.

I did an example import using your configuration but using n10s.rdf.import.fetch() & here is output:

Terminationstatus: "OK" | triplesLoaded: 8734522 | triplesParsed: 8734522
{
  "owl": "http://www.w3.org/2002/07/owl#",
  "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
  "ns0": "http://purl.org/dc/elements/1.1/",
  "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
  "ns2": "http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#",
  "ns1": "http://protege.stanford.edu/plugins/owl/protege#",
  "ns3": "http://www.geneontology.org/formats/oboInOwl#"
}

We can now see that triplesLoaded is equal to triplesParsed 🙂

Best Regards,
Rob

Hi Rob,

Thanks for the reply, really appreciate it and yep that appears to have to done the trick!

Im going to take a closer look at the data, I may have some further questions down the line but for now thanks so much. 

Regards

 

Chris

Nodes 2022
Nodes
NODES 2022, Neo4j Online Education Summit

On November 16 and 17 for 24 hours across all timezones, you’ll learn about best practices for beginners and experts alike.