cancel
Showing results for 
Search instead for 
Did you mean: 

Importing ncithesaurus ontology into neo4j

ceiag
Node

Hi Neo4j community,

Looking for some general advice on the most efficient way of converting/importing an ontology file (Thesaurus_22.06d.OWL (https://evs.nci.nih.gov/ftp1/NCI_Thesaurus/Thesaurus_22.06d.OWL.zip)) into a neo4j knowledge graph. 

Have explored using the Neosemantics plugin, and although I have had some success I'm not convinced it's importing the data in its entirety,  Have used the following commands

CREATE CONSTRAINT n10s_unique_uri ON (r:Resource)
ASSERT r.uri IS UNIQUE;

call n10s.graphconfig.init( { handleMultival: "ARRAY" })

CALL n10s.onto.preview.fetch("file:///home/xxxxx/.config/Neo4j Desktop/Application/relate-data/dbmss/dbms-785d18d6-677e-4238-b22b-a94227bc4930/import/Thesaurus.owl","RDF/XML");


The result is as follows, the difference between the number of triplesLoaded and triplesParsed is somewhat disconcerting, and as far as I can tell not all relationships are displaying.

I'm not sure if it's related to the initial graph config have tried various variations to no avail. 

╒═══════════════════╤═══════════════╤═══════════════╤════════════╤═══════════╤════════════╕
│"terminationStatus"│"triplesLoaded"│"triplesParsed"│"namespaces"│"extraInfo"│"callParams"│
╞═══════════════════╪═══════════════╪═══════════════╪════════════╪═══════════╪════════════╡
│"OK"               │921897         │8734522        │null        │""         │{}          │
└───────────────────┴───────────────┴───────────────┴────────────┴───────────┴────────────┘


Any advice would be greatly appreciated. 

Using neo4jDesktop 1.4.15

Thanks

Chris

1 ACCEPTED SOLUTION

Rcolinp
Ninja
Ninja

Hey there @ceiag - maybe I can be of some help,

Right off the bat - you are correct that the import is not importing the ontology entirely or preserving all triples. 

That said:
There are a few things you can do here. Take a look at the n10s documentation for importing ontologies. You'll find there it notes that only the following 6 criteria will be accounted for upon import. 

  1. Named class (category) declarations with both rdfs:Class and owl:Class.

  2. Explicit class hierarchies defined with rdf:subClassOf statements.

  3. Property definitions with owl:ObjectProperty, owl:DatatypeProperty and rdfs:Property

  4. Explicit property hierarchies defined with rdfs:subPropertyOf statements.

  5. Domain and range information for properties described as rdfs:domain and rdfs:range statements.

Restrictions defined with owl:Restriction
(i believe the rdf:subClassOf is a typo and means rdfs:subClassOf)

A Solution (possibly helpful alternative):

Rather than using:

 

 

n10s.onto.import.fetch(url :: STRING?, format :: STRING?)

 

 

Try using:

 

 

n10s.rdf.import.fetch(url :: STRING?, format :: STRING?)

 

 

This will import the ontology and preserve all triples. 

Hope this helps! Feel free to loop back for more help.

I did an example import using your configuration but using n10s.rdf.import.fetch() & here is output:

Terminationstatus: "OK" | triplesLoaded: 8734522 | triplesParsed: 8734522
{
  "owl": "http://www.w3.org/2002/07/owl#",
  "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
  "ns0": "http://purl.org/dc/elements/1.1/",
  "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
  "ns2": "http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#",
  "ns1": "http://protege.stanford.edu/plugins/owl/protege#",
  "ns3": "http://www.geneontology.org/formats/oboInOwl#"
}

We can now see that triplesLoaded is equal to triplesParsed 🙂

Best Regards,
Rob

View solution in original post

2 REPLIES 2

Rcolinp
Ninja
Ninja

Hey there @ceiag - maybe I can be of some help,

Right off the bat - you are correct that the import is not importing the ontology entirely or preserving all triples. 

That said:
There are a few things you can do here. Take a look at the n10s documentation for importing ontologies. You'll find there it notes that only the following 6 criteria will be accounted for upon import. 

  1. Named class (category) declarations with both rdfs:Class and owl:Class.

  2. Explicit class hierarchies defined with rdf:subClassOf statements.

  3. Property definitions with owl:ObjectProperty, owl:DatatypeProperty and rdfs:Property

  4. Explicit property hierarchies defined with rdfs:subPropertyOf statements.

  5. Domain and range information for properties described as rdfs:domain and rdfs:range statements.

Restrictions defined with owl:Restriction
(i believe the rdf:subClassOf is a typo and means rdfs:subClassOf)

A Solution (possibly helpful alternative):

Rather than using:

 

 

n10s.onto.import.fetch(url :: STRING?, format :: STRING?)

 

 

Try using:

 

 

n10s.rdf.import.fetch(url :: STRING?, format :: STRING?)

 

 

This will import the ontology and preserve all triples. 

Hope this helps! Feel free to loop back for more help.

I did an example import using your configuration but using n10s.rdf.import.fetch() & here is output:

Terminationstatus: "OK" | triplesLoaded: 8734522 | triplesParsed: 8734522
{
  "owl": "http://www.w3.org/2002/07/owl#",
  "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
  "ns0": "http://purl.org/dc/elements/1.1/",
  "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
  "ns2": "http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#",
  "ns1": "http://protege.stanford.edu/plugins/owl/protege#",
  "ns3": "http://www.geneontology.org/formats/oboInOwl#"
}

We can now see that triplesLoaded is equal to triplesParsed 🙂

Best Regards,
Rob

Hi Rob,

Thanks for the reply, really appreciate it and yep that appears to have to done the trick!

Im going to take a closer look at the data, I may have some further questions down the line but for now thanks so much. 

Regards

 

Chris