Importing ncithesaurus ontology into neo4j

Hi Neo4j community,

Looking for some general advice on the most efficient way of converting/importing an ontology file (Thesaurus_22.06d.OWL (https://evs.nci.nih.gov/ftp1/NCI_Thesaurus/Thesaurus_22.06d.OWL.zip)) into a neo4j knowledge graph.

Have explored using the Neosemantics plugin, and although I have had some success I'm not convinced it's importing the data in its entirety, Have used the following commands

CREATE CONSTRAINT n10s_unique_uri ON (r:Resource)
ASSERT r.uri IS UNIQUE;

call n10s.graphconfig.init( { handleMultival: "ARRAY" })

CALL n10s.onto.preview.fetch("file:///home/xxxxx/.config/Neo4j Desktop/Application/relate-data/dbmss/dbms-785d18d6-677e-4238-b22b-a94227bc4930/import/Thesaurus.owl","RDF/XML");

The result is as follows, the difference between the number of triplesLoaded and triplesParsed is somewhat disconcerting, and as far as I can tell not all relationships are displaying.

I'm not sure if it's related to the initial graph config have tried various variations to no avail.

╒═══════════════════╤═══════════════╤═══════════════╤════════════╤═══════════╤════════════╕
│"terminationStatus"│"triplesLoaded"│"triplesParsed"│"namespaces"│"extraInfo"│"callParams"│
╞═══════════════════╪═══════════════╪═══════════════╪════════════╪═══════════╪════════════╡
│"OK"               │921897         │8734522        │null        │""         │{}          │
└───────────────────┴───────────────┴───────────────┴────────────┴───────────┴────────────┘

Any advice would be greatly appreciated.

Using neo4jDesktop 1.4.15

Thanks

Chris

Hey there @ceiag - maybe I can be of some help,

Right off the bat - you are correct that the import is not importing the ontology entirely or preserving all triples.

That said:
There are a few things you can do here. Take a look at the n10s documentation for importing ontologies. You'll find there it notes that only the following 6 criteria will be accounted for upon import.

  1. Named class (category) declarations with both rdfs:Class and owl:Class.

  2. Explicit class hierarchies defined with rdf:subClassOf statements.

  3. Property definitions with owl:ObjectProperty, owl:DatatypeProperty and rdfs:Property

  4. Explicit property hierarchies defined with rdfs:subPropertyOf statements.

  5. Domain and range information for properties described as rdfs:domain and rdfs:range statements.

Restrictions defined with owl:Restriction.
_(i believe the rdf:subClassOf is a typo and means rdfs:subClassOf)

_A Solution (possibly helpful alternative):

Rather than using:

n10s.onto.import.fetch(url :: STRING?, format :: STRING?)

Try using:

n10s.rdf.import.fetch(url :: STRING?, format :: STRING?)

This will import the ontology and preserve all triples.

Hope this helps! Feel free to loop back for more help.

I did an example import using your configuration but using n10s.rdf.import.fetch() & here is output:

Terminationstatus: "OK" | triplesLoaded: 8734522 | triplesParsed: 8734522
{
  "owl": "http://www.w3.org/2002/07/owl#",
  "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
  "ns0": "http://purl.org/dc/elements/1.1/",
  "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
  "ns2": "http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#",
  "ns1": "http://protege.stanford.edu/plugins/owl/protege#",
  "ns3": "http://www.geneontology.org/formats/oboInOwl#"
}

We can now see that triplesLoaded is equal to triplesParsed :slightly_smiling_face:

Best Regards,
Rob

Hi Rob,

Thanks for the reply, really appreciate it and yep that appears to have to done the trick!

Im going to take a closer look at the data, I may have some further questions down the line but for now thanks so much.

Regards

Chris