Modelling open statistical data using Neo4j

Hello everyone : )
I am curious about the steps involved in creating a knowledge graph using open statistical data. My research has brought me to the conclusion that the RDF Data Cube Vocabulary (W3C) is the way to go for modelling statistical data (e.g. population counts, averages etc.), so my question is the best way to go about this with Neo4j.
Am I right in thinking that the below steps would be along the right lines, or am I completely off the mark? Any help/advice greatly appreciated!

  1. Create an ontology based on the RDF Data Cube Vocabulary using Protégé
  2. Import the ontology in to Neo4j
  3. Import the statistical data (multiple sources)
  4. Analyze the data (visualisations, pattern finding, ML etc.)

Many thanks!


Hi @Shin what you describe sounds like a reasonable general approach but I must admit I'm not super familiar with the Data Cube Voc. and I'm not finding it particularly easys to find examples.

Do you have a dataset at hand that I could have a look at in case I find any specific recommendation for this type of data?



Hi JB,

Yes that would be great thank you!
The original files are csv exports from the National Cancer Registry of Ireland Incidence statistics | National Cancer Registry Ireland which I have attached below in txt format.
I have also provided the links to 3 good research papers I have read that provide more info on the RDF Data Cube Vocabulary.

Any recommendations would be greatly appreciated : )

All_female_cancers_by_county_region.txt (42.4 KB)
All_cancers_by_sex_year.txt (3.2 KB)
All_female_cancers_by_age.txt (11.4 KB)