Neosemantic validations not scaling

n10s version: 4.4.0

I have shacl validations setup through the call:
CALL apoc.trigger.add("shacl-validate", "call n10s.validation.shacl.validateTransaction($createdNodes, $createdRelationships, $assignedLabels, {}, $assignedNodeProperties, {}, $deletedRelationships, $deletedNodes)", { phase: "before" });

The shacl file is pretty basic. It validates each property usually for datatype, min/max value, and validations relationships. When trying to insert nodes, this is extremely fast when the node count isn't very large. When there are say 5 million nodes, the schema validations take an incredible long time (over a minute). If I am not mistaken, this validation is supposed to be specific for a transaction and not the entire database so I am unsure why it's not scaling with the number of nodes. Any suggestions would be greatly appreciated.

This is a subset of the shacl file as an example

with '
@prefix sh:      <http://www.w3.org/ns/shacl#> .
@prefix xsd:     <http://www.w3.org/2001/XMLSchema#> .
@prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix neovoc:  <neo4j://vocabulary#> .
@prefix n4sch:   <neo4j://graph.schema#> .

# Node Shapes

neovoc:PersonShape
    a sh:NodeShape ;
    sh:ignoredProperties ( rdf:type ) ;
    sh:targetClass n4sch:Person;
    sh:property neovoc:IdProperty ;
    sh:property neovoc:NotRequiredExternalIdProperty ;
    sh:property neovoc:CreatedAtProperty ;
    sh:property neovoc:CreatedByProperty ;
    sh:property neovoc:UpdatedAtProperty ;
    sh:property neovoc:UpdatedByProperty .

neovoc:IdProperty
    a sh:PropertyShape ;
    sh:path n4sch:Id ;
    sh:pattern "[0-9a-fA-F]{8}\\-[0-9a-fA-F]{4}\\-[0-9a-fA-F]{4}\\-[0-9a-fA-F]{4}\\-[0-9a-fA-F]{12}" ;
    sh:minCount 1 ;
    sh:maxCount 1 ;
    sh:datatype xsd:string .

neovoc:NotRequiredExternalIdProperty
    a sh:PropertyShape ;
    sh:path n4sch:ExternalId ;
    sh:datatype xsd:string .

neovoc:CreatedAtProperty
  a sh:PropertyShape ;
  sh:path n4sch:CreatedAt ;
  sh:minCount 1 ;
  sh:maxCount 1 ;
  sh:datatype xsd:dateTime .

neovoc:CreatedByProperty
  a sh:PropertyShape ;
  sh:path n4sch:CreatedBy ;
  sh:minCount 1 ;
  sh:maxCount 1 ;
  sh:datatype xsd:string .

neovoc:UpdatedAtProperty
  a sh:PropertyShape ;
  sh:path n4sch:UpdatedAt ;
  sh:datatype xsd:dateTime .

neovoc:UpdatedByProperty
  a sh:PropertyShape ;
  sh:path n4sch:UpdatedBy ;
  sh:datatype xsd:string .
' as shacl

Sorry for the late response @MadGraphs ,
So you're having the validations applied transactionally, correct?
Could you confirm that you have set up the trigger through apoc.trigger.add and you're just writing to the graph using cypher and you're noticing a degradation in the performance of the validations as the size of the graph grows.

You're right and the validation should only apply to the elements modified by the transaction so unless it's a large transaction there's no reason (at least not apparent) for the degradation.

Is there a way for us to get a sample of the graph you're working on in order to try and reproduce that behavior?

Thanks,

JB.

@jesus_barrasa

Yes the validations should be applied transactionally and are set up through apoc.trigger.add.

The command we use:

CALL apoc.trigger.add("shacl-validate", "call n10s.validation.shacl.validateTransaction($createdNodes, $createdRelationships, $assignedLabels, {}, $assignedNodeProperties, {}, $deletedRelationships, $deletedNodes)", { phase: "before" });

The data in the graph is currently not public data so I need to ask to see if there is anything I can do about it. I wonder if I can create a very basic graph to reproduce this issue.

Thanks @MadGraphs !
synthetic data would be fine.
It does not have to be the real one as long as it helps us reproducing the behaviour you're seeing.

JB.

@jesus_barrasa I will try to come up with something as soon as I have the time. How would you like me to send you the data?

Whatever works for you, You can put it on github and share a link here or just mail me directly at jesus at neo4j dot com

Hey @MadGraphs , any luck getting hold of the data?
Cheers,

JB.

@jesus_barrasa it's not the most pressing priority right now, but I will let you know when it's done.

1 Like

Was able to get it done and sent it to your email.