Performing high-level validation of transaction data before commit

Hi folks,

I am interested in building a graph with some fairly rigid constraints. Some of these can be handled by the standard constraint mechansim, but some will require analysis of select nodes/relationships within the transaction scope and in some cases across the graph and even outside the graph via external lookups.

The validation initially will be performed at an application layer just outside of the graph itself. However, I'm interested in eventually moving this validation into Neo4j so that we are able to eliminate the dependency on the application front end.

Based on my initial observations, this looks like a case for a custom procedure called from a trigger, or a custom TransactionEventListener (this will be a java project using the Neo4j-OGM).

Are there any other options out there for this kind of thing?

Also if this idea is making your skin crawl I'd definitely like to hear what the primary concerns are. Performance impact is pretty obviously a concern, I won't get into much detail on the use case, but updates will be few relative to reads (<100/day) and the overall graph will be relatively small. (<100k nodes). For maintenance i'm hoping to have a single validation procedure that can operate in both contexts by leveraging OGM.

Thanks

Automated data validation? Lots of ways to do that, as long as you carefully consider how to handle validation errors.

Were it me, I'd build my own plugin to handle it. (I'm almost done doing something very similar, which will be open-source July-ish). There's little you can't do with a plugin, though there's a bit of a learning curve.

The simplest, and which should address your use-case, is apoc.trigger.*:

Property enforcement

CALL apoc.trigger.add("forceStringType",
"UNWIND apoc.trigger.propertiesByKey({assignedNodeProperties}, 'reference') AS prop
CALL apoc.util.validate(apoc.meta.type(prop) <> 'STRING', 'expected string property type, got %s', [apoc.meta.type(prop)]) RETURN null",
{phase:'before'})

Adding timestamps

CALL apoc.trigger.add('timestamp','UNWIND {createdNodes} AS n SET n.ts = timestamp()');

Autogenerate UUID

CREATE CONSTRAINT ON (n:Label) ASSERT n.uuid IS UNIQUE;
CREATE INDEX ON :Label(uuid);
CALL apoc.trigger.add('generate-uuid',"UNWIND {createdNodes} AS n set n.uuid=apoc.create.uuid()", {phase:'after'});

Hey Tony, thanks for the reply!

Plugins are definitely on the menu. My question wasn't super clear, but here's a (contrived unfortunately) example. Imagine this schema:

Assume we have the following business rules for adding a router to a graph:

  • No router may be added to the graph without being located in an existing cabinet.
  • No router may be added with zero interfaces
  • No routers may share an interface
  • No router may be added without at least one interface that is not attached to a new or existing subnet.
  • No interface may be added with an IP address that matches another interface
  • No subnet may be added with a cidr range that overlaps or is contained within another subnet
  • All new interfaces must have a valid IP address
  • No interfaces may be added to a subnet that doesnt' contain their IP address

With the above, any 'add router to network' operation must meet the following business requirements:
(remember this is contrived so the business rationale for these rules might not exactly make sense)

  • create exactly one new :Router node
  • have one or more new :Interface nodes with a :ROUTES_VIA relationship back to :Router
  • has no :ROUTES_VIA relationships to existing :Interfaces (in other words all :Interfaces in the graph have exactly 1 incoming :ROUTES_VIA relationship)
  • all :Interface nodes have an :ATTACHED_TO relationship to exactly one existing or new :Subnet (no secondary addresses for the router nerds)
  • all :Interface.ip properties have a value that fits within the CIDR range of the attached :Subnet.cidr property
  • at least one :Interface has an :ATTACHED_TO relationship to an existing :Subnet
  • any new :Subnet.cidr properties must not overlap with any Subnet.cidr property in the graph.
  • any new :Interface.ip properties are unique across the graph

Again network folks can pick apart the example, my main point is that the rules we want to validate are a) rather broad in scope and b) already implemented in the application that's feeding the graph.

So my question is if we can create a plugin that can look at the collection of nodes that will be created just before the transaction commits to validate all of these rules. From what i'm seeing, it looks like we will be able to by looking at the transaction data in the 'beforeCommit' context.

Presumably if we do this as a plugin, we would write a @Procedure that takes the TransactionData and Context to do the above validation. It looks like we could also do this via TransactionEventListener, but i'm having a difficult time trying to find build and deployment instructions for those.

Does this make sense? Any thoughts on the above?

1 Like

Quick example 'add router' transaction:

:begin
match (c:Cabinet {location:"R6"})
create p1=(r:Router { serialNo:"78382", model: "asr9901", rackUnits:2, vendor: "Cisco" })-[:IS_IN]->(c)
with r
match (s1:Subnet {cidr: "10.4.0.0/26"})
create p2=(r)-[:ROUTES_VIA]->(i1:Interface{ip: "10.4.0.8"})-[:ATTACHED_TO]->(s1)
create p3=(r)-[:ROUTES_VIA]->(i2:Interface{ip: "10.4.8.1"})-[:ATTACHED_TO]->(s2:Subnet{cidr:"10.4.8.0/24"});
:commit

Graph after a few of them

1 Like

Without spending a couple hours carefully analyzing your rules and requirements, it looks like this could be accomplished with a beforeCommit trigger.

However, considering the complexity involved, I'd probably have a plugin provide a an API POST uri, so that your system can simply POST a request to your plugin, and you can put the graph-data logic in Java code. That might be easier to maintain as rules and requirements are added/changed.

I'd start with beforeCommit, as id should only take a day or two to get that doing what you need it to. A custom-api plugin will take closer to a month.

Get this thing running, and you'll be able to automate network security! With a real representation of the network, instead of giant abstract tables!

Honestly, that's pretty cool. :)

Great suggestions thank you!!!

One quick question:

With this are you talking about a trigger running a Procedure in the 'before' phase, or a TransactionEventListener with beforeCommit() implemented? I think they are basically equivalent but am not sure. I have been able to start tinkering with the Procedure, but i haven't found an example of a custom TransactionEventListener implementation beyond the basic guts of a class.

Edit: Nevermind, this is an excellent example of a TransactionEventListener - GitHub - maxdemarzi/neo_listens: Sample Event Listener / Triggers

You've got good intuition. :)

I was referring to apoc triggers:
http://neo4j-contrib.github.io/neo4j-apoc-procedures/3.5/operational/triggers/

Got it, thank you Tony!