Performing high-level validation of transaction data before commit

bobrich · May 13, 2020, 2:55pm

Hi folks,

I am interested in building a graph with some fairly rigid constraints. Some of these can be handled by the standard constraint mechansim, but some will require analysis of select nodes/relationships within the transaction scope and in some cases across the graph and even outside the graph via external lookups.

The validation initially will be performed at an application layer just outside of the graph itself. However, I'm interested in eventually moving this validation into Neo4j so that we are able to eliminate the dependency on the application front end.

Based on my initial observations, this looks like a case for a custom procedure called from a trigger, or a custom TransactionEventListener (this will be a java project using the Neo4j-OGM).

Are there any other options out there for this kind of thing?

Also if this idea is making your skin crawl I'd definitely like to hear what the primary concerns are. Performance impact is pretty obviously a concern, I won't get into much detail on the use case, but updates will be few relative to reads (<100/day) and the overall graph will be relatively small. (<100k nodes). For maintenance i'm hoping to have a single validation procedure that can operate in both contexts by leveraging OGM.

Thanks

tony.chiboucas · May 13, 2020, 9:24pm

Automated data validation? Lots of ways to do that, as long as you carefully consider how to handle validation errors.

Were it me, I'd build my own plugin to handle it. (I'm almost done doing something very similar, which will be open-source July-ish). There's little you can't do with a plugin, though there's a bit of a learning curve.

The simplest, and which should address your use-case, is apoc.trigger.*:

Property enforcement

CALL apoc.trigger.add("forceStringType",
"UNWIND apoc.trigger.propertiesByKey({assignedNodeProperties}, 'reference') AS prop
CALL apoc.util.validate(apoc.meta.type(prop) <> 'STRING', 'expected string property type, got %s', [apoc.meta.type(prop)]) RETURN null",
{phase:'before'})

Adding timestamps

CALL apoc.trigger.add('timestamp','UNWIND {createdNodes} AS n SET n.ts = timestamp()');

Autogenerate UUID

CREATE CONSTRAINT ON (n:Label) ASSERT n.uuid IS UNIQUE;
CREATE INDEX ON :Label(uuid);
CALL apoc.trigger.add('generate-uuid',"UNWIND {createdNodes} AS n set n.uuid=apoc.create.uuid()", {phase:'after'});

bobrich · May 14, 2020, 2:06pm

Hey Tony, thanks for the reply!

Plugins are definitely on the menu. My question wasn't super clear, but here's a (contrived unfortunately) example. Imagine this schema:

Assume we have the following business rules for adding a router to a graph:

No router may be added to the graph without being located in an existing cabinet.
No router may be added with zero interfaces
No routers may share an interface
No router may be added without at least one interface that is not attached to a new or existing subnet.
No interface may be added with an IP address that matches another interface
No subnet may be added with a cidr range that overlaps or is contained within another subnet
All new interfaces must have a valid IP address
No interfaces may be added to a subnet that doesnt' contain their IP address

With the above, any 'add router to network' operation must meet the following business requirements:
(remember this is contrived so the business rationale for these rules might not exactly make sense)

create exactly one new :Router node
have one or more new :Interface nodes with a :ROUTES_VIA relationship back to :Router
has no :ROUTES_VIA relationships to existing :Interfaces (in other words all :Interfaces in the graph have exactly 1 incoming :ROUTES_VIA relationship)
all :Interface nodes have an :ATTACHED_TO relationship to exactly one existing or new :Subnet (no secondary addresses for the router nerds)
all :Interface.ip properties have a value that fits within the CIDR range of the attached :Subnet.cidr property
at least one :Interface has an :ATTACHED_TO relationship to an existing :Subnet
any new :Subnet.cidr properties must not overlap with any Subnet.cidr property in the graph.
any new :Interface.ip properties are unique across the graph

Again network folks can pick apart the example, my main point is that the rules we want to validate are a) rather broad in scope and b) already implemented in the application that's feeding the graph.

So my question is if we can create a plugin that can look at the collection of nodes that will be created just before the transaction commits to validate all of these rules. From what i'm seeing, it looks like we will be able to by looking at the transaction data in the 'beforeCommit' context.

Presumably if we do this as a plugin, we would write a @Procedure that takes the TransactionData and Context to do the above validation. It looks like we could also do this via TransactionEventListener, but i'm having a difficult time trying to find build and deployment instructions for those.

Does this make sense? Any thoughts on the above?

bobrich · May 14, 2020, 2:19pm

Quick example 'add router' transaction:

:begin
match (c:Cabinet {location:"R6"})
create p1=(r:Router { serialNo:"78382", model: "asr9901", rackUnits:2, vendor: "Cisco" })-[:IS_IN]->(c)
with r
match (s1:Subnet {cidr: "10.4.0.0/26"})
create p2=(r)-[:ROUTES_VIA]->(i1:Interface{ip: "10.4.0.8"})-[:ATTACHED_TO]->(s1)
create p3=(r)-[:ROUTES_VIA]->(i2:Interface{ip: "10.4.8.1"})-[:ATTACHED_TO]->(s2:Subnet{cidr:"10.4.8.0/24"});
:commit

Graph after a few of them

tony.chiboucas · May 14, 2020, 7:19pm

Without spending a couple hours carefully analyzing your rules and requirements, it looks like this could be accomplished with a beforeCommit trigger.

However, considering the complexity involved, I'd probably have a plugin provide a an API POST uri, so that your system can simply POST a request to your plugin, and you can put the graph-data logic in Java code. That might be easier to maintain as rules and requirements are added/changed.

I'd start with beforeCommit, as id should only take a day or two to get that doing what you need it to. A custom-api plugin will take closer to a month.

tony.chiboucas · May 14, 2020, 7:21pm

Get this thing running, and you'll be able to automate network security! With a real representation of the network, instead of giant abstract tables!

Honestly, that's pretty cool. :)

bobrich · May 15, 2020, 5:54am

Great suggestions thank you!!!

One quick question:

With this are you talking about a trigger running a Procedure in the 'before' phase, or a TransactionEventListener with beforeCommit() implemented? I think they are basically equivalent but am not sure. I have been able to start tinkering with the Procedure, but i haven't found an example of a custom TransactionEventListener implementation beyond the basic guts of a class.

Edit: Nevermind, this is an excellent example of a TransactionEventListener - GitHub - maxdemarzi/neo_listens: Sample Event Listener / Triggers

You've got good intuition. :)

tony.chiboucas · May 18, 2020, 5:19pm

I was referring to apoc triggers:
http://neo4j-contrib.github.io/neo4j-apoc-procedures/3.5/operational/triggers/

bobrich · May 18, 2020, 5:28pm

Got it, thank you Tony!

Topic		Replies	Views
Go Driver: How to "catch" transaction errors thrown by apoc.util.validate? Drivers & Stacks apoc , migrated , drivers , errors , exception , exception-handling , go-tagged	3	249	June 27, 2023
User Defined Procedures Procedures & APOC apoc , cypher	1	273	May 7, 2021
Using APOC Procedures on an Embedded Database from Java Procedures & APOC	7	2325	December 10, 2020
Understanding Transactions in user-defined procedure/function Procedures & APOC transaction	0	980	April 23, 2019
Can I get the list of changed nodes/relationships in the transaction without committing? Procedures & APOC	3	795	October 3, 2020

Performing high-level validation of transaction data before commit

Property enforcement

Adding timestamps

Autogenerate UUID

Related topics