Help optimize query

mr_rustbot · May 17, 2021, 8:53pm

I am using Neo4j for analyzing LDAP servers, and would appreciate some help / feedback on the data model as well as the query for building relationships.

Currently, I am creating nodes for each LDAP entry with a label equal to its objectClass attribute (e.g. organizationalUnit, user, group, etc.).

The problem is, I want to create relationships between nodes based on the RDN (e.g. (OU=Users,DN=foo)-[:CONTAINS]->(CN=Bob,OU=Users,DN=foo). So far I have this:

EXPLAIN MATCH (child)
WHERE child
WITH
    child,
    substring(reduce(parent_dn = "", rdn IN tail(split(child.dn, ",")) | parent_dn + "," + rdn), 1) as parent_dn
MATCH (parent {dn: parent_dn})
MERGE (parent)-[:CONTAINS]->(child)
RETURN count(parent);

Which is obviously not good since it doesn't specify labels so doesn't use indices.

I'm stuck on how I should optimize this because of that. Should I not use objectClass as a label since I have maybe 20 unique values? Should I just use one label for everything?

I originally thought it would be good to use objectClass as a label since Neo4j would color them differently and allow me to differentiate between types.

david_allen · May 17, 2021, 9:07pm

I'm far from an LDAP master but whenever you have embedded strings like this, your data model probably isn't right. You shouldn't ever need to parse text in cypher as you're doing right now. This is a strong indication that what you need is 3 different properties and possibly label types.

For example, what you reference as (CN=Bob,OU=Users,DN=foo), you could consider modeling this as:

(c:CN { id: "Bob" }), (o:OU { id: 'Users' }), (d:DN { id: 'foo' }), (entry)-[:REF]->(c), (entry)-[:REF]->(o), (entry)-[:REF]->(d)

In other words, if you can do that text parsing, do it once upfront when you load the model, and then never again

ameyasoft · May 18, 2021, 6:05am

Hi David,

This is exactly the right approach. I want to emphasize the fact that the underlying benefit of this approach is 'Scalability'.

mr_rustbot · May 18, 2021, 12:38pm

@david_allen @ameyasoft The entire DN (CN=Bob,OU=Users,DN=foo) is actually what is the "ID" in the sense that CN=Bob,OU=Users,DN=foo should be unique across all types of nodes (not just CN), but "Bob" isn't necessarily unique across anything, even CN. There could be CN=Bob,OU=Users,DN=foo as well as CN=Bob,OU=SomethingElse,DN=foo. Are you mainly just suggesting to use CN / OU / DN as node labels, and I could do (c:CN { dn: 'CN=Bob,OU=Users,DN=foo' }), (o:OU { dn: 'OU=Users,DN=foo' }), (d:DN { dn: 'DN=foo' }), (entry)-[:REF]->(c), (entry)-[:REF]->(o), (entry)-[:REF]->(d) so they can be indexed by dn (formerly id) / guaranteed to be unique? Or is there a benefit to keeping (c:CN { id: 'Bob' }) instead of the whole DN?

Also, regarding properties, I'm not sure how I would store them as such. The DN (the entire string) is kind of like a URL / path in the sense that the order matters. I may have OU=foo and OU=bar for one node, but OU=foo,OU=bar is totally different from OU=bar,OU=foo. Think of it like components to a URL.

The relationships which would be created are really related to substrings. CN=Bob,OU=Users,DN=foo is part of OU=Users,DN=foo which is part of DN=foo.

Topic		Replies	Views
Graph Data Modeling Question Modeling performance , neo4j-desktop , modeling , data-modeling	12	1247	May 4, 2021
Specific relationship vs Label Neo4j Graph Platform migrated	3	252	August 26, 2022
Writing out performant queries for n-relationship queries on a single node where n > 2 Cypher	20	3545	October 14, 2019
Filtering by Relationship Type - Contains Neo4j Graph Platform performance , cypher , modeling , data-modeling	5	2253	May 9, 2020
Creating relationship over several millions of nodes Cypher apoc , performance , cypher , relationship	23	2902	September 24, 2020

Demystifying Neo4j UX Research

Help optimize query

Related topics