Return a Node from Neo4j DB to SDN with only a sub-set of properties (no Projections)

federico.buttieri · March 24, 2023, 8:20pm

Hello everyboy,
I'm having a problem with Neo4j (v. 5.3.0) and Spring Data Neo4j (v. 6.3.7) for which I have not found any solution here at the moment.
I'm working on a DB with more than 15 millions of nodes so I'm putting some effort to optimize my queries with such a large dataset.
While doing some query tuning with PROFILE, I've identified a potential bottleneck, during the ProduceResults step. My goal is to run a 5-hops exploration from a central node and right now, with a complex use case, I reach 16.5 millions DB hits with a duration in the order of minutes (not so good). That's because for each node and relationships in the paths explored, Neo4j reads and returns all the properties of these entities (each node can have up to 50 different properties!).
Since I'm only interested in a small subset of these properties, what I have in mind is to find a way to return a node (with its metadata) but with only some properties instead of all.

For simplicity, let's say that I want to run a query like this:

MATCH (p:Person)-[r:SUPERVISES]->(m:Person)
WHERE p.IDENTIFIER = '...'
RETURN p, collect(r), collect(m)

With SDN, I can get a Person object with a list of SUPERVISES relationships, each with a reference to the other Person related.

The reason I wrote no Projections in the title is because I don't think I can solve my problem with a simple Map Projection. If I write something like this:

RETURN p {.prop1, .prop2}, collect(r), collect(m {.prop1, .prop2})

the SDN object has the properties of my choice but no information about the relationships, because Neo4j is not returning a Node object but a list of properties.

Once I figured out the problem, I thought I solved it with queries like these:

RETURN {identity: id(p), labels: labels(p), properties: {prop1: p.prop1, prop2: p.prop2}} AS p,
collect(r), {identity: id(m), labels: labels(m), properties: {prop1: m.prop1, prop2: m.prop2}} AS m

or

RETURN p {__internalNeo4jId__: id(p), __nodeLabels__: labels(p), .prop1, .prop2},
collect(r), m {__internalNeo4jId__: id(m), __nodeLabels__: labels(m), .prop1, .prop2}

to match the results with the metadata Neo4j returns in the initial case. But, while I still get the properties of my choice in the SDN object, I can't get the relationships lists populated, which is essential for the application.

So so that brings me back to the question: it is possible to get the information of a node from Neo4j to SDN but only with a subset of properties?

P.S.: the reason I want to solve this problem on the SDN side is because the queries, like the one I showed you, run just fine and helped me to turn down that 16.5 millions DB hits to only 1.7 millions.

federico.buttieri · March 24, 2023, 8:27pm

For completeness, here are the Java classes I use (for the examples):

@Data
@Node("Person")
public class Person {
	@Relationship(type = "SUPERVISES", direction = Relationship.Direction.OUTGOING)
	private List<Supervises> peopleSupervised;

	@Id
	@Property("prop1")
	private String prop1;
	@Property("prop2")
	private String prop2;
}

@Data
@RelationshipProperties
public class Supervises {
	@TargetNode
	private Person relationshipWith;

	@Id
	@GeneratedValue
	private final Long generatedId;
	@Property("prop")
	private String prop;
}

public interface PersonRepository extends Neo4jRepository<Person, String> {
	@Query("...")
	List<Person> runQuery(String id);
}

gerrit.meier · March 27, 2023, 1:54pm

The underlying problem is that you are mixing maps, that look like nodes, and relationships.
Good that you already invested some time in reading the documentation or went through the generated queries. There are only a bits that needs to get changed in your custom query:

MATCH (p:Person) 
return p{__internalNeo4jId__: id(p), __nodeLabels__: labels(p), .name,
Person_SUPERVISES_Person:
	[(p)-[Person__relationship__Person:SUPERVISES]->(p_peopleSupervised:Person)|p_peopleSupervised{
		__internalNeo4jId__: id(p_peopleSupervised), .name, nodeLabels: labels(p_peopleSupervised), Person__relationship__Person}]
	}

This looks more complicated at first sight but is needed. I try to explain it line by line.

The MATCH pattern can of course contain the additional condition for the specific identifier.
This is the map representation that you already got
Now we add an additional field for the relationship, you want to return. The name has to be <from node primary label>_<relationshipType>_<to node primary label>.
Pattern matching for the relationship. Here again, the naming conventions are different but needed: <from node primary label>_relationship_<to node primary label>...into map comprehension....
All fields and extra information like id, labels as you have it done for the root node plus the name of the relationship from the pattern matching (4).

The names are needed because SDN has an internal contract between query generator and the later mapping based on this names. This makes it faster than evaluating the content of the returned records every time.

---- some thoughts ----
Since you are always expecting a list of only two link nodes, I would say that projections even might help here. You can reduce the amount of properties and relationships to the ones you need.
What makes it a little bit complicated for SDN to create queries like yours is the fact that you have a cyclic domain model.
In this case SDN will fall back to a data driven approach right now regardless of the projection. With data driven I mean, it will fetch the Person and try to follow each defined relationship in a separate query. In this way, the reduced properties will get ignored because we are returning all matching nodes in the end and not just a "sub map". I think we could improve here because the metadata of a projection (with a link to another Person projection that does not define any relationships) defines a limited horizon.

---- end of my thoughts ;) ----

federico.buttieri · April 3, 2023, 9:29pm

Hello Gerrit,
thank you very much for your response and sorry it took me some days to answer you. Last week I tried the approach you recommended and it worked! I was able to get nodes with the properties of my choice, with the references to relationships and nodes at the other end as well, according to the SDN model. For this reason I will mark your answer as a solution.
However, I still have two problems that I don't think I've been able to solve this way:

the first is that I was unable to apply filters to the relationship properties as well, whose high number is still an issue;
the second is that the query you suggested work only with relationships at cardinality 1; if I use the syntax [Person__relationship__Person:SUPERVISES*1..2] it no longer does its trick. As I need to explore relationships up to 5 hops, I should write 4 nested queries inside the first, each with a reference to the previous one and I don't think it's practical enough.

After some try-and-error, I decided to use another approach with a different model, much more similar with the one adopted by the front-end of my application (it uses Ogma Linkurious). I will post my solution in case it might be of help to someone.
Instead of returning a node with the references to its relationships, I return a plain list of all nodes in the paths (with the projections of the properties I'm interested in) plus another list of all relationships in the paths as well (for which I can use projections the same way). Basically I treat nodes and relationships as if they were the same type of entity. The query used is something like this:

MATCH path = (p:Person)<-[:SUPERVISES*1..5]-(:Person)
WHERE p.IDENTIFIER = '...'
UNWIND [n IN nodes(path) | n {
    identity: id(n),
    __nodeLabels__: ["<labelIWantToMarkNodes>"],
    .prop1, .prop2, ...
}] + [r IN relationships(path) | r {
    identity: id(r),
    __nodeLabels__: ["<labelIWantToMarkRelationships>"],
    sourceNode: id(startNode(r)),
    targetNode: id(endNode(r)),
    type: type(r),
    .relProp1, .relProp2, ...
}] AS entities
RETURN entities

Then, Spring-side I deal with distinguishing the two types using labels, with an abstract Entity class and two subclasses decorated with @Node annotation. This way I also avoid a mapping phase to send the data to my front-end (lucky for me). I was surprised by the performance of this query, as what used to takes minutes to do, now take a few seconds.
Of course this solution doesn't come without a price, that is now it's more complicated to navigate the path, but still doable. And I treat relationships as if they were nodes, which is not really in line with the SDN model. Nonetheless, this works for me!

Thanks again Gerrit for the help and to this opportunity to learn something new about Neo4j. Hope to meet you again in the comunity.
Cheers!

gerrit.meier · April 4, 2023, 7:23am

Thanks for the insights. The combination of relationships(path) / nodes(path) combined with the map projection is a really nice solution. Of course you (or at least I) have to wrap your head around this but it looks really elegant in the end.

Topic		Replies	Views
SDN + Spring Boot + Partial Response in REST API Spring Data Neo4j & Neo4j-OGM	4	2644	November 17, 2018
SDN6 return the relationship Spring Data Neo4j & Neo4j-OGM	1	263	March 21, 2022
SDN 6 - projecting Map<String, ClassNode> to Map<String, Long> Spring Data Neo4j & Neo4j-OGM migrated	0	169	December 13, 2022
SpringBoot Projection/@QueryResult not working Spring Data Neo4j & Neo4j-OGM	1	1777	March 25, 2021
SDN findAll performance and populating entities at depth > 1 Spring Data Neo4j & Neo4j-OGM sdn	12	1931	May 16, 2022

Return a Node from Neo4j DB to SDN with only a sub-set of properties (no Projections)

Related Topics