Max Number of relationships to a node - Best Modelling

pierrick.ganon · November 8, 2019, 7:12pm

We have a service that process a potential "match" between a user (Profile) and a Project (a paid opportunity)

In the graph we create relationship with a score property.

The number of relationships created to the :Project node could be 20k or more

Some stats about the data

50k projects
500k profiles (growing 1000 a day)

Right now we have the following model:

MATCH (p:Project{id:""})<-[r:MATCHES]-(pm:ProjectMatch)
MATCH (profile:Profile)-[:MATCH_PROJECT]->(pm:ProjectMatch)
RETURN profile
order by r.score desc

r contains the score between the project and the profile
ProjectMatch is a node created for each month and year for a specific profile

year: 2019
month: 8
profileId: ""

We've experienced slow queries i.e to get all matches ordered by score which made us rethink the model and to potentially simplify it to just:

MATCH (p:Project{id:""})<-[r:MATCHES]-(pm:ProjectMatch)
MATCH (profile:Profile)-[:MATCH_PROJECT]->(pm:ProjectMatch)
RETURN profile
order by r.score desc

I am finding the same number of dbhits or very similar between the 2 models. Any advice?

Which data model is "better" or is supposed to perform better? Is it scalable in the long run?

Query we run

Get all profiles returned by score desc
Get count of all matches
Get all profiles returned by score desc that haven't received an email

WHERE NOT ((profile)-[:HAS_EMAILS]->(:Emails)-[:SENT]->(:Email{projectId: ""}))

lju · November 11, 2019, 2:03pm

You may find introducing some of the elements from your ProjectMatch node into relationships from the Profile to ProjectMatch and incorporating them into your queries will help speed up your queries.

For example, let's say we take the properties year and month from ProjectMatch, and have something like:
(Profile)-[:MATCH_PROJECT_2019_11]->(ProjectMatch) and we use that specific relationship type, you can filter down the number of relationships that need to be traversed to get to ProjectMatch and Project. Of course this only works if you can be specific with dates, but it gives you an idea of how you can use more fine-grained relationship types to speed up queries.

I would recommend you have a look at the following for some modelling tips and tricks:

Modeling Airline Flights in Neo4j | Max De Marzi

pierrick.ganon · November 12, 2019, 3:12pm

Thanks @lju

That wouldn't work for us for example:

What if you're on November 1st and you need matches from last month (a day ago)? You would need to dynamically generate the 2 relationships names

MATCH_PROJECT_2019_10|MATCH_PROJECT_2019_11

Does the current modelling make sense though?

Should we just keep the structure and filter ProjectMatch by year and month?

(Profile)-[:MATCH_PROJECT]->(pm:ProjectMatch{year: 2019})
WHERE pm.month = 10 OR pm.month = 11

Topic		Replies	Views
Graph Data Modeling Question Modeling performance , neo4j-desktop , modeling , data-modeling	12	1247	May 4, 2021
Is Neo4j good enough for data models relationship oriented? Cypher	5	1227	April 26, 2019
Heavy relationships vs Node hops. Which is better? Modeling performance	5	3676	May 27, 2021
[Modeling question] Graph matching algorithm used Modeling	4	540	April 29, 2020
Comparing Mutual Connections Performance Issue Modeling performance , cypher	2	458	April 8, 2020

Max Number of relationships to a node - Best Modelling

Related topics