Performance Expectations - 55m nodes, 3000m relationships

phil · May 25, 2022, 11:57pm

Hey folks -
I'm new to graph databases and I'm trying to get an overall sense of performance and scalability and the knobs I can turn to speed things up. I have a model (image attached). I've indexed properties appropriately but I'm finding traversals joining across nodes and relationships are slow. Here's an example query:

	MATCH (cu:User)-[:IS]->(cc:Contact)<-[h:HAS]-(u:User)-[:IS]->(c:Contact)
	MATCH (cu)-[ph:HAS]->(pc:Contact)-[w:WORKS_FOR]->(co:Company)
	WHERE 
		co.domain = $domain and u.username = $username and
		not u = cu and not pc = c and not pc = cc 

	RETURN *, h.strength + ph.strength as rstrength
	LIMIT $limit

Is there a way to do the SQL equivalent to do compound indexes so these types of queries are faster?

With 20GB of frame 9GB allocated to page cache, results are still taking many minutes. PROFILE seems to most dislike these:

As you get to systems with 100-millions of nodes and relationships, what are typically strategies to provide good response times even with low concurrency.

bennu_neo · May 26, 2022, 11:19am

Hi @phil !

Just a couple of questions to better understand your model.

What's the actual difference between User and Contact?
2 What's the actual difference between Has and Is?

About your question, yes, composite indexes do exists on neo4j, and they have to be applied on the same Label.

Can you eventually send the entire profile of our query?

Bennu

phil · May 26, 2022, 3:58pm

What's the actual difference between User and Contact?

Users have logins (OAuth tokens, and permissions and privileges).
Contacts are people who users know.

2 Users can share a contact with a HAS relationship.

The IS relationship is unique between User and Contact.

Imagine you're using Google Contacts, you're the User, the people in your Contacts are the Contacts.

2 What's the actual difference between Has and Is?
The IS relationship is the contact information for the user. The Has relationship says that the user knows the Contact.

Following on from above, every user also has contact information (title, email, etc.) for themselves. If 2 users share the same contact, there is only one Contact record. In the base case:

User has an IS and a HAS relationship to their own contact record, and additionally HAS relationship to Contacts for people they know.

The query have answers the question, do I know any users who know someone (who I don't know), that works at a company that has a $domain.
I am the $username. The other user us cu. The potential contact is pc.

What's the best way to get the full query profile? I can't seem to get the UI's to output something that fits on a page?

Can you give an specific example of how I'd construct an index that includes both nodes and relationships?

I'm already doing this:
CREATE INDEX userRelationshipProperty FOR ()-[r:HAS]-() ON (r.strength);

glilienfield · May 26, 2022, 8:30pm

Do the user and contact have the same properties? Can you eliminate the contacts and just have users? You can have relationship HAS_CONTACT back to a user for a contact. You could add a label to the User node to indicate those users that also have login privileges.

Just a thought. Maybe it doesn’t work for your application.

Topic		Replies	Views
Query relationship poor performance Neo4j Graph Platform migrated	13	122	September 28, 2022
2nd degree relations lookup seems very slow. Can I improve it somehow? Cypher performance , cypher	18	142	July 1, 2025
Querying relationships slow performance Cypher performance , cypher , relationship	4	2058	October 15, 2020
How to proper think about indexes and relationships Cypher performance	12	6196	October 29, 2018
Indexing and Efficiently Filtering Based on Relationships in Neo4j Cypher cypher	7	54	August 29, 2024

July Summer Fun!

Performance Expectations - 55m nodes, 3000m relationships

Related topics