Performance Expectations - 55m nodes, 3000m relationships

Hey folks -
I'm new to graph databases and I'm trying to get an overall sense of performance and scalability and the knobs I can turn to speed things up. I have a model (image attached). I've indexed properties appropriately but I'm finding traversals joining across nodes and relationships are slow. Here's an example query:

	MATCH (cu:User)-[:IS]->(cc:Contact)<-[h:HAS]-(u:User)-[:IS]->(c:Contact)
	MATCH (cu)-[ph:HAS]->(pc:Contact)-[w:WORKS_FOR]->(co:Company)
	WHERE 
		co.domain = $domain and u.username = $username and
		not u = cu and not pc = c and not pc = cc 

	RETURN *, h.strength + ph.strength as rstrength
	LIMIT $limit		

Is there a way to do the SQL equivalent to do compound indexes so these types of queries are faster?

With 20GB of frame 9GB allocated to page cache, results are still taking many minutes. PROFILE seems to most dislike these:
image

As you get to systems with 100-millions of nodes and relationships, what are typically strategies to provide good response times even with low concurrency.

Hi @phil !

Just a couple of questions to better understand your model.

  1. What's the actual difference between User and Contact?
    2 What's the actual difference between Has and Is?

About your question, yes, composite indexes do exists on neo4j, and they have to be applied on the same Label.

Can you eventually send the entire profile of our query?

Bennu

  1. What's the actual difference between User and Contact?
  • Users have logins (OAuth tokens, and permissions and privileges).
  • Contacts are people who users know.

2 Users can share a contact with a HAS relationship.

The IS relationship is unique between User and Contact.

Imagine you're using Google Contacts, you're the User, the people in your Contacts are the Contacts.

2 What's the actual difference between Has and Is?
The IS relationship is the contact information for the user. The Has relationship says that the user knows the Contact.

Following on from above, every user also has contact information (title, email, etc.) for themselves. If 2 users share the same contact, there is only one Contact record. In the base case:

User has an IS and a HAS relationship to their own contact record, and additionally HAS relationship to Contacts for people they know.

The query have answers the question, do I know any users who know someone (who I don't know), that works at a company that has a $domain.
I am the $username. The other user us cu. The potential contact is pc.

What's the best way to get the full query profile? I can't seem to get the UI's to output something that fits on a page?

Can you give an specific example of how I'd construct an index that includes both nodes and relationships?

I'm already doing this:
CREATE INDEX userRelationshipProperty FOR ()-[r:HAS]-() ON (r.strength);

Do the user and contact have the same properties? Can you eliminate the contacts and just have users? You can have relationship HAS_CONTACT back to a user for a contact. You could add a label to the User node to indicate those users that also have login privileges.

Just a thought. Maybe it doesn’t work for your application.

1 Like