I am new to Neo4j, graph databases and well...databases in general. I am trying to return some results from a query in my database but no luck as of yet. Here is my use case:
I'm doing some IAM analysis looking at users on a platform and what groups they are apart of. The data set I have is 300K nodes (users and groups) with 10M relationships. What I need is an output of all user nodes with all group nodes they are related to. Below is the query I wrote:
// Gather all user group membership
MATCH (u:User)
WITH u.dn AS n
MATCH (u {dn:n})-[`*]->(g:Group)
RETURN DISTINCT u.dn, g.dn
Line 1 - Gather all Users
Line 2 - Make the variable n with user node's property "DN". DN is the unique name for every user node
Line 3 - Find all users and their group memberships. Use * in the link relationship to account for any unknown depth of membership. Ex: User is a member of a group, which that group is a member of another group, which that second group is a member of third group, etc...
Line 4 - Return to me the unique names of all the users and groups (I don't need a graph at this point, just a table).
When I run this query, the neo4j browser crashes. It is my understanding from research that neo4j isn't meant to handle returning millions of rows. My current memory heap is 2 GB and pagefile of 1 GB. So as a noob here, what are my options. Is there a better way to query this? Thanks!
Hi, as Dana said, the browser visualization doesn't handle rendering that much data well. However, there are a couple of things you can do to trim some of the rendering computation.
You don't need the initial MATCH on all users. You can simply specify the pattern below. Your current statement will select all users in the graph, then pass all those to the next part of the query (finding groups). Instead, you can specify the whole pattern in the first line so Cypher doesn't have to work as hard and can filter out any unnecessary parts of the pattern early on.
MATCH (u:User)-[`*]->(g:Group)
You can specify a limited number of hops. For instance, if people are part of groups which are part of other groups, you can only traverse up to 10 or 20 hops. Not sure how valuable a 30th subgroup would be for your use case (or if subgroups that deep even exist in your data).
MATCH (u:User)-[*..10]->(g:Group)
You can also aggregate by user, so you can see all the groups by each user. This will trim the number of rows returning for rendering. I also think it might be easier to read. :)
MATCH (u:User)-[*..10]->(g:Group)
WITH u.name as userName, collect(g.name) as groups
RETURN userName, groups
I will explore this option. I would honestly love to use a shell over the browser hands down. Is there a way to export the results to a CSV? I tried the APOC export csv and it was not working like I expected.
Thanks! Again, I'm a noob when it comes to this. The only thing that I need is the * for the link as number is always going to be unknown. I plan on reusing this code for other platforms and situations. I agree that in an ideal world that users shouldn't be nested 30 groups deep but I want to make sure I can account for that.
I am running the code right now and I will let you know how it works out. Thanks again
It was taking hours to write a couple of kilobytes to the CSV. basically I took the provided code and wrapped the APOC command around it. I maybe didn't write the syntax right.
i have the same problem with crash when display 10k relations.
when i use apoc.gephi.add as follows:
MATCH path = (:用户)-[:回帖]->(:用户)
CALL apoc.gephi.add(null,'workspace1',path) yield nodes
return *
but return error as follows:
Neo.ClientError.Procedure.ProcedureCallFailed: Failed to invoke procedure apoc.gephi.add: Caused by: java.lang.RuntimeException: Can't read url or key http://localhost:8080/workspace1?operation=updateGraph as json: Connection refused: connect
@weilong that looks like a different error. the error indicates it cant read http://localhost:8080/workspace1?operation=updateGraph and is not related to displaying millions of nodes in the browser
Thanks everyone for the help on this one. I found by using UNWIND and the APOC command to work the best. I really thank for everyone's involvement in helping me figure this out!