Querying a large graphDb

wumirose · June 10, 2022, 10:50am

Hi, great minds! I am new to neo4j and currently exploring an existing graph to extract data for downstream tasks.

I would like to get all pairs of nodes and their relationship from the graph.

MATCH (n)-[r]-(n1) WHERE n<>n1 AND n1>n RETURN *

This will return about 12,726,288 estimated rows.

Instead, I decided to extract the pairwise information between 2 node types

MATCH (n:Node{type:nodetypeA})-[r]-(n1: Node{type:nodetypeB}) WHERE n<>n1 AND id(n)<id(n1) RETURN *

with 653,022 estimated rows; sadly, neo4j has timeout continuously. I have increased the connection timeout (ms) through the neo4j browser, yet nothing works differently.

Any suggestion will be highly appreciated.

bennu_neo · June 10, 2022, 2:09pm

Hello @wumirose !

Why do you think this may be a timeout problem? This may be a Desktop OOM rendering problem. You may be asking for too much info to be displayed. Have you try with a driver of your preference? My personally, I have used SDN6 with Webflux without problems.

Bennu

wumirose · June 12, 2022, 11:20pm

I agree; probably the 653,022 rows are too much to extract.

Thanks a lot for your suggestion about SDN6 and Webflux, It's my first time learning a bit about reactive programming. However, it appears the reactive clients provide no support for Python, and I currently run my queries in Python Environment and connect to neo4j with py2neo.

Any further suggestions will be greatly appreciated.

bennu_neo · June 13, 2022, 10:30pm

Hi @wumirose

Have you tried with https://neo4j.com/docs/api/python-driver/current/api.html#graphdatabase ? This one should work as stream AFAIK.

Try something like

from neo4j import GraphDatabase

user = "youUsername"
password = "yourPassword"
uri = "yourUri"
driver = GraphDatabase.driver(uri, auth=(user,password))
with driver.session() as session:
    result = session.run("MATCH (n:Node{type:nodetypeA})-[r]-(n1: Node{type:nodetypeB}) WHERE n<>n1 AND id(n)<id(n1) RETURN n as node1, n1 as node2, r as rel")
    for record in result:
             print("node1 {}".format(record["node1"]))

Lemme know how it goes

wumirose · June 14, 2022, 3:06pm

The API does the trick! I'm so happy right now.

Thank you so much @bennu_neo, for your help. It means a lot!

bennu_neo · June 14, 2022, 4:22pm

@wumirose you are welcome! Enjoy it!

wumirose · June 16, 2022, 7:18pm

I noticed that the query gives only the first-order connection between 2 nodes; however, I will need at least the second-order relationship for my downstream application. I have tried:

MATCH (n)-[r*1..2]-(n1) 
WHERE n<>n1 AND id(n)<id(n1) 
WITH n.name as Name1, n1.name as Name2, r AS rel
UNWIND rel AS rl
RETURN Name1, Name2, Id, rl.id AS relId

which estimated about 2 million rows.

I would like to skip some rows so I can eventually end up with less than 1 million (~500,000). I have played with a few other queries like SKIP and LIMIT, but I can't seem to get a helpful result.

Your suggestions will be greatly appreciated.

bennu_neo · June 17, 2022, 1:31pm

Hi @wumirose !

This is technically another question, but let's do it

In general, I don't agree with this whole db export stream but if it works for you. It's fine. Can you try a query like?

MATCH p = (n)-[*1..2]-(n1) 
WHERE id(n)<id(n1) 
WITH n.name as Name1, n1.name as Name2, relationships(p) as rel
SKIP 10
LIMIT 10
UNWIND rel AS rl
RETURN Name1, Name2, rl.id AS relId

Keep in mind that limit and skip will apply on the *WITH* step.

wumirose · June 17, 2022, 2:45pm

Actually, the match is between 2 node types, not the whole db😉.

MATCH p = (n:Node{type: 'typeA')-[*1..2]-(n1: Node(type: 'typeB')

This is so helpful! I haven't explored SKIP and LIMIT use before RETURN. Thanks a bunch for that.

Topic		Replies	Views
Hola, Graph DBers! Introduce-Yourself	2	178	February 29, 2024
Displaying millions of lines of results Newbie Questions	9	2758	June 22, 2019
Nodejs neo4j-driver Trying to read and return result set of over 4m records Drivers & Stacks	15	3858	August 11, 2020
Optimizing simple queries for very large graph DB Cypher performance	12	1372	February 29, 2024
Trying to visualize the entire graph General	5	3304	November 22, 2021

Get Certified in June!

Querying a large graphDb

Related topics