Neo4j - Relationship between the node count is not matched with the data

babu_ganesh0708 · October 17, 2018, 8:13am

Hi All,

Neo4j version 3.4.1 community

I am tring to build the graph to find the most connected user for this specific topic and below is the simple query I have tried to find the user,

MATCH (p1:Username)-[:interest]->(p3:Topic)<-[:interest]-(p2:Username) WHERE(p3.topic STARTS WITH 'dc02 c drive high utilization')
RETURN *

Query response,

Below is the sample I have indexed it,

date_time,keywords,message_subject,recipient_address,sender_address
2018-09-10T17:36:48.823Z,dc02 c drive high utilization,PDX-VWIN-DC02 C drive High utilization,gavin.debeer@test.com,NarendraChoudaryB@test.com
2018-09-10T17:37:48.823Z,dc02 c drive high utilization,PDX-VWIN-DC02 C drive High utilization,Lisa.Sorenson@test.com,NarendraChoudaryB@test.com
2018-09-10T17:38:48.823Z,dc02 c drive high utilization,PDX-VWIN-DC02 C drive High utilization,PrathamK@test.com,NarendraChoudaryB@test.com
2018-09-10T17:39:48.823Z,dc02 c drive high utilization,PDX-VWIN-DC02 C drive High utilization,ArunR@test.com,NarendraChoudaryB@test.com
2018-09-10T17:40:48.823Z,dc02 c drive high utilization,PDX-VWIN-DC02 C drive High utilization,RMCTeam@test.com,NarendraChoudaryB@test.com
2018-09-10T17:41:48.823Z,dc02 c drive high utilization,PDX-VWIN-DC02 C drive High utilization,mike.blasberg@test.com,NarendraChoudaryB@test.com
2018-09-10T17:51:06.798Z,dc02 c drive high utilization,RE: PDX-VWIN-DC02 C drive High utilization,Lisa.Sorenson@test.com,GauravSi1@test.com

If you check the data in the sender address the user "Narendra" has sent mail to different users for 6 times but in the graph the relationship "interest' is showing only 5 times.

I would like to know why it is showing like that? Please correct me if I am doing anything wrong.

Regards,
Ganeshbabu R

babu_ganesh0708 · October 17, 2018, 10:52am

Also I want add one more thing here,

We have tried by adding different sender & receiver address instead of above data and in that case we are able to see 6 interest relationships for narendra in the graph and below is the response

I am stuck in understanding these relationships and kindly clarify with your thoughts and it will be really helpful.

Thanks,
Ganeshbabu R

andrew_bowman · October 17, 2018, 5:52pm

Can you provide the Cypher query used to generate the sample result data?

babu_ganesh0708 · October 18, 2018, 4:55am

Hi @andrew_bowman

Below is the query I used to load the sample data,

LOAD CSV WITH HEADERS FROM "file:///data.csv" AS row
WITH row WHERE row.message_subject <> 'none' AND row.keywords <> 'none'
MERGE (p1:Username {mail_id: row.sender_address}) ON CREATE SET p1.timestamp = row.date_time 
MERGE (p2:Username {mail_id: row.recipient_address})
MERGE (p3:Topic {topic: row.keywords})
WITH p1, p2, p3, row, COUNT(*) AS count
MERGE (p1)-[rel:sent]->(p2) ON CREATE SET rel.time = row.date_time
MERGE (p1)-[:interest]->(p3)<-[:interest]-(p2)
SET rel.count = count

For generating sample result data we have used python and it will create new csv with these columns,
date_time,keywords,message_subject,recipient_address,sender_address

Sample result data

2018-09-10T17:36:48.823Z,dc02 c drive high utilization,PDX-VWIN-DC02 C drive High utilization,gavin.debeer@test.com,NarendraChoudaryB@test.com
2018-09-10T17:37:48.823Z,dc02 c drive high utilization,PDX-VWIN-DC02 C drive High utilization,Lisa.Sorenson@test.com,NarendraChoudaryB@test.com
2018-09-10T17:38:48.823Z,dc02 c drive high utilization,PDX-VWIN-DC02 C drive High utilization,PrathamK@test.com,NarendraChoudaryB@test.com
2018-09-10T17:39:48.823Z,dc02 c drive high utilization,PDX-VWIN-DC02 C drive High utilization,ArunR@test.com,NarendraChoudaryB@test.com
2018-09-10T17:40:48.823Z,dc02 c drive high utilization,PDX-VWIN-DC02 C drive High utilization,RMCTeam@test.com,NarendraChoudaryB@test.com
2018-09-10T17:41:48.823Z,dc02 c drive high utilization,PDX-VWIN-DC02 C drive High utilization,mike.blasberg@test.com,NarendraChoudaryB@test.com
2018-09-10T17:51:06.798Z,dc02 c drive high utilization,RE: PDX-VWIN-DC02 C drive High utilization,Lisa.Sorenson@test.com,GauravSi1@test.com

Thanks,
Ganeshbabu R

andrew_bowman · October 19, 2018, 12:15am

There's some buggy behavior in play here.

When I do your import in Neo4j 3.4.1 I get:

Added 9 labels, created 9 nodes, set 27 properties, created 19 relationships, completed after 107 ms.

12 :interest relationships are created.

When I do this in 3.4.8 I get:

Added 9 labels, created 9 nodes, set 25 properties, created 21 relationships, completed after 185 ms.

14 :interest releationships are created.

I'm not quite sure which bug is in play here, but in general it's important to be up to date with patch releases so you avoid buggy behavior that has since been fixed.

babu_ganesh0708 · October 19, 2018, 2:07am

Thanks @andrew_bowman

I will upgrade the neo4j version and will try & execute the same query then share you my feedback..

Regards,
Ganeshbabu R

ameyasoft · October 19, 2018, 4:14am

Hi,

Try this query:

LOAD CSV WITH HEADERS FROM "file:///data.csv" As row
WITH row

MERGE (p1:Username {mail_id: row.sender_address}) ON CREATE SET p1.timestamp = row.date_time
MERGE (p2:Username {mail_id: row.recipient_address})
MERGE (p3:Topic {topic: row.keywords})
WITH p1, p2, p3, row, COUNT(*) AS cnt
MERGE (p1)-[rel:sent]->(p2) ON CREATE SET rel.time = row.date_time,rel.count = cnt
MERGE (p1)-[:interest]->(p3)<-[:interest]-(p2);

Here is the result:
bganesh1

Also,

-Kamal

ameyasoft · October 19, 2018, 4:16am

Hi,

Sorry I didn't finish the last sentence. Now the 'sent' relation has count property and it will be 1 always as it is evaluating the count per each row.
-Kamal

babu_ganesh0708 · October 22, 2018, 12:40pm

Hi @andrew_bowman,

I am trying the same in the neo4j version 3.4.9

Below is the response in the console as I am getting the same response which I got in the version of 3.4.1

Added 9 labels, created 9 nodes, set 29 properties, created 19 relationships, completed after 234 ms.

12 interest relationships are created.

I am using community edition of 3.4.9. Can I know which edition of neo4j you tried?

Regards,
Ganeshbabu R

babu_ganesh0708 · October 22, 2018, 12:45pm

Hi @ameyasoft

Yes I tried the same but didn't get the expected output and below is the respone in console,

Are you using the community edition of neo4j 3.4.9?

Let me know your thoughts.

Regards,
Ganeshbabu R

ameyasoft · October 22, 2018, 9:12pm

Hi,

My version was 3.3.1. I installed version 3.4.9 and got the same result as version 3.3.1. Here is the screenshot with version 3.4.9.

bganesh2

Make sure that you remove 'SET rel.count = count' from your query if you have as in your original query.

If you are still not getting it right send me your .csv file and I can check on my instance.

-Kamal

ameyasoft · October 22, 2018, 11:29pm

Hi,

I tried your original query in 3.4.9 and got the same result as with my query.
-Kamal

Topic		Replies	Views
Relationship between nodes of same Label General	28	6963	July 18, 2021
Query Slows down when nodes at double depth relationship are accessed Operations performance , cypher	9	1150	November 28, 2019
Reduce the number of relations in the appearance Import / Export cypher , relationship , neo4j-import	6	1113	September 18, 2018
Import CSV relationship error Cypher	10	990	June 21, 2019
Creating relationship over several millions of nodes Cypher apoc , performance , cypher , relationship	23	2914	September 24, 2020

Demystifying Neo4j UX Research

Neo4j - Relationship between the node count is not matched with the data

Related topics