Neo4j - Relationship between the node count is not matched with the data

cypher

(Babu Ganesh0708) #1

Hi All,

Neo4j version 3.4.1 community

I am tring to build the graph to find the most connected user for this specific topic and below is the simple query I have tried to find the user,

MATCH (p1:Username)-[:interest]->(p3:Topic)<-[:interest]-(p2:Username) WHERE(p3.topic STARTS WITH 'dc02 c drive high utilization')
RETURN *

Query response,

image

Below is the sample I have indexed it,

date_time,keywords,message_subject,recipient_address,sender_address
2018-09-10T17:36:48.823Z,dc02 c drive high utilization,PDX-VWIN-DC02 C drive High utilization,gavin.debeer@test.com,NarendraChoudaryB@test.com
2018-09-10T17:37:48.823Z,dc02 c drive high utilization,PDX-VWIN-DC02 C drive High utilization,Lisa.Sorenson@test.com,NarendraChoudaryB@test.com
2018-09-10T17:38:48.823Z,dc02 c drive high utilization,PDX-VWIN-DC02 C drive High utilization,PrathamK@test.com,NarendraChoudaryB@test.com
2018-09-10T17:39:48.823Z,dc02 c drive high utilization,PDX-VWIN-DC02 C drive High utilization,ArunR@test.com,NarendraChoudaryB@test.com
2018-09-10T17:40:48.823Z,dc02 c drive high utilization,PDX-VWIN-DC02 C drive High utilization,RMCTeam@test.com,NarendraChoudaryB@test.com
2018-09-10T17:41:48.823Z,dc02 c drive high utilization,PDX-VWIN-DC02 C drive High utilization,mike.blasberg@test.com,NarendraChoudaryB@test.com
2018-09-10T17:51:06.798Z,dc02 c drive high utilization,RE: PDX-VWIN-DC02 C drive High utilization,Lisa.Sorenson@test.com,GauravSi1@test.com

If you check the data in the sender address the user "Narendra" has sent mail to different users for 6 times but in the graph the relationship "interest' is showing only 5 times.

I would like to know why it is showing like that? Please correct me if I am doing anything wrong.

Regards,
Ganeshbabu R


(Babu Ganesh0708) #2

Also I want add one more thing here,

We have tried by adding different sender & receiver address instead of above data and in that case we are able to see 6 interest relationships for narendra in the graph and below is the response

I am stuck in understanding these relationships and kindly clarify with your thoughts and it will be really helpful.

Thanks,
Ganeshbabu R


(Andrew Bowman) #3

Can you provide the Cypher query used to generate the sample result data?


(Babu Ganesh0708) #4

Hi @andrew.bowman

Below is the query I used to load the sample data,

LOAD CSV WITH HEADERS FROM "file:///data.csv" AS row
WITH row WHERE row.message_subject <> 'none' AND row.keywords <> 'none'
MERGE (p1:Username {mail_id: row.sender_address}) ON CREATE SET p1.timestamp = row.date_time 
MERGE (p2:Username {mail_id: row.recipient_address})
MERGE (p3:Topic {topic: row.keywords})
WITH p1, p2, p3, row, COUNT(*) AS count
MERGE (p1)-[rel:sent]->(p2) ON CREATE SET rel.time = row.date_time
MERGE (p1)-[:interest]->(p3)<-[:interest]-(p2)
SET rel.count = count

For generating sample result data we have used python and it will create new csv with these columns,
date_time,keywords,message_subject,recipient_address,sender_address

Sample result data

2018-09-10T17:36:48.823Z,dc02 c drive high utilization,PDX-VWIN-DC02 C drive High utilization,gavin.debeer@test.com,NarendraChoudaryB@test.com
2018-09-10T17:37:48.823Z,dc02 c drive high utilization,PDX-VWIN-DC02 C drive High utilization,Lisa.Sorenson@test.com,NarendraChoudaryB@test.com
2018-09-10T17:38:48.823Z,dc02 c drive high utilization,PDX-VWIN-DC02 C drive High utilization,PrathamK@test.com,NarendraChoudaryB@test.com
2018-09-10T17:39:48.823Z,dc02 c drive high utilization,PDX-VWIN-DC02 C drive High utilization,ArunR@test.com,NarendraChoudaryB@test.com
2018-09-10T17:40:48.823Z,dc02 c drive high utilization,PDX-VWIN-DC02 C drive High utilization,RMCTeam@test.com,NarendraChoudaryB@test.com
2018-09-10T17:41:48.823Z,dc02 c drive high utilization,PDX-VWIN-DC02 C drive High utilization,mike.blasberg@test.com,NarendraChoudaryB@test.com
2018-09-10T17:51:06.798Z,dc02 c drive high utilization,RE: PDX-VWIN-DC02 C drive High utilization,Lisa.Sorenson@test.com,GauravSi1@test.com

Thanks,
Ganeshbabu R


(Andrew Bowman) #5

There's some buggy behavior in play here.

When I do your import in Neo4j 3.4.1 I get:

Added 9 labels, created 9 nodes, set 27 properties, created 19 relationships, completed after 107 ms.

12 :interest relationships are created.

When I do this in 3.4.8 I get:

Added 9 labels, created 9 nodes, set 25 properties, created 21 relationships, completed after 185 ms.

14 :interest releationships are created.

I'm not quite sure which bug is in play here, but in general it's important to be up to date with patch releases so you avoid buggy behavior that has since been fixed.


(Babu Ganesh0708) #6

Thanks @andrew.bowman

I will upgrade the neo4j version and will try & execute the same query then share you my feedback..

Regards,
Ganeshbabu R


(Ameyasoft) #7

Hi,

Try this query:

LOAD CSV WITH HEADERS FROM "file:///data.csv" As row
WITH row

MERGE (p1:Username {mail_id: row.sender_address}) ON CREATE SET p1.timestamp = row.date_time
MERGE (p2:Username {mail_id: row.recipient_address})
MERGE (p3:Topic {topic: row.keywords})
WITH p1, p2, p3, row, COUNT(*) AS cnt
MERGE (p1)-[rel:sent]->(p2) ON CREATE SET rel.time = row.date_time,rel.count = cnt
MERGE (p1)-[:interest]->(p3)<-[:interest]-(p2);

Here is the result:
bganesh1

Also,

-Kamal


(Ameyasoft) #8

Hi,

Sorry I didn't finish the last sentence. Now the 'sent' relation has count property and it will be 1 always as it is evaluating the count per each row.
-Kamal


(Babu Ganesh0708) #9

Hi @andrew.bowman,

I am trying the same in the neo4j version 3.4.9

Below is the response in the console as I am getting the same response which I got in the version of 3.4.1

Added 9 labels, created 9 nodes, set 29 properties, created 19 relationships, completed after 234 ms.

12 interest relationships are created.

I am using community edition of 3.4.9. Can I know which edition of neo4j you tried?

Regards,
Ganeshbabu R


(Babu Ganesh0708) #10

Hi @ameyasoft

Yes I tried the same but didn't get the expected output and below is the respone in console,

Are you using the community edition of neo4j 3.4.9?

Let me know your thoughts.

Regards,
Ganeshbabu R


(Ameyasoft) #11

Hi,

My version was 3.3.1. I installed version 3.4.9 and got the same result as version 3.3.1. Here is the screenshot with version 3.4.9.

bganesh2

Make sure that you remove 'SET rel.count = count' from your query if you have as in your original query.

If you are still not getting it right send me your .csv file and I can check on my instance.

-Kamal


(Ameyasoft) #12

Hi,

I tried your original query in 3.4.9 and got the same result as with my query.
-Kamal