Hi,
I am evaluating the Neo4j 4.2 with data that have millions of nodes and relationships. But writing performance is quite slow. My sample query is given below
create (session:Session {session_id: 'session_id1'})
;
match (s:Session) where s.session_id='session_id1' with s
create (e1:Event {insert_id: "insert_id1"}) set e1:SeenPage
create (s)-[:CONTAINS]->(e1)
create (s)-[:FIRST_EVENT]->(e1)
merge (pp1:Properties {value: "sample-url-1"}) set pp1:Page merge (e1)-[:RELATED_TO]->(pp1)
merge (pp2:Properties {value: "sp"}) set pp2:Type merge (e1)-[:RELATED_TO]->(pp2)
;
match (e1:SeenPage) where e1.insert_id='insert_id1' with e1
create (e2:Event {insert_id: "insert_id2"}) set e2:Show merge (e1)-[:NEXT]->(e2) with e2
match (s:Session) where s.session_id='session_id1' with s, e2
create (s)-[:CONTAINS]->(e2)
merge (pp1:Properties {value: "occasions"}) set pp1:Category merge (e2)-[:RELATED_TO]->(pp1)
merge (pp2:Properties {value: "sample-url-2"}) set pp2:Page merge (e2)-[:RELATED_TO]->(pp2)
merge (pp3:Properties {value: "pl"}) set pp3:Type merge (e2)-[:RELATED_TO]->(pp3)
merge (pp4:Properties {value: "child category"}) set pp4:Sub_Category merge (e2)-[:RELATED_TO]->(pp4)
;
match (e2:Show) where e2.insert_id='insert_id2' with e2
create (e3:Event {insert_id: "insert_id3"}) set e3:SeenPage merge (e2)-[:NEXT]->(e3) with e3
match (s:Session) where s.session_id='session_id1' with s, e3
create (s)-[:CONTAINS]->(e3)
merge (pp1:Properties {value: "/p-page-0"}) set pp1:Page merge (e3)-[:RELATED_TO]->(pp1)
merge (pp2:Properties {value: "sp"}) set pp2:Type merge (e3)-[:RELATED_TO]->(pp2)
;
match (e3:SeenPage) where e3.insert_id='insert_id3' with e3
create (e4:Event {insert_id: "insert_id4"}) set e4:Show merge (e3)-[:NEXT]->(e4) with e4
match (s:Session) where s.session_id='session_id1' with s, e4
create (s)-[:CONTAINS]->(e4)
merge (pp1:Properties {value: "rect1"}) set pp1:Category merge (e4)-[:RELATED_TO]->(pp1)
merge (pp2:Properties {value: "/p-page-1"}) set pp2:Page merge (e4)-[:RELATED_TO]->(pp2)
merge (pp3:Properties {value: "pl"}) set pp3:Type merge (e4)-[:RELATED_TO]->(pp3)
merge (pp4:Properties {value: "him"}) set pp4:Sub_Category merge (e4)-[:RELATED_TO]->(pp4)
;
match (e4:Show) where e4.insert_id='insert_id4' with e4
create (e5:Event {insert_id: "insert_id5"}) set e5:SeenPage merge (e4)-[:NEXT]->(e5) with e5
match (s:Session) where s.session_id='session_id1' with s, e5
create (s)-[:CONTAINS]->(e5)
merge (pp1:Properties {value: "/p-page-2"}) set pp1:Page merge (e5)-[:RELATED_TO]->(pp1)
merge (pp2:Properties {value: "sp"}) set pp2:Type merge (e5)-[:RELATED_TO]->(pp2)
;
This data represents the journey of a user on a website. The user starts a session and browses the pages. The action done by the user is recorded as the event. Each event has its unique id. Then the sequence of events is connected with the relationship :NEXT
and CONTAINS
. Events are not unique that's why I had to use create
not merge
. Properties of events are unique and these are created as nodes then added with a relationship RELATED_TO
.
It's like this
#session contains events
#events are connected with :next
Session-[:CONTAINS]->(Event1)-[:NEXT]-(Event2)<-[:CONTAINS]-Session
A session can contain 100s of events. The current speed of writing is quite slow. It is writing 10k sessions data in 4 hours. Each session contains on average 10 events. I am writing data event by event using a python bolt connector.
Any help would be really appreciated.