Find same sequences of nodes in a graph with multiple chains of nodes (Sequential Pattern Mining algorithm)

Given a graph with several letters with every letter modelled as a chain of word nodes connected with NEXT-edges. Is there a way to find sequences of nodes of a given length covering the same text without setting any start node for the query ?
Bests
Andreas

Hello @andreas.kuczera :slight_smile:

Are you looking for a Sequential Pattern Mining algorithm?

Regards,
Cobra

Yes, sounds good to me ;-)

I'm also looking for one ...

I asked Neo4j if they were going to add any in GDS in the future but they told me it was not planned. I know there is a pattern mining library in R but I am looking for a Cypher solution. I haven't started to think about how to do it, but maybe we can start to find a Cypher solution here.

Yes, we can start here.
My first try was to query with fixes path length.

What is your current cypher query?

I found a few topics which could be interested:

I understand the logic but I wonder if it is optimize for big graphs :confused:

There was an approach by Stefan Armbruster but it wasn't merged to apoc: Commits · sarmbruster/neo4j-apoc-procedures · GitHub

I don't know if I'm ready to develop my own plugin :sweat_smile: but it will definitely be quicker in java than in full cypher.

Cypher might be to slow ... thats why apoc could be a good solution.

Hi all!

If you have a small portion of the data with some examples I may like helping with a Pregel Implementation.

Bennu

Looks like someone developed its own plugin with the Apriori algorithm.

Hello @Bennu :slight_smile:

Thank you, I will try to create a good dataset this afternoon :slight_smile:

Regards,
Cobra

You can use this dataset from this source, it's a little social network dataset. Here is the cypher queries to create it:

CREATE (nAlice:User {id:'Alice', name:'Node Alice'})
,(nBridget:User {id:'Bridget', name:'Node Bridget'})
,(nCharles:User {id:'Charles', name:'Node Charles'})
,(nDoug:User {id:'Doug', name:'Node Doug'})
,(nMark:User {id:'Mark', name:'Node Mark'})
,(nNancy:User {id:'Nancy', name:'Node Nancy'})
,(nOliver:User {id:'Oliver', name:'Node Oliver'})

CREATE (nAlice)-[:FOLLOWS]->(nBridget)
,(nAlice)-[:FOLLOWS]->(nCharles)
,(nAlice)-[:FOLLOWS]->(nDoug)
,(nBridget)-[:FOLLOWS]->(nAlice)
,(nBridget)-[:FOLLOWS]->(nCharles)
,(nBridget)-[:FOLLOWS]->(nDoug)
,(nCharles)-[:FOLLOWS]->(nAlice)
,(nCharles)-[:FOLLOWS]->(nBridget)
,(nCharles)-[:FOLLOWS]->(nDoug)
,(nDoug)-[:FOLLOWS]->(nAlice)
,(nDoug)-[:FOLLOWS]->(nBridget)
,(nDoug)-[:FOLLOWS]->(nCharles)
,(nMark)-[:FOLLOWS]->(nNancy)
,(nMark)-[:FOLLOWS]->(nOliver)
,(nNancy)-[:FOLLOWS]->(nMark)
,(nNancy)-[:FOLLOWS]->(nOliver)
,(nOliver)-[:FOLLOWS]->(nMark)
,(nOliver)-[:FOLLOWS]->(nNancy)
,(nCharles)-[:FOLLOWS]->(nMark)
,(nMark)-[:FOLLOWS]->(nCharles)

CREATE (msg1:Message {id:'1', contents:'This is a message', day_sent: 'Monday'})
,(msg2:Message {id:'2', contents:'This is a message', day_sent: 'Monday'})
,(msg3:Message {id:'3', contents:'This is a message', day_sent: 'Monday'})
,(msg4:Message {id:'4', contents:'This is a message', day_sent: 'Monday'})
,(msg5:Message {id:'5', contents:'This is a message', day_sent: 'Monday'})
,(msg6:Message {id:'6', contents:'This is a message', day_sent: 'Monday'})
,(msg7:Message {id:'7', contents:'This is a message', day_sent: 'Monday'})
,(msg8:Message {id:'8', contents:'This is a message', day_sent: 'Monday'})
,(msg9:Message {id:'9', contents:'This is a message', day_sent: 'Tuesday'})
,(msg10:Message {id:'10', contents:'This is a message', day_sent: 'Tuesday'})
,(msg11:Message {id:'11', contents:'This is a message', day_sent: 'Tuesday'})
,(msg12:Message {id:'12', contents:'This is a message', day_sent: 'Tuesday'})
,(msg13:Message {id:'13', contents:'This is a message', day_sent: 'Tuesday'})
,(msg14:Message {id:'14', contents:'This is a message', day_sent: 'Monday'})
,(msg15:Message {id:'15', contents:'This is a message', day_sent: 'Thursday'})
,(msg16:Message {id:'16', contents:'This is a message', day_sent: 'Thursday'})
,(msg17:Message {id:'17', contents:'This is a message', day_sent: 'Thursday'})
,(msg18:Message {id:'18', contents:'This is a message', day_sent: 'Thursday'})
,(msg19:Message {id:'19', contents:'This is a message', day_sent: 'Thursday'})
,(msg20:Message {id:'20', contents:'Conversation Starter', day_sent: 'Friday'})
,(msg21:Message {id:'21', contents:'Conversation Starter', day_sent: 'Friday'})
,(msg22:Message {id:'22', contents:'Conversation Starter', day_sent: 'Friday'})
,(msg23:Message {id:'23', contents:'Conversation Starter', day_sent: 'Friday'})
,(msg24:Message {id:'24', contents:'Reply', day_sent: 'Friday'})
,(msg25:Message {id:'25', contents:'Reply', day_sent: 'Friday'})
,(msg26:Message {id:'26', contents:'Reply', day_sent: 'Friday'})
,(msg27:Message {id:'27', contents:'Reply', day_sent: 'Friday'})
,(msg28:Message {id:'28', contents:'Reply', day_sent: 'Friday'})
,(msg29:Message {id:'29', contents:'Reply', day_sent: 'Friday'})
,(msg30:Message {id:'30', contents:'Reply', day_sent: 'Friday'})
,(msg31:Message {id:'31', contents:'Reply', day_sent: 'Friday'})
,(msg32:Message {id:'32', contents:'Reply', day_sent: 'Friday'})
,(msg33:Message {id:'33', contents:'Reply', day_sent: 'Friday'})
,(msg34:Message {id:'34', contents:'Reply', day_sent: 'Friday'})
,(msg35:Message {id:'35', contents:'Conversation Starter', day_sent: 'Saturday'})
,(msg36:Message {id:'36', contents:'Conversation Starter', day_sent: 'Saturday'})
,(msg37:Message {id:'37', contents:'Reply', day_sent: 'Saturday'})
,(msg38:Message {id:'38', contents:'Reply', day_sent: 'Saturday'})
,(msg39:Message {id:'39', contents:'Reply', day_sent: 'Saturday'})
,(msg40:Message {id:'40', contents:'Reply', day_sent: 'Saturday'})
,(msg41:Message {id:'41', contents:'Reply', day_sent: 'Saturday'})
,(msg42:Message {id:'42', contents:'Reply', day_sent: 'Saturday'})
,(msg43:Message {id:'43', contents:'Reply', day_sent: 'Saturday'})
,(msg44:Message {id:'44', contents:'Reply', day_sent: 'Saturday'})
,(msg45:Message {id:'45', contents:'Reply', day_sent: 'Saturday'})
,(msg46:Message {id:'46', contents:'Reply', day_sent: 'Saturday'})
,(msg47:Message {id:'47', contents:'Reply', day_sent: 'Saturday'})
,(msg48:Message {id:'48', contents:'Reply', day_sent: 'Saturday'})
,(msg49:Message {id:'49', contents:'Reply', day_sent: 'Sunday'})
,(msg50:Message {id:'50', contents:'Reply', day_sent: 'Sunday'})
,(msg51:Message {id:'51', contents:'Reply', day_sent: 'Sunday'})
,(msg52:Message {id:'52', contents:'Reply', day_sent: 'Sunday'})
,(msg53:Message {id:'53', contents:'Reply', day_sent: 'Sunday'})
,(msg54:Message {id:'54', contents:'Conversation Starter', day_sent: 'Sunday'})
,(msg55:Message {id:'55', contents:'Conversation Starter', day_sent: 'Sunday'})
,(msg56:Message {id:'56', contents:'Conversation Starter', day_sent: 'Sunday'})
,(msg57:Message {id:'57', contents:'Reply', day_sent: 'Sunday'})
,(msg58:Message {id:'58', contents:'Reply', day_sent: 'Sunday'})
,(msg59:Message {id:'59', contents:'Reply', day_sent: 'Sunday'})
,(msg60:Message {id:'60', contents:'Reply', day_sent: 'Sunday'})
,(msg61:Message {id:'61', contents:'Reply', day_sent: 'Sunday'})
,(msg62:Message {id:'62', contents:'Reply', day_sent: 'Sunday'})
,(msg63:Message {id:'63', contents:'Reply', day_sent: 'Sunday'})
,(msg64:Message {id:'64', contents:'Reply', day_sent: 'Sunday'})
,(msg65:Message {id:'65', contents:'Reply', day_sent: 'Sunday'})
,(msg66:Message {id:'66', contents:'Reply', day_sent: 'Sunday'})
,(msg67:Message {id:'67', contents:'Reply', day_sent: 'Sunday'})
,(msg68:Message {id:'68', contents:'Reply', day_sent: 'Sunday'})
,(msg69:Message {id:'69', contents:'Reply', day_sent: 'Sunday'})
,(nAlice)-[:SENT]->(msg2)
,(nBridget)-[:SENT]->(msg3)
,(nBridget)-[:SENT]->(msg4)
,(nBridget)-[:SENT]->(msg5)
,(nCharles)-[:SENT]->(msg6)
,(nDoug)-[:SENT]->(msg7)
,(nDoug)-[:SENT]->(msg8)
,(nAlice)-[:SENT]->(msg9)
,(nBridget)-[:SENT]->(msg10)
,(nCharles)-[:SENT]->(msg11)
,(nDoug)-[:SENT]->(msg12)
,(nDoug)-[:SENT]->(msg13)
,(nAlice)-[:SENT]->(msg14)
,(nAlice)-[:SENT]->(msg15)
,(nAlice)-[:SENT]->(msg16)
,(nAlice)-[:SENT]->(msg17)
,(nBridget)-[:SENT]->(msg18)
,(nCharles)-[:SENT]->(msg19)
,(nBridget)-[:SENT]->(msg20)
,(nBridget)-[:SENT]->(msg21)
,(nNancy)-[:SENT]->(msg22)
,(nMark)-[:SENT]->(msg23)
,(nAlice)-[:SENT]->(msg24)
,(nBridget)-[:SENT]->(msg25)
,(nAlice)-[:SENT]->(msg26)
,(nBridget)-[:SENT]->(msg27)
,(nAlice)-[:SENT]->(msg28)
,(nNancy)-[:SENT]->(msg29)
,(nMark)-[:SENT]->(msg30)
,(nAlice)-[:SENT]->(msg31)
,(nBridget)-[:SENT]->(msg32)
,(nBridget)-[:SENT]->(msg33)
,(nMark)-[:SENT]->(msg34)
,(nMark)-[:SENT]->(msg35)
,(nAlice)-[:SENT]->(msg36)
,(nAlice)-[:SENT]->(msg37)
,(nBridget)-[:SENT]->(msg38)
,(nMark)-[:SENT]->(msg39)
,(nMark)-[:SENT]->(msg40)
,(nBridget)-[:SENT]->(msg41)
,(nCharles)-[:SENT]->(msg42)
,(nBridget)-[:SENT]->(msg43)
,(nAlice )-[:SENT]->(msg44)
,(nCharles)-[:SENT]->(msg45)
,(nDoug)-[:SENT]->(msg46)
,(nDoug)-[:SENT]->(msg47)
,(nMark)-[:SENT]->(msg48)
,(nAlice)-[:SENT]->(msg49)
,(nMark)-[:SENT]->(msg50)
,(nAlice)-[:SENT]->(msg51)
,(nBridget)-[:SENT]->(msg52)
,(nAlice)-[:SENT]->(msg53)
,(nAlice)-[:SENT]->(msg54)
,(nAlice)-[:SENT]->(msg55)
,(nAlice)-[:SENT]->(msg56)
,(nCharles)-[:SENT]->(msg57)
,(nAlice)-[:SENT]->(msg58)
,(nCharles)-[:SENT]->(msg59)
,(nAlice)-[:SENT]->(msg60)
,(nCharles)-[:SENT]->(msg61)
,(nCharles)-[:SENT]->(msg62)
,(nBridget)-[:SENT]->(msg63)
,(nCharles)-[:SENT]->(msg64)
,(nMark)-[:SENT]->(msg65)
,(nMark)-[:SENT]->(msg66)
,(nCharles)-[:SENT]->(msg67)
,(nBridget)-[:SENT]->(msg68)
,(nCharles)-[:SENT]->(msg69)
CREATE (msg5)-[:FORWARD]->(msg2)
,(msg6)-[:FORWARD]->(msg2)
,(msg7)-[:FORWARD]->(msg3)
,(msg8)-[:FORWARD]->(msg1)
,(msg11)-[:FORWARD]->(msg10)
,(msg12)-[:FORWARD]->(msg10)
,(msg13)-[:FORWARD]->(msg11)
,(msg14)-[:FORWARD]->(msg3)
,(msg15)-[:FORWARD]->(msg4)
,(msg16)-[:FORWARD]->(msg5)
,(msg17)-[:FORWARD]->(msg10)
,(msg18)-[:FORWARD]->(msg6)
,(msg46)-[:FORWARD]->(msg39)
,(msg47)-[:FORWARD]->(msg40)

CREATE (msg24)-[:REPLY_TO]->(msg20)
,(msg25)-[:REPLY_TO]->(msg24)
,(msg26)-[:REPLY_TO]->(msg25)
,(msg27)-[:REPLY_TO]->(msg26)
,(msg28)-[:REPLY_TO]->(msg27)
,(msg29)-[:REPLY_TO]->(msg21)
,(msg30)-[:REPLY_TO]->(msg21)
,(msg31)-[:REPLY_TO]->(msg21)
,(msg32)-[:REPLY_TO]->(msg28)
,(msg33)-[:REPLY_TO]->(msg30)
,(msg34)-[:REPLY_TO]->(msg33)
,(msg37)-[:REPLY_TO]->(msg35)
,(msg38)-[:REPLY_TO]->(msg35)
,(msg39)-[:REPLY_TO]->(msg37)
,(msg40)-[:REPLY_TO]->(msg38)
,(msg41)-[:REPLY_TO]->(msg38)
,(msg42)-[:REPLY_TO]->(msg37)
,(msg43)-[:REPLY_TO]->(msg37)
,(msg44)-[:REPLY_TO]->(msg43)
,(msg45)-[:REPLY_TO]->(msg42)
,(msg48)-[:REPLY_TO]->(msg36)
,(msg49)-[:REPLY_TO]->(msg48)
,(msg50)-[:REPLY_TO]->(msg49)
,(msg51)-[:REPLY_TO]->(msg41)
,(msg52)-[:REPLY_TO]->(msg50)
,(msg53)-[:REPLY_TO]->(msg52)
,(msg57)-[:REPLY_TO]->(msg54)
,(msg58)-[:REPLY_TO]->(msg57)
,(msg59)-[:REPLY_TO]->(msg58)
,(msg60)-[:REPLY_TO]->(msg59)
,(msg61)-[:REPLY_TO]->(msg60)
,(msg62)-[:REPLY_TO]->(msg55)
,(msg63)-[:REPLY_TO]->(msg62)
,(msg64)-[:REPLY_TO]->(msg63)
,(msg65)-[:REPLY_TO]->(msg64)
,(msg66)-[:REPLY_TO]->(msg56)
,(msg67)-[:REPLY_TO]->(msg66)
,(msg68)-[:REPLY_TO]->(msg67)
,(msg69)-[:REPLY_TO]->(msg68)

I think there are three output possibilities:

  • most frequent node sequence
  • most frequent sequence of relationships
  • most frequent path sequence (nodes & relations)

I have a running graphDB here: http://jlu-buster.mni.thm.de:9580/browser/
Login is neo4j/1234
There are chains of Token-Nodes which show the Words of letters.
The task is now to show sequences in these chains of the same pattern in the property text.

// 4 Gleiche
match (t1:Token)-[:NEXT_TOKEN]->
(t2:Token)-[:NEXT_TOKEN]->(t3:Token)-[:NEXT_TOKEN]->(t4:Token)
match (b1:Token)-[:NEXT_TOKEN]->
(b2:Token)-[:NEXT_TOKEN]->(b3:Token)-[:NEXT_TOKEN]->(b4:Token)
where t1.text = b1.text
and t2.text = b2.text
and t3.text = b3.text
and t4.text = b4.text
and t1.Briefnr <> b1.Briefnr
return t1.Briefnr, b2.Briefnr, b1.text, b2.text, b3.text, b4.text;

Runs forever ...

Hi @andreas.kuczera !

Is there any scheduler on the instance? It's not running right now :frowning:

Bennu

No i stopped it. Can you reach the DB ?

Hi Andreas,

I suggest to first prepare the data properly so that it fully can benefit from the power of the graph. Actually you do mainly text searches.

My suggestions are:

1.) There are 13840 Token with text equals " " - if not needed, delete them

2.) All text properties I have seen, ends with a space - if not needed, strip those spaces

3.) There was no index on the text property, I created one using CREATE INDEX ON :Token(text)

4.) The property BriefNr that is used in a condition in your cypher doesn't exist in the Token nodes

5.) There are many duplicates texts (15611 unique texts, 11319 multiple used texts blown up to 145427 nodes). I would suggest not using duplicate Tokens but just one 1 unique Token per "word". Then you could relate those unique Tokens with relations storing the source of the relation (BriefNr, startIndex, endIndex) on the relation.

With data structured like this, you could easily run a query that shows similar paths in seconds, because similar senctences will traverse exactly the same nodes in the path. Also you will not longer rely on slow heavy text searches.

Best,
Reiner

1 Like

Ok, i changed the Graph accordingly. Now every Token has a unique value-Property.
Does this work for your query ?
Best,
Andreas