Find most common paths between nodes having certain labels


(Er Saurabhbhadauria) #1

Hi,
I am working on some NLP project, I need to figure out the most common patterns between certain kind of words. I have created nodes for each word in the sentence with certain grammatical properties and dependency relationships are created between words .
Now I need to figure out the most common paths between a certain type of words. Is there any way to find the most common paths between two nodes having certain labels. A sample image is below for example.

Thanks,
Saurabh


(Andrew Bowman) #2

You may want to provide what the expected results should be for any example data/images, and what you've tried so far and why those attempts aren't satisfactory.


(Paul Thomas) #3

Yes, you can via the with statements and the labels function ...

Here is simple, sample data and queries but you'll need to think about how to model / structure the data so the results are useful.

1st query to create sample data ...

create
(peter:Person {name:"Peter"}),
(paul:Person {name:"Paul"}),
(mary:Person {name:"Mary"}),
(knows:Verb {name:"Knows"}),
(drinks:Verb {name:"Drinks"}),
(eats:Verb {name:"Eats"}),
(milk:Food {name:"Milk"}),
(bread:Food {name:"Beans"}),
(peter)-[:Rel]->(knows)-[:Rel]->(paul),
(paul)-[:Rel]->(knows)-[:Rel]->(mary),
(mary)-[:Rel]->(knows)-[:Rel]->(peter),
(peter)-[:Rel]->(eats)-[:Rel]->(bread),
(mary)-[:Rel]->(drinks)-[:Rel]->(milk)

2nd query to get results ...

match (n1)-->(n2)-->(n3)
with
labels(n1) as nodeType1,
labels(n2) as nodeType2,
labels(n3) as nodeType3,
count(labels(n2)) as freq
return nodeType1, nodeType2, nodeType3, freq
order by freq desc

nodeType1 nodeType2 nodeType3 freq
["Person"] ["Verb"] ["Person"] 9
["Verb"] ["Person"] ["Verb"] 5
["Person"] ["Verb"] ["Food"] 2

(Andrew Bowman) #4

Thanks, though keep in mind the frequencies include paths that start and end with the same node, is that desired? If not you may need to add in a predicate WHERE n1 <> n3.

Your counts also include patterns where all of the nodes are the same, just in a different order (Mary KNOWS Peter ; Peter KNOWS Mary). I'm assuming this is okay since your pattern is using directed relationships.

In any case, are these results useful to you? If not, can you tell us why this isn't meeting what you need?


(Er Saurabhbhadauria) #5

Thanks, Paul for the reply.

Unfortunately, in my case, I may have multiple nodes in between of start and end node and these between nodes may vary for another set of data. I also want to consider the type of relationship between the nodes.
for example, I may have below type of paths.

StartNode - Verb - Adverb - End node
StartNode - Noun- Verb - adjective- End node
StartNode - Verb - End node
StartNode - helping verb - verb - Noun - End node

If I have a large number of sentences then these paths will be repeated for similar sentences.

-Saurabh


(Er Saurabhbhadauria) #6

Hi Andrew,
As I mentioned, I am trying to figure out the most common path patterns between some nodes with the particular label. I am very new to the Neo4 and I am sure there will be some effective way to do this.
So far, I am trying to generate a string for each matching path and later I will group them and get the most occurred pattern for the paths. I know this is not an efficient way.

Thanks,
Saurabh