This website uses cookies. By clicking Accept, you consent to the use of cookies. Click Here to learn more about how we use cookies.

Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Neo4j
- Technical Discussions
- Neo4j Graph Platform
- Re: Best way of retrieving all paths between a sta...

Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

Best way of retrieving all paths between a start and end node

Options

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-21-2022 01:41 AM

Hi, I am currently working on a project that involves machine learning over paths in a graph. For this, **I have to extract all possible paths between a known start and end node**. Unfortunately, the graphs that are involved tend to be rather large, such that normal cypher queries would be to slow to perform this kind of operation. I have been looking into the APOC- and the GDSL-libraries, but unfortunately I could not find an algorithm that would suit this kind of problem. The closest I could find in terms of the APOC-library, was the apoc.path.subgraphAll() procedure. As for the GDSL there appears to be no such procedure at all, since the BFS- and DFS-algorithms only return a single, but not all paths. As a last resort, I also considered implementing my own algorithm using the Pregel-API.

Did I miss any option or is that all there is? I'd be glad if there was a simple way performing this operation, as I do not have much time unfortunately. So if I could prevent having to resort to the Pregel-API, I'd be more than happy.

Thanks a ton!

Labels:

11 REPLIES 11

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-21-2022 02:28 AM - edited 06-21-2022 02:32 AM

Hi @lorenzh ,

Have you tried something like...?

```
MATCH(start : {id : $startId}), (end {id : $idEnd})
with start, end
CALL apoc.path.expandConfig(start, {
terminatorNodes : [end],
uniqueness : 'RELATIONSHIP_PATH' //You may like changing this
optional : true
}) yield path
return count(path)
```

Bennu

Oh, y’all wanted a twist, ey?

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-22-2022 11:38 PM

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-23-2022 01:03 AM

Sounds fair. If you have a public instance/db dump that i can test I would be happy to help (Love this challenging problems) 😄

Oh, y’all wanted a twist, ey?

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-24-2022 02:43 AM

Thanks a lot, that is very kind! I have made some progress since, by resorting to the findAllPaths() algorithm found in the standard graphdb-library. It seems to work at least reasonably fast for depths of up to 100 nodes in my example.

Ultimately, however, it would be great to have this kind of method also in the GDSL, such that one could leverage graph projections. In this regard, I have tried using the pregel-API to solve this problem, but I gave up for now due to some of it's limitations. As far as I understand, the order in which messages are sent is not retained with pregel and there is no way to tell from which node a message originated, right? If that's correct, then I would assume there is also no way one could construct paths using computations on the individual nodes alone. There might be a chance to deal with is kind of problem within the masterCompute()method, but as I said, the solution I found seems to be acceptable for now.

If you want to give it a try for yourself, I could provide you with a db-dump of course 🙂

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-29-2022 08:23 AM

Hi @lorenzh !

In general this a 'well known' problem.. the loopy lattices problem. It scales really bad, generally speaking, so it's always good knowing some context on every use case so it's possible to evaluate particular optimizations.

I'll take a look on the problem in general, but it could be good if you can share me that dump so we can land it on your use case. Pregel may be a good option (once you arrive to this levels you need to power of parallel computing). I saw some other post you made about, indeed handling everything with integer messages may be tricky, but lets try 😄

Oh, y’all wanted a twist, ey?

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-30-2022 03:04 AM

Thanks, so the problem I am facing is called loopy lattices? I knew this was going to be hard, but taking a closer look at it, now that I know what it's called, it's definitely harder than I imagined. It may still be possible in this case, but let me explain a little more what I am trying to do here:

The purpose of all of this is to run some static code analysis on a graph representation of a given program. The type of graph in particular is a so-called Sparse Value Flow Graph (SVFG), generated using a tool called SVF. In short, this graph representation models the flows of values throughout a program, either directly through name-taken variables or indirectly through address-taken values. Using the available methods, it is easy to answer questions such as: "Does a value produced at some point in the program, reach another defined point in the program". This is also known as a taint-style analysis. Imagine we have a function that introduces a user defined value into the program (e.g. the gets()-function) we must make sure that it never reaches the call-site of, lets say, the malloc()-function.

Now, what I want to do is, I want to retrieve not one single path between two of such points, I want to retrieve all possible paths, through which a value could flow from point A to point B. As the "S" in "SVFG" stands for "Sparse", the graph is not that densely connected. I counted an average of 2.5 outgoing edges per node. Another characteristic that may be beneficial in this case is that while some nodes are highly connected, the maximum depth of the paths they define are mostly very short. Nodes on longer paths are usually very sparsely connected. See the example below:

If you're interested and you'd like to help me out, I'd be very thankful. I've create d a dump of a small-ish httpd server called "darkhttpd" and pushed it to this GitHub-repository that you could have a look at. I could of course also provide you with push access, if you want to. And yeah, as you've already inferred correctly from my other post, I am currently giving pregel a try, since I had some mild success wrt. another problem 😉

Thanks a ton,

Lorenz

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-30-2022 04:10 AM - edited 06-30-2022 04:11 AM

Hi @lorenzh!

I'll take a look on that dump later today of possible. Can you gimme an example of two nodes that you are using to calculate every possible path? (preferably an slow combination).

Another thing, technically, considering the sequentially of executions, direction matters, no?

PS: This problem/project/idea/homework looks fun (hard and fun)

Oh, y’all wanted a twist, ey?

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-30-2022 07:02 AM

Hi, so I've run a couple of experiments and a particularly slow combination appears to be the following, where n is the source node and m is the destination node.

```
MATCH (n) WHERE n.n_hash = "42222368"
MATCH (m) WHERE m.n_hash IN ["43603232"]
RETURN n AS source, m AS sink
```

And yes, that's correct, the direction does matter. Using pregel, I thought it might be possible to do something along the lines of:

- Beginning at the start node, mark it as reached and send the own node id to all neighbors
- Neighbors that receive node ids mark themselves as reached and keep record of all ids sent to them
- In the masterCompute() method, after each superstep a list of pairs is built for each node reached in the last step, such that each pair contains a) the id sent to the node and b) the own id of the node.
- Now with these "puzzle pieces", paths can be constructed using the list of pairs from the last super step. For each node that occurs multiple times as a sender in the pairs, all paths ending in that node must be copied

```
SS 1 SS 2 SS 3 | SS n
A->B | B->C | E->F | ...
A->C | B->E | E->G | ...
A->D | C->E | E->D | ...
After SS 1
A->B
A->C
A->D
After SS 2
A->B->C
A->B->E
A->C->E
A->D
After SS 3
A->B->C
A->B->E->F
A->B->E->G
A->B->E->D
A->C->E->F
A->C->E->G
A->C->E->D
A->D
```

I haven't had the time to think all of that through and I am not sure how much it may speed up things, if it does at all. After all there is a lot of stuff going on the in masterCompute() method that is not being executed in parallel.

Thanks a lot,

Lorenz

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-30-2022 07:20 AM

Hi @lorenzh ,

Quick question. From sink to source, which relationship are you planning to traverse (with direction)? svfg? direct +indirect svfg edge?

Oh, y’all wanted a twist, ey?

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-30-2022 07:38 AM

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-30-2022 07:58 AM - edited 06-30-2022 07:59 AM

Well @lorenzh , how many path do you expect on ur previous example? Using the relationshigFilter you get one (and fast).

```
MATCH (start) WHERE start.n_hash = "42222368"
MATCH (end) WHERE end.n_hash IN ["43603232"]
with start, end
CALL apoc.path.expandConfig(start, {
terminatorNodes : [end],
uniqueness : 'NODE_PATH',
relationshipFilter: 'svfg>',
optional : true
}) yield path
return path
```

Bennu

Oh, y’all wanted a twist, ey?

Related Content

- Neo.ClientError.Statement.SyntaxError in Neo4j Graph Platform
- How to access node, relationship in variable in Neo4j Graph Platform
- return the start node and end node of every realtions in Neo4j Graph Platform
- Querying Linux Kernel Git history with neo4j in Neo4j Graph Platform
- apoc.import.graphml in 4.3 and above ignores relationship/edge properties in Neo4j Graph Platform