Hi there,
I'm having a couple of nodes each with a path poperty representing a file path:
/foo/File1.java
/foo/bar/File2.java
/foo/bar/File3.java
I'd like to convert this into a tree structure consisting of nodes/relationships, i.e.
(foo)-[:CONTAINS]->(file1)
(foo)-[:CONTAINS]->(bar)
(bar)-[:CONTAINS]->(file2)
(bar)-[:CONTAINS]->(file3)
I'm looking for an elegant Cypher/APOC based solution, any suggestion out here?
Cheers
Dirk
In terms of pseudo-code, you can split the string by slash, (split(path, "/")
and then deal with that array, unwind the array and create it as nodes, and create relationships between them)
It gets a bit more complicated if you want to say that /foo contains /foo/bar, rather than just /foo contains bar.
I'd recommend trying some things yourself, and then coming back with what you tried and what doesn't work about it. It's easier for the community to support questions rather than to write the code.
Hi David,
thanks for your response, I posted that question in that form because I hoped that someone else already solved this and could directly provide a solution.
Before that I already tried on my own and ran into problems: I've used split-function to get the path segments, created nodes using apoc.create.node and linked them:
WITH split("foo/bar/File1.java","/") as segments
UNWIND segments AS segment
CALL apoc.create.node(['Path'], {path:segment}) YIELD node
WITH collect(node) as nodes
CALL apoc.nodes.link(nodes,'CONTAINS')
RETURN nodes
This looks good at first but it comes with a problem: If I feed it now with an overlapping path (e.g. "foo/bar/File2.java") it will create a complete new list but what I need is that the "foo" and "bar" nodes should be re-used/merged. So the correct solution would be something like "merge on every fully qualified path segment", e.g. "foo", "foo/bar", "foo/bar/File.java".
As a workaround I see to create all those independent lists, run some reduce-query on all created nodes to create the fully qualified paths and merge the duplicates afterwards. But this sounds a bit strange to me and I'm now looking for a more elegant solution.
Cheers
Dirk
Here's my current solution:
- Create linked lists of :Path labeled nodes for each relativePath property (e.g. "/foo/bar") found on a :Git:File nodes, e.g. "(foo)-[:CONTAINS->(bar)":
MATCH
(f:Git:File)
WHERE
exists(f.relativePath)
WITH
f, split(f.relativePath, "/") as segments
UNWIND
segments AS segment
CALL
apoc.create.node(['Path'], {path:segment}) YIELD node
WITH
f, collect(node) as nodes
CALL
apoc.nodes.link(nodes,'CONTAINS')
RETURN
count(nodes)
Compute for each :Path node a relativePath property representing the path from the root, e.g. "/foo/bar"
MATCH
(root:Path)
WHERE NOT
()-[:CONTAINS]->(root)
WITH
root
MATCH
path=(root)-[:CONTAINS*0..]->(segment:Path)
SET
segment.relativePath = reduce(result = "", n in nodes(path) | result + "/" + n.path)
RETURN
count(path)
Merge :Path duplicates using APOC
MATCH
(p:Path)
WITH
p.relativePath as relativePath, collect(p) as paths
CALL
apoc.refactor.mergeNodes(paths, {mergeRels:true}) YIELD node
RETURN
relativePath, count(paths)
Remove left-over duplicates of CONTAINS relations between merged :Path nodes
MATCH
(p:Path)-[r:CONTAINS]->(c:Path)
WITH
p,c, collect(r) as relations
WHERE
size(relations) > 1
UNWIND
tail(relations) as duplicate
DELETE
duplicate
RETURN
p,c
Any suggestion on how to improve that?
Cheers
Dirk
Here's a different approach that may work for you.
We use an APOC function to get the indexes of all slashes in the string (this gets us a list of indexes), then we use an extract on that list to get us the substring from the start of the path to the given index, and we make sure we add the full path at the end:
WITH "path/to/the/thing.txt" as path
WITH path, apoc.text.indexesOf(path, "/") as delimiters
WITH path, [del in delimiters | substring(path, 0, del)] + path as paths
RETURN paths
This results in: ["path", "path/to", "path/to/the", "path/to/the/thing.txt"]
Now that you have the absolute path to each node, you can MERGE the nodes (with a FOREACH), and then MERGE the relationships between each node.
To avoid creating duplicates, you can use apoc.coll.pairsMin()
on the nodes to get you a list of list pairs of adjacent nodes in the list, then UNWIND that and MERGE the relationships between.
Looks good, will give it a try and come back with the results!
Thanks a lot,
Dirk