I have file/script nodes that are dependent on one another forming a chain (f:Filename)-[:DEPENDENT_ON]->(f1:Filename)
each file/script has Table nodes connected to it that are either being read from or written to by a script. In other words, a table either sends data to or accepts data from another table via a script. The diagram looks like this -
match p=(f:FileName)-[:DEPENDENT_ON]->(f1:FileName)
with nodes(p) as script
unwind script as scripts
match (scripts)-[:WRITES_TO]->(t:Table)
match (t1:Table)-[:SENDS_TO]->(scripts)
return scripts, t, t1
returns no rows.
Using subgraphAll
match p=(f:FileName)-[:DEPENDENT_ON]->(f1:FileName)
with nodes(p) as script
unwind script as scripts
call apoc.path.subgraphAll(scripts,{maxLevel:3,relationshipFilter:'SENDS_TO|WRITES_TO', bfs:true}) YIELD nodes, relationships
unwind relationships as r
unwind nodes as n
return scripts, r,n
also doesn't produce the desired results.
Note that not every script node has table nodes connected to it hence I'm performing a match instead of an optional match as I only want to see dependency chains that have tables connected to the scripts. When I do an optional match, I only get the chain but no table nodes.
First make sure there are no misspellings, especially with casing. Your first line says you're working with :Filename nodes, but your query uses :FileName nodes, so double-check the actual case of the label you're using.
Also, how do you want your results to look? Some example output may help us give better advice, as currently your queries would produce, per script, a cartesian product of all possible combinations of tables sending to the script with the tables the script is writing to.
So if tables t1, t2, and t3 were sending to the script, and t4, t5, and t6 were written to by the script, you would have 9 rows for that script for all 3x3 combinations of a table sending to with a table being written to.
You might want to consider using pattern comprehensions to collect the tables for each script. That would mean your results would be a row per script, with a list of the tables sending to it, and a list of tables being written to. (the same data may repeat, however, for every path using the that script in your graph...meaning if it is dependent upon another, and another depends upon it, it would occur twice).
The first line that shows :Filename as opposed to FileName is merely a typo.
Secondly I want to return results in a graph first of all, not rows and columns. Pretty much like the diagram I posted in my question. Once I have the graph for it, I can use different tricks to massage the tabular output.
I think I get it when you say pattern comprehension but could you please elaborate? At first I thought this would be a pretty simple query, but I'm stumped at the moment. Thank you.
Okay, for displaying the graph alone, something like this should work:
match p = (start:FileName)-[:DEPENDENT_ON*0..]->(f1:FileName)
where not ()-[:DEPENDENT_ON]->(start)
optional match tablePath = (f1)-[:WRITERS_TO|SENDS_TO*0..1]-(:Table)
RETURN p, tablePath
Thanks Andrew. At the moment the query you provided times out. I've consulted a few other people regarding this and those solutions eventually time out as well. I'm beginning to think this is a result of improper data loading. Since there are over 200k tables and about 10k script nodes, I'm going to inspect the files used for loading manually from our archives as opposed to searching in the graph. Will update with my findings.
So it was an underlying data issue. I've reloaded the correct data and fixed it. All versions of the query are working now. Thanks for the interesting options! Safe to mark this thread closed.