I have a query that is not working as expected. I'll first show the query in isolated parts that return expected results and counts and then I'll show the troublesome query.
Query 1 of 2 that Works
Call apoc.load.jdbc("jdbc:sqlserver://edw.s.uni;databaseName=dbname;","select TABLE_NAME from INFORMATION_SCHEMA.TABLES
Where TABLE_CATALOG = 'dbname'
ORDER BY TABLE_NAME ASC") Yield row
Return row.TABLE_NAME
This returns 440 rows of table names
Query 2 of 2 that Works
Match (db:Database {Title: 'dbname'})-->(t:Table)-->(tv:TableVersion)
Return tv.Title as TableName
This returns 404 rows of table names
Query that Doesn't Work
What I want is for this query to return the 36 tables that are in the edw database (the 1 of 2 query) that don't yet exist in the Neo4j data set (the 2 of 2 query).
Call apoc.load.jdbc("jdbc:sqlserver://edw.s.uni;databaseName=dbname;","select TABLE_NAME from INFORMATION_SCHEMA.TABLES
Where TABLE_CATALOG = 'dbname'
ORDER BY TABLE_NAME ASC") Yield row
With row.TABLE_NAME as SQLTable
Match (db:Database {Title: 'UWODS'})-->(t:Table)-->(tv:TableVersion)
Where NOT SQLTable = KNTVTitle //or Where SQLTable <> KNTVTItle... is one better?
Return SQLTable
What gets returned are 177760 rows with a lot of rows duplicated.
What must I do to see the expected 36 rows of SQL table names?
I've updated my original 2 of 2 query, I misled you when I typed "Return dbname". I do not want the database name in the 2 of 2 query, I want the names of all the tables in that specific db.
So are you suggesting that the following...
Match (db:Database {Title: 'dbname'})-->(t:Table)-->(tv:TableVersion)
Return apoc.coll.sort(collect(distinct tv.Title)) as TableNames
If so, I don't understand how to incorporate this bit into the greater query.
I'm trying to compare 2 lists of tables (1 from a SQL edw and 1 from a neo4j data set) and return the names of 36 tables sourced from the SQL edw query that don't exist in my neo4j data set.
I've managed to return the 36 tables using the following...
Call apoc.load.jdbc("jdbc:sqlserver://edw.s.uni.;databaseName=dbname;","select TABLE_NAME from INFORMATION_SCHEMA.TABLES
Where TABLE_CATALOG = 'dbname'
ORDER BY TABLE_NAME ASC") Yield row
with row.TABLE_NAME as SQLTable
Match (db:Database {Title: 'dbname'})-->(t:Table)-->(tv:TableVersion)
With COLLECT(DISTINCT SQLTable) AS SQLTables, COLLECT(DISTINCT tv.Title) AS TVTitles
With [n IN SQLTables WHERE NOT n IN TVTitles] as missingTab
Unwind missingTab as missingTables
Return missingTables
The result is 36 rows of the expected tables.
Can anyone confirm whether this is a best practice?
This cypher will end of with a cross product join between the SQLTable and the tv.Titles before they are collected. It would be better, especially if you have 1000's of tables to collect the SQLTables before the match and then collect the tv.Titles. Your cypher would performance much faster.
Call apoc.load.jdbc("jdbc:sqlserver://edw.s.uni.;databaseName=dbname;","select TABLE_NAME from INFORMATION_SCHEMA.TABLES
Where TABLE_CATALOG = 'dbname'
ORDER BY TABLE_NAME ASC") Yield row
with COLLECT(row.TABLE_NAME) as SQLTables
Match (db:Database {Title: 'dbname'})-->(t:Table)-->(tv:TableVersion)
With SQLTables, COLLECT(tv.Title) AS TVTitles
With [n IN SQLTables WHERE NOT n IN TVTitles] as missingTab
Unwind missingTab as missingTables
Return missingTables