Neo4j Query to Compare Lists

knuser · March 2, 2023, 9:46pm

Greetings,

I have a query that is not working as expected. I'll first show the query in isolated parts that return expected results and counts and then I'll show the troublesome query.

Query 1 of 2 that Works

Call apoc.load.jdbc("jdbc:sqlserver://edw.s.uni;databaseName=dbname;","select TABLE_NAME from INFORMATION_SCHEMA.TABLES
Where TABLE_CATALOG = 'dbname'
ORDER BY TABLE_NAME ASC") Yield row
Return row.TABLE_NAME

This returns 440 rows of table names

Query 2 of 2 that Works

Match (db:Database {Title: 'dbname'})-->(t:Table)-->(tv:TableVersion)
Return tv.Title as TableName

This returns 404 rows of table names

Query that Doesn't Work

What I want is for this query to return the 36 tables that are in the edw database (the 1 of 2 query) that don't yet exist in the Neo4j data set (the 2 of 2 query).

Call apoc.load.jdbc("jdbc:sqlserver://edw.s.uni;databaseName=dbname;","select TABLE_NAME from INFORMATION_SCHEMA.TABLES
Where TABLE_CATALOG = 'dbname'
ORDER BY TABLE_NAME ASC") Yield row
With row.TABLE_NAME as SQLTable
Match (db:Database {Title: 'UWODS'})-->(t:Table)-->(tv:TableVersion)
Where NOT SQLTable = KNTVTitle //or Where SQLTable <> KNTVTItle... is one better?
Return SQLTable

What gets returned are 177760 rows with a lot of rows duplicated.

What must I do to see the expected 36 rows of SQL table names?

ameyasoft · March 2, 2023, 11:19pm

Try this:

match (db:Database)
return apoc.coll.sort(collect(distinct db.dbname)) as dbnames
use this list to compare with tables in edw database

knuser · March 2, 2023, 11:43pm

Thanks @ameyasoft

I've updated my original 2 of 2 query, I misled you when I typed "Return dbname". I do not want the database name in the 2 of 2 query, I want the names of all the tables in that specific db.

So are you suggesting that the following...

Match (db:Database {Title: 'dbname'})-->(t:Table)-->(tv:TableVersion)
Return apoc.coll.sort(collect(distinct tv.Title)) as TableNames

If so, I don't understand how to incorporate this bit into the greater query.
I'm trying to compare 2 lists of tables (1 from a SQL edw and 1 from a neo4j data set) and return the names of 36 tables sourced from the SQL edw query that don't exist in my neo4j data set.

knuser · March 3, 2023, 12:58am

I've managed to return the 36 tables using the following...

Call apoc.load.jdbc("jdbc:sqlserver://edw.s.uni.;databaseName=dbname;","select TABLE_NAME from INFORMATION_SCHEMA.TABLES
Where TABLE_CATALOG = 'dbname'
ORDER BY TABLE_NAME ASC") Yield row
with row.TABLE_NAME as SQLTable

Match (db:Database {Title: 'dbname'})-->(t:Table)-->(tv:TableVersion)
With COLLECT(DISTINCT SQLTable) AS SQLTables, COLLECT(DISTINCT tv.Title) AS TVTitles

With [n IN SQLTables WHERE NOT n IN TVTitles] as missingTab
Unwind missingTab as missingTables
Return missingTables

The result is 36 rows of the expected tables.

Can anyone confirm whether this is a best practice?

ameyasoft · March 3, 2023, 6:47am

This is a very neat solution! Congratulations you solved your problem.

myron_higerd · March 8, 2023, 2:28pm

This cypher will end of with a cross product join between the SQLTable and the tv.Titles before they are collected. It would be better, especially if you have 1000's of tables to collect the SQLTables before the match and then collect the tv.Titles. Your cypher would performance much faster.

Call apoc.load.jdbc("jdbc:sqlserver://edw.s.uni.;databaseName=dbname;","select TABLE_NAME from INFORMATION_SCHEMA.TABLES
Where TABLE_CATALOG = 'dbname'
ORDER BY TABLE_NAME ASC") Yield row
with COLLECT(row.TABLE_NAME) as SQLTables

Match (db:Database {Title: 'dbname'})-->(t:Table)-->(tv:TableVersion)
With SQLTables, COLLECT(tv.Title) AS TVTitles

With [n IN SQLTables WHERE NOT n IN TVTitles] as missingTab
Unwind missingTab as missingTables
Return missingTables

knuser · March 8, 2023, 7:17pm

Thanks for the improvement @myron_higerd !

Topic		Replies	Views
Using `describe table` with `apoc.load.jdbc`` Neo4j Graph Platform migrated	4	138	July 27, 2022
Help re-writing query Cypher performance , cypher	7	1030	March 11, 2019
How to match multiple queries? Newbie Questions apoc , cypher	6	5923	April 28, 2020
Return all relationships from a set of nodes Cypher apoc , cypher , relationship , knowledge-base	2	4642	September 24, 2020
APOC - jdbcUpdate within periodic.iterate not executing Procedures & APOC apoc , jdbc , cypher , apocperiodiciterate	0	767	September 18, 2019

Neo4j Query to Compare Lists

Related topics