cancel
Showing results for 
Search instead for 
Did you mean: 

How do I use Cypher to connect to a RDBMS using JDBC

dana_canzano
Neo4j
Neo4j

With the inclusion of java stored procedures in Neo4j 3.x, one can run Cypher to connect to a RDBMS using JDBC.
To do so one needs to download and install https://github.com/neo4j-contrib/neo4j-apoc-procedures.

After installation of the Neo4j APOC kit, download the respective RDBMS JDBC driver .jar and install into $NEO4J_HOME\plugins.
The respective JDBC driver can be obtained from the RDBMS vendor. For example:

[width="40%",frame="topbot",options="header,footer"]
|======================
|Vendor | Download location | JDBC jar file
|MySQL | https://dev.mysql.com/downloads/connector/j/ | mysql-connector-java-5.1.34.jar
|Postgres | https://jdbc.postgresql.org/ | postgresql-9.4.1209.jar
|Oracle | http://www.oracle.com/technetwork/database/features/jdbc/ | ojdbc7.jar
|======================

After installing APOC and copying the RDBMS vendor .jar to $NEO4J_HOME\plugins, restart Neo4j.

The apoc.load.jdbc stored procedure is used to connect over JDBC and takes 2 arguments,namely:

connection string
SQL statement or table

The 'connection string' is vendor specific and as such one should consult the RDBMS vendor for syntax.
The 'SQL statement or table name' could be for example 'select * from movies' or simply 'movies'.
Usage of a single table name would result in 'select * from

'

The following example would connect to a mysql database named proddb1 as user root with password=football and select all movies from the 'movies' table where the studio column was defined to be 'MGM Studios'.
Using this data we would then create nodes in Neo4j for all the movies meeting this criteria and define the title property.

CALL apoc.load.jdbc('jdbc:mysql://localhost:3306/proddb1?user=root&password=football','select title from movies where studio=\'MGM Studios\'') YIELD row
CREATE (n:Movies {name:row.title})

If running the above results in error message similar to:

No suitable driver found for jdbc:mysql://localhost:3306/proddb1?user=root&password=football

you may need to first manually load the driver by calling:

call apoc.load.driver('com.mysql.jdbc.Driver')

where 'com.mysql.jdbc.Driver' is the class name for the MySQL JDBC driver.

If you want to hide/alias the connection string, this can be accomplished by adding to the conf/neo4j.conf a parameter similar to:

apoc.jdbc.myalias.url=jdbc:mysql://localhost:3306/proddb1?user=root&password=football

and now the above Cypher can be rewritten as:

CALL apoc.load.jdbc('myalias','select title from movies where studio=\'MGM Studios\'') YIELD row
CREATE (n:Movies {name:row.title})

It should be noted that the apoc.load.jdbc call is simply providing connectivity back to the RDBMS over JDBC.
The second argument can be any SQL statement, and that includes a SQL statement that may modify the source database through an UPDATE, DROP, TRUNCATE, etc.
If required, you might want to connect to the RDBMS with a user who only has SELECT privileges.

Also when loading data from JDBC, be mindful of datatypes and any necessary conversions; for example whereas MySQL supports a native DATE datatype, Neo4j does not.
For example, to get the column with DATE data type, convert the value to a String when importing into Neo4j.

2 REPLIES 2

When hiding the connection string in the neo4.conf, does this provide enough security to ensure passwords aren't accessible? I checked running an active query and found that it doesn't expose the string. However, I got a Neo.ClientError.Procedure.ProcedureCallFailed error for one of my tests and the message for it did list the connection string exposing the details of the string.

Are there any other protections that can be done to ensure that string doens't get exposed?

The only other option is to run client-side code to do the import on your machine.

e.g. using the etl tool.