How do you schedule and manage data integration with Neo4j?

I'd love to hear how others are doing this. I've got several cql scripts that I use regularly, and I've build some of my own tools to handle things like injecting credentials/tokens to keep them secure.

I'm largely just wrapping the scripts within powershell, then scheduling the running of these using Windows task scheduler, and logging results using a node label: (:Cypherlogentry) (I know! - It's certainly not very sophisticated)

As I'm growing the data I'm working with, it's getting challenging to manage all my schedules (what's working, what got broken, etc).

How are others managing import/exports and your n4j/cypher processes?

I've explored some of the 3rd party (ETL) data management tools, but sometimes I think they are adding additional complexity, rather than helping me manage the ETL processes I've already designed.

Do I just need to buckle down and learn a tool like Pentaho Kettle?

You might be at that time like you mentioned of learning an ETL tool. Don't worry they're not that scary and actual make life a lot easier because they're designed for process flow, step completion dependencies, etc... Pentaho is fairly easy to pick up and run. Apache Airflow is a new comer to the game. There's dozens out there to choose from. At home I run pentaho because it's free and easily integrates with the Neo4j JDBC driver. I have a couple of articles on my personal blog if you need any help getting setup.

1 Like

Hi Paul,

I mostly use Apache Airflow to ingest data or build new neo4j databases from scratch. If you need help feel free to ask.

Cheers Kris

Thanks for the starter info!

I've got Pentaho running, have my Advantage Database and Neo4j connection.
The query from Advantage is working, and I'm populating the parameters into the "Neo4j Cypher" step.
I'm just using the output to create a simple node & relationship:

MATCH (r:Recordstatus{state:$ACCOUNTSTATUS})
MERGE (a:Company {acctrecid:$RECID})
MERGE (a)-[:ACCOUNT_STATE]->(r)

When I preview The query works fine, but the Cypher fails, but without much helpful information on what I did wrong.

"Neo4j Cypher.0 - ERROR (version 8.3.0.0-371, build 8.3.0.0-371 from 2019-06-11 11.09.08 by buildguy) : Unexpected error"

How do you figure out what it doesn't like about the way you've written your transform... it is aggressively vague! ;)

I'd prefer to write CYPHER (more familiar to me) I had an old example working using "Graph Output" but the additional complexity of the models and mappings, makes it more challenging to use that method for me.

Thanks!

Made some progress, and now I'm trying to do some more advanced cypher within kettle.

I am trying to use the Neo4j Cypher kettle plugin (@matt.casters ) so I can make the parameters into a map that I can unwind. (and not have to deal with the indexed parameters, used one-at-a-time in the exact order challenge.

I've selected "Collect parameter values as map"
Name of values map list: companies.

If I simply do a test like this:

UNWIND $companies as c
RETURN c.RECID,c.COMPANY
;

I get 683 BLANK rows returned... The source is a simple Table Input (JDBC db query), and the preview there shows all the data as I expect...

If I use a MERGE statement, it errors complaining about null values...

Any ideas?

I am super happy with Nifi and Neo4j. You should take a look.