I have been using Cypher and apoc methods to load CSV in AuraDB (in the Query tab), using github as my file location, but I need to load some sensitive data, so I need a better option.
I can import CSV files using the import tool, but then it appears that I need to use the GUI. Is there a way to access the files that I imported (using the Import tab), using Cypher and apoc load methods in the Query tab?
If you have your source files on something like AWS S3, would a pre-signed url work and satisfy your info sec needs?
I think it is always worth asking the following questions:
-
Is this a data integration/continuous flow for updating the graph? => A better implementation is maybe some code that runs as a lamda/function/event processor etc
-
Is this a batch etl flow? => Do you already use something like spark/databriks in your organisation for this?
I.e. LOAD CSV is not always the right option and it is certainly not your only option.
Hi Hakan,
Thank you for your response.
No, my company does not want me to use a pre-sign URL.
Based on your response, I perhaps did not phrase my question generally enough. I have CSV that I want to load into the database. How do I get it there? I don't want to use the Import tool because I don't want to use the GUI. I don't want to use a pre-signed URL.
@dana_canzano mentioned that I can use FTP. I asked to confirm that I can use it to load to AuraDB, and am waiting on a response. If I can, then that works for me.
Let's say I use a lamda/function/event processor, I still am unclear how I can securely transmit the data.
Other people in my organization use Spark. A Data Analyst told me that he is able to mask the data that way. But most of the official Neo4j literature states that my options are:
- Import a database (limit 4 GiB)
- Import a flat file from a public website
- Pre-signed URL (which is a variation on option 2(
- Use the import tool, which means using the GUI.
Can you point me to other Neo4j literature describing other alternatives?
So, returning to my question: is there a
Got you! Thanks for providing some extra context. I think we should rule out load csv. Why? Because then you end up in a situation of "how can aura db securely access my csv/data" and it is better to flip that around to "how can my applications securely access aura" (answer may be to access the aura db through a private endpoint/aura enterprise feature Security - Neo4j Aura).
In terms of finding documentation and examples for how to build an application that writes to the database:
- Spark/Databricks connector: Neo4j Connector for Apache Spark - Neo4j Spark
- Write a small script/function to push data = use the neo4j driver for the programming language you choose: Create applications with Neo4j - Neo4j Documentation
I have used the Python Neo4J driver, but in the method I used, I still load a CSV. At the stage of maturity that my organization, we still want to use CSV's. I am having trouble believing that that this method, which is the most well-documented, cannot be done securely.
As to the FTP option that @dana_canzano proposed, does this work with AuraDB?
Pick up a tool like pandas
import pandas as pd
then load you csv file using:
artists_df = pd.read_csv('artists.csv')
then run your query:
records, summary, keys = driver.execute_query(
'''
unwind $rows as row
merge (a:Artist{name: row[1], year: toInteger(row[2])
return count(*) as rows_processed
''',
database_=DB_NAME,
routing_=RoutingControl.WRITE,
rows = df_artists.to_dict('records')
)
You can borrow bits and pieces from here notebooks/load-neo4j.ipynb at main · lqst/notebooks · GitHub
Then make sure your code runs on a server/lambda etc that have the right permissions to your source data and to the database instance/private endpoint.
1 Like
Thanks for the reply.
I want to see if I am understanding your proposal correctly.
In this scenario, I am using Python to load the data into AuraDB, but instead of pointing to a CSV, I am assigning the content of the CSV to a Pandas object (which is stored locally on my computer), and using Cyper (in the Python script) to load the contents of the CSV (assigned to the Pandas object) to Aura DB?
Yes exactly, you move the responsibility of "who can access your csv files" from the database to your code (that you can deploy to a suitable place, running it on your local computer is usually not the best place if security is a concern).
I think that is an investment in data integration work that you have to do if you want to securely move data into your database.
1 Like