Troubleshooting CSV Import from SageMaker to Neo4j on EC2

I'm trying to load a CSV file hosted on Amazon SageMaker into a Neo4j database running on an EC2 instance using a Python script . I have verified that simple Cypher queries work, so sending queries to the database is not an issue. The Neo4j version is 5.19.0.

I have made the following configuration changes to the neo4j.conf file:
●commented
#server.directories.import=/var/lib/neo4j/import
●uncommented
dbms.security.allow_csv_import_from_file_urls=true
After making these changes, I restarted the Neo4j service .
-sudo systemctl restart neo4j

I'm attempting to run the following Cypher query from my Python script to create nodes from the CSV file:

LOAD CSV WITH HEADERS FROM "file:///upload_2024-06-05/nodes_MUKYA.csv" AS line CREATE (u:MUKYA {type:line.type, label:line.label, nodeId:line.nodeId, rank:line.rank, JP:line.JP, source:line.source})

However, I'm receiving the following error:

Query failed: {code: Neo.ClientError.Statement.ExternalResourceFailed} {message: Cannot load from URL 'file:///upload_2024-06-05/nodes_MUKYA.csv': Couldn't load the external resource at: file:///upload_2024-06-05/nodes_MUKYA.csv ()}

Are there any additional configurations or steps required to successfully load the CSV file from Amazon SageMaker into the Neo4j database on the EC2 instance using a Python script on SageMaker?

I created the apoc.conf file myself and placed it in the correct location (/etc/neo4j/apoc.conf). Inside the apoc.conf, I added the following lines:

apoc.import.file.enabled=true
apoc.export.file.enabled=true

I then restarted neo4j and tried to perform an import, but it did not work successfully.

It looks like you are using neo4j’s “load csv” procedures, so apoc.conn settings are not relevant.

You can use http or ftp to get the file remotely.

You can also read it from an AWS S3 bucket, if you can push it to one.

1 Like

When specifying the file like 'https://{bucket}.s3.amazonaws.com/{folder}/nodes_xxx.csv', I was able to import the data into AuraDB via a CSV LOAD operation.