APOC s3 url isnt working for me

dbeaumon · June 26, 2019, 4:03pm

I'm trying to use APOC to load a CSV load from S3. I cannot make this happen.

My setup:
neo 3.6.3 (docker) - tried both enterprise and community
apoc 3.5.0.4

The plugins directory (per the documentation) contains:
apoc-3.5.0.4-all.jar
aws-java-sdk-core-1.11.250.jar
aws-java-sdk-s3-1.11.250.jar
httpclient-4.5.4.jar
httpcore-4.4.8.jar
joda-time-2.9.9.jar

I can call dbms.procedures() and see that APOC is being loaded successfully upon startup. I am setting NEO4J_apoc_import_file_enabled=true, and I can test against a local file successfully (CALL apoc.load.csv('/test.csv') yield lineNo, map, list RETURN *).

When I try to do the same via S3 using the format - s3://accessKey:secretKey@endpoint:port/bucket/key

CALL apoc.load.csv("s3://MY_URL_STRING") yield lineNo, map, list RETURN *;

All I get is a variation on the theme of "cannot find the file locally in your import directory":
Neo.ClientError.Procedure.ProcedureCallFailed: Failed to invoke procedure apoc.load.csv: Caused by: java.io.FileNotFoundException: /import/mybucket/misc/test.csv (No such file or directory)

What am I missing?

EDIT:
A poke through the code and I'd say that FileUtils.isFile should probably detect S3 as "not a file" so that the url doesn't get mangled by changeFileUrlIfImportDirectoryConstrained if you have apoc.import.file.use_neo4j_config set to true (otherwise you have to set it to false to make this work as far as I can tell, which may not be what is desired).

I think a larger question that I have is this - is it a design decision to force the static credentials provider (via URL parsing, meaning S3 access only comes via a client providing their creds), or could I:
a) Add a config parameter that allowed server-config-based fall-through for s3
b) Allow an S3 URL without credentials, and a fall through to allow the server to use the default credentials provider, thus allowing environment variables, creds files, instance profiles etc

I agree it's not the best idea by default (hence enabling it), but for some POC work it would make my life easier.

If I would be doing a pull request, I'm probably going to do all of the above, and I would rather not write the code if it is going to be rejected anyway

dana_canzano · June 26, 2019, 8:15pm

although this does not describe APOC it does detail how to get LOAD CSV to read from S3 Load CSV data in Neo4j from CSV files on Amazon S3 Bucket - Knowledge Base

does this suffice?

dbeaumon · June 26, 2019, 8:59pm

Not really (although it is a neat trick regardless), since I was thinking of both reading from and writing to an S3 bucket and this only covers reading (via http).

I'll just fork the APOC core and make it do what I want for my POC for the time being - I'm comfortable in the AWS libraries and I'm not sure anyone should be following my example on this one anyway

Topic		Replies	Views
Can apoc.import.csv reach s3 files? Neo.ClientError.Procedure.ProcedureCallFailed Procedures & APOC apoc , csv , s3	1	47	September 6, 2024
How do I connect to S3 from a remote machine which is the neo4j import directory Integrations & Ecosystem	3	1040	October 11, 2018
Problem with Export cvs in S3 (apoc.export.csv.query) Neo4j Graph Platform apoc , cypher , operations , export , import , plugin	3	723	April 30, 2021
Problem with Export cvs in S3 (apoc.export.csv.query) Neo4j Graph Platform migrated	1	202	July 21, 2022
Export CSV to S3 fails with "No such file or directory" Import / Export apoc , export	3	641	April 28, 2021

July Summer Fun!

APOC s3 url isnt working for me

Related topics