Mass File Import from Google Drive


(Michael McKenzie) #1

I have 2 previous topics on this site LOAD CSV & Google Sheets and UNWIND and apoc.periodic.iterate where I am working with loading many files currently stored in a Google Drive Folder.

I am now at a point where I would like to "combine" these efforts. The issue I now face is that I have converted some data into google sheets from CSV files, which I did to practice using the Google Sheets method. Now though I have many more files (~50 files/folder x 20+ folders) and it doesn't seem efficient to make all of the comma-delimited ".txt" files into Google Sheets files.

I am currently stuck on being able to import the ".txt" from Google Drive. I figure I am messing up the sharing of the file properly from Google Drive.

Also, I am trying to import into a cloud instance from a Chromebook and Macbook Pro (my 2 devices).

Any thoughts?


(M. David Allen) #2

Just a thought, but I sometimes use the q tool when I'm dealing with masses of CSV. It allows you to query text/csv basically like it was a relational database with SQL directly.

http://harelba.github.io/q/

In your situation what I might do is combine all of the files of like schema so that I ended up with one really big CSV file per schema type, and then LOAD CSV each of those USING PERIODIC COMMIT, this way you'd have relatively few URLs to deal with.

the other way to do it is to create a "meta manifest". If you have a pile of files in a directory you can list one per line like this:

ls -1

Then what you basically have is another CSV with a single column of the file that you want to load. So then you call apoc.load.csv("file:///path/to/manifest.csv") and you end up with a dynamic list of files, which you then feed into the LOAD CSV process.


(Michael McKenzie) #3

@david.allen I will have to check that out. What about just importing a .csv file from Google Drive? Is there a special trick as to what should be used for the URL in the LOAD CSV cypher query?


(M. David Allen) #4

I don't think there's any special trick, except getting Google drive to be OK with "link sharing". Doing this manually per file seems like it'd be a pain, unless it were automated with a google API or something.