I'm currently working on a university project where I need to export my existing Neo4j Graph as RDF triples.
Right now I'm trying to figure out the best way to do that given that the database is quite large (11mil nodes, 1bil relationships).
I already found the blog post series of Jesús Barrasa, where he exported small parts of the neo4j database with the neosemantics plugin and HTTP endpoints.
So here's my question: Would that also be the way to go for exporting the complete database as RDF triples? Like defining a cypher query for querying all the nodes with all the relationships and sending it per post requenst against the HTTP endpoint? Or is there a better way to it?
Thankful for every response!
Hi, maybe a little late.
With a short python script you can query neo4j to get node by node with rdf endpoint, /rdf/<database_name>/describe/<node_id> gives all the information associated with the node, put it in one or more files (if you use more than one file be sure to write de prefixes on each new file) considreing the size can be quite large.
My export is around 17 million nodes and 200 million relationships, got a 19 gb ttl rdf file. Compressed to 1.8gb in ZIP file.
Note: It is extremely slow, it took almost 24 hours non stop processing as python uses only one core. You can use thread jobs slicing the nodes to query by number, e.g. 2 threads, one with the data in the upper half and the other with the data from lower half: each thread will query for different node ids.