Import Yelp dataset

shivanandiyer · May 15, 2019, 2:55am

We managed to sort some of those issues out for now by loading data using cypher instead of python.
How long did it take for you for load the complete Yelp dataset? Loading the business.json took me around 7 hours with heapsize configured to 12G and pagecache size 6GB. I'm running neo4j desktop on my laptop - 4 core, 32GB

This is what I ran. Wondering if setting the batch size and parallel = true would have made some difference.
CALL apoc.load.json('file:///business.json')
YIELD value
WITH value
MERGE (b:Business {id:value.business_id})
SET b += apoc.map.clean(value, ['attributes','hours','business_id','categories','address','postal_code'], )
WITH b,value.categories as categories
UNWIND categories as category
MERGE (c:Category{name:category})
MERGE (b)-[:IN_CATEGORY]->(c);

Topic		Replies	Views
Import Yelp with instructions from github yelp-graph-algorithms Import / Export	0	807	April 12, 2019
Updates to the graph algorithms docs, chapter 3.4 import yelp dataset Documentation knowledge-base	9	842	April 15, 2020
Yelp DB load is taking a long time. Is it hung or how to assess if will fail. What option can i use to fasten the process Import / Export import	0	131	April 28, 2022
Yelp dataset missing from Neo4j Sandbox Graph Academy	5	679	January 21, 2020
Yelp Dataset Import Issue Graph Algorithms/Graph Data Science	0	617	July 18, 2019

Import Yelp dataset

Related topics