Updates to the graph algorithms docs, chapter 3.4 import yelp dataset

Hi, I found the following parts to need updating on chapter 3.4 import in the graph algorithms documentation:

  • the names of the files are not correct anymore, any .json file of the yelp dataset now has a yelp_academic_dataset_ before the old name
  • the import script for tips calls a map value that no longer exists. likes has been replaced by compliment_count
    more errata in the comments if I find other things that don't work :slight_smile:

link to the specific chapter of docs: https://neo4j.com/docs/graph-algorithms/current/yelp-example/?&_ga=2.73701264.321930910.1585048967-292239439.1580847145&_gac=1.14886786.1581886550.EAIaIQobChMIsuqZyfrW5wIVjLB7Ch2qtgDZEAEYASAAEgLwV_D_BwE#yelp-import

PS: I hope the style is ok for you guys and the information is sufficiently traceable, if not feedback is always appreciated, it's my first post here.

Hi Florian,

Welcome to the community!!

Many thanks for reporting this - we'll get it reviewed and fixed!

Regards, David

1 Like

Hi David!
thanks - nice to meet you :slight_smile:

I found another issue when working through the example: extracting categories from the business.json file is inconclusive.
I split the original command in two parts:
from:

CALL apoc.periodic.iterate("
CALL apoc.load.json('file:///dataset/business.json') YIELD value RETURN value
","
MERGE (b:Business{id:value.business_id})
SET b += apoc.map.clean(value, ['attributes','hours','business_id','categories','address','postal_code'],[])
WITH b,value.categories as categories
UNWIND categories as category
MERGE (c:Category{id:category})
MERGE (b)-[:IN_CATEGORY]->(c)
",{batchSize: 10000, iterateList: true});

to:

CALL apoc.periodic.iterate("
CALL apoc.load.json('file:///dataset/business.json') YIELD value RETURN value
","
MERGE (b:Business{id:value.business_id})
ON CREATE SET b += apoc.map.clean(value, ['attributes','hours','business_id','categories','address','postal_code'],[])
",{batchSize: 10000, iterateList: true});

and

CALL apoc.periodic.iterate("
CALL apoc.load.json('file:///dataset/business.json') YIELD value RETURN value
","
WITH value.categories as categories
UNWIND categories as category
MERGE (c:Category{id:category})
",{batchSize: 10000, iterateList: true});

I left out creating the links from categories to businesses, so the second command only creates the categories. After waiting for one hour I quit the command. Unfortunately, there's no error message provided. having reviewed some of the entries manually, there's not thousands of categories in each business, and noting that creating the businesses finished within less than 20s, I conclude that sth is wrong here, but I don't quite get what.

sorry for not being able to provide a more conclusive observation.

same happens for the user query - it does not complete but does not add any new users either.
finding out whether the query is still active wsa done via match(u:User) return count(u)
the number I currently have are 1979059 users, for categories it's 86727. both don't increase if you rerun them, while the queries that should add users and categories are still running, even after hours.

Hey Florian,

Thanks again! I've passed this onto the Graph Data Science engineering team to investigate.

Regards,
David

hey david, thanks to you too :) after letting it run over the afternoon, the neo4j browser gui is stuck, same as the neo4j desktop interface. both don't react anymore. however, it seems one of them produced more results, since connecting with a new browser window returns a few more categories. I cannot tell you how long it took to work the command overall, since the original window does not answer anymore...