Updates to the graph algorithms docs, chapter 3.4 import yelp dataset

Hi, I found the following parts to need updating on chapter 3.4 import in the graph algorithms documentation:

  • the names of the files are not correct anymore, any .json file of the yelp dataset now has a yelp_academic_dataset_ before the old name
  • the import script for tips calls a map value that no longer exists. likes has been replaced by compliment_count
    more errata in the comments if I find other things that don't work :slight_smile:

link to the specific chapter of docs: The Neo4j Graph Data Science Library Manual v2.2 - Neo4j Graph Data Science

PS: I hope the style is ok for you guys and the information is sufficiently traceable, if not feedback is always appreciated, it's my first post here.

Hi Florian,

Welcome to the community!!

Many thanks for reporting this - we'll get it reviewed and fixed!

Regards, David

1 Like

Hi David!
thanks - nice to meet you :slight_smile:

I found another issue when working through the example: extracting categories from the business.json file is inconclusive.
I split the original command in two parts:
from:

CALL apoc.periodic.iterate("
CALL apoc.load.json('file:///dataset/business.json') YIELD value RETURN value
","
MERGE (b:Business{id:value.business_id})
SET b += apoc.map.clean(value, ['attributes','hours','business_id','categories','address','postal_code'],[])
WITH b,value.categories as categories
UNWIND categories as category
MERGE (c:Category{id:category})
MERGE (b)-[:IN_CATEGORY]->(c)
",{batchSize: 10000, iterateList: true});

to:

CALL apoc.periodic.iterate("
CALL apoc.load.json('file:///dataset/business.json') YIELD value RETURN value
","
MERGE (b:Business{id:value.business_id})
ON CREATE SET b += apoc.map.clean(value, ['attributes','hours','business_id','categories','address','postal_code'],[])
",{batchSize: 10000, iterateList: true});

and

CALL apoc.periodic.iterate("
CALL apoc.load.json('file:///dataset/business.json') YIELD value RETURN value
","
WITH value.categories as categories
UNWIND categories as category
MERGE (c:Category{id:category})
",{batchSize: 10000, iterateList: true});

I left out creating the links from categories to businesses, so the second command only creates the categories. After waiting for one hour I quit the command. Unfortunately, there's no error message provided. having reviewed some of the entries manually, there's not thousands of categories in each business, and noting that creating the businesses finished within less than 20s, I conclude that sth is wrong here, but I don't quite get what.

sorry for not being able to provide a more conclusive observation.

same happens for the user query - it does not complete but does not add any new users either.
finding out whether the query is still active wsa done via match(u:User) return count(u)
the number I currently have are 1979059 users, for categories it's 86727. both don't increase if you rerun them, while the queries that should add users and categories are still running, even after hours.

Hey Florian,

Thanks again! I've passed this onto the Graph Data Science engineering team to investigate.

Regards,
David

hey david, thanks to you too :) after letting it run over the afternoon, the neo4j browser gui is stuck, same as the neo4j desktop interface. both don't react anymore. however, it seems one of them produced more results, since connecting with a new browser window returns a few more categories. I cannot tell you how long it took to work the command overall, since the original window does not answer anymore...

Hi @florian.schummer! We've actually just released the Graph Data Science library (GDS) which includes all the algorithms from the algos library, but lots of improvements to performance, usability, and user experience.

As such, we won't be updating the older algos library -- but the new docs are here The Neo4j Graph Data Science Library Manual v2.2 - Neo4j Graph Data Science

(you can download GDS from the Neo4j download center or from Desktop)

Hi Alicia,

thanks for the update - I've actually already been checking it out and think it's great (also the data science play ground, a really useful tool).

Are there guesses yet as to when a v4.0 compatible GDS-library will be released?

April 23! Coming very soon -- I'll post an update on the community forum, and it will show up on the download center and desktop as soon as the preview release is out.