Batched Transaction

Hello everyone, I’m working on a recommendation engine usecase at production level. Though I’m new to neo4j, liking it so far.

I had a use case to fetch products sorted based on geography(nearest first), I thought to make use of spatial functions like distance and point to sort them.

For that we need latitude and longitude properties in Product nodes, which we don’t have yet, but we have a postal address, so we figured, we’ll make use of call apoc.spatial.geocodeOnce, fetch and set latitude and longitude properties in all the Product nodes.

Well, a bit of problem is it’s becoming an expensive operation, we have around 5000 product nodes, it’s taking an average 1000 ms to update each node, by that calculation, on a single core, it’d take around 90 min to update all the nodes. I was really curious to know if there’s a smarter way to handle the transaction in chunks(may be, updating 500 products in one tx then, next 500… and so on). I thought, apoc.periodic.iterate is a way to go but I was looking for suggestions how do we solve this problem efficiently?

P.S - When I tried out a few apoc.spatial.geocodeOnce calls to couple of products with the postal address we have in our db, I saw a couple of calls returned no result, what could be the possible reasons for this? (may be, we don’t have standardised postal addresses for such products?, if so how shall we address the problem, shall we make use of google geocode api for those products or there are other smarter ways embedded in neo4j)
Thanks in advance :slight_smile:

Adding cypher for a reference - it's taking quite high time to complete

CALL apoc.periodic.iterate(
  "MATCH (p:Product) return p", 
  "CALL apoc.spatial.geocodeOnce(p.postal_address) YIELD location SET p.latitude=location.latitude, p.longitude=location.longitude",
  {batchSize:500, iterateList:true, parallel:true}
);

You can configure the throttle in neo4j.conf

just be aware that this might block you from the API
otherwise you can also purchase an google maps API key