Bottleneck on apoc.when

guille_rr · September 24, 2021, 2:49pm

Hi all,
I don´t know where is the problem on the query, because the first part is too fast (a few ms) but the second has two when, which takes a long time to consult. (1min)
The first part filters a lot of data, it doesn't seem like the amount of data is the problem. I'm doing something wrong.
Thanks in advance!

PROFILE
//Codigo para sacar el valor minimo del campo cNivelIn de rapm
MATCH (cen:CENTRO{name:'C0010037'})<-[r5:APLICA_DIRECTO_INCLUYE]-(rapc:RAPC)
MATCH (rapc)<-[r4:APLICA_CENTROS]-(cab:CabeceraAP)
//WHERE (COALESCE(rapc.fIniEfe,"") + COALESCE(rapc.cHoraIni,"")) <= '201909050000' <= (COALESCE(rapc.fFinEfe,"") + COALESCE(rapc.cHoraFin,""))
MATCH (cab)-[r3:APLICA_MERCANCIA_C|APLICA_MERCANCIA_D]->(rapm:RAPM)
MATCH (rapm)-[r2]->(n)
MATCH p = (b:BARRA{id:'B001015215407584'})-[r6:PERTENECE_A*..3]->(n) WITH *, relationships(p) AS rel
//OPTIONAL MATCH (b)<-[:PERTENECE_A]-(t:TALLA)
MATCH (cab)-[:APLICA_MERCANCIA_C|APLICA_MERCANCIA_D]->(ra:RAPM)-[]->(n)
WITH ra,cab,n
//buscamos si una cabecera tiene mas de un camino
CALL apoc.when(
  size((cab)-[:APLICA_MERCANCIA_C|APLICA_MERCANCIA_D]->(:RAPM)-[]->(n)) > 1,
  //codigo para cuando una cabecera tiene mas de un rapm, sacar el min del grupo
  'WITH ra.cNivelIn as nivel, ra, cab
		//cambiar la M por 8,5
		CALL apoc.when(
          nivel = "M",
          "RETURN 8.5 as nivel",
          "RETURN nivel as nivel",
          {nivel:nivel}) YIELD value
          WITH min(value.nivel) as min, ra.name as ra, cab.name as cab 
          RETURN min, ra, cab
   ',
  //codigo para cuando una cabecera tiene un solo rapm
  'WITH ra.cNivelIn as nivel, ra.name as ra, cab.name as cab 
  RETURN nivel, ra, cab',
  {ra:ra,cab:cab,n:n}) YIELD value 
RETURN value

The PROFILE:

RESULT

Started streaming 46 records after 20 ms and completed after 65244 ms.

dhruv.sharma8826 · September 25, 2021, 2:23pm

Hi @guille_rr

I am wondering if the WITH is creating a lot of cartesian products that unintentionally increases the input to the next part. Can you check if there is cartesian product being created at that stage?

Regards.

guille_rr · September 27, 2021, 9:18am

Hi @dhruv.sharma8826,

you mean the WITH inside the apoc.when?
I have removed the WITH that is before the apoc because it did not make sense.
.
This is the volume that is generated

but the apoc.when get a lot of records

I can't see the problem.

Thanks a lot
Regards

dhruv.sharma8826 · September 27, 2021, 12:07pm

Hi @guille_rr

In the volume that you shared, you actually are doing a distinct over the nodes that are returned and that's why you aren't able to see the cartesian products being formed here.

ideally what happens is that when we either use distinct or we just return the nodes on the neo4j browser, Neo4j in background very smartly masks the complexity of cartesian products and show the data to us as if all is good.
But when we use the data in our query and we kind of pass that data for further processing(like in your case into an Apoc function), it is carried forward to the next step in form of rows of result. That's where the cartesian product gets created.

In order to exactly see if they are getting created in your case and how much records rows they are actually creating. You need to run the same query that you ran for volume with slight difference.
In the query use-> RETURN cab, ra, n
Instead of RETURN count (distinct cab), count (distinct ra), count (distinct n)

Then on the result section, on the left where we have some tabs(inside the result sub-section), there is an option to select as 'Text' just below 'Graph'. Select that, this shows the row wise result that gets passed on to the next section of the query, or to say the result of your volume query.
See the row count there at the bottom, this will show the actual number of rows getting generated. If they are too high(exponentially high), that means the cartesian product is getting generated at this stage.

Hope this helps.

Regards.

guille_rr · September 27, 2021, 2:24pm

Finally the problem was the size (), I have changed it to a node counter and it does it in 290ms
I do not know why this strange operation of the size ().

I have to take these cases into account:

that's why I used size ()

The final code is:

Thank you for everything!

dhruv.sharma8826 · October 2, 2021, 1:51pm

Great to hear you finally found the solution.

Ok, now I see, the size() would actually traverse the path and then calculated the size for each instance, so more operation to perform and hence takes a larger time, while on the other hand doing a count reduces that extra effort and directly has a count of nodes before its being passed to the apoc function.

Thanks for sharing your findings.

Regards.

Topic		Replies	Views
Tuning Cypher queries by understanding cardinality Cypher performance , cypher , knowledge-base	0	1054	August 23, 2018
Why don´t work this order by desc limit 1 Cypher	11	320	October 27, 2021
Performance problems with apoc.path.expand and apoc.run.cypher Procedures & APOC apoc , performance , cypher	3	351	May 13, 2021
Match query with relationship is taking too long to retrieve results does it mean we need to upgrade Neo4j Graph Platform migrated	11	285	January 5, 2023
Multiple match takes too long to execute Cypher	4	151	June 28, 2024

Take the Course Then Join The Aura Agent Hackathon

Bottleneck on apoc.when

Thank you for everything!

Related topics

Take the Course Then Join
The Aura Agent Hackathon