Bottleneck on apoc.when

Hi all,
I don´t know where is the problem on the query, because the first part is too fast (a few ms) but the second has two when, which takes a long time to consult. (1min)
The first part filters a lot of data, it doesn't seem like the amount of data is the problem. I'm doing something wrong.
Thanks in advance!

PROFILE
//Codigo para sacar el valor minimo del campo cNivelIn de rapm
MATCH (cen:CENTRO{name:'C0010037'})<-[r5:APLICA_DIRECTO_INCLUYE]-(rapc:RAPC)
MATCH (rapc)<-[r4:APLICA_CENTROS]-(cab:CabeceraAP)
//WHERE (COALESCE(rapc.fIniEfe,"") + COALESCE(rapc.cHoraIni,"")) <= '201909050000' <= (COALESCE(rapc.fFinEfe,"") + COALESCE(rapc.cHoraFin,""))
MATCH (cab)-[r3:APLICA_MERCANCIA_C|APLICA_MERCANCIA_D]->(rapm:RAPM)
MATCH (rapm)-[r2]->(n)
MATCH p = (b:BARRA{id:'B001015215407584'})-[r6:PERTENECE_A*..3]->(n) WITH *, relationships(p) AS rel
//OPTIONAL MATCH (b)<-[:PERTENECE_A]-(t:TALLA)
MATCH (cab)-[:APLICA_MERCANCIA_C|APLICA_MERCANCIA_D]->(ra:RAPM)-[]->(n)
WITH ra,cab,n
//buscamos si una cabecera tiene mas de un camino
CALL apoc.when(
  size((cab)-[:APLICA_MERCANCIA_C|APLICA_MERCANCIA_D]->(:RAPM)-[]->(n)) > 1,
  //codigo para cuando una cabecera tiene mas de un rapm, sacar el min del grupo
  'WITH ra.cNivelIn as nivel, ra, cab
		//cambiar la M por 8,5
		CALL apoc.when(
          nivel = "M",
          "RETURN 8.5 as nivel",
          "RETURN nivel as nivel",
          {nivel:nivel}) YIELD value
          WITH min(value.nivel) as min, ra.name as ra, cab.name as cab 
          RETURN min, ra, cab
   ',
  //codigo para cuando una cabecera tiene un solo rapm
  'WITH ra.cNivelIn as nivel, ra.name as ra, cab.name as cab 
  RETURN nivel, ra, cab',
  {ra:ra,cab:cab,n:n}) YIELD value 
RETURN value

The PROFILE:





image

RESULT

Started streaming 46 records after 20 ms and completed after 65244 ms.

Hi @guille_rr

I am wondering if the WITH is creating a lot of cartesian products that unintentionally increases the input to the next part. Can you check if there is cartesian product being created at that stage?

Regards.

Hi @dhruv.sharma8826,

you mean the WITH inside the apoc.when?
I have removed the WITH that is before the apoc because it did not make sense.
.
This is the volume that is generated

but the apoc.when get a lot of records

I can't see the problem.

Thanks a lot
Regards

Hi @guille_rr

In the volume that you shared, you actually are doing a distinct over the nodes that are returned and that's why you aren't able to see the cartesian products being formed here.

ideally what happens is that when we either use distinct or we just return the nodes on the neo4j browser, Neo4j in background very smartly masks the complexity of cartesian products and show the data to us as if all is good.
But when we use the data in our query and we kind of pass that data for further processing(like in your case into an Apoc function), it is carried forward to the next step in form of rows of result. That's where the cartesian product gets created.

In order to exactly see if they are getting created in your case and how much records rows they are actually creating. You need to run the same query that you ran for volume with slight difference.
In the query use-> RETURN cab, ra, n
Instead of RETURN count (distinct cab), count (distinct ra), count (distinct n)

Then on the result section, on the left where we have some tabs(inside the result sub-section), there is an option to select as 'Text' just below 'Graph'. Select that, this shows the row wise result that gets passed on to the next section of the query, or to say the result of your volume query.
See the row count there at the bottom, this will show the actual number of rows getting generated. If they are too high(exponentially high), that means the cartesian product is getting generated at this stage.

Hope this helps.

Regards.

Finally the problem was the size (), I have changed it to a node counter and it does it in 290ms
I do not know why this strange operation of the size ().

I have to take these cases into account:

image
that's why I used size ()

The final code is:

Thank you for everything!

1 Like

Great to hear you finally found the solution. :slight_smile:

Ok, now I see, the size() would actually traverse the path and then calculated the size for each instance, so more operation to perform and hence takes a larger time, while on the other hand doing a count reduces that extra effort and directly has a count of nodes before its being passed to the apoc function.

Thanks for sharing your findings.

Regards.