Neo4j: 5.26.7 EE, running on Ubuntu Pro 24.04 LTS
file
Well, good morning to everyone!
I got this very simple query
WITH "pikeen" AS owner
WITH pikeen.uuid5(owner,owner) as ownerUuid5
MATCH (pikeen:Owner { uuid: ownerUuid5 })
WITH pikeen, ownerUuid5
LOAD CSV WITH HEADERS
FROM "https://storage.googleapis.com/file.csv"
AS row FIELDTERMINATOR ';'
WITH row, pikeen, toLower(row.url) AS url
WHERE row.uuid <> "#"
WITH row, pikeen, url, pikeen.uuid5(url,row.shortName) as uuid5
MERGE (merchant:Merchant { uuid: uuid5})
SET merchant.shortName= row.shortName,
merchant.uuid= uuid5,
merchant.url= url
MERGE (pikeen)-[:HAS_MERCHANT]->(merchant)
return pikeen, merchant;
Well, if I run this query, it do nothing but returns
WITH "pikeen" AS owner
WITH pikeen.uuid5(owner,owner) as ownerUuid5
MATCH (pikeen:Owner { uuid: ownerUuid5 })
WITH pikeen, ownerUuid5
LOAD CSV WITH HEADERS
FROM "https://storage.googleapis.com/file.csv"
AS row FIELDTERMINATOR ';'
WITH row, pikeen, toLower(row.url) AS url
WHERE row.uuid <> "#"
WITH row, pikeen, url, pikeen.uuid5(url,row.shortName) as uuid5
CALL {
WITH row, pikeen, url, uuid5
MERGE (merchant:Merchant { uuid: uuid5})
ON CREATE
SET merchant.shortName= row.shortName,
merchant.url= url
MERGE (pikeen)-[:HAS_MERCHANT]->(merchant)
return merchant
}
return pikeen, merchant;
I don't have your pikeen module ... so i just substituted it with strings and I didn't get the error - you'll have to see if anyone from Neo picks it up.
You could just load all the csv inside a temporaryLabel and work it out from a MATCH instead (as I understand that EAGER is related to the LOAD CSV and the fact that it doesn't want to load everything into memory)
I didn't understand what do you mean with You could just load all the csv inside a temporaryLabel and work it out from a MATCH. Can you give me a sample?
BTW, the query in any case doesn't run after the EAGER message. It yust returns 0 managed records
The eager seems to be triggered by the SET operation. You could try to include the set as part of the merge with "on create" and "on match". I don't think that removes the eager though. In my plan, it just moved the eager operation.
WITH "pikeen" AS owner
WITH pikeen.uuid5(owner,owner) as ownerUuid5
MATCH (pikeen:Owner { uuid: ownerUuid5 })
WITH pikeen, ownerUuid5
LOAD CSV WITH HEADERS
FROM "https://storage.googleapis.com/file.csv"
AS row FIELDTERMINATOR ';'
WITH row, pikeen, toLower(row.url) AS url
WHERE row.uuid <> "#"
WITH row, pikeen, url, pikeen.uuid5(url,row.shortName) as uuid5
MERGE (merchant:Merchant { uuid: uuid5})
ON CREATE SET merchant.shortName= row.shortName,
merchant.uuid= uuid5,
merchant.url= url
ON MATCH SET merchant.shortName= row.shortName,
merchant.uuid= uuid5,
merchant.url= url
MERGE (pikeen)-[:HAS_MERCHANT]->(merchant)
return pikeen, merchant
As you just predict , it just moves the eager in another place.
2 Things:
CYPHER is becoming worst a day after the other: looks like a spaghetti code salad
if I put this statement CREATE CONSTRAINT Merchant_uuid IF NOT EXISTS FOR (n:Merchant) REQUIRE (n.uuid) IS UNIQUE; at the beginning of the query the Eager problem disappears, AND THEN. as the index already existed, I think this should be a CYPHER interpreter error.
CREATE CONSTRAINT Merchant_uuid IF NOT EXISTS FOR (n:Merchant) REQUIRE n.uuid IS UNIQUE;
It is like If the system create the Constraint (or not create because it already exists!) the following query knows the presence of it and doesn't return the EAGER error
But if I remove the creat Constraint, it doesn't recognize the index and seems to run.
Well, I'm here again:This query run but not execute!!!
the result is **No changes, no records. Completed after 340 ms**
I espected to add 3 or four merchants with their properties.
The real .csv file has 4 lines available
The execution plan for this query contains the Eager operator, which forces all dependent data to be materialized in main memory before proceeding
Using LOAD CSV with a large data set in a query where the execution plan contains the Eager operator could potentially consume a lot of memory and is likely to not perform well. See the Neo4j Manual entry on the Eager operator for more information and hints on how problems could be avoided.
LOAD CSV WITH HEADERS
FROM "https://storage.googleapis.com/pikeen-neo4j/gdpr/merchants.csv"
AS row FIELDTERMINATOR ';'
WITH row, pikeen.uuid5("pikeen","pikeen") as ownerUuid5, "pikeen" AS owner, toLower(row.url) AS url, toLower(row.shortName) AS shortName
MATCH (pikeen:Owner { uuid: ownerUuid5 })
WITH row, pikeen, url, shortName, ownerUuid5, pikeen.uuid5(shortName,url) as uuid5
WHERE row.uuid <> "//"
MERGE (merchant:Merchant { uuid: uuid5})
ON CREATE SET merchant.shortName= shortName,
merchant.uuid= uuid5,
merchant.url= url
ON MATCH SET merchant.shortName= shortName,
merchant.uuid= uuid5,
merchant.url= url
MERGE (pikeen)-[:HAS_MERCHANT]->(merchant)
return pikeen, merchant;
Status code Neo.ClientNotification.Statement.EagerOperator
We add an Eager operator in the plan to make sure that the query behaves correctly. This should not interfere with obtaining the correct results. So, if you did get other results before, maybe there was a bug, that we fixed. The only problem is that it might make your query consume a lot of memory. So, unless you experience Out-Of-Memory problems, the Eager operator is not your problem.
How to "fix" Eager
If you prepend EXPLAIN to your query and have a look at the textual representation of the query plan, you should get a fairly good explanation as to why the Eager operators were put there.
When I ran the query, I got two Eager operators:
One is added because you set merchant.uuid=uuid5, while it is MATCHed on in the MERGE statement above. I do not see why you cannot leave that out, as the MERGE should already make sure it has the right value.
The other is added because you return the whole entities RETURN pikeen, merchant and have set properties before. We have discovered some inconsistencies in how we return entities that requires this Eager. However, do you really need the whole entities? Or would a count(*) or a property suffice?
Regarding no changes
That being said, I understand that you still have a problem of the query not creating the nodes that you expected. I cannot really tell you why that is the problem just from looking at your query but my first hunch would be that the entities (nodes and relationships) already existed and MERGE did not need to change anything. Could that be?
Otherwise, the output from a PROFILE run may be useful together with some information about the data present in the database (Is it empty? Or what is there?)
but my first hunch would be that the entities (nodes and relationships) already existed and MERGE did not need to change anything. Could that be?
It can be true. this is a query running when we would have some new customer: sometimes the new customer is there, sometimes is not and in the latter case, there is nothing to change.
If this is the real problem, I coul relate the query execution at the result of the 'OPTIONAL MATCH (merchant:Merchant { uuid: uuid5})
if the result is not null, I could avoid to execute the rest of the query, while if it null, I can jump at the end.
The drawback is that this doesn't allow me to change attributes....
I'll try and let you know the result
At the same time thank you all for your collaboration and ideas, and to help me understand what really is and why there is the 'EAGER'.