Dear Neo4j staff,
Thank you for providing the helpful official tutorial comparing Cypher with SQL. As a beginner to graph databases, this tutorial helped me a lot. However, I found there is a potential issue in the example titled “Return customers without existing orders“: the SQL query is NOT equivalent to the Cypher query.
Link to the example: Comparing Cypher with SQL - Getting Started
To reproduce the counterexample.
Neo4j version: 5.27.0
Neo4j platform: latest Neo4j Sandbox.
With the following database:
CREATE TABLE customers (
CustomerID INTEGER,
CompanyName VARCHAR(40)
);
INSERT INTO customers VALUES (0, "Drachenblut Delikatessen");
INSERT INTO customers VALUES (1, NULL);
CREATE TABLE orders (
OrderID INTEGER,
CustomerID INTEGER
);
INSERT INTO orders VALUES (0, 0);
INSERT INTO orders VALUES (1, 0);
CREATE TABLE order_details (
OrderID INTEGER,
ProductID INTEGER,
UnitPrice INTEGER,
Quantity INTEGER
);
INSERT INTO order_details VALUES (0, 1, 10, 1);
INSERT INTO order_details VALUES (0, 0, -2, 1);
CREATE TABLE products (
ProductID INTEGER,
ProductName VARCHAR(40)
);
INSERT INTO products VALUES (0, "A");
INSERT INTO products VALUES (1, "B");
By executing the following SQL query,
SELECT p.ProductName, sum(od.UnitPrice * od.Quantity) AS Volume
FROM customers AS c
LEFT OUTER JOIN orders AS o ON (c.CustomerID = o.CustomerID)
LEFT OUTER JOIN order_details AS od ON (o.OrderID = od.OrderID)
LEFT OUTER JOIN products AS p ON (od.ProductID = p.ProductID)
WHERE c.CompanyName = 'Drachenblut Delikatessen'
GROUP BY p.ProductName
ORDER BY Volume DESC;
the output of the SQL query is:
+-------------+--------+
| ProductName | Volume |
+-------------+--------+
| B | 10 |
| A | -2 |
| NULL | NULL |
+-------------+--------+
However, if we create the same database in graph model
MATCH (n) DETACH DELETE n;
CREATE
(n1:Customer {customerId: 0, companyName: "Drachenblut Delikatessen"}),
(n2:Customer {customerId: 1, companyName: NULL}),
(n3:Order {orderId: 0}),
(n4:Order {orderId: 1}),
(n5:Product {productId: 0, productName: "A"}),
(n6:Product {productId: 1, productName: "B"}),
(n1)-[e1_1:PURCHASED]->(n3),
(n1)-[e2_1:PURCHASED]->(n4),
(n3)-[e3_1:ORDERS {unitPrice: 10, quantity: 1}]->(n6),
(n3)-[e4_1:ORDERS {unitPrice: -2, quantity: 1}]->(n5);
and execute the following Cypher query
MATCH (c:Customer {companyName:'Drachenblut Delikatessen'})
OPTIONAL MATCH (p:Product)<-[o:ORDERS]-(:Order)<-[:PURCHASED]-(c)
RETURN p.productName, toInteger(sum(o.unitPrice * o.quantity)) AS volume
ORDER BY volume DESC;
the output of the Cypher query is:
+-------------+--------+
| ProductName | Volume |
+-------------+--------+
| "B" | 10 |
| "A" | -2 |
+-------------+--------+
The main reason is that three continuous LEFT JOINs are non-equivalent to an OPTIONAL MATCH + a long path pattern, i.e.,
-- SQL
LEFT OUTER JOIN orders AS o ON (c.CustomerID = o.CustomerID)
LEFT OUTER JOIN order_details AS od ON (o.OrderID = od.OrderID)
LEFT OUTER JOIN products AS p ON (od.ProductID = p.ProductID)
-- Cypher
OPTIONAL MATCH (p:Product)<-[o:ORDERS]-(:Order)<-[:PURCHASED]-(c).
Instead, these LEFT JOINs are equivalent to those OPTIONAL MATCHs
OPTIONAL MATCH (order:Order)<-[:PURCHASED]-(c)
OPTIONAL MATCH (p:Product)<-[o:ORDERS]-(order)
You can check this Cypher query
MATCH (c:Customer {companyName:'Drachenblut Delikatessen'})
OPTIONAL MATCH (order:Order)<-[:PURCHASED]-(c)
OPTIONAL MATCH (p:Product)<-[o:ORDERS]-(order)
RETURN p.productName, sum(o.unitPrice * o.quantity) AS volume
ORDER BY volume DESC;
and it produce
+-------------+--------+
| ProductName | Volume |
+-------------+--------+
| "B" | 10 |
| NULL | 0 |
| "A" | -2 |
+-------------+--------+
The 2nd row’s Volume is 0 because Neo4j automatically converts SUM(NULL) to 0.
In your test cases, the CompanyName
attribute in the customers
table may not allow NULL values, which makes the Cypher query equivalent to the SQL query in this specific context. However, this could mislead beginners. I recommend updating the Cypher query to be more semantically consistent with the SQL query.
Thanks for your attention.