My graph db has only 130,000 nodes and 5 labels. I do this simple counting:
MATCH (m:Product) WHERE m.effect='corloring' AND m.source='web' RETURN count(m) as count
In the Neo4j Brower, I can notice it below after the query: Started streaming 1 records in less than 1 ms and completed after 107 ms.
A '107 ms' time seems long for a single query for a tiny graph. I am on a powerful Mac Pro machine. Is this normal for neo4j query? For offline queries, it's fine.
If I build indexes for the 2 properties in the WHERE clause, would be it much faster?
Hi, The query may take a long time the first time but it could be less afterwards. The indexes can improve the query. You could create an compose index.
Yes that's normal. Putting an index on the two properties you're using to filter will speed that up.
As @jggomez mentioned, the first time the query runs it'll typically be a lot slower. The second and subsequent runs the query plan is cached so returns much quicker.
Also, to use the query caching, if you are going to pass in the effect or source as more than just "coloring" and "web", you will want to make use of parameters as they will allow for reuse of the qeury via the cache.
Example of what I mean:
This is better
MATCH (m:Product) WHERE m.effect = $effect AND m.source = $source
RETURN count(*)
than different queries like this
MATCH (m:Product) WHERE m.effect = 'coloring' AND m.source = 'web'
RETURN count(*)
MATCH (m:Product) WHERE m.effect = 'coloring' AND m.source = 'print'
RETURN count(*)
Does the $effect and $source make a difference? In my code they are already variables, since they are passed into the function via the generic variable name 'property_name' and 'property_value'. In this case, property_name is 'effect' and value is 'coloring'.
Par of the function looks like:
def create_where_clause(WHERE, channel, properties):
property_clause = ''
for property in properties:
property_clause += " m." + property.name + "='" + property.value + "'" + " AND"
if property_clause:
if not WHERE:
WHERE = "WHERE " + property_clause
else:
WHERE += property_clause
if WHERE and WHERE.endswith(' AND'):
WHERE = WHERE[:len(WHERE) - 4]
return WHERE
I dont know whether this way is what you meant by using variables. To my understanding, the cypher statement composed this way is still a raw string eventually.
By the time cypher is executed by tx.run(cypher), it already becomes an explicit cypher query without variables, even though variable names are used to parsing and composing the query statement.
It has to be a variable when it reaches neo4j for it to work as a variable.
So what you have, as you said, is still an explicit string with no variables and would not take advantage of the query caching.
Although, depending on size, and indexes, you might not need it.
So, as you can see, my whole cypher is composed based on function arguments, and these arguments can be optionally null. Take the code above as one example, both 'properties' and channel can be None, and 'properties' is a list of property.name and property.value, and whole WHERE clause can be optional. The template of my query composition is:
Query nodes with a relation specification: any node pairs with this relation. The label is the node type.
node_of_relation(relation=RelType.brandHasTopProd, label=WBChannel.Computer, limit=3)
Query nodes(neighbors) with a specification of a property (whatever relations between the node and target nodes). The property is to identify the starting node. node_of_relation(property='id=00001', label=WBChannel.Computer, limit=3
Query nodes with a specific relation specification, and the starting node with the property specification. The property string is parsed as a list of property_name and property_value pair. node_of_relation(relation=RelType.competitiveProduct, property='color=White;price=$100')
Any of the node_of_relation could be optional and it is very flexible for users to use. and that's how the way my cypher statement is composed with the template of: