Concept Questions

minhan_lmh
Node

Hi everyone. I’m new to Neo 4j. Do pardon me if its a fundamental question.
I’m at the part where I’m learning about creating constraints and indexes.

  • May I know when is it necessary to create constraints as I can’t quite understand the logic behind the methods?
  • When I upload a data, does Neo4j automatically index the documents? If not, does it mean that I have to manually create index for every single property?
  • Under the hood, does Neo4j work like elasticsearch?
1 ACCEPTED SOLUTION

It may help to know what your definition of "indexing" is. I have a feeling that the way ElasticSearch uses the term may be different than how we use it.

In Neo4j, indexes are used to perform fast lookups of nodes (or relationships, with regard to the fulltext schema index) based on the label of the node and one or more indexed properties. For example, if many of your queries require lookup of :Person nodes by their name property then an index on :Person(name) will speed up those node lookups.

When the given property (or properties) are provided inline in the pattern or in a supported predicate in the associated WHERE clause, then the planner will find the node(s) via an index lookup, which is much more efficient than a label scan (checking all nodes of the given label for those where the property/properties match) or an all nodes scan (same thing but for all nodes in the graph).

Index lookups are only used to find starting places in the graph (the planner has to decide which one or ones to use if there are multiple candidates). After the starting nodes are found the remainder of the traversals and matches in the query are usually performed via index-free adjacency, pointer-hopping between node records and their connected relationship records. Indexes are not used for this (as opposed to table joins in RDMBS).

Also, when you create a unique constraint it will also create an index on the label/property.

View solution in original post

2 REPLIES 2

dana_canzano
Neo4j
Neo4j

A constraint it typically created when you want to enforce uniqueness. Lets say you have a dataset which contains Employee records, and each employee is assigned a unique employee ID. You might create a constraint create constratint on (n:Employee) assert (n.emp_id) is unique; so that if one tries to create multiple :Employee nodes with the same value for the property named emp_id, the constraint would prevent the creation of the 2nd-nth Employee record with this same emp_id

Neo4j does not index anything by default. You would want to index but you would not want to simply and blindly index every property. Like any database, indexes have a performance hit on data load. If you are loading 10 rows of data and there are no indexes involved with this data then its basically 10 writes. But if you takes these same 10 rows and you index all 50 properties for each rows then its now basically 510 writes (i.e. for each row there is 1 write to create the row and 50 writes to update the associated indexes).
Generally you might want to index properties which are used in where clauses.
For example a match (n:Employee) where n.emp_id='00012' return n; would benefit if there was an index in :Employee(emp_id);

It may help to know what your definition of "indexing" is. I have a feeling that the way ElasticSearch uses the term may be different than how we use it.

In Neo4j, indexes are used to perform fast lookups of nodes (or relationships, with regard to the fulltext schema index) based on the label of the node and one or more indexed properties. For example, if many of your queries require lookup of :Person nodes by their name property then an index on :Person(name) will speed up those node lookups.

When the given property (or properties) are provided inline in the pattern or in a supported predicate in the associated WHERE clause, then the planner will find the node(s) via an index lookup, which is much more efficient than a label scan (checking all nodes of the given label for those where the property/properties match) or an all nodes scan (same thing but for all nodes in the graph).

Index lookups are only used to find starting places in the graph (the planner has to decide which one or ones to use if there are multiple candidates). After the starting nodes are found the remainder of the traversals and matches in the query are usually performed via index-free adjacency, pointer-hopping between node records and their connected relationship records. Indexes are not used for this (as opposed to table joins in RDMBS).

Also, when you create a unique constraint it will also create an index on the label/property.