Indexing mongo IDs

jaideep.das · July 24, 2019, 10:13am

I have nodes with properties as follows:

uid: this mongo object ID string.
typeName: a regular string field.

I have created the following indexes:

create index on :Document(uid)
create index on :Document(typeName)
create index on :Document(uid, typeName)

Now the issue I am facing is when I try to run the following query, its very slow with all the indexes in place.

profile match (doc1:Document)-[*]->(doc2:Document)
where doc1.uid = "58ad88227faf733d28c8c4ee" and doc2.typeName = "Disease"
return doc2

Once I drop the following indexes and retain the typeName index the same query runs blazing fast.

drop index on :Document(uid)
drop index on :Document(uid, typeName)

Seems like when it tries to use the uid index its slowing things down. Why would that be.

dana_canzano · July 24, 2019, 4:44pm

What version of Neo4j?
Do you have the results of the PROFILE / EXPLAIN to show what it is doing?
I would expect your initial experience when having both indexes should result in to NodeIndexSeeks? if it does not you may want to add a USING INDEX hint and thus

profile match (doc1:Document)-[*]->(doc2:Document)
using index doc1:Document(uid) 
where doc1.uid = "58ad88227faf733d28c8c4ee" and doc2.typeName = "Disease"
return doc2

andrew_bowman · July 24, 2019, 6:30pm

We'll need PROFILE plans of the queries to see what's going on.

My guess though is that when starting from doc2, index lookup is fast, and following the path back from that document to doc1 and filtering is an easy operation, likely there aren't many paths to evaluate.

We might see that if we start from doc1 that there could be a great many paths to consider when finding doc2, so it could be the graph structure, and the number of possible paths out, that may be slowing this down, and not something that's directly index related. The plans may confirm this.

I should also point out that this approach for the query likely won't meet your needs, as the query is looking for all possible paths between these nodes (not just one...it won't stop until all possible paths are explored and evaluated).

Since you're just trying to see if the two nodes are connected, try using shortestPath() instead (it will stop when the first path is found), it will do a bidirectional bfs expansion and may be faster, depending on the graph structure between them:

profile 
match shortestPath((doc1:Document)-[*]->(doc2:Document))
where doc1.uid = "58ad88227faf733d28c8c4ee" and doc2.typeName = "Disease"
return doc2

jaideep.das · July 25, 2019, 9:15am

Here are the profile sceenshots.

with only typename index

with uid and typename index both

andrew_bowman · July 25, 2019, 7:44pm

Yep, that confirms my assumptions. Note that the expand when we start from doc1 generates 54k rows of results, and it takes a bit of time for that expand and filtering down, and when we expand from doc2 instead there are only 11 rows.

Now try profiling with the query I provided using shortestPath().

jaideep.das · July 26, 2019, 10:46am

With both uid and typename index in place

with only typename index in place

could it be that as typename index doesn't have as many index as uid index, with only typename index its faster. or could it be the mongo IDs in uid index is hitting some kinda size limitation.

andrew_bowman · July 26, 2019, 6:08pm

No, this has nothing to do with indexing itself, the lookups of just the nodes are fast in either case.

The problem is the number of possible path matches expanding the pattern from one node vs the other. There are 54k possible paths to evaluate when expanding from doc1, which filters down to only 9, but it's doing a ton of work that it doesn't need to do. You DON'T want to find and filter every possible path between the nodes, you just want to know if they're connected, and shortestPath() uses a bidirectional bfs expansion to do this, and once a single path is found it stops looking, where in your original query it's exhaustive and will find all possible paths between the pair of nodes, not stopping after the first is found.

jaideep.das · July 29, 2019, 8:45am

Understood thanks for the suggestion

Topic		Replies	Views
Performance question - how to work with indexes Cypher performance , cypher , data-modeling	5	397	November 25, 2021
Fast with Properties, Slow with elementId() — Why? Fixable? Cypher performance , cypher	5	179	June 19, 2025
Using indexed nodes and simple queries my cypher queries are still taking around 500ms. Can this be further optimized? Looking for advice Cypher performance , cypher	5	444	December 18, 2020
How to get query plan and how to create and use an index on id? Cypher	10	511	August 4, 2020
Indexes required to make this query fast Cypher	1	611	September 2, 2019

Indexing mongo IDs

Related topics