We've been slowly building up a Neo4j database (v.3.4.4 community edition) of genetic analyses. Up until the past week, we've been working with relatively small datasets and fleshing out the relationships, etc. Everything had been going quite well, so we decided to start loading considerably more data to see if the performance stayed consistent.
Now, some of the compound indexes in place appear to have stopped being used, which is quite disconcerting. I'm hoping with some information either the err of our ways can be pointed out or I can learn whether or not something else is wrong.
One of the node labels is :Variant
, which would look like so:
{
name: "X:843593:C:TG",
chromosome: "X",
position: 843593,
reference: "C",
alt: "TG"
}
The :Variant
label also has the following, compound index defined (and I can confirm its existence with call db.indexes()
):
CREATE INDEX ON :Variant(chromosome, position)
Again, up until recently, this seemed to be working fine with about 1M variant nodes in the database. After loading ~8M more variant nodes (and ~100M other analyses nodes), even a simple query like counting variants in a range takes way too long:
MATCH (n:Variant)
WHERE n.chromosome="8"
AND n.position>2000000
AND n.position<3000000
RETURN count(n)
The above query can take > 6 seconds to run, and only counted ~6000 nodes. When I go to explain/profile the query, I get the following output (note how chromosome is nowhere in the filter, but just a simple test on the position only):
It's possible that it's never used the compound index and we previously had so few nodes that the performance cost of counting everything was negligible. Using an index hint does not help either, although I wouldn't expect it to for a simple query like the above (it either knows of the index and uses it where appropriately or it doesn't think it can).
So... what are we doing wrong?
- Was the compound index created incorrectly?
- Do compound indexes not work w/ the community edition?
- Is there some quirkiness related to compound indexes to be aware of?
- Before uploading a slew of new nodes we deleted the old ones to make a minor change; could this have invalidated the old index if no
:Variant
nodes existed for a period of time? - Something else?
As a final point of reference, a MySQL table of variants with the same compound index, 10x the number of records, and on similar hardware returns the count in < 100 ms. This isn't to knock Neo4j, but rather it's my personal gut-check to know if what I'm seeing is within the realm of reasonable, and right now it isn't. So, I'm sure that something is either setup wrong or not correctly being used.
Thanks for any insights!