Is it possible to get some sort of overview regarding the way ONLINE indices are created and used in neo4j?
I am importing a large number of items (order of magnitude of millions) and I was wondering if in the long term it would be worth triggering the indexing manually OR break the data import in batches, to allow the indices to include the latest items loaded (?).
The specific questions that I have are:
Is it possible to manually trigger the indexing process? (So that it starts working in the background earlier than it would be scheduled).
Are the indices block or incremental? So, do indices need to operate on a block of data to be more efficient or do they operate incrementally?
Is there any list that outlines what sort of indices are used for specific data types? (e.g. the spatial indices are space filling curves. Is it safe to assume that anything else is probably a B-Tree?)
if you create an index and have no data it becomes online immediately.
From then all operations/transactions will use the index for reading and writing.
If you have already data, the index will undergo a concurrent background population phase until after which it's switched online.
So if you import data that needs to read from the index (e.g. to create relationships or assert uniqueness) then create the index upfront, before that part and make sure it's online.
There are procedures like db.awaitIndexes(timeout) to wait for that.
And for resampling indexes (which affects which selectivity they report to the cypher planner) you can use call db.resampleOutdatedIndexes()
Thanks, this is useful, although it won't help too much with the specific problem that motivated this question.
What I mean with 2 is, does the index get updated with each write operation or does it wait for N write operations and then sorts them out? But if they are k-trees then they probably get updated on each write.