Node labels missing from scan store

Neo4j version: 4.0.0
Driver: Python (py2neo for import)

We use py2neo to import nodes from a CSV using code like:

USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS FROM $csv_path
AS line
FIELDTERMINATOR ','
WITH %s
MERGE (n:has_uri {uri: uri})
ON CREATE SET n :«labels», %s
ON MATCH SET n :«labels», %s

After it's imported, there are some inconsistencies, some nodes do not appear to be returned from indexes for labels that they do have, here's an example:

When we run the consistency checker, we get lots of errors (we ran out of disk space at 10Gb...) of two types:

2020-04-15 11:36:52.499+0000 ERROR [o.n.c.ConsistencyCheckService] This node record has a label that is not found in the label scan store entry for this node

and

ERROR: This node was not found in the expected index.

Rebuilding the labelscanstore as suggested here: Creating a subset graph - #12 by andrew.bowman had no effect.

Upgrading to 4.0.3 had no effect.

We build the database from scratch frequently, and have never experienced this issue before upgrading to Neo4j version 4, and have never successfully built a database on Neo4j version 4, so I think that it is probably due to a change in version 4, although we have not yet run an identical comparison between v3 and v4 to confirm.

Any help would be hugely appreciated. Thank you!

Can you provide a reproducible way showing this issue? Then it should be easily possible to understand what's going on.

Thanks for the response!

We can definitely try - is there any way we can force a reindex of the labels so we can experiment without having to rebuild the entire DB? It's a... slow process!

You can try to simple delete neostore.labeltokenstore.db. It should be rebuild automatically upon next startup. Please take a backup before since this could potentially be harmful.

1 Like

Ah - gave that a try and the result was:

java.lang.RuntimeException: Store files neostore.labeltokenstore.db is(are) missing and recovery is not possible. Please restore from a consistent backup.

Does that mean our only option is to rebuild from scratch?

We can do that, it'll just take a while to isolate a minimal test case - there are ~20 million nodes.

Thanks again Stefan!

I guess rebuilding from scratch is the easiest option.

1 Like