Indexing a string property is taking several days and it doesn't seem near to completion

Hi,

We are indexing a property of a node and the server is doing something (I see increments in the physical disk size of the index) but it is running for several days (more than a week) and it doesn't seem near completion. For me several days of indexing in a powerful machine denotes that something is wrong but I don't know what, so asking help here in order to diagnose the problem.

  • The number of nodes is 999x10^6 , a billion.
  • The property is a string hash of size 64 completely random characters
  • We are using Neo4J Enterprise Causal Cluster created from Google Cloud Marketplace template. The cluster is just 3 core members:
    • The leader has 32 CPU cores and 128Gb of RAM.
    • Followers have 4 CPU cores and 128Gb of RAM
  • Memory is configured as:
dbms.memory.heap.iniitial_size=30600m
dbms.memory.heap.max_size=30600m
dbms.memory.pagecache.size=74800m

as was recommended by neo4j-admin memrec

  • Memory usage for the java process in Leader is reported as 64%
  • CPU usage is just at 10% consistently
  • Index size keeps growing although slowly
neo4j-enterprise-causal-cluster-1-core-vm-1:/var/lib/neo4j/data/databases/graph.db$ while true; do echo "$(date -Iseconds) $(du -ck schema/index/native-btree-1.0/* | grep total)" ; sleep 60; done
2019-04-18T07:06:07+00:00 145382404     total
2019-04-18T07:07:07+00:00 145384212     total
2019-04-18T07:08:07+00:00 145386312     total
2019-04-18T07:09:07+00:00 145388496     total
2019-04-18T07:10:07+00:00 145390420     total
2019-04-18T07:11:07+00:00 145392748     total
  • There is only one index being created at the moment and it has been running for 6 days.

I don't know if these numbers are "normal" for the problem's size, but I would appreciate any help on trying to diagnose if this is expected or if I should tweak some parameter or check any log to discover who us causing this slow down.

Hello Toni,

no this definitely not normal.

Which Neo4j version is this running?
What kind of disk setup, do you have info about the disk performance.
How did you create the index?

How is the I/O load? What kind of disk did you provision?

How big is that store on disk?

Can you add this to your config.

dbms.jvm.additional=-Dorg.neo4j.kernel.impl.index.schema.GenericNativeIndexPopulator.blockBasedPopulation=true

Hi Michael,

  • I'm using Neo4J 3.5.3 from Google Cloud Marketplace (link)
  • As disks I'm using standard disks from Google. You can check specs here https://cloud.google.com/compute/docs/disks/#pdspecs
  • I created the index by ussing a Cypher query from Neo4J Desktop: "CREATE INDEX ON :Transactions(hash)"
  • Store of the DB can be checked below, disk size is 2 terabytes
neo4j-enterprise-causal-cluster-1-core-vm-1:/var/lib/neo4j/data/databases$ du -h *
8.0K    graph.db/schema/index/native-btree-1.0/1/profiles
41G     graph.db/schema/index/native-btree-1.0/1
1.6M    graph.db/schema/index/native-btree-1.0/3/profiles
99G     graph.db/schema/index/native-btree-1.0/3
139G    graph.db/schema/index/native-btree-1.0
139G    graph.db/schema/index
139G    graph.db/schema
156K    graph.db/profiles
4.0K    graph.db/index
736G    graph.db
0       store_lock

The DB was created using neo4j-import tool and after that I launched index creation.

How could I check I/O load for the DB?

I'll add that config to the DB and I will report back.

Ok, yesterday I restarted the DB with

dbms.jvm.additional=-Dorg.neo4j.kernel.impl.index.schema.GenericNativeIndexPopulator.blockBasedPopulation=true

at the beginning it was very fast when I was checking call db.indexes but now it is stuck. I have two cluster to test ideas about how to solve this, one has a leader with 128Gb RAM and the other with 64Gb.

  • In first cluster, index creation is stuck (although slowly advancing) at 37%
  • In second cluster, index creation is stuck (although slowly advancing) at 15%

They both were launched during same hour, and index creation status seems to follow memory ratios. Does that ring any bell on you? I'm running out of ideas. I don't know if it has something to do here, but this machine hasn't any swap memory just RAM

Sorry for the delay, answer from our team:

Took a quick look - looks like they're on 3.5.3? I think they need to be on 3.5.5 or higher to get all the index population fixes.

(and will still need dbms.jvm.additional=-Dorg.neo4j.kernel.impl.index.schema.GenericNativeIndexPopulator.blockBasedPopulation=true)

Hi Michael. I'm running into a similar problem where my index building is taking a prohibitively long time (~7 hours to index a node property on 200M nodes). I've tried using this blockBasedPopulation but building the index with this setting enabled causes an out of memory crash. The index builds for a while (around an hour) with low memory usage. Then memory usage spikes, and the DB gets OOM-killed.

Any ideas on how this could be resolved? I'm using 3.5.6

It looks like this commit in 3.5.7 might fix my issue: https://github.com/neo4j/neo4j/commit/2095b378bcb4653e54a4855c804159f6206ff7c6

Giving it a try now.

It works! Index that was taking 7 hours to build now takes 45 minutes, and uses pretty much constant memory.

2 Likes

Also affecting this is the type of disk, OS and Drivers/Firmware.
Data imported on Ubuntu and then indexed took a few hours.
Same data and database imported onto AWS Linux or CentOS 7.4 to less than 10 minutes.
Same SSD and IO provisioning on both OS's
Better drivers on some than others.

This was happening to my database as well, with 488 GB RAM on AWS i3.16xlarge, after 45 min the entire db was being killed. We are moving to 3.5.7 now to see if it fixes the issue.

Update - 3.5.7 has fixed it, the machine no longer dies on index creation.

1 Like