New Database Recreating Unwanted indexes


(A Scerri) #1

Hi

I'm not sure if i have missed something in the documentation or i have some lingering transaction log being replayed causing this but here is what i have done. This is all with 3.5.0 community edition.

I first created a database by importing data (neo4j-admin import) and then created a bunch of indexes (through the cypher shell). This all worked as expected and I was testing the database successfully in various ways, the database being stopped and started many times.

I then wanted to try an alternate graph structure so on the same server i thought that i should just be able to modify the active database name in the config (stopping any server etc first). Create the new database by again importing the data using the CLI import tool (neo4j-admin or neo4j-import). At this point launching the console command it all came up ok, but i did notice when i tried to stop it later there was some error about not being able to rotate logs. However the main thing was i noticed it had gone and created the same number of indexes as my old database (based on the schema/index/native-btree-1.0 directory contents).

What I dont know is why these got recreated, apparently using the old definitions which are then broken because the graph nodes and relations has different properties amongst other changes. I've not found an explanation as to why they would be recreated nor anyway to stop it. I have tried removing the transaction log file from the configured directory and then removed the schema/index subdirectory, but i guess its possible the database now has the definitions baked in and is simply rebuilding them when started.

Looking in the old transaction log it seemed to contain what look like the definitions of the indexes so i assumed maybe it was replaying that bit for some reason. a) not sure why this was even here as i created the original database over a month ago and it was all ok as i previously mentioned b) why would it attempt to retry such old statements on a new database.

I'd love to hear from anyone that has a better understanding of what may have happened, and how to prevent this. I've assumed at this point having dug around that there is no other location some rogue definitions of the indexes could be hiding out. Clearly the database name isn't playing a role in determining what is going on.

Thanks

Tony


(Michael Hunger) #2

Sounds like your two installations got mixed up?
Probably with your "active-database" setting?

Would you have more details on what you did? Perhaps a script.

Also do you have the debug.log with the output of but i did notice when i tried to stop it later there was some error about not being able to rotate logs.


(A Scerri) #3

Hi

Like I say i modified the config to change the active database name, and running the import definitiely put its output into the new database directory. Then launching the neo4j console it clearly opens the new database directory and in the log i could see it start the index creation process and in the directory the schema directory (with index/native-btree-1.0 etc) got created too. So i'm pretty sure it wasnt refering to the old database directory at this point.

I was able to repeat the whole process, this time removing the single transaction log file that i had spotted and this time it did not create any indexes when i got to launching the console step. So it just appears at least when creating databases using the import tools you want to make sure to clear out any transaction logs before launching the database to prevent it attempting to apply it to your new database. Is there maybe some nuance to creating indexes as to when it considers the transaction completed?

Tony


(Michael Hunger) #4

That sounds really odd and dangerous.

Could it be that there was still a server running / while you changed the setting or during the import?

It would be really good to be able to reproduce it.


(A Scerri) #5

Hi

No i made sure to check for any processes running to avoid such situations. So i believe the sequence to reproduce it should be something like this, assuming no processes running to start with and a fresh installation with completed config file and necessary directories:

  1. Set config to db name 1
  2. "neo4j-admin import" OR "neo4j-import" to load data
  3. "neo4j console" to launch database
  4. "cypher-shell" connect to the database and issue create index commands, wait until all 100% complete using (call db.indexes;" to check progress, then quit shell
  5. then stop console

That should get the first one done. I'm doing a bunch of things after this primarily just running traversal queries. I don't think i'm doing anything else to provoke this, no additional updates etc. Then you can repeat steps 1,2 and 3 using a different active database name in the config. After launching the console (step 5) you should see it attempt to recreate the indexes, and looking in the transaction log directory you should find a file.

I've not used this exact sequence above to try and reproduce it myself but that's essentially what i did ignoring unrelated stuff. And to fix it between changing database names having stopped everything i removed the old transaction log file (about 3KB in my case). I don't have much time to look at this specifically it was a blocker at the time until i tried removing the transaction log which allowed me to continue.

Tony


(Michael Hunger) #6

i tried to reproduce it but it doesn't happen for me
here is the script

rm -rf data/databases/*
echo $':ID,name\n42,Douglas Adams' > people.csv
bin/neo4j-admin import --mode csv --nodes:Person people.csv
bin/neo4j-admin set-initial-password test
bin/neo4j start
sleep 30
echo "Creating index"
echo 'create index on :Person(name);' | bin/cypher-shell -u neo4j -p test
sleep 10
bin/neo4j stop
sleep 10
echo 'dbms.active_database=graph.db2' >> conf/neo4j.conf 
bin/neo4j-admin import --database=graph.db2 --mode csv --nodes:Person people.csv
bin/neo4j start
sleep 30
echo "Listing indexes"
echo 'call db.indexes;' | bin/cypher-shell -u neo4j -p test
bin/neo4j stop

(A Scerri) #7

Hi

I'll have a go with that script on my setup. One difference might be the user used to launch things. I also ran "neo4j console" rather than "neo4j start" but i'd hope that shouldnt make too much difference to the running instance. I'll try it with the same data you used above, but will also try it with some bigger data incase the time required to build the indexes (it has 500M nodes and 700M edges, with +700M properties in the graph i was using). But I waited for the indexes to complete before terminating the cypher-shell in my case.

Thanks for looking at this, i'll see if i can pin it down to anything more specific time permitting.

Tony


(Michael Hunger) #8

How did you wait for completion? With call db.awaitIndexes(1000000)?


(A Scerri) #9

I just monitored the results of call db.indexes untill they all reached 100% progress.

Tony