I am looking to test different structures of my knowledge graph and one consistent piece of advice I have received is to turn otherwise categorical node properties into separate nodes.
However, the CSVs imported and used to create my current graph were created by someone else and subsequently changed/corrupted so I can no longer edit them manually to just create new CSV files designating new nodes, relationships, and the appropriate headers for easy import with the import tool.
Recreating the CSVs will take an enormous amount of time due to volume of data so I am wondering if there is a simple cypher query that can accomplish this task.
I tried applying the solution but received the following error throwback: " Neo.ClientError.Procedure.ProcedureCallFailed: Failed to invoke procedure apoc.refactor.categorize: Caused by: java.lang.StackOverflowError
"
For some context, I am working with neo4j community edition 3.5.3. To test the apoc command above I copied my graph and edited my .conf file to point the active database parameter to my copied db instead of the original so I could test the property explosion. (graph_explode.db is the new graph, graph.db is the original graph.
It looks I'm getting two different sets of errors, one from this morning and one from yesterday evening.
Yesterday evening I was getting warnings from procedures regarding plugin loading failures such as:
WARN [o.n.k.i.p.Procedures] Failed to load apoc.util.s3.S3URLConnection from plugin jar /home/ubuntu/neo4j-community-3.5.3/plugins/apoc-3.5.0.2-all.jar: com/amazonaws/ClientConfigurat
WARN [o.n.k.i.p.Procedures] Failed to load apoc.data.email.ExtractEmail from plugin jar /home/ubuntu/neo4j-community-3.5.3/plugins/apoc-3.5.0.2-all.jar: javax/mail/internet/AddressException
WARN [o.n.k.i.p.Procedures] Failed to load com.jayway.jsonpath.spi.json.GsonJsonProvider from plugin jar /home/ubuntu/neo4j-community-3.5.3/plugins/apoc-3.5.0.2-all.jar: com/google/gson/JsonElement
WARN [o.n.k.i.p.Procedures] Failed to load org.neo4j.driver.internal.shaded.io.netty.handler.ssl.JettyAlpnSslEngine$ClientEngine from plugin jar /home/ubuntu/neo4j-community-3.5.3/plugins/apoc-3.5.0.2-all.jar: org/eclipse/jetty/alpn/ALPN$Provider
This morning I was getting the following warnings with a new apoc failure for the same refactor command
Neo.ClientError.Procedure.ProcedureCallFailed: Failed to invoke procedure apoc.refactor.categorize: Caused by: org.neo4j.kernel.DeadlockDetectedException: LockClient[1264920] can't wait on resource RWLock[NODE(72798004), hash=1583583385] since => LockClient[1264920] <-[:HELD_BY]- RWLock[NODE(72798452), hash=28924501] <-[:WAITING_FOR]- LockClient[1265128] <-[:HELD_BY]- RWLock[NODE(72798004), hash=1583583385]
WARN [o.n.b.r.DefaultBoltConnection] The client is unauthorized due to authentication failure.
INFO [o.n.c.i.ExecutionEngine] Discarded stale query from the query cache after 10 seconds: WITH {node} AS n MERGE (cat:RegistryNum {registryNumber: {value}}) MERGE (n)-[:HAS_REGISTRYNUMBER]->(cat) RETURN cat
INFO [o.n.c.i.CommunityCompilerFactory] Discarded stale query from the query cache after 10 seconds: WITH {node} AS n MERGE (cat:RegistryNum {registryNumber: {value}}) MERGE (n)-[:HAS_REGISTRYNUMBER]->(cat) RETURN cat
This command gave the following error during execution on the browser:
Neo.ClientError.Procedure.ProcedureCallFailed: Failed to invoke procedure apoc.refactor.categorize: Caused by: java.lang.StackOverflowError\
This command gave the following error during execution on the browser:
Neo.ClientError.Procedure.ProcedureCallFailed: Failed to invoke procedure apoc.refactor.categorize: Caused by: org.neo4j.kernel.DeadlockDetectedException: LockClient[1264920] can't wait on resource RWLock[NODE(72798004), hash=1583583385] since => LockClient[1264920] <-[:HELD_BY]- RWLock[NODE(72798452), hash=28924501] <-[:WAITING_FOR]- LockClient[1265128] <-[:HELD_BY]- RWLock[NODE(72798004), hash=1583583385]
Essentially, I just applied the new apoc command to separate properties on two different nodes to test it out and got the above errors
The locking issues you're seeing in the logs indicate that you're running concurrent modifications - which should not happen in the apoc.refactor.categorize is the only query running at that time.