Turning property of a node into a new node

I am looking to test different structures of my knowledge graph and one consistent piece of advice I have received is to turn otherwise categorical node properties into separate nodes.

However, the CSVs imported and used to create my current graph were created by someone else and subsequently changed/corrupted so I can no longer edit them manually to just create new CSV files designating new nodes, relationships, and the appropriate headers for easy import with the import tool.

Recreating the CSVs will take an enormous amount of time due to volume of data so I am wondering if there is a simple cypher query that can accomplish this task.

Thanks!

APOC to the rescue, use apoc.refactor.categorize, see http://neo4j-contrib.github.io/neo4j-apoc-procedures/3.5/graph-refactoring/categorize/ for details.

1 Like

Hello, thank you for the update!

I tried applying the solution but received the following error throwback: " Neo.ClientError.Procedure.ProcedureCallFailed: Failed to invoke procedure apoc.refactor.categorize: Caused by: java.lang.StackOverflowError
"

Any advice?

What statement have you used exactly? Any stacktrace found in debug.log?

Hello,

For some context, I am working with neo4j community edition 3.5.3. To test the apoc command above I copied my graph and edited my .conf file to point the active database parameter to my copied db instead of the original so I could test the property explosion. (graph_explode.db is the new graph, graph.db is the original graph.

It looks I'm getting two different sets of errors, one from this morning and one from yesterday evening.

Yesterday evening I was getting warnings from procedures regarding plugin loading failures such as:

  1. WARN [o.n.k.i.p.Procedures] Failed to load apoc.util.s3.S3URLConnection from plugin jar /home/ubuntu/neo4j-community-3.5.3/plugins/apoc-3.5.0.2-all.jar: com/amazonaws/ClientConfigurat
  2. WARN [o.n.k.i.p.Procedures] Failed to load apoc.data.email.ExtractEmail from plugin jar /home/ubuntu/neo4j-community-3.5.3/plugins/apoc-3.5.0.2-all.jar: javax/mail/internet/AddressException
  3. WARN [o.n.k.i.p.Procedures] Failed to load com.jayway.jsonpath.spi.json.GsonJsonProvider from plugin jar /home/ubuntu/neo4j-community-3.5.3/plugins/apoc-3.5.0.2-all.jar: com/google/gson/JsonElement
  4. WARN [o.n.k.i.p.Procedures] Failed to load org.neo4j.driver.internal.shaded.io.netty.handler.ssl.JettyAlpnSslEngine$ClientEngine from plugin jar /home/ubuntu/neo4j-community-3.5.3/plugins/apoc-3.5.0.2-all.jar: org/eclipse/jetty/alpn/ALPN$Provider

------------------------------------------------------------------------------------------------------------------------------------

This morning I was getting the following warnings with a new apoc failure for the same refactor command

Neo.ClientError.Procedure.ProcedureCallFailed: Failed to invoke procedure apoc.refactor.categorize: Caused by: org.neo4j.kernel.DeadlockDetectedException: LockClient[1264920] can't wait on resource RWLock[NODE(72798004), hash=1583583385] since => LockClient[1264920] <-[:HELD_BY]- RWLock[NODE(72798452), hash=28924501] <-[:WAITING_FOR]- LockClient[1265128] <-[:HELD_BY]- RWLock[NODE(72798004), hash=1583583385]

  1. WARN [o.n.b.r.DefaultBoltConnection] The client is unauthorized due to authentication failure.
  2. INFO [o.n.c.i.ExecutionEngine] Discarded stale query from the query cache after 10 seconds: WITH {node} AS n MERGE (cat:RegistryNum {registryNumber: {value}}) MERGE (n)-[:HAS_REGISTRYNUMBER]->(cat) RETURN cat
  3. INFO [o.n.c.i.CommunityCompilerFactory] Discarded stale query from the query cache after 10 seconds: WITH {node} AS n MERGE (cat:RegistryNum {registryNumber: {value}}) MERGE (n)-[:HAS_REGISTRYNUMBER]->(cat) RETURN cat

Sorry, forgot to include my actual statement. The statements I was running yesterday and today were:

1.Command call apoc.refactor.categorize('nameOfSubstance','HAS_NAMEDSUBSTANCE',true,'SubstanceName','nameOfSubstance',,100)

This command gave the following error during execution on the browser:
Neo.ClientError.Procedure.ProcedureCallFailed: Failed to invoke procedure apoc.refactor.categorize: Caused by: java.lang.StackOverflowError\

  1. Command call apoc.refactor.categorize('registryNumber','HAS_REGISTRYNUMBER',true,'RegistryNum','registryNumber',,100)

This command gave the following error during execution on the browser:
Neo.ClientError.Procedure.ProcedureCallFailed: Failed to invoke procedure apoc.refactor.categorize: Caused by: org.neo4j.kernel.DeadlockDetectedException: LockClient[1264920] can't wait on resource RWLock[NODE(72798004), hash=1583583385] since => LockClient[1264920] <-[:HELD_BY]- RWLock[NODE(72798452), hash=28924501] <-[:WAITING_FOR]- LockClient[1265128] <-[:HELD_BY]- RWLock[NODE(72798004), hash=1583583385]

Essentially, I just applied the new apoc command to separate properties on two different nodes to test it out and got the above errors

Can you make sure you only run on categorize command at a time? With that limitation you shouldn't see any DeadlockDetectedException.

The Failed to load messages are uncritical - disgregard them.

3.5.3 is a pretty old release, consider an upgrade to 3.5.14 together with the latest apoc 3.5.x.x version.

What do you mean run one categorize command at a time? I ran it as single query on the Neo4j browser. Is there another way to perform the command?

Thanks.

The locking issues you're seeing in the logs indicate that you're running concurrent modifications - which should not happen in the apoc.refactor.categorize is the only query running at that time.

1 Like