Create nodes + relationships in bulk?

I an using py2neo to create a KG from external database.
Is there a way to create nodes + relationships in bulk ?

My code looks like:

import py2neo as pn

for k in range(0,n):
     id = xxx
     node_properties = {...}
     node = pn.Node(label, id=id, **node_properties)

and sth similar for the relationships.

For very large sets, it takes ages to finish, as each node is create separately.
Is there a way to upload a table directly as a bulk and not go through the loop ?

If you use the python driver directly you can pass the query to create all the nodes to the driver to execute in a single transaction.

There is an example of using the driver shown here:

1 Like

I can highly recommend the driver manual if you need an entry point for picking up the official Python driver.

As a matter of fact, it has a short section about batched node creation: https://neo4j.com/docs/python-manual/current/performance/#batch-data-creation

Py2neo is not maintained any longer and I'm not very familiar with it, so I'm afraid I can't help you much should you decided to stick to it.

Thanks a lot, the following code works as expected:

with GraphDatabase.driver(NEO4J_DB_URI, auth=AUTH) as driver:
  driver.verify_connectivity()
  driver.execute_query("""MERGE (g:Graph {physical_name:'test_driver'}) """)
  numbers = [{"value": random.random(), "name": generate_random_string()} for _ in range(100)]
  query = """  
    MATCH (g:Graph {physical_name: 'test_driver'})  
    WITH g, $numbers AS batch  
    UNWIND batch AS node  
    CREATE (g)-[:HAS_NUMBER]->(n:Number)  
    SET n.value = node.value, n.name = node.name  
    """  
  session = driver.session()
  result = session.run(query, numbers=numbers)
  session.close()

Another question:
Is it possible to pass the labels (node class) for example as variable ?
I'd like to create nodes with different labels dynamically, not just the attributes.
I tried to pass something like

numbers = [{"labels": "NodeClass", "rel": "relation", "parent_id" : 4435, "value": random.random(), "name": generate_random_string()} for _ in range(k)]
query = """  
    WITH $nodes AS batch  
    UNWIND batch AS node  
    MATCH (g {global_id: node.parent_id})  
    CREATE (g)-[r:rel]->(n:node.labels)
    SET n.value = node.value, n.name = node.name  
    """  
    # SET n:node.labels
session = driver.session()
result = session.run(query, nodes=numbers)

but it seems neo4j doesn't allow passing labels as variables...

CypherSyntaxError: {code: Neo.ClientError.Statement.SyntaxError} {message: Invalid input '.': expected a parameter, '&', ')', ':', 'WHERE', '{' or '|' (line 5, column 32 (offset: 132))
"    CREATE (g)-[r:rel]->(n:node.labels)"

Yes, you cannot use a variable for a label when matching or creating a node. You can use an apoc procedure to create a node and specify the label(s) dynamically.

Create the node then create the relationship between the two nodes.

You can set a label dynamically starting with version 5.24.

2 Likes