Another "speed up the load" question from a relatively inexperienced Neo4j user

I'm so young :see_no_evil:, and no, I'm working for a startup but we are opened to consulting :slight_smile:

So you must create batches of data now:

BATCH = {'batch': []}


def reset_batch():
    """
    Function to reset the batch.
    """
    BATCH["batch"] = []


def merge_relation(args):
    """
    Function to create relations from a batch.
    """
    if len(BATCH['batch']) > 1000:
        with graphDB_Driver.session() as ses:
            ses.run("UNWIND $batch AS row MATCH (a:ProgNode{inode:row.a}) MATCH (b:ProgNode{inode:row.b}) CALL apoc.merge.relationship(a, 'PROGRAM', {}, apoc.map.removeKeys(properties(row), ['a', 'b']), b) YIELD rel RETURN 1", batch=BATCH["batch"])
        reset_batch()
    BATCH['batch'].append(args.to_dict())


def merge_node(args):
    """
    Function to create nodes from a batch.
    """
    if len(BATCH['batch']) > 1000:
        with graphDB_Driver.session() as ses:
            ses.run("UNWIND $batch AS row CALL apoc.merge.node(['ProgNode', row.nodetype], {inode:row.inode}, apoc.map.removeKeys(properties(row), ['nodetype', 'inode'])) YIELD node RETURN 1", batch=BATCH["batch"])
        reset_batch()
    BATCH['batch'].append(args.to_dict())


nodes = pd.read_csv(filepath_or_buffer='nodes.csv', header=[0], sep='||', encoding='utf-8')
relations = pd.read_csv(filepath_or_buffer='relations.csv', header=[0], sep='||', encoding='utf-8')

nodes.apply(lambda h: merge_node(h), axis=1)
reset_batch()
relations.apply(lambda h: merge_relation(h), axis=1)

Don't forget to add the UNIQUE CONSTRAINTS:

CREATE CONSTRAINT constraint_inode ON (p:ProgNode) ASSERT p.inode IS UNIQUE

You also need to install APOC plugin on your database.

Documentation:

I'm not sure if the code is working correclty but the idea is here :slight_smile: I hope it will help you :slight_smile:

Regards,
Cobra