Scrapy Pipeline to Neo4j Bulk Store Very Slow

I need help speeding up the process for inputting items from a Scrapy pipeline into Neo4j. I am currently working on a project where I am scraping the data for about a million patents and storing their information and connections with Neo4j. Each patent will have on average have 10 different connections including assignees, inventors, classifications, and most importantly connections to other patents.

Neo4j Server version: 4.0.4 (community)
Neo4j Browser version: 4.0.8
Py2Neo Version: 5.0b1

I have tried searching for a way, using python to store these items into Neo4j using py2neo and UNWIND queries, but it takes WAY too long (several seconds) per item. Any suggestions on how to speed up this process? Here's an example snippet from my code:

def assignee(item):
                    user = item.get("user")
                    for assignee in user['assignees']:
                        assignee_user = parse_user(assignee)

                        fullname = assignee_user['fullname'] if 'fullname' in assignee_user else '',
                        first_name = assignee_user['first_name'] if 'first_name' in assignee_user else '',
                        last_name = assignee_user['last_name'] if 'last_name' in assignee_user else ''

                        assignee = {
                            "fullname": fullname,
                            "first_name": first_name,
                            "last_name": last_name
                        }

                        if assignee_user['status'] == 3:
                            city_located = assignee_user['city']
                            state_abbreviation =  assignee_user['state']
                            country_abbreviation = assignee_user['country']

                            location = {
                                "city": city, 
                                "state": state_abbreviation,
                                "country": country_abbreviation
                            }

                        elif assignee_user['status'] == 2:
                            city = assignee_user['city']
                            country_abbreviation = assignee_user['country']

                            location = {
                                "city": city, 
                                "state": None,
                                "country": country_abbreviation
                            }

                        elif assignee_user['status'] == 0:
                            location = {
                                "city": None,
                                "state": None,
                                "country": None,
                            }

                        yield assignee, location



params = []
                    for individual in assignee(item):
                        assignee, location = individual
                        params.append({
                                        'fullname': assignee['fullname'], 
                                        'first_name': assignee['first_name'],
                                        'last_name': assignee['last_name'],
                                        'city': location['city'],
                                        'state': location['state'],
                                        'country': location['country']
                                    })

                    q = """
                        MATCH(patent:Patent) WHERE patent.document_number = '"""+document_number+"""'
                        UNWIND {$datas} as data
                        MERGE(assignee:User {fullname: data.fullname})
                        SET assignee.first_name = data.first_name,
                            assignee.last_name = data.last_name
                        MERGE(city:City {name: data.city})
                        MERGE(patent)-[:ASSIGNEE]->(assignee)
                        MERGE(assignee)-[:LOCATED_IN]->(city)
                    """

Thanks Sameer, I'll take a look at it :slight_smile: