I'm using python async driver to import data to my Neo4j database. However, it is rather slow and when I test it with sync driver, I realized that it takes same amount of time to import data. It processes 120k rows in 1 hour with 8 threads. Here is the code I'm using:
async def run_cql(cql, dict):
async with self._async_driver.session(**self.db_config) as session:
await session.run(cql, dict=dict)
async import(htmls):
for i, html in enumerate(htmls):
print("Processing article %.2f%%" % (i * 100 / len(htmls))) if i % 100 == 0 else None
rows.append(html)
if len(rows) == self.chunk_size:
chunk_num = chunk_num + 1
session_index = (chunk_num - 1) % self.thread_count
rows_dict = {'rows': rows}
process_params.append({'session_index': session_index, 'cql': self.cql,
'rows_dict': rows_dict})
if session_index == self.thread_count - 1:
tasks = []
for p in process_params:
tasks.append(asyncio.create_task(
self.run_cql(p['cql'], p['rows_dict'])))
await asyncio.gather(*tasks)
process_params = []
rows = []
Is there anything wrong with my code? Or is there any additional configuration that I should make on Neo4j side. I set thread worker count to 8, it didn't matter.