Hi everyone,
I'm using python to extract various datapoints out of a 100GB + json file. Using neomodel, for every group of datapoints extracted, the data is then saved into a neo4j DBMS running on the same system.
I monitor the data extracted and saved by calculating how often the saving function is called every second. Depending on the system im running this on, the script will start to slow down either almost immediately or after a few minutes. I don't see any strain on the system, so I don't know what to make of this behaviour. This is the script to save the data:
from neomodel import StructuredNode, StringProperty, Relationship, config, DateProperty, BooleanProperty
config.DATABASE_URL = 'bolt://neo4j:password@localhost:7687'
class Label1(StructuredNode):
Prop1 = StringProperty(unique_index=True)
Prop2 = StringProperty()
Prop3 = StringProperty(unique_index=True)
Prop4 = Relationship('Label2', 'REL1')
Prop5 = Relationship('Label3', 'REL2')
class Label2(StructuredNode):
Prop1 = DateProperty()
Prop2 = BooleanProperty(default=False)
class Label3(StructuredNode):
Prop1 = StringProperty()
Prop2 = StringProperty()
Prop3 = StringProperty()
def save_data(Data1, Data2, Data3):
# LABEL 1
send_data1 = Label1(Prop1=Data1[0], Prop3=Data1[1], Prop2=Data1[2]).save()
# LABEL 2
try:
send_data2 = Label2.nodes.get(Prop1=Data2[0], Prop2=Data2[1])
except:
send_data2 = Label2(Prop1=Data2[0], Prop2=Data2[1]).save()
# LABEL 3
try:
send_data3 = Label3.nodes.get(Prop1=Data3[0])
except:
send_data3 = Label3(Prop1=Data3[0], Prop2=Data3[1], Prop3=Data3[2]).save()
# RELATIONSHIPS
send_data1.Label2.connect(send_data2)
send_data1.Label3.connect(send_data3)
I'm very new to neo4j, but from what I found, this probably has something to do with doing a lot of writes in short succession? Is there a better way to do this using neomodel? Thanks in advance!
Edit: Added the log. At around 5500 saved datagroups the decline begins. If I leave it to run long enough it will decline to 1 group/s and below.
saved 100 datagroups [groups/s: 12.57]
saved 200 datagroups [groups/s: 16.25]
saved 300 datagroups [groups/s: 18.01]
saved 400 datagroups [groups/s: 19.22]
saved 500 datagroups [groups/s: 19.89]
saved 600 datagroups [groups/s: 20.40]
found 616 datapoints among 1000 entities
saved 700 datagroups [groups/s: 20.81]
saved 800 datagroups [groups/s: 21.05]
saved 900 datagroups [groups/s: 21.28]
saved 1000 datagroups [groups/s: 21.42]
saved 1100 datagroups [groups/s: 21.54]
found 1150 datapoints among 2000 entities
saved 1200 datagroups [groups/s: 21.69]
saved 1300 datagroups [groups/s: 21.86]
saved 1400 datagroups [groups/s: 22.00]
saved 1500 datagroups [groups/s: 22.00]
found 1565 datapoints among 3000 entities
saved 1600 datagroups [groups/s: 22.10]
saved 1700 datagroups [groups/s: 22.20]
saved 1800 datagroups [groups/s: 22.25]
saved 1900 datagroups [groups/s: 22.34]
saved 2000 datagroups [groups/s: 22.41]
found 2096 datapoints among 4000 entities
saved 2100 datagroups [groups/s: 22.46]
saved 2200 datagroups [groups/s: 22.45]
saved 2300 datagroups [groups/s: 22.47]
saved 2400 datagroups [groups/s: 22.50]
found 2467 datapoints among 5000 entities
saved 2500 datagroups [groups/s: 22.53]
saved 2600 datagroups [groups/s: 22.54]
saved 2700 datagroups [groups/s: 22.58]
saved 2800 datagroups [groups/s: 22.61]
saved 2900 datagroups [groups/s: 22.61]
saved 3000 datagroups [groups/s: 22.66]
saved 3100 datagroups [groups/s: 22.76]
found 3113 datapoints among 6000 entities
saved 3200 datagroups [groups/s: 22.84]
saved 3300 datagroups [groups/s: 22.91]
saved 3400 datagroups [groups/s: 22.99]
saved 3500 datagroups [groups/s: 23.06]
saved 3600 datagroups [groups/s: 23.07]
saved 3700 datagroups [groups/s: 23.06]
found 3714 datapoints among 7000 entities
saved 3800 datagroups [groups/s: 23.07]
saved 3900 datagroups [groups/s: 23.09]
saved 4000 datagroups [groups/s: 23.09]
saved 4100 datagroups [groups/s: 23.10]
saved 4200 datagroups [groups/s: 23.11]
saved 4300 datagroups [groups/s: 23.11]
saved 4400 datagroups [groups/s: 23.12]
saved 4500 datagroups [groups/s: 23.13]
saved 4600 datagroups [groups/s: 23.11]
saved 4700 datagroups [groups/s: 23.08]
found 4776 datapoints among 8000 entities
saved 4800 datagroups [groups/s: 23.07]
saved 4900 datagroups [groups/s: 23.05]
saved 5000 datagroups [groups/s: 23.05]
saved 5100 datagroups [groups/s: 23.03]
saved 5200 datagroups [groups/s: 23.02]
saved 5300 datagroups [groups/s: 23.03]
saved 5400 datagroups [groups/s: 23.02]
saved 5500 datagroups [groups/s: 23.02]
saved 5600 datagroups [groups/s: 22.99]
saved 5700 datagroups [groups/s: 22.98]
saved 5800 datagroups [groups/s: 22.98]
saved 5900 datagroups [groups/s: 22.98]
found 5902 datapoints among 9000 entities
saved 6000 datagroups [groups/s: 22.97]
saved 6100 datagroups [groups/s: 22.96]
saved 6200 datagroups [groups/s: 22.95]
saved 6300 datagroups [groups/s: 22.95]
saved 6400 datagroups [groups/s: 22.94]
saved 6500 datagroups [groups/s: 22.95]
saved 6600 datagroups [groups/s: 22.94]
saved 6700 datagroups [groups/s: 22.93]
saved 6800 datagroups [groups/s: 22.92]
saved 6900 datagroups [groups/s: 22.90]
found 6920 datapoints among 10000 entities