Hello all!
this is gonna be a bit of a generic question.
I have a graph with nodes that have 20+ properties, lets say we have an id
, hash
, groupA
, groupB
properties + some more that aren't relevant.
I have a constraint on the id property which must be unique
I periodically pull data from an api and send it to the DB, to CRUD the nodes, so if the hash
of any node changed I update the updateTime
, if I send a node with an id
that doesnt exists it creates it and populates the createTime
, and if there is already in the DB a node with an id
that wasnt sent it should "delete" the node (actually just updates the label and not really deletes it). now all this happens for different "groups" (A and B)
at first I did it all in the cypher query, sent all the nodes to the cypher and had a big query that did all the logic, the problem was that most times I was sending lots of data that wasnt being updated and the query took to long to run, which I figured was caused by sending all the data to the DB when it wasnt necessary, so we moved the CRUD logic into the server.
so now in order to know what to update/create/delete we run more queries,
the first one looks something like this match (n:Label {groupA: 'a', groupB: 'b'}) return n.id as id, n.hash as hash
then in the server side we check which nodes hash changed, which id's didnt exists in the db to create and which id's dont need to exists anymore.
the problem is that this query takes a long time, and I was wondering how to speed it up.
my guess is adding some indexing, but I didnt understand very well from the docs how the indexes work. (we have around 150k+ nodes of the same label, groupA and groupB)
should I add 3 indexes for each property - groupA
, groupB
, hash
(I guess id
doesnt need index because of the constraint)
or should I add 1 index with all 3 properties?
and will that even help?
thanks a lot.