Nodes related to same nodes causing idempotence problems – Spring data Neo4j 7.4.7

Hello !

I am writing this topic because my team and I are encountering an issue during the use of Spring Data Neo4j 7.4.7.

Here is some context to explain the issue.

  1. We have a node labeled Person :
@Getter
@NoArgsConstructor(access = AccessLevel.PROTECTED)
@RequiredArgsConstructor(access = AccessLevel.PROTECTED)
@Node(primaryLabel = "Person")
@EqualsAndHashCode(callSuper = false, onlyExplicitlyIncluded = true)
public class Person extends AggregateNode {

   @Id
   @NonNull
   @EqualsAndHashCode.Include
   private UUID id;	

   @Version
   private Long version;
   
   private Date dateOfBirth;
   
   private String firstName;
   
   private String lastName; 
   
   @Relationship(type = "WORK_IN", direction = Relationship.Direction.OUTGOING)
   private Company company;
   
   
   @Relationship(type = "RELATED_TO", direction = Relationship.Direction.OUTGOING)
   private Set<Person> relatives = new HashSet<>();
   
   // Other nodes related to Person, and more properties
   
}
  1. To give you a little bit more of context, our application works through an event driven design (event are stored in aggregates via a domainEvent List stored into AggregateNode,
    and published through .save() method of Spring data repos, everything is based on Spring Data event managment).
    As an example, when a company is modified, a companyModifiedDomainEvent is published and consumed into a handler that invalidates all its related workers(Person nodes).
    Each PersonInvalidatedDomainEvent is then consumed asyncronously through the same refreshPersonEventHandler.
    It is during these processes that the problems occur. At an undefined moment, it appears an incomplete company is fetched from neo4j DB and then saved, resulting in breaking company's relationships with its subnodes.
    Moreover, it appears even though company repository is never called during these process, only the personRepository is used to fetch person nodes, and save them.

  2. The ultimate solution we found to tackle the issue was to create a kind of 'identifierNode'.
    The class was this one :

@Getter
@NoArgsConstructor(access = AccessLevel.PROTECTED)
@RequiredArgsConstructor(access = AccessLevel.PROTECTED)
@Node(primaryLabel = "PersonId")
@EqualsAndHashCode(callSuper = false, onlyExplicitlyIncluded = true)
public class PersonId {
	@Id
    @NonNull
    @EqualsAndHashCode.Include
    private UUID id;	

    @Version
    private Long version;
}

Because we added this node we modifed the Person node related class as so (adding "PersonId" label) :

@Getter
@NoArgsConstructor(access = AccessLevel.PROTECTED)
@RequiredArgsConstructor(access = AccessLevel.PROTECTED)
@Node(primaryLabel = "Person", labels ={"PersonId"})
@EqualsAndHashCode(callSuper = false, onlyExplicitlyIncluded = true)
public class Person extends AggregateNode {

// Same as before

    @Relationship(type = "RELATED_TO", direction = Relationship.Direction.OUTGOING)
    private Set<PersonId> relatives = new HashSet<>();

// Same as before

}

It generated, on purpous, a node labelled Person and PersonId for whichever Person node created into the GDB.
We simplified our Person aggregate, so that now it only has simplified connections with nodes of its own type.

Now during the event consumption processes explained before, the GDB is now idempotent whatever the concurrency encountered by the app (event consumption, parallel workflows etc...).

However, we encountered other issues regarding SDN cache. We found the problem when we requested Persons with a repo .findAllByIdIn(uuids) method.

If you consider this GDB configuration :

(a:Person{id: “aUuid”})-[RELATED_TO]→(b:Person{id: “bUuid”})

and

(b:Person)-[RELATED_TO]→(a:Person)

When you call this.personRepo.findAllByIdIn([‘aUuid’,‘bUuid’]) we encounter a mapping problem. Indeed, 'a' is first fetch and stored into SDN cache as a ‘aUuid’/Object map entry. The Object is actually a Person. When 'b' is fetched, SDN tries to populate is relatives Set with an ‘a’ PersonId Node. But SDN find that this id is already present in its cache, and it then tries to cast the Object (actually a Person type) into a PersonId typed Object. So it crashes.

We loved this implem because it enabled us to keep Spring data repos, with seamless domain event handling.

We had other possible solutions that we dropped. Here are the one we found :

  • Use of projections :
    • to control savings depth, and not break undesired relationships
    • plus control reads to improve node read performances.
    • But we lost the great Spring data synergy (event + repo simplified).
  • Use on intermediary node :
    • Beween two Person we introduced a Relative node. For our domain purpose the implementation would be like so :
(a:Person)<-[SOURCES]-(r1:Relative)-[TARGETS]->(b:Person)
(b:Person)<-[SOURCES]-(r1:Relative)-[TARGETS]->(a:Person)

Note that it is now the Relative node that owns relationships in java classes.


We would like an advise from your team regarding our app. Do you recommend a specific approach ? If so, which one ?

Thanks for reading it all, I hope you guys will give us a bit of help to tackle this problem.

1 Like

Thanks for reaching out. It might take some days to have a deeper look on your problem and come up with a suggestion (or questions). Just wanted to make you aware that I’ve seen it and it’s in my queue.

Please correct me if I am wrong, but the major problem (that also leads to your follow up problems) is:

it appears an incomplete company is fetched

Right?

Could you also post this entity or at least the parts that -as I assume- declare the link back to the Person?

Hi Gerrit,

First off all thanks a lot for your prompt response to our post. Me and my team are glad to have such feedback !

You got it right. We conclude the issue lied into the Company reconstruction. Or to make it more generic, into a node that lies ‘inside’ the base node (i.e. the one queried at the beginning of our workflow and saved at the end) via relationships.
What I mean by ‘inside’, is that it has got a depth upper than 0 regarding the base node.
Indeed, we did not exposed all our code but it is not always the Company nodes that is corrupted, it can be other ‘inside’ nodes. The corruption is not idempotent. Maybe concurrence is the reason about all of this.

In our code, there is no ‘deep’ node that declare a link back to Person other than Person itself through its RELATED_TO @Relationship.

To have a better picture of it, here is the node company :

@Getter
@Setter
@NoArgsConstructor
@RequiredArgsConstructor
@Node("Company")
public class Company extends AggregateNode { 
	
// other fields (not relationships)

@Relationship(type = "WANTS_BILLING_MODE", direction = Relationship.Direction.OUTGOING)
private BillingFrequency billingMatchingFrequency; // it is an node that only contains an enum


@Relationship(type = "MANAGED_BY", direction = Relationship.Direction.INCOMING)
private Corporation headCorp;

// other fields (not relationships)

}

Eventually, I would add that all our nodes have one direction relationships. In our example, Person node owns the relationship with Company node, the reverse is not present in Company node. It is the case for every relationships in every nodes. We were careful about that, so no circular fetch would be done. Except for the Person node of course. A P Person is connected to another Q Person. And nothing prevent the fact the Q does not have a relationship back to P.

If you need more info, feel free to ask.

Have a good day

Maybe I am over simplifying this because I don't have all the details, but do you always have the related company node in the person java objects when saving an updated person node? If not, the relationship will be dropped as the person node is updated. Finding, modifying, and saving a company node will maintain the relationship to the person node, since it is not part of the company definition (as you pointed out).

What does it mean to "invalidate workers(person nodes)" when a company is modified?

Thanks @glilienfield was about to ask the same with this ^^ in mind.

This would mean that there is a custom repository method that does not load all nodes (through the relationships) from the modelled graph. I don’t know if you have a test for this that fails on a reliable rate and you could check this. Because it would be good to know if it’s a problem when loading (and persisting an already faulty entity graph) or if it’s only the save call.

We figured out the relationship was dropped because a person node is loaded by sdn with a null value for company, whereas it is actually not null in database, nor made null by any of our workflow in transactional context. Indeed when it is saved, the link between company and person is dropped.

The problem is indeed the improper load of the person object, because it does not always recover all of its links.

I have logged cypher driver calls, and the 'fetch' calls and consistent : generated cypher query from Person findById(UUID id) method is good. But it generates multiple cypher calls. The fetch is not made by only one cypher query.


This behaviour is different when we tries to fetch nodes which direct relations or deeper one are not going to circle to the base fetched node : in this case the cypher query is only one line and only one call to DB is made to fetch the node.


Even though the queries seem complete, the related POJO still have null values with related nodes. And because the saving part behave properly, its drops these relationships.

In the end, according to me and my team the problems seems to rely in the fetching part, for a potentially 'circular through relationship' node.

When SDN queries for a specific node, it will reproduce the entire entity in your java classes. In your case, you have a Person node that contains a list of Person nodes. This will cause a recursive traversal of the Person graph to get all related nodes at each level. This is the cause of the multiple queries. SDN should properly load all related Person nodes, regardless of depth. I have not seen the behavior you mentioned.

Are you stating you have a Person node in the graph of your root Person node that has a Company related to it, but when retrieved in a query for the root Person, that this Person does not have a related Company object? And, if you query for this Person by its ID, it will return the Person node with the associated Company node?

Here is an example, showing a Person node with a graph two levels deep:

The screenshot shows the "gary" node returned by the findById method after saving it. All the Person nodes and associated Company nodes are included.

It's not possible one of you update operations is nulling out the Company node for a related Person node, thus loosing the relationship?

I did that in this case, and the resulting Person node now returns with that Person node without a Company node, as expected. You can see the Person node's name attribute was correctly updated to 'Santa' and the company is now null.

Just a note, because SDN wants to operate over an entire entity, I use SDN when I have independent entries. This provides a very quick and convenient method for retrieving and updating these entities. I use the driver and write my own repository methods when I have a network of entities, where I want to create and modify specific parts of the network.

Hi @glilienfiel,

First of all, my apologies me for taking so long to reply. Me and my team had a lot of work going around the past weeks/months.

No workflow is nulling out the company. Or at least no workflow we are aware of ^^'.
We guess the the problem lurks during concurrence induced by the user of event driven design. Indeed lot of event are consumed at the same time. Eventually, the idempotence of the node should be the same, regardless the event consumption order. I mean theoretically it should be idempotent, but it randomly does not with null values .

The way we could find out the company was somehow nulled, is by adding a dumb if(company == null) in our code and putting a breakpoint inside this statement.

How it turns out null, is something we could not figure out. We could not reproduce the problem with unit/integration tests. It only appears in full dev environment with the event consuming part.
And when "null" breakpoint is reach, when we go back in the stack, a person node is recovered by the repo during a specific event handler with a null value for the company.
(we added @Version on entities to prevent concurrent problems as recommended by doc).

Anyway, we eventually managed to avoid having the problem. We rewrote our domain to have "anaemic" nodes. We do not have full company node into a persone node anymore. We only have a CompanyId Node. The link between company node and it anemic version, is built using labels. No more circular loading/saving, and event consumption seems working perfectly now !

To conclude I think we can close the thread as our problem is now solved.

Again thanks a lot for the diligence you had answering me,
All the best !

I think that is the best way to handle relationships in two domain entities, use ID and not hard coded using a relationship.

good to hear you got it resolved.