I'm trying to migrate JSON documents from Couchbase into Neo4j. After receiving the document, I find out the type of object to create by reading the some fields. Each node object class inherits some node properties like ID, labels and Version(for optimistic locking) from the Transaction class.
Example of a node class:
@Node
public class User extends Transaction{
public static final String featureID = "12109";
public static final String featureVariantID = "000";
@Property("FullName")
private String fullName;
@Property(name = "Alias")
private String alias;
@Property(name = "EmploymentType")
private String employmentType;
@Property(name = "EmployeeCode")
private String employeeCode;
@Relationship("LineManager")
Set<RelatedTo<User>> lineManagers;
@Relationship("FunctionalManager")
Set<RelatedTo<User>> functionalManagers;
@Relationship("LegalEntity")
Set<RelatedTo<LegalEntity>> legalEntities;
// More relations like these
public User() {
}
public User(String transactionID, String tenantID) {
super(featureID, featureVariantID, transactionID, tenantID);
this.fullName = "";
this.alias = "";
this.employmentType = "";
this.employeeCode = "";
lineManagers = new LinkedHashSet<>();
functionalManagers = new LinkedHashSet<>();
legalEntities = new LinkedHashSet<>();
// ....
}
//Getters and Setters
// Used to get which fields of the JSON document are to be used to populate the relationship sets
private static final Map<String, String> allowedRelationships = new HashMap<>();
static {
allowedRelationships.put("LineManager", "Data.LineManagerUserID");
allowedRelationships.put("FunctionalManager", "Data.FunctionalManagerUserID");
allowedRelationships.put("LegalEntity", "Data.EmployeeLegalEntityID");
}
public Map<String, String> getAllowedRelationships() {
return User.allowedRelationships;
}
}
Other than this there are 12 other classes that have a similar structure.
The structure of the relationship entity is:
@RelationshipProperties
public class RelatedTo<T extends Transaction> extends BaseRelationship <T>{
@Property("DocumentID")
private String documentID;
public RelatedTo() {
}
public RelatedTo(String effectiveFromTimestamp, String effectiveTillTimestamp, String status, T target, String documentID) {
super(effectiveTillTimestamp, effectiveFromTimestamp, status, target);
this.documentID = documentID;
}
//Getter and Setter
}
It inherits from the parent class:
@RelationshipProperties
public class BaseRelationship <Target extends Transaction>{
@Id
@GeneratedValue
private Long id;
@Property("EffectiveTillTimestamp")
private String effectiveTillTimestamp;
@Property("EffectiveFromTimestamp")
private String effectiveFromTimestamp;
@Property("Status")
private String status;
@TargetNode
private Target targetNode;
public BaseRelationship(String effectiveTillTimestamp, String effectiveFromTimestamp, String status, Target targetNode) {
this.effectiveTillTimestamp = effectiveTillTimestamp;
this.effectiveFromTimestamp = effectiveFromTimestamp;
this.status = status;
this.targetNode = targetNode;
}
public BaseRelationship() {
}
// Getters and Setters
}
To determine the type of object to create I'm making use of Reflection which returns a
<T extends Transaction> Class<T>
object after which I cast it into that particular type. Then I create and save the node into Neo4j. After that I read the document again to create relationships for that node along with some processing of data after which I save the node. Like:
{
LegalEntity parentNode = transactionRepository.getOrCreateNode(transactionMap, tenantID, relTransactionID, LegalEntity.class);// fetching node from database
RelatedTo<LegalEntity> newParentRelationship = new RelatedTo<>(effectiveFromTimestamp, effectiveTillTimestamp, status, parentNode, documentID);
Set<RelatedTo<LegalEntity>> existingRelationships = user.getLegalEntities();
// some processing of data and then updating the above set
}
The problem is that each save is taking 4-10 seconds including the processing, fetching and saving. Each node can contain 10-15 relations. That means each relation is taking more than 400 ms to be created.
Is there some way to improve the speed, as I have to import more than 50,000 documents from couchbase.
Currently I'm updating each node and relation in a single thread. I've tried to do it using an ExecutorService, but the save operations would become stuck in a deadlock and throw either an Optimistic Locking Exception or Transient Data Access Exception.
I wonder if there is any other way to optimize the code itself. For example, I have to loop through the existing relationships of the node and validate them against the new received document and create new relations or update the previous ones. I don’t think I can skip or simplify this step. Also, how does the saving time of 400ms per relationship compare to the average? Is it normal for a database with 14,000 nodes and 70,000 relationships?