GDS and Causal Cluster Lack of Integration

stu_v_kerr · April 26, 2021, 4:48pm

Now that we have been using the Neo4J graph database for a few years and have integrated and using the GDS library we are now moving to a causal cluster. But Neo4J has not integrated the GDS library to run on a cluster. What? GDS algorithms are allegedly parallelized, but they are unable to run on a cluster? I remember that being the entire point of using Hadoop and Spark - eliminate the transport of data across the network. Yet here we are again? Starting to look at TIgerGraph.

When will this integration issue be fixed?

alicia_frame1 · April 26, 2021, 6:02pm

Neo4j causal clusters are built for fault tolerance and high availability (they're ACID compliant); they're not built for scale out (and don't use Hadoop or Spark). If you're looking for a big data scale out solution, that would be Neo4j fabric. The parallelization of the algorithms is intended to leverage multiple CPUs; if you have an enterprise license you can simply set the concurrency parameter and see speed ups for your algorithm execution time.

GDS does not run on core members of a causal cluster because the algorithms are extremely memory hungry and operate on a longer time scale than simple queries. In a cluster, this causes problems with leader election, and leads to instability.

If you're looking to "integrate" GDS with a causal cluster, you can run GDS on a read replica (and consume the results in a separate program, or use kafka to apply writes to the leader) or you can detach a single instance from your cluster with the same data.

A thorough discussion of why GDS shouldn't run on a causal cluster, and your options, is available here: https://neo4j.com/docs/graph-data-science/current/installation/#installation-causal-cluster

If you're primarily interested in availability, we'll be introducing warm backups/read replicas to GDS with Neo4j 4.3.

If what you're actually after is scale out, then we recommend either using a bigger box (we have customers running GDS on tens of billions of nodes in production) or leveraging Fabric. We document how to use GDS with fabric here.

Topic		Replies	Views
GDS and Causal Cluster Lack of Integration Neo4j Graph Platform migrated	2	160	June 14, 2022
Install GDS plugin on Neo4j Causal Cluster Cluster	1	698	October 8, 2020
Sync GDS graph into READ_REPLICA Cluster	1	296	November 24, 2021
New Graph Database User from SF Introduce-Yourself	6	1555	October 23, 2018
Doubts about Neo4j Fabric with a Causal Cluster Cluster	2	494	December 1, 2021

GDS and Causal Cluster Lack of Integration

Related topics