Causal clustering plugins and write transactions

yurippe · August 27, 2018, 8:15am

I am having a hard time figuring out how I can programatically detect whether I am in a single instance or clustering mode, and how I would transparently proxy queries done with GraphDatabaseService to the leader of a cluster.

Is it possible to let a core always redirect writes using the Neo4J java API (not a driver) from a plugin.

In reality what I need is read-only copies, but it says in the HA clustering section that HA clustering is somewhat depricated. Should I use it regardless?

michael.hunger · August 27, 2018, 8:35am

Could you explain what you need this functionality for?

For redirection you'd still need to use the Java driver between instances.
Also the Java API itself doesn't support causal clusters, because you cannot determine upfront if you're running a read- or write-transaction.

yurippe · August 27, 2018, 10:01am

I is not a problem if I have to specify the fact that it is a write operation, I'd just like to transparantly be able to execute a write query on any of the members of the core cluster.

Right now I just want read copies, but as far as I know, CA clusters need a minimum of 2 cores, and should really have at least 3. Therefore, the extra complexity of writing to the database is not something I am a big fan of.

What is the recommended way of setting up a main database (reads / writes) and have more read-only slaves ?
I do not need the guarantee of being able to write at all times, but I would like to have backups and the ability to offload some computationally heavy tasks to read-only slaves.

michael.hunger · August 27, 2018, 10:25am

That's where/why you use a driver, with the bolt+routing protocol which is aware of the cluster topology and if you're using a read- or write-tx and routes appropriately (and even retries if the cluster changed topology during your operation).

It also takes care of load balancing between core and read-replica instances.

yurippe · August 27, 2018, 11:18am

So when making plugins, one should never use GraphDatabaseService to do transactions, but instead use a driver?

Also, for the setup I described, does it make sense to use a CA cluster? Or is there some other way of getting read-replicas?

david_allen · August 27, 2018, 11:55am

There's no need to transparently proxy things to the leader; this is done for you by a bolt+routing driver, as Michael says. But it is also OK to use a GraphDatabaseService to do transactions. I think the missing piece here is that if I did transactions inside of a GraphDatabaseService, I'd generally be only doing them on the leader, automatically, with no extra code needed, because whatever that code is would only be invoked on the leader.

For example: you write a stored procedure with the annotation @Procedure(name = "myPlugin.writeSomeStuff", mode = Mode.WRITE)

Inside of that procedure, you use a GraphDatabaseService to write some stuff, and then stream some results back. All good.

Now, that plugin is installed on all 3 nodes, but the procedure never gets called anywhere but the leader, because the client writes explicit write transactions (and autocommit) transactions to the leader. So cypher that calls the procedure in addition to bolt+routing basically takes this away so you don't have to worry about it.

If you did manually call that write procedure on a follower, it would fail -- because followers cannot accept writes. Fortunately if you set things up right, this just won't arise. Extra cores and read replicas scale out your read workload.

yurippe · August 27, 2018, 12:30pm

This would not be true for asyncronous procedures though.

And what about transaction listeners. If a transaction listener mutated the database on certain queries, wouldn't that pose a problem? The last issue is a hypothetical, but could be solved by doing a dbms.cluster.overview() and check the result, but it is overly complicated. There should be an easy way of telling whether or not you are the leader or follower.

This also makes it more complicated to use the neo4j browser, as you have to manually find the leader and then execute queries, which I think is a poor user experience. It requires knowledge about the underlying structure of the clustering, and even though the Neo4J instances may communicate on a network, does not mean all replicas are reachable for all clients.

david_allen · August 27, 2018, 1:08pm

To find whether a node is leader or follower, CALL dbms.cluster.role(); (Docs: Monitor servers - Operations Manual)

The neo4j browser too can accept bolt+routing as the address you connect to. By default, say it attempts to connect to bolt://my-cluster. If you instead connect to bolt+routing://my-cluster, then the cluster topology is then transparent to you. You can run both read and write queries, and the browser will route them wherever is appropriate.

On the transaction listener, I'm not sure. I may look into this.

michael.hunger · August 27, 2018, 9:20pm

Best to write your extensions as procedure then you can call them from Cypher and they are executed in the right context (read vs. write) and transaction.

If you want to check within a procedure what state the current instance has, you can use something like this:

github.com

neo4j-contrib/neo4j-apoc-procedures/blob/3.4/src/main/java/apoc/util/Util.java#L546-L561


      
              combined.putAll(second);
              return combined;
          }
          
          
public static Map<String,Object> map(Object ... values) {
              Map<String, Object> map = new LinkedHashMap<>();
              for (int i = 0; i < values.length; i+=2) {
                  if (values[i] == null) continue;
                  map.put(values[i].toString(),values[i+1]);
              }
              return map;
          }
          
          
public static Map<String, Object> map(List<Object> pairs) {
              Map<String, Object> res = new LinkedHashMap<>(pairs.size() / 2);
              Iterator<Object> it = pairs.iterator();

sakshiitw09 · February 7, 2019, 9:02am

I am creating causal cluster of 2 node as per the instructions of Deploy a basic cluster - Operations Manual but i am getting error as

2019-02-07 06:24:32.198+0000 INFO ======== Neo4j 3.5.2 ========
2019-02-07 06:24:32.201+0000 INFO Starting...
2019-02-07 06:24:33.080+0000 INFO Initiating metrics...
2019-02-07 06:24:33.111+0000 INFO My connection info: [
Discovery: listen=172.31.38.35:5000, advertised=172.31.38.35:5000,
Transaction: listen=172.31.38.35:6000, advertised=172.31.38.35:6000,
Raft: listen=172.31.38.35:7000, advertised=172.31.38.35:7000,
Client Connector Addresses: bolt://172.31.38.35:7687,http://172.31.38.35:7474,https://172.31.38.35:7473
]
2019-02-07 06:24:33.111+0000 INFO Discovering other core members in initial members set: [172.31.38.24:5000, 172.31.38.35:5000]
2019-02-07 06:24:41.893+0000 INFO Bound to cluster with id 0e47df3c-4d53-4d27-86ef-7a3ca706be66
2019-02-07 06:24:41.911+0000 INFO Discovered core member at 172.31.38.24:5000
2019-02-07 06:24:55.744+0000 INFO Connected to /172.31.38.24:7000 [raft version:2]
2019-02-07 06:25:10.539+0000 INFO Waiting to hear from leader...
2019-02-07 06:25:38.541+0000 INFO Waiting to hear from leader...
2019-02-07 06:26:06.542+0000 INFO Waiting to hear from leader...
2019-02-07 06:26:34.543+0000 INFO Waiting to hear from leader...
2019-02-07 06:27:02.543+0000 INFO Waiting to hear from leader...
2019-02-07 06:27:30.544+0000 INFO Waiting to hear from leader...
2019-02-07 06:27:58.545+0000 INFO Waiting to hear from leader...
2019-02-07 06:28:26.546+0000 INFO Waiting to hear from leader...
2019-02-07 06:28:54.547+0000 INFO Waiting to hear from leader...
2019-02-07 06:29:22.548+0000 INFO Waiting to hear from leader...
2019-02-07 06:29:50.549+0000 INFO Waiting to hear from leader...
2019-02-07 06:30:18.550+0000 INFO Waiting to hear from leader...
2019-02-07 06:30:46.551+0000 INFO Waiting to hear from leader...
2019-02-07 06:31:14.551+0000 INFO Waiting to hear from leader...
2019-02-07 06:31:42.552+0000 INFO Waiting to hear from leader...
2019-02-07 06:32:10.554+0000 INFO Waiting to hear from leader...
2019-02-07 06:32:38.555+0000 INFO Waiting to hear from leader...
2019-02-07 06:33:06.555+0000 INFO Waiting to hear from leader...
2019-02-07 06:33:34.556+0000 INFO Waiting to hear from leader...
2019-02-07 06:34:02.557+0000 INFO Waiting to hear from leader...
2019-02-07 06:34:30.558+0000 INFO Waiting to hear from leader...
2019-02-07 06:34:50.673+0000 INFO Lost connection to /172.31.38.24:7000 [raft version:2]
2019-02-07 06:34:58.558+0000 INFO Waiting to hear from leader...
2019-02-07 06:35:26.559+0000 INFO Waiting to hear from leader...
2019-02-07 06:35:54.560+0000 INFO Waiting to hear from leader...
2019-02-07 06:36:22.560+0000 INFO Waiting to hear from leader...
2019-02-07 06:36:50.561+0000 INFO Waiting to hear from leader...
2019-02-07 06:37:18.562+0000 INFO Waiting to hear from leader...
2019-02-07 06:37:39.238+0000 ERROR Failed to start Neo4j: Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase@7a0ef219' was successfully initialized, but failed to start. Please see the attached cause exception "null". Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase@7a0ef219' was successfully initialized, but failed to start. Please see the attached cause exception "null".
org.neo4j.server.ServerStartupException: Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase@7a0ef219' was successfully initialized, but failed to start. Please see the attached cause exception "null".
at org.neo4j.server.exception.ServerStartupErrors.translateToServerStartupError(ServerStartupErrors.java:45)
at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:184)
at org.neo4j.server.ServerBootstrapper.start(ServerBootstrapper.java:123)
at org.neo4j.server.ServerBootstrapper.start(ServerBootstrapper.java:90)
at com.neo4j.server.enterprise.CommercialEntryPoint.main(CommercialEntryPoint.java:22)
Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component 'org.neo4j.server.database.LifecycleManagingDatabase@7a0ef219' was successfully initialized, but failed to start. Please see the attached cause exception "null".
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:473)
at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:111)
at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:177)
... 3 more
Caused by: java.lang.RuntimeException: Error starting org.neo4j.graphdb.facade.GraphDatabaseFacadeFactory, /home/sakshi/neo4j-enterprise-3.5.2/data/databases
at org.neo4j.graphdb.facade.GraphDatabaseFacadeFactory.initFacade(GraphDatabaseFacadeFactory.java:216)
at com.neo4j.causalclustering.core.CommercialCoreGraphDatabase.(CommercialCoreGraphDatabase.java:28)
at com.neo4j.server.database.CommercialGraphFactory.newGraphDatabase(CommercialGraphFactory.java:36)
at org.neo4j.server.database.LifecycleManagingDatabase.start(LifecycleManagingDatabase.java:78)
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:452)
... 5 more
Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component 'org.neo4j.causalclustering.core.consensus.membership.MembershipWaiterLifecycle@5ec4ff02' was successfully initialized, but failed to start. Please see the attached cause exception "null".
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:473)
at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:111)
at org.neo4j.graphdb.facade.GraphDatabaseFacadeFactory.initFacade(GraphDatabaseFacadeFactory.java:211)
... 9 more
Caused by: java.lang.RuntimeException: Server failed to join cluster within catchup time limit [600000 ms]
at org.neo4j.causalclustering.core.consensus.membership.MembershipWaiterLifecycle.start(MembershipWaiterLifecycle.java:55)
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:452)
... 11 more
Caused by: java.util.concurrent.TimeoutException
at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
at org.neo4j.causalclustering.core.consensus.membership.MembershipWaiterLifecycle.start(MembershipWaiterLifecycle.java:43)
... 12 more
2019-02-07 06:37:39.239+0000 INFO Neo4j Server shutdown initiated by request

please suggest me where i am going wrong

david_allen · February 7, 2019, 10:13pm

Please consider taking cluster formation issues to another thread, but briefly I should say that forming a cluster of 2 is not a good idea; clusters should have odd numbers of members, with 3 as a minimum. This allows HA guarantees on the data. In order to write you require a quorum of members to agree, and if one machine out of a 2 node cluster goes away, you cannot get a majority anymore and you'll have a read-only database.

Additionally -- neo4j has some configuration items that look for a minimum cluster size before forming, which for these reasons is typically 3.

sakshiitw09 · February 8, 2019, 6:58am

Sir using three nodes for creating cluster ,getting similar error.

2019-02-08 06:45:25.715+0000 INFO Discovered core member at 172.31.38.37:5000
2019-02-08 06:45:38.780+0000 INFO Connected to /172.31.38.24:7000 [raft version:2]
2019-02-08 06:45:54.197+0000 INFO Waiting to hear from leader...
2019-02-08 06:45:58.325+0000 INFO Lost connection to /172.31.38.24:7000 [raft version:2]
2019-02-08 06:45:58.635+0000 WARN Lost core member at 172.31.38.24:5000
2019-02-08 06:46:22.198+0000 INFO Waiting to hear from leader...
2019-02-08 06:46:50.199+0000 INFO Waiting to hear from leader...
2019-02-08 06:47:08.624+0000 INFO Connected to /172.31.38.24:7000 [raft version:2]
2019-02-08 06:47:12.226+0000 INFO Discovered core member at 172.31.38.24:5000
2019-02-08 06:47:18.200+0000 INFO Waiting to hear from leader...
2019-02-08 06:47:23.515+0000 INFO Connected to /172.31.38.37:7000 [raft version:2]
2019-02-08 06:47:46.201+0000 INFO Waiting to hear from leader...
2019-02-08 06:48:14.201+0000 INFO Waiting to hear from leader...
2019-02-08 06:48:42.202+0000 INFO Waiting to hear from leader...
2019-02-08 06:49:10.203+0000 INFO Waiting to hear from leader...
2019-02-08 06:49:38.204+0000 INFO Waiting to hear from leader...
2019-02-08 06:50:06.204+0000 INFO Waiting to hear from leader...
2019-02-08 06:50:34.205+0000 INFO Waiting to hear from leader...
2019-02-08 06:51:02.206+0000 INFO Waiting to hear from leader...
2019-02-08 06:51:30.207+0000 INFO Waiting to hear from leader...
2019-02-08 06:51:58.208+0000 INFO Waiting to hear from leader...
2019-02-08 06:52:26.209+0000 INFO Waiting to hear from leader...
2019-02-08 06:52:54.210+0000 INFO Waiting to hear from leader...
2019-02-08 06:53:22.210+0000 INFO Waiting to hear from leader...
2019-02-08 06:53:50.211+0000 INFO Waiting to hear from leader...
2019-02-08 06:54:18.212+0000 INFO Waiting to hear from leader...
2019-02-08 06:54:46.213+0000 INFO Waiting to hear from leader...
2019-02-08 06:55:14.214+0000 INFO Waiting to hear from leader...
2019-02-08 06:55:33.394+0000 INFO Lost connection to /172.31.38.37:7000 [raft version:2]
2019-02-08 06:55:33.394+0000 INFO Lost connection to /172.31.38.24:7000 [raft version:2]
2019-02-08 06:55:42.215+0000 INFO Waiting to hear from leader...
2019-02-08 06:55:45.499+0000 ERROR Failed to start Neo4j: Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase@5e1d03d7' was successfully initialized, but failed to start. Please see the attached cause exception "null". Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase@5e1d03d7' was successfully initialized, but failed to start. Please see the attached cause exception "null".
org.neo4j.server.ServerStartupException: Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase@5e1d03d7' was successfully initialized, but failed to start. Please see the attached cause exception "null".
at org.neo4j.server.exception.ServerStartupErrors.translateToServerStartupError(ServerStartupErrors.java:45)
at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:184)
at org.neo4j.server.ServerBootstrapper.start(ServerBootstrapper.java:123)
at org.neo4j.server.ServerBootstrapper.start(ServerBootstrapper.java:90)
at com.neo4j.server.enterprise.CommercialEntryPoint.main(CommercialEntryPoint.java:22)
Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component 'org.neo4j.server.database.LifecycleManagingDatabase@5e1d03d7' was successfully initialized, but failed to start. Please see the attached cause exception "null".
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:473)
at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:111)
at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:177)
... 3 more
Caused by: java.lang.RuntimeException: Error starting org.neo4j.graphdb.facade.GraphDatabaseFacadeFactory, /home/sakshi/neo4j-enterprise-3.5.2/data/databases
at org.neo4j.graphdb.facade.GraphDatabaseFacadeFactory.initFacade(GraphDatabaseFacadeFactory.java:216)
at com.neo4j.causalclustering.core.CommercialCoreGraphDatabase.(CommercialCoreGraphDatabase.java:28)
at com.neo4j.server.database.CommercialGraphFactory.newGraphDatabase(CommercialGraphFactory.java:36)
at org.neo4j.server.database.LifecycleManagingDatabase.start(LifecycleManagingDatabase.java:78)
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:452)
... 5 more
Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component 'org.neo4j.causalclustering.core.consensus.membership.MembershipWaiterLifecycle@5ec4ff02' was successfully initialized, but failed to start. Please see the attached cause exception "null".
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:473)
at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:111)
at org.neo4j.graphdb.facade.GraphDatabaseFacadeFactory.initFacade(GraphDatabaseFacadeFactory.java:211)
... 9 more
Caused by: java.lang.RuntimeException: Server failed to join cluster within catchup time limit [600000 ms]
at org.neo4j.causalclustering.core.consensus.membership.MembershipWaiterLifecycle.start(MembershipWaiterLifecycle.java:55)
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:452)
... 11 more
Caused by: java.util.concurrent.TimeoutException
at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
at org.neo4j.causalclustering.core.consensus.membership.MembershipWaiterLifecycle.start(MembershipWaiterLifecycle.java:43)
... 12 more
2019-02-08 06:55:45.500+0000 INFO Neo4j Server shutdown initiated by request

Topic		Replies	Views
Causal Cluster not forming Neo4j Graph Platform	5	5734	October 18, 2018
Neo4j Causal Cluster fails to form despite service showing ok Cluster cluster	0	236	May 20, 2024
Writing Data In Neo4J Causal Clustering Implementation Neo4j Graph Platform	2	402	December 28, 2020
Cluster is wrong Cluster	2	1092	July 14, 2020
Want to know Types of Clustering Neo4j supports Operations	6	1371	June 15, 2020

Causal clustering plugins and write transactions

Related topics