EC2 Clustering, Hazelcast and Hello!

Hi!

I'm a student using Neo4j, and I'm looking forward to become more versed in it. I have had experience with a few SQL databases, but chose Neo4j because my current dataset and use-case is inherently relationship-based.

I've set up a cluster on EC2 but hit a snag on Hazelcast that I'd appreciate any advice on. I apologize if this is outside the scope of Neo4j itself, but I don't know where else to go yet and Neo4j has been friendly with reaching out so far.

My machines are Amazon Linux t2.micro's running Java 8 and Neo4j 3.5.6. My neo4j.conf follows all the settings listed in the tutorial, and I've been able to connect to these as single-core instances when I turn off the causal cluster options and use the browser interface with my instance's IP address at port 7474.

Here is my error:

2019-06-17 00:45:12.725+0000 INFO  Discovering other core members in initial members set: [34.209.242.210:5000]
Exception in thread "HZ Starting Thread" java.lang.RuntimeException: Hazelcast CANNOT start on this node. No matching network interface found.
Interface matching must be either disabled or updated in the hazelcast.xml config file.

After that, they say they're listening for the total of 2 core members (I'm trying 2 for now) but eventually give up.

After reading this hazelcast documentation I have run sudo find / hazelcast.xml and sudo find / hazelcast-default.xml and the files don't seem to exist. Any thoughts or guidance?

Thank you for reading!

1 Like

Sorry you're running into this. Pasting the uncommented bits of your config would be a big help in trying to find the issue. In particular, I'm wondering what your default_advertised_address is, and what your causal_clustering.* settings look like.

Also, when you encounter this error, what comes next? Does the DB fail to start, or if not -- there should be more to this error dump.

Hi David, thank you for taking a look at this with me.

This is the rest of my config file:

dbms.active_database=graph.db
dbms.directories.data=/var/lib/neo4j/data
dbms.directories.plugins=/var/lib/neo4j/plugins
dbms.directories.certificates=/var/lib/neo4j/certificates
dbms.directories.logs=/var/log/neo4j
dbms.directories.lib=/usr/share/neo4j/lib
dbms.directories.run=/var/run/neo4j
dbms.directories.metrics=/var/lib/neo4j/metrics
dbms.directories.import=/var/lib/neo4j/import
dbms.security.auth_enabled=false

dbms.connectors.default_listen_address=0.0.0.0
dbms.connectors.default_advertised_address=54.xx.xxx.xx
dbms.connector.bolt.enabled=true
dbms.connector.http.enabled=true
dbms.connector.https.enabled=true

dbms.mode=CORE
causal_clustering.minimum_core_cluster_size_at_runtime=3
causal_clustering.initial_discovery_members=34.xx.xxx.xx:5000,54.xx.xxx.xxx:5000,54.xx.xxx.xx:5000
causal_clustering.discovery_listen_address=54.xx.xxx.xx:5000

dbms.jvm.additional=-XX:+UseG1GC
dbms.jvm.additional=-XX:-OmitStackTraceInFastThrow
dbms.jvm.additional=-XX:+UnlockExperimentalVMOptions
dbms.jvm.additional=-XX:+TrustFinalNonStaticFields
dbms.jvm.additional=-XX:+DisableExplicitGC
dbms.jvm.additional=-Djdk.tls.ephemeralDHKeySize=2048
dbms.jvm.additional=-Djdk.tls.rejectClientInitiatedRenegotiation=true
dbms.windows_service_name=neo4j
dbms.jvm.additional=-Dunsupported.dbms.udc.source=rpm

I had been uncommenting the lines about raft and transaction listen address, but leaving those commented cleared away some errors that looked like it was redundant to set those addresses explicitly unless I would be changing them from their defaults. Is it correct that the raft and transaction options will set themselves to their default ports at my listen_address?

I also used to be starting the service with sudo neo4j start but have learned that this is bad practice (it was also causing an error with a missing or uncreated pid file). Now I used sudo systemctl start neo4j and I do not get that pid error. I don't know if this will be relevant as I retry the connection, which I'll be doing today.

All your notes and advice are greatly appreciated! My classmates are all working with Postgres and Mongo so I only have online resources.

1 Like

This is the result of sudo systemctl status neo4j

[ec2-user@ip-172-31-16-75 ~]$ sudo systemctl status neo4j
● neo4j.service - Neo4j Graph Database
   Loaded: loaded (/usr/lib/systemd/system/neo4j.service; disabled; vendor preset: disabled)
   Active: active (running) since Wed 2019-06-19 20:53:15 UTC; 10s ago
 Main PID: 17048 (java)
   CGroup: /system.slice/neo4j.service
           └─17048 /usr/bin/java -cp /var/lib/neo4j/plugins:/etc/neo4j:/usr/share/neo4j/lib/*:/var/lib/neo4j/plugins/* -server -XX:+UseG1GC -XX:-OmitStackTraceInFastTh...

Jun 19 20:53:23 ip-172-31-16-75.us-west-2.compute.internal neo4j[17048]: at com.hazelcast.instance.HazelcastInstanceImpl.createNode(HazelcastInstanceImpl.java:153)
Jun 19 20:53:23 ip-172-31-16-75.us-west-2.compute.internal neo4j[17048]: at com.hazelcast.instance.HazelcastInstanceImpl.<init>(HazelcastInstanceImpl.java:125)
Jun 19 20:53:23 ip-172-31-16-75.us-west-2.compute.internal neo4j[17048]: at com.hazelcast.instance.HazelcastInstanceFactory.constructHazelcastInstance(Hazelcast...va:218)
Jun 19 20:53:23 ip-172-31-16-75.us-west-2.compute.internal neo4j[17048]: at com.hazelcast.instance.HazelcastInstanceFactory.newHazelcastInstance(HazelcastInstan...va:176)
Jun 19 20:53:23 ip-172-31-16-75.us-west-2.compute.internal neo4j[17048]: at com.hazelcast.instance.HazelcastInstanceFactory.newHazelcastInstance(HazelcastInstan...va:126)
Jun 19 20:53:23 ip-172-31-16-75.us-west-2.compute.internal neo4j[17048]: at com.hazelcast.core.Hazelcast.newHazelcastInstance(Hazelcast.java:58)
Jun 19 20:53:23 ip-172-31-16-75.us-west-2.compute.internal neo4j[17048]: at org.neo4j.causalclustering.discovery.HazelcastCoreTopologyService.createHazelcastIns...va:283)
Jun 19 20:53:23 ip-172-31-16-75.us-west-2.compute.internal neo4j[17048]: at org.neo4j.causalclustering.discovery.HazelcastCoreTopologyService.lambda$start0$0(Ha...va:163)
Jun 19 20:53:23 ip-172-31-16-75.us-west-2.compute.internal neo4j[17048]: at java.lang.Thread.run(Thread.java:748)
Jun 19 20:53:23 ip-172-31-16-75.us-west-2.compute.internal neo4j[17048]: 2019-06-19 20:53:23.409+0000 INFO  Waiting for a total of 2 core members...

And this is from the debug.log:

2019-06-19 20:56:13.491+0000 INFO [o.n.m.MetricsExtension] Initiating metrics...
2019-06-19 20:56:13.554+0000 INFO [c.n.c.d.SslHazelcastCoreTopologyService] Cluster discovery service starting
2019-06-19 20:56:13.638+0000 INFO [c.n.c.d.SslHazelcastCoreTopologyService] My connection info: [
	Discovery:   listen=52.xx.xxx.xxx:5000, advertised=52.xx.xxx.xxx:5000,
	Transaction: listen=0.0.0.0:6000, advertised=52.xx.xx.xxx:6000, 
	Raft:        listen=0.0.0.0:7000, advertised=52.xx.xx.xxx:7000, 
	Client Connector Addresses: bolt://52.xx.xxx.xxx:7687,http://52.xx.xxx.xxx:7474,https://52.xx.xxx.xxx:7473
]
2019-06-19 20:56:13.639+0000 INFO [c.n.c.d.SslHazelcastCoreTopologyService] Discovering other core members in initial members set: [52.xx.xxx.xxx:5000, 54.xxx.xxx.xxx:5000]
2019-06-19 20:56:13.699+0000 INFO [o.n.c.c.c.l.s.SegmentedRaftLog] log started with recovered state State{prevIndex=-1, prevTerm=-1, appendIndex=-1}
2019-06-19 20:56:13.700+0000 INFO [o.n.c.c.c.m.RaftMembershipManager] Membership state before recovery: RaftMembershipState{committed=null, appended=null, ordinal=-1}
2019-06-19 20:56:13.700+0000 INFO [o.n.c.c.c.m.RaftMembershipManager] Recovering from: -1 to: -1
2019-06-19 20:56:13.701+0000 INFO [o.n.c.c.c.m.RaftMembershipManager] Membership state after recovery: RaftMembershipState{committed=null, appended=null, ordinal=-1}
2019-06-19 20:56:13.701+0000 INFO [o.n.c.c.c.m.RaftMembershipManager] Target membership: []
2019-06-19 20:56:13.856+0000 INFO [o.n.c.n.Server] raft-server: bound to 0.0.0.0:7000
2019-06-19 20:56:13.859+0000 INFO [o.n.c.d.CoreMonitor] Waiting for a total of 2 core members...
2019-06-19 20:56:23.887+0000 INFO [o.n.c.d.CoreMonitor] Waiting for a total of 2 core members...