CPU Load (1k to 1.5k/sec) on 3xr4.large VMs on AWS

Hi Guys,

We have a production cluster running on 3xr4.large machines based on this template:
https://aws.amazon.com/marketplace/pp/B07D441G55

As usage on our APIs have been increasing, seeing around 1k to 1.5k requests per second. During which the CPU goes on high load and subsequently the response times become slower.

As seen in the below report, response time is seen to be increased up to – 20seconds, at the time increase in Total API counts to more than 1k/sec.

Thich is affecting CPU load to increase at the time. (more than 100%)
image

Logs report

image

CPU Load

Would like to understand if there are suggestions for the server specs for this current load?

Any help is much appreciated!

Thanks,
Arnab

Most of the times when CPU usage is high, it could be related to Query inefficiency and memory usage. If Page cache is small, it can cause lot of Page cache faults, that can cause disk access. Also, if memory is being allocated and deallocated Java GC can take up too much CPU time.

Can you provide these details to be able to assits?

  1. Try ":sysinfo" and provide the system details.
  2. Try ":schema" to show the indexes
    3.. Sample queries being run and their profile details.

Thanks for response. But, the queries are profiled and they respond fine under lower load. Page cache is also configured correctly based on Halin monintoring recommendations.

Here are the details:

  1. Store Sizes
    Count Store 16.38 KiB
    Label Store 16.02 KiB
    Index Store 218.74 MiB
    Schema Store 8.01 KiB
    Array Store 26.27 MiB
    Logical Log 226.13 MiB
    Node Store 157.06 MiB
    Property Store 1.21 GiB
    Relationship Store 449.86 MiB
    String Store 8.46 GiB
    Total Store Size 14.14 GiB

ID Allocation
Node ID 10978645
Property ID 31570607
Relationship ID 13774798
Relationship Type ID 40

Page Cache
Faults 328607
Evictions 0
File Mappings 1564
Bytes Read 2523463369
Flushes 8981477
Eviction Exceptions 0
File Unmappings 1533
Bytes Written 125153474134
Hit Ratio 100.00%
Usage Ratio 25.77%

Transactions
Last Tx Id 18484578
Current 2
Peak 9
Opened 36053580
Committed 36044407

Causal Cluster Members
Roles Addresses Actions
LEADER bolt://10.2.x.x:7687, http://10.2.x.x:7474, https://10.2.x.x:7473 Open
FOLLOWER bolt://10.2.x.x:7687, http://10.2.x.x:7474, https://10.2.x.x:7473 Open
FOLLOWER bolt://10.2.x.x:7687, http://10.2.x.x:7474, https://10.2.x.x:7473 Open

  1. Indexes
    ON :app(uuid) ONLINE
    ON :brand(st_id) ONLINE
    ON :brand(uuid) ONLINE
    ON :business_entity(uuid) ONLINE
    ON :location(uuid) ONLINE
    ON :platform(uuid) ONLINE
    ON :profile(customer_id) ONLINE
    ON :profile(uuid) ONLINE
    ON :profile_attribute(key) ONLINE
    ON :role(name) ONLINE
    ON :service(uuid) ONLINE
    ON :store(uuid) ONLINE
    ON :store_attribute(key) ONLINE
    ON :store_attribute(uuid) ONLINE

No constraints

  1. Sample query
    MATCH (st:store {uuid: 'c5cd8c60-c3da-11e9-bff4-038e23a94533'})-[:HAS_BRAND]->(s_br:brand {uuid: 'b2a8c210-e70f-11e8-8ac5-e735f388a4d7'}), (s:app {uuid: 'dd669910-e0d3-11e8-b431-1d83bfd113a3'})-[:HAS_PROFILE]->(p:profile {customer_id: '1009798810--dd669910-e0d3-11e8-b431-1d83bfd113a3'})-[rel:ROLE]->(r:role {name: 'Consumer'}) MERGE (p)-[rel_offline_pref_brand:OFFLINE_PREFERRED_BRAND]->(s_br) ON CREATE set rel_offline_pref_brand.created = '1583391765447', rel_offline_pref_brand.last_updated = '1583391765447' ON MATCH set rel_offline_pref_brand.last_updated = '1583391765447' RETURN COLLECT(DISTINCT s_br.uuid) as offline_preferred_brands

Thanks!
Arnab