Query slows down?

I am running an import query using LOAD CSV using PERIODIC COMMIT on an aws using the provided AMI community version.


I am trying to debug a query that has some performance issues.
The query imports overall about 8mio. relations, that it is expected to take some time.

While monitoring the progress through the data directory, I notice that transaction files (250mb) are being created in the beginning first every minute for the first 6 files, but then the next one takes about 20 minutes before the speed again pick ups and goes about back to normal.

I have not systematically verified this, but to me it seems, when the data directory has many transaction files this problem is more likely to occur. After cleaning the transactions files, it seems that this slow down happens later (but still happens).

Debug.log shows a lot of these warnings:

2020-04-03 05:05:08.875+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause {pauseTime=114, gcTime=163, gcCount=1}

heap size is 18G

Question is, if this is expected behaviour, and how I can avoid it.
In the beginning I believed it was related to my query or the data, but this slow down happens at random stages during the import.

Thanks a lot!

GC pauses are expected, especially when processing many or large transactions over a short amount of time, but most of the time they are subsecond such as your example and not worth worrying about.

Be on the lookout when you start seeing frequent multisecond GCs, however, that can be an indicator of poorly tuned queries (usually due to lack of supporting indexes, or lack of batching when it's needed).

Use an EXPLAIN on your periodic commit LOAD CSV query to make sure there are no Eager operators present, as those disable batching and can put pressure on the heap.