Query optimalisation


(Tomswinkels) #7

Sure!


(Tomswinkels) #8

I have running the query that you give me.


(Michael Hunger) #9

Sorry please run it with PROFILE


(Tomswinkels) #10

No problem :slight_smile:

Cypher version: CYPHER 3.5, planner: COST, runtime: SLOTTED. 864 total db hits in 68 ms.


(Tomswinkels) #11

I think the count(results) is the problem in my case. But how can i get than the count of de total results of a filter?


(Michael Hunger) #12

You mean the total count? Yep that requires pulling through all data, which is not effective.
Esp. with the regexp filters which are very inefficient as they require each record to be touched.

Do you really need the total counts, or would be "more than xxx" be enough?


(Michael Hunger) #13

If you instead used a Neo4j 3.5 fulltext index on your messages, you could try to use this:

PROFILE
CALL db.index.fulltext.queryNodes("messages", "someren brand") yield node as message1
MATCH (message:TS_P2000Message)
WHERE 1546281754000 <= message.sended <= 1548960154000
AND message1 = message
RETURN count(*)

(Tomswinkels) #14

I use the total counts now for pagination.

The fulltext index takes very long time, there are 4 mil TS_P2000Message notes.

Maybe the best way is to use

PROFILE
MATCH (message:TS_P2000Message) 
WHERE 1546281754000 <= message.sended <= 1548960154000 
AND message.message =~ ("(?i).*\\b"+"(someren|brand)"+"\\b.*") 
RETURN message
ORDER BY message.sended desc 
SKIP 0 LIMIT 10

But in this case i get all the messages with someren or brand. But in some cases i want say only if someren and rosdaal. And in some cases i even want filter on codes (that is a relation between the message and code). When the word match is ok than i make some new examples.

Another thing what we do is give the relationships of something directly back (in one query). But is it better to do that, or a new request? I think it is better to do a new query but than you have with a limit of 50 results sometimes 1+50*(4 subqueries) = 201 queries for one request.

Now the indexes not used, this takes more time.

PROFILE
MATCH (message:TS_P2000Message)
WHERE 1546281754000 <= message.sended <= 1548960154000
OPTIONAL MATCH (message)-[:HAS_URGENCY]->(urgency:P2000Urgency)
WITH {urgency: {urgency: urgency}, message: message} AS message
RETURN message
ORDER BY message.sended desc
SKIP 0 LIMIT 10

(Michael Hunger) #15

That last statement won't work as you need to use message.message.sended but then you loose all benefits of the index backed ordering.

So you should not shadow the name.

I can't believe that fetching results from an index with only 4M entries takes so long.
How big are your messages usually?


(Tomswinkels) #16

First i have create the fulltext index

CALL db.index.fulltext.createNodeIndex("messages",["TS_P2000Message"],["message"])

PROFILE
CALL db.index.fulltext.queryNodes("messages", "someren brand") yield node as message1
MATCH (message:TS_P2000Message)
WHERE 1546281754000 <= message.sended <= 1548960154000
AND message1 = message
RETURN count(*)

It takes more than 1035732 ms after that time i have killed the query.


(Michael Hunger) #17

What does the EXPLAIN look like?

And what does this look like:

PROFILE
CALL db.index.fulltext.queryNodes("messages", "someren brand") yield node 
RETURN count(*)

(Michael Hunger) #18

Would it be possible to share your db with us for testing?
How big are your message texts usually?


(Tomswinkels) #19

I have run that query:


Cypher version: CYPHER 3.5, planner: COST, runtime: SLOTTED. 1 total db hits in 3049 ms.

I will share my db with Neo4j, how can i do that? I can send a download link.

The messages texts are not big, max 200 characters.


(Michael Hunger) #20

Thanks I also meant the explain from the other statement that didn't finish.
You can send me the link to michael at neo4j.com or via PM.

Thanks so much.


(Tomswinkels) #21

Sorry, see here the explain.

I have send the link to you're mail.


(Michael Hunger) #22

I just ran this with your data on my laptop (with 4G PC and 2G heap configured).

PROFILE
CALL db.index.fulltext.queryNodes("messages", "someren brand") yield node as message
WHERE 1546281754000 <= message.sended <= 1548960154000
RETURN message.uuid

It finished in less than 200ms


(Tomswinkels) #23

Hmmm thats sounds good.

But then i run that query:

Started streaming 2310 records after 39 ms and completed after 22869 ms, displaying first 1000 rows.


(Michael Hunger) #24

What is your memory (heap + page-cache config again)?
What kind of disk are you running?


(Tomswinkels) #25

This are the settings that i use now:

dbms.memory.heap.initial_size=3000m
dbms.memory.heap.max_size=3000m
dbms.memory.pagecache.size=1000m

150GB SSD disk.


(Tomswinkels) #26

I have change the settings to:

dbms.memory.heap.initial_size=2000m
dbms.memory.heap.max_size=2000m
dbms.memory.pagecache.size=4000m

Cypher version: CYPHER 3.5, planner: COST, runtime: SLOTTED. 121489 total db hits in 17191 ms.

Maybe you result is from the second (cache) run?