Hi,
Since the upgrade of my database from 3.15.14 to 4.1.0 didn't work, I went ahead and recreated a new 4.1.0 database. I am noticing that the 4.1.0 full text search is significantly slower than 3.15.14. Less than a second as compared to ~20 sec which limits its usefulness. It was my intent to just reproduce what I had before, but in 4.1.0 so I could take advantage of the the latest gds procedures and not have different ones for different data sets.
Should I expect this radically different performance? And is there any way to improve it?
In 3.15.14 I have this schema
Indexes
ON :cpc(art) ONLINE
ON :company(name) ONLINE (for uniqueness constraint)
ON :cpc(cpcClass) ONLINE (for uniqueness constraint)
ON :patent(patNum) ONLINE (for uniqueness constraint)
Constraints
ON ( company:company ) ASSERT company.name IS UNIQUE
ON ( cpc:cpc ) ASSERT cpc.cpcClass IS UNIQUE
ON ( patent:patent ) ASSERT patent.patNum IS UNIQUE
ANd in 4.1.0 I have this
Index Name Type Uniqueness EntityType LabelsOrTypes Properties State
constraint_475f8f44 BTREE UNIQUE NODE [ "company" ] [ "name" ] ONLINE
constraint_5cd72616 BTREE UNIQUE NODE [ "patent" ] [ "num" ] ONLINE
constraint_f3b7a871 BTREE UNIQUE NODE [ "cpc" ] [ "subgroup" ] ONLINE
cpcArt FULLTEXT NONUNIQUE NODE [ "cpc" ] [ "art" ] ONLINE
Constraints
ON ( company:company ) ASSERT (company.name) IS UNIQUE
ON ( patent:patent ) ASSERT (patent.num) IS UNIQUE
ON ( cpc:cpc ) ASSERT (cpc.subgroup) IS UNIQUE
Note: I changed some property names to make more sense for the user. subgroup=cpcClass and patNum =num
Andy
Could you share examples of the queries that are running much slower in 4.1?
Hi,
I will try to outline what is happening.
I am focusing on Bloom. I run this query within Bloom with the 3.15.14 database and get results in 1.76 seconds
In 4.1.0 I run this query and get results in 21.76 seconds
Digging deeper I think the index schemes may not be completely the same
For 3.15.14 Database
Indexes
ON :cpc(art) ONLINE
ON :company(name) ONLINE (for uniqueness constraint)
ON :cpc(cpcClass) ONLINE (for uniqueness constraint)
ON :patent(patNum) ONLINE (for uniqueness constraint)
Constraints
ON ( company:company ) ASSERT company.name IS UNIQUE
ON ( cpc:cpc ) ASSERT cpc.cpcClass IS UNIQUE
ON ( patent:patent ) ASSERT patent.patNum IS UNIQUE
Digging a bit deeper it appears that the cpc(Art) index is a node property and not listed as full text.
{
"description": "INDEX ON :cpc(art)",
"indexName": "index_22",
"tokenNames": [
"cpc"
],
"properties": [
"art"
],
"state": "ONLINE",
"type": "node_label_property",
"progress": 100.0,
"provider": {
"version": "1.0",
And for the 4.1.0 I have
Index Name Type Uniqueness EntityType LabelsOrTypes Properties State
constraint_475f8f44 BTREE UNIQUE NODE [ "company" ] [ "name" ] ONLINE
constraint_5cd72616 BTREE UNIQUE NODE [ "patent" ] [ "num" ] ONLINE
constraint_f3b7a871 BTREE UNIQUE NODE [ "cpc" ] [ "subgroup" ] ONLINE
cpcArt FULLTEXT NONUNIQUE NODE [ "cpc" ] [ "art" ] ONLINE
Constraints
ON ( company:company ) ASSERT (company.name) IS UNIQUE
ON ( patent:patent ) ASSERT (patent.num) IS UNIQUE
ON ( cpc:cpc ) ASSERT (cpc.subgroup) IS UNIQUE
if I call queries within browser via cypher in 4.1.0 (note: I tried both ways of handling the text of the search)
CALL db.index.fulltext.queryNodes("cpcArt", ""atomic layer deposition"") YIELD node RETURN node.subgroup
Started streaming 3 records after 3 ms and completed after 40 ms
CALL db.index.fulltext.queryNodes("cpcArt", "atomic layer deposition") YIELD node RETURN node.subgroup
Started streaming 2107 records after 3 ms and completed after 6 ms, displaying first 1000 rows.
The results are much faster as compared to the Bloom results.
Andy