Really weird: query not returning expected data after upgrade from 3.5.16 community to 3.5.19 enterprise

Hi there, encountered a strange situation after my Neo4j upgrade.

This query works as expected:

match (n:AWSTag) where n.key contains "aws:auto" return n.key, n.value order by n.key limit 300

as it returns a list of nodes with n.key = aws:autoscaling:groupName

However, when I just add an "s" to the key filter by running

match (n:AWSTag) where n.key contains "aws:autos" return n.key, n.value order by n.key limit 300, I get 0 results:

I have a second neo4j server running similar data and it shows this exact same problem as well.

Things I've tried:

  • I have tried deleting all indexes on this :AWSTag node but that didn't work.
  • I then tried turning off the server, deleting /var/lib/neo4j/data/databases/graph.db/schema/*, and turning it on again.

How I installed Neo4j Enterprise:

  1. service stop neo4j
  2. Uninstall Neo4j Community 3.5.16, keeping the old data
  3. apt-get install neo4j-enterprise=3.5.19
  4. service start neo4j

All other queries except for those related to :AWSTags work as expected.

which OS do you want to install on, use tags for that too

Ubuntu; whatever AWS is using

neo4j version, desktop version, browser version

3.5.19 Enterprise

what kind of API / driver do you use

The script that loads the data to the graph uses Python driver 1.7.6

screenshot of PROFILE or EXPLAIN with boxes expanded (lower right corner)

PROFILE of the successful query:

PROFILE of the failing query:

which plugins / extensions / procedures do you use

None

Is behaviour different when you first upgrade your community instance to the latest 3.5.x (.20 at the moment) and THEN upgrade to the enterprise version?

Thanks for your reply. Both my instances are on enterprise already so this might be a bit late. I think I could roll back to the original community 3.5.16, test, go to community 3.5.20, test, and then do enterprise 3.5.20.

If I roll back to community 3.5.16 and the issue is still present, what could this mean? Have you seen a similar problem before?

I suspect a bug in how Enterprise parses the CONTAINS argument.

Try using STARTS WITH instead:
https://neo4j.com/docs/cypher-manual/current/clauses/where/#match-string-start

MATCH (n:AWSTag)
WHERE n.key STARTS WITH "aws:autos"
RETURN n.key, n.value
ORDER BY n.key
LIMIT 300

Gave that a try, 0 results.

Here's some other weird behavior:

match (a:AWSTag) where a.key starts with "aws:auto" return distinct a.key
returns all these other results that don't start with 'aws:auto':

If I try an exact field search with
match (a:AWSTag{key:"aws:autoscaling:groupName"}) return a and match (a:AWSTag{key:"aws:ec2:fleet-id"}) return a,


the former returns nothing, and the latter works.

This is more than a little bizarre. I don't know where to start.

Have you tried explicitly defining and querying an index?

Have you tried explicitly defining and querying an index?

Yes, there is an id field on my :AWSTag node that I have created an index for.

I originally discovered this issue when I ran a query to find all tags of an EC2Instance node with
match(n:EC2Instance{publicipaddress:"1.2.3.4"})--(a:AWSTag) return a.id

and noticed that queries on a.id with prefixes of aws:autoscaling:groupName* did not work even though they show up in queries to connected nodes. For example, this returns 0 results even though I know this node exists:

match(t:AWSTag{id:"aws:autoscaling:groupName:MY_GROUP_NAME"} return t

Alex, that looks like a bug, can you please raise it as an issue at

And link to this thread as well?

Did you try to drop and recreate the index?

Could you run: CALL db.stats.retrieve("GRAPH COUNTS") either via cypher-shell or via http:

curl -H accept:application/json -H content-type:application/json -d '{"statements":[{"statement":"CALL db.stats.retrieve(\"GRAPH COUNTS\")"}]}' [http://________](http://________/)_:7474/db/data/transaction/commit > graphCounts.json

and share the results

can you please raise it as an issue

Sure, will do!

Did you try to drop and recreate the index?

Yup. With and without the index I get the same behavior.

Could you run: CALL db.stats.retrieve("GRAPH COUNTS")

Here are the count stats; I'm only including the ones related to AWSTags:

{
  "relationships": [
    {
      "count": 8882403
    },
    { ... snip ... },
    {
      "relationshipType": "TAGGED",
      "count": 97277,
      "endLabel": "AWSTag"
    },
    {
      "relationshipType": "TAGGED",
      "count": 97277,
      "endLabel": "Tag"
    },
    { ... snip ... }
  ],
  "nodes": [
    {
      "count": 2855520
    },
    { ... snip ... },
    {
      "count": 66181,
      "label": "AWSTag"
    },
    {
      "count": 66181,
      "label": "Tag"
    },
    { ... snip ... }
  ],
  "indexes": [
    { ... snip ... },
    {
      "updatesSinceEstimation": 2134,
      "totalSize": 64047,
      "properties": [
        "id"
      ],
      "labels": [
        "AWSTag"
      ],
      "estimatedUniqueSize": 64047
    },
    { ... snip ... }
  ],
  "constraints": []
}

Submitted https://github.com/neo4j/neo4j/issues/12552.

2 Likes

Hi there, I faced the same issue and ended up using Regular Expressions, though I'm not sure how much more resource consuming this is.

Look at the WHEN clause:

	FOREACH (ignoreMe IN CASE WHEN (m.body =~ '(?ms).+Esta informaci.n te result. .til.+') and (m)-[:SENT_BY]->(:Nubot) THEN [1] ELSE [] END | \
		SET m:Recommendation \
    ) \

I hope this helps, in the meantime.

Cheers,

1 Like

Just to add another data point, I noticed that this query does not work in Neo4j:

But it does work in a Linkurious instance connected to that same Neo4j server!

I saw this other post (Differences in Full Text between 3.5 and 4.1 in Bloom) on query result differences in Bloom. Since Linkurious also uses a full text index, I wonder if this is related.

I looked into this issue and found the problem, we were removing ":auto" from cypher queries. The fix will make into this or next release, thanks for reporting this issue!

For a little more context, in the browser we can prefix a query with :auto to change how it's executed (this is required when you run a USING PERIODIC COMMIT LOAD CSV kind of query), and it looks like we were being a little lazy with how we removed the :auto part (since this would be a browser command that only the browser should interpret, and it would need to be removed from the Cypher before sending).