Update full-text search index with new node properties

Hi,

my graph schema is partly dynamic and the nodes can have different properties for the same label. For example, a Product node could have an ISBN property if it is a Book, a Serial Number if it is an Electronic Device, or some other product specific attributes.

I want to use full-text search for all Products and their properties. However, I would need to specify the searchable properties beforehand and cannot include my properties that are created during runtime. Is it somehow possible to update an existing full-text index with new properties?

As alternative I came up with the idea to define a MATCH string property on each and every node that is simply a concatenated string of all properties. So I would need to make sure every node creation/change updates this MATCH property and I would need to create a full-text index with just one property MATCH. Is this a good idea?

The way indexes work is that they have to be "pre-populated". That is, when you create an index, the database looks through all of the data that you're indexing and builds an index representation. This makes dynamic properties in an index really tough -- because if there were a way of specifying an index like that, how to properly build the index so that it kept current with your database would be quite complex. Usually people are advised to drop the old index and create a new one when the data model itself changes.

Now, if you have truly dynamic properties, the second-order question is why is the data model shifting so frequently. If the properties you expect on a node changes regularly, you can expect other problems too (like having to adapt your cypher queries frequently in order to find the right records in your database)

Having a match statement dynamically fetch the list of properties is fine and you can do this to feed the index creation process, it's just that after the index is created, that match isn't going to be dynamically executed again. You're still in the same situation where you'd have to later drop the index and re-create it for a different property set.

Sorry this isn't the ideal answer, but the scenario that you outline makes me think that the better thing to do might be to take a step back and question why the property set is changing so frequently, and see what can be done about that.

2 Likes

Hi David,

thank you for the explanation. I understand that this use case is not optimal for indexing and a static property model would be much better.

Let me explain my a my alternativ idea again as I think it's not clear what I meant with this. The idea is to define a property called _MATCHER (could be any other name) which exists on every single node in a graph - not matter which label. And this property will be a concatenated string of all other properties on each node. For example, three nodes:

  1. [Product] ID: 314; Title: Learning Neo4j; ISBN: 9781849517164; _MATCHER: 314 Learning Neo4j 9781849517164
  2. [Product] ID: 628: Name: Apple iPhone 11 Pro; EAN: 0190199388765; _MATCHER: 628 Apple iPhone 11 Pro 0190199388765

In this case I need to create one index on label Product and property _MATCHER and I need to make sure, that every this property will be filled on every node change. So let's say I add another property Weight: 200g to the second node, this value must be appended to the _MATCHER string.

Could be that this idea is absolute non-sense. I familiar with this type of solution from my ERP background where it is common to have an additional column on each table which represents the normalised and uppercased value to use as search value.

1 Like

This could work. The cost will be data duplication (you'll store everything twice) and the possibility that the text matching could be affected by bad concatenation. For example:

ProductID: 21
Title: "Guns and Ammo"
_MATCHER: 21 Guns and Ammo

If you search for "21 Gun Salute" this product probably matches, where it probably shouldn't, possibly because of text concatenation

2 Likes

i'm also facing same issue wrt droping old index & creating new index every time the data sync with neo4j , if u find any alternate solution please let us know

Unfortunately the only solution is to drop the index and re-create it every time.