Fulltext indexing that can include context

Neo4j + Lucene are a powerful combination. There is one feature I would love to see in this integration. Lucene was built on document stores, where a single document contains a collection of key value pairs. When we read a document we expect all related information to be contained in that document. So when you index documents in a MongoDB database, or records in a MySQL database (i.e. a row of a single table), in both cases you are limited to the key:value pairs contained inside the container.
But a graph database differs from document stores; in a graph you generally benefit from a more exploded view of everything. So a Person node may not contain fields describing that person, instead Person may have relationships to other nodes like Hobbies, Employment, Address...

When I index person, I would like to be able to use more context when I describe my Person to the Lucene index. I would like to be able to make a query to compose the set of fields for my Person node index to include Address.city, Employment.currentEmployer, Hobbies.favorite.

I can't see that there is any way to do this other than to run a query that creates an actual field in my Person node derived from those related nodes, and then base my index on that materialised field. Lucene will accept only one label at a time, and there is no place to specify a query.

Perhaps a great feature would be to be to allow the index creator to include fields from connected nodes.

PS: I do see a challenge in implementing this. The index for the Person node above would have to be aware when the Address node changes so that Person node connected to it would be re-indexed.

Hi,
Say you have (Person)-->(Address) you want to index the combination of it?

In Neo4J full text creation you can give more than one labels. One caveat is that It is at the node level not at path level.

CALL db.index.fulltext.createNodeIndex("PersonAddr",["Person", "Address"],["firstName", "lastName", "line1", "city","state"])

This creates an index that can encompass both Person and Address. Say if you search for "Smith" you can get person nodes which have smith in their name and address which has smith in the line1 or city.

Say if you have Person named Smith living at "1 Smith ln" the search response will include both Person and address nodes in the response.

If you do need to have at the path level, one way is to have the properties on Person node as you surmised. Either you have to manually add those to person node or implement a transaction handler that can update these properties as part of beforeCommit or afterCommit usage.

I get it. That index may return one result that points to a Person node plus one result that points to an Address node, but the Address node could very well be connected to some other Person node. I think in that case I should search only on Address, and traverse to the Person node.

And perhaps one day, Neo4j will find a way to allow us to specify a mother node, connected nodes, and fields on mother node and connected nodes, and then return to us just the mother node.