Indexing values in a node list property {list: ['index me']}

tony.chiboucas · August 29, 2019, 6:26pm

Indexing nodes by each entry in a list property

Indices are primarily used to improve performance.
Performant Cypher on large and complex data requires careful and clever management of lists, maps, and collections.
Neo4j 4 will only support basic indices, and fulltext indices.

However, there is no longer any path for indexing values in a list property.

{ mylistprop: ['name1', 'name2','indexme2'] }

There are multiple uses and applications for indexing such that name1, name2, indexme2 are each a key in the index pointing a single node.

Example simple graph

CREATE (:Thing {name: 'thing1', listprop: ['thing1', 'alias', 'alias 2']})
CREATE (:Thing {name: 'thing2', listprop: ['thing2', 'aka', 'another thing']})

Desired index:

'alias' → (thing1)
'alias 2' → (thing1)
'aka' → (thing2)
'another thing' → (thing2)

Intended Use

CALL apoc.load.json(url) YIELD value
WHERE exists(value.name)
OPTIONAL MATCH (prime:Thing {listprop: value.name})
USING INDEX prime:Thing(listprop)

WITH value as imported, CASE WHEN prime IS NOT NULL THEN prime ELSE value END AS target
MERGE (x:Thing {name: target.name})
SET x = target
SET x.listprop = apoc.coll.toSet(target.listprop + imported.name)
MERGE (:Meta {usefuldetail: 'graph-power'})-[:ABOUT]->(target)

DEPRECATED (will be removed in Neo4j 4.x):
Neo4j, Cypher Manual indexing, and apoc.index.*.

That leaves four ways to accomplish this goal, all of which are bad options:

Create a node for every property in the lists being indexed.
- Significantly inflates your DB, and is not ideal when needing an index for large and complex graphs, which is the primary application for this kind of index.
Use manual indexing, locking the application to Neo4j 3
Build a Lucene anaylzer specific for the purpose.
Convert the list to a single string, replacing spaces with underscores, and useing the whitespace fulltext index analyzer.

I suspect I'm missing something simple, and I may simply go the way of option 1 in the interest of preserving data. However, in many cases, including mine, this creates an n-to-n problem, where the resulting data will be Nodes^n resulting nodes and relationships, effectively many times larger than necessary.

Am I missing something obvious to anyone?

tony.chiboucas · August 30, 2019, 6:09pm

I guess for now, I'll go with the "blow up my database" option, and hopefully find some time to explore adding a better solution into a Neo4j 4.5-ish at some point.

genealogy · May 11, 2020, 12:46pm

I've just started using Neo4j 4.03 and it looks like lists in nodes are stored as strings when the nodes are created by Load CSV. I could unwind them in v3.x, but cannot in v4.x.

Topic		Replies	Views
Is it possible to Index a List property in Neo4j? Neo4j Graph Platform cypher , index	1	1526	May 31, 2019
Multi-valued field indexing Cypher	10	2601	April 2, 2021
Index on array Cypher cypher , index	5	4206	June 30, 2021
Synonyms: how to match a list property? Cypher	3	421	July 25, 2021
Can I store a list or array in Neo4j Cypher operations	14	14055	March 5, 2021

July Summer Fun!

Indexing values in a node list property {list: ['index me']}

Indexing nodes by each entry in a list property

Example simple graph

Related topics