List lookup vs index property lookup

pieterpaul.strybol · April 18, 2020, 10:26am

Hi all,

I am struggling with an optimization of my Graph Model and I was hoping you could help me!

All cypher query results and profiles have been performed on Neo4j Desktop 1.2.7 and Neo4j 4.0.3

I have a graph model in which I label my Gene nodes CREATE (g:Symbol:Gene {gname:'Gene1}) ,
as some of you may know genes can have quite a large number of aliases and I am on the fence whether I should model them in one of the following ways:

(g:Symbol:Gene)-[:HAS_ALIASES]->(a:Aliases {synonyms:['Alias1', 'Alias2', ...]})
(g:Symbol:Gene)-[:HAS_ALIAS]->(a:Alias {synonym:'Alias1'}), (g:Symbol:Gene)-[:HAS_ALIAS]->(a:Alias {synonym:'Alias2'}), ...

It is important for me that these lookups are as fast as possible because you never known beforehand if an incoming dataset containt all official symbols (eg. HGNC) or synonyms, or even a combination of both.

I have run a quick profile using both approaches and the results were a bit surprising to me. For the test I created 1 gene with 4 aliases using approach (1) and (2). When I profiled my query approach (2) had fewer db hits but took longer than approach (1). And when I indexed the 'synonym' property it took even longer with even fewer db hits?

I thought approach (2) would win for sure because Neo4j is optimized for traversels and not the retrieval of a long list of properties. Can someone explain to me why this is happening? Or suggest a better way of modelling this? Because this problem also translates to other id's, especially Ensembl gene and protein ID's.

Thanks in advance for your feedback!

andrew_bowman · April 20, 2020, 3:45am

If aliases are only used for retrieval, never lookup, then route 1 is what you want, as that will require only a single traversal and a single property lookup, vs n number of traversals and property lookups.

If you need to lookup a :Gene or :Symbol node via an alias, then you need to go with route 2, since you can index :Alias(synonym) to speed up the lookup, but you cannot apply an index to speed up route 1, since elements in a list property can't be individually indexed at this time.

Also, using only 4 entries as a test won't be a good indicator of real performance with actual data. You need to consider how this needs to scale as the number of :Alias nodes increase. An index lookup will always beat a label scan + filter at scale.

pieterpaul.strybol · April 27, 2020, 8:32am

Hi Andrew,

I thought as much, than you very much for your answer!

I'll increase the size of my test graph for further performance profiling.

Topic		Replies	Views
Querying on multiple labels and property values appear not to use indexes Cypher	3	7439	December 5, 2018
Which is more efficient, using properties or traversing relationship? Neo4j Graph Platform cypher	5	670	April 16, 2023
Querying on multiple labels and property values appear not to use indexes Neo4j Graph Platform migrated , cypher-tagged	1	137	December 27, 2022
Tuning of cypher queries for List operations Cypher optimization	3	331	September 2, 2021
Graph Model Design: Label vs Indexed Property is there an update? Modeling	2	431	October 16, 2020

List lookup vs index property lookup

Related topics