cancel
Showing results for 
Search instead for 
Did you mean: 

Do number of properties on a node impact query speeds even if the node is only queried for through a relationship with an indexed node

founders
Node

I am in the process of modeling a graph made up of :User, :Event, :Venue, :Schedule, :Subscription, :Day, :Month and :Year nodes. Here is the hierarchy: (:User)-[:CREATED]->(:Subscription)-[:FOR]->(:Schedule)<-[:BELONGS_TO]-(:Event)-[:OCCUR_ON]->(:Day)-[:OF]-(:Month)-[:OF]->(:Year).

I currently have just 4 indexes which are set on :User.uuid, :Year.number, :Month.name and :Day.number

I have 2 main queries.

  1. Query for all of schedules a user subscribed to.
  2. Query for the events occurring in a given month that belong to schedules that the user subscribed to.

The first query is really fast. I am in the process of setting up the schema for the :Event nodes.
I am pulling information from 4 different apis and merging them into individual :Event nodes. An :Event node could be a sports game, tv show episode, artist performance, podcast episode or a local event off ticketmaster or eventbrite. No matter where the event comes from, it would have a start, end date and time, name and description. However for certain types of events, such as a sports game. There is additional metadata that I would like to add to the graph such as scores, stats, game leaders, line scores, highlights and news. I have 2 paths to go down. I can turn each type of metadata into its own node and assign it a label that I would create like :Leader, :Stat, :LineScore etc .. or I can flatten all those properties and add them directly to the :Event node. The :Event node would end up having 40+ properties.

The graph is being queried using GraphQL. The reason I am hesitant to break out the metadata into individual nodes is because of the way GraphQL would resolve the nodes. It would query for events, and then run additional queries to get the additional properties if requested, such as resolving game leaders or stats.

I have read elsewhere that having too many properties can result in slow execution times since neo4j would iterate over each of the properties till it comes across the property to MATCH against. I don't believe this would apply in my case. Since I am not querying for :Event nodes directly. I am querying for a :Day first and then finding the events occurring on that day that belong to a schedule that a user subscribed to. The only indexes needed here would be on :Year.number, :Month.number, :Day.number and :User.uuid. There are a very small number of year, month and day nodes, so it should be quick to find the day. Then I can find the user relatively quickly since its indexed on uuid. At this point the number of events have been filtered down significantly making it easy to find the events that belong to a schedule the user subscribed to.

Would it be okay for only the sports :Event nodes to have 40+ metadata properties since there will never be a query that would query for an :Event directly other than by uuid. Where I can create an index on :Event.uuid.

Here is an example query:
MATCH (d:Day)-[:OF]->(m:Month)-[:OF]->(y:Year)
WHERE m.number = $month, y.number = $year
WITH d
MATCH (u:User{uuid: $userUUID})-[:CREATED]->(sub:Subscription)-[:FOR]->(s:Schedule)
WITH d, s
MATCH (s)<-[:BELOGNS_TO]-(e:Event)-[:OCCURS_ON]->(d)
RETURN { event: e }

In summary:
If :Event nodes had 30+ properties. Would it significantly slow down the query above.

In conclusion the 3 additional questions I have are:

  1. Does the kb size of a node impact response/look up times
  2. If a node is indexed by a field, will the number of properties still impact the look up time
  3. Would breaking out metadata into separate nodes result in a significant increase in query response times.

Thanks in advance for reading the essay above. Any help would be greatly appreciated. I am leaning towards just flattening the metadata and just create 30+ fields only on the sports games nodes, in order to save time, and also because I will never query for something like, find other events with similar stats or other games with similar leaders etc ...

1 REPLY 1

Hello,

When simply traversing to or through a node and not accessing it's properties in any way, the number or size of its properties do not matter. During traversal, node values are lightweight, mostly a wrapper around a graph id with some info about its labels and attached relationships. Only when projecting or filtering on properties (and this includes the return of the node itself) will property access happen, and that's when you could be impacted if there are a larger number of properties.

Index lookup does not need to perform property access to function. The index itself already has the property (or properties) for the lookup, we don't have to get the properties from the node during lookup.

Breaking out metadata into separate nodes shouldn't have any impact, especially if the relationships used to reach those separate nodes are not used in your filtering patterns to get the event nodes in question. Selection of the relationships to traverse when matching a pattern is very efficient, so even if there were 1 million other relationships on a node, if your pattern doesn't traverse those relationships, you won't be impacted.

Nodes 2022
Nodes
NODES 2022, Neo4j Online Education Summit

On November 16 and 17 for 24 hours across all timezones, you’ll learn about best practices for beginners and experts alike.