Multiple interest/tag match performance

urubatan · February 21, 2020, 9:15pm

Hello everyone, I'm pretty new to Neo4j, thank you in advance for any help
Let me try to explain my problem here:
I have stories that have many tags, each tag has one element (an element might be a company, location, ...)
then I have newsletters, each newsletter has many interests, and every interest have one or more (has_many) elements
the element has newsletters and tags end of relationship mapped

to build a newsletter with simple interests (simple means each interest has only one element), the performance right now is really fast and the query is pretty easy:
newsletter.interests.element.tags.story.as(:s).where('s.published_date > ?', Date.yesterday) works fine and fast
(it maps to the graph query bellow)

Gbrief#interests#elements#tags#story 
  MATCH (gbrief363386)
  WHERE (ID(gbrief363386) = {ID_gbrief363386})
  MATCH (gbrief363386)-[rel1:`interested_in`]->(node3:`Ginterest`)
  MATCH (node3)-[rel2:`interested`]->(node4:`Gelement`)
  MATCH (node4)<-[rel3:`element_tag`]-(node5:`GstoryTag`)
  MATCH (node5)-[rel4:`storytag`]->(s:`Gstory`)
  WHERE (s.published_date > {question_mark_param})
  RETURN s | {:question_mark_param=>Tue, 12 Nov 2019 21:14:45 UTC +00:00, :ID_gbrief363386=>363386}

but that is not good enough, if an interest has more than one element, all elements must match to the tagged elements to match a story

for example, one interest has elements A and C
storyA is tagged with A and B
storyB is tagged with A, B and C
only storyB should match that interest

and I'm using this code:
newsletter.interests.query_as(:int).match('(int)-[intelem:interested]->(elem:Gelement)').with('int,collect(distinct elem) as ielements').match('(elem)<-[rel3:element_tag]-(node5:GstoryTag),(node5)-[rel4:storytag]->(s:Gstory),(s)<-[rel5:storytag]-(node7:GstoryTag),(node7)-[rel6:element_tag]->(selem:Gelement)').with('ielements, s, collect(distinct selem) as selements').where('all(e in ielements where e in selements)').where('s.published_date > ?', dt).pluck(:s)

(it maps to the graph query bellow)

Gbrief#interests 
  MATCH (gbrief363386)
  WHERE (ID(gbrief363386) = {ID_gbrief363386})
  MATCH 
    (gbrief363386)-[rel1:`interested_in`]->(int:`Ginterest`), 
    (int)-[intelem:`interested`]->(elem:`Gelement`)
  WITH int,collect(distinct elem) as ielements
  MATCH (elem)<-[rel3:`element_tag`]-(node5:`GstoryTag`),(node5)-[rel4:`storytag`]->(s:`Gstory`),(s)<-[rel5:`storytag`]-(node7:`GstoryTag`),(node7)-[rel6:`element_tag`]->(selem:`Gelement`)
  WITH ielements, s, collect(distinct selem) as selements
  WHERE 
    (all(e in ielements where e in selements)) AND 
    (s.published_date > {question_mark_param})
  RETURN s | {:question_mark_param=>Tue, 12 Nov 2019 21:14:45 UTC +00:00, :ID_gbrief363386=>363386}

and this is really, really slow
any ideas on how I can improve this? changing the mapping or the query will work

Thank youvery much

yyyguy · February 21, 2020, 10:44pm

Hey @urubatan,

Welcome to the community.

As it relates to your issue, I am trying to make sense of the graph structure that you currently have.

**(Newsletter)-[:INTERESTED_IN]->(Interest)**
**(Interest)-[:HAS]->(Element)**
**(Element)-[:HAS]->(Tag)**
**(Story)-[:HAS]->(Tag)**

Is that about right?

I guess without some real data, it is a bit challenging to know the relevance of each node label.

For example, I understand what each of the node labels are for, except for the Element node label. It seems to me that you might be able to get by without Element nodes in your graph. Again, without understanding your data a bit better, that is one thing that stands out for me.

Is it possible to get a sample of your data (screenshot, etc.) to provide some better direction?

Let us know.

-yyyguy

urubatan · February 21, 2020, 11:01pm

my graph is:

**(Newsletter)-[:INTERESTED_IN]->(Interest)**
**(Interest)-[:INTERESTED]->(Element)** #because one interest might be one element or more, for example I might be interested in stories about Apple, or only stories about Apple that happened in China
**(Story)-[TAGGED_WITH]->(Tag)**
**(Tag)-[HAS_ONE]->(Element)**
**(Element)<-[INTERESTED]-(Interest)**
**(Element)<-[MAS_MANY]-(Tag)** # (the inverse of the other relationship)

Interest has a name property, that is used as a label in the newsletter
for example:

Newsletter#1
  Interest{name: 'Local News'} -[:INTERESTED]->(Element {name: 'Brazil'})
  Interest{name: 'Mercosul Tech'} -[:INTERESTED]->(Element {name: 'Latin America'}, Element{name: 'Technology'})

Story#1
  Tag-[]->(Element {name: 'Uruguai'})
Story#2
  Tag-[]->(Element {name: 'Technology'})
Story#3
  Tag-[]->(Element {name: 'Technology'})
  Tag-[]->(Element {name: 'Brazil'})
Story#4
  Tag-[]->(Element {name: 'Technology'})
  Tag-[]->(Element {name: 'Latin America'})
  Tag-[]->(Element {name: 'Uruguai'})

Story#3 will be in the "Local News" section of the newsletter
Story#4 will be in the "Mercosul Tech" section
Stories 1 and 2 will not be in the newsletter

this is a very simple example, but I think it is enough to show the problem

Thank you very much for any help on this

yyyguy · February 21, 2020, 11:20pm

Thanks for the example you provided. It appears that an Element node holds information relevant to the associated tag. Is this accurate?

It does seem to me that you are using relationships to indicate the cardinality between nodes. Neo4j does not need or use cardinality. I would not recommend using any relationship to indicate cardinality. A relationship should indicate the intent of the relationship, not if it is a one-to-one or one-to-many or many-to-one. You can easily figure out whether a node has one or multiple node relationships.

It seems like you could collapse the Element name into the Tag node if I am understanding what is intended with a tag.

Let me know if any of this makes sense to you.

Cheers,

-yyyguy

urubatan · February 22, 2020, 12:33am

I can remove the tag and associate the story directly with the element
Thanks for pointing that, I'll update the model

The cardinality I was using just to explain the associations and my actual problem

To match all element/tags when an interest has more than one element

Any help is welcome

Topic		Replies	Views
Multiple matches performance drop Neo4j Graph Platform	4	786	January 29, 2019
Best cypher query for having a match on multiple relations Cypher cypher	3	308	January 28, 2021
Count how many times a node match Cypher	5	756	October 18, 2020
Multiple matches in a single query vs single match in multiple queries Cypher querying , optimization , cypher , operations	5	334	April 18, 2023
Chaining MATCH expressions Newbie Questions	4	582	October 8, 2020

Multiple interest/tag match performance

Related topics