Finding duplicate nodes in subgraphs

janezic · January 27, 2024, 3:08pm

I have a graph which contains the following path:

(:Book)-[:HAS_BOOK_ISSUE]->(:Book_Issue)->[:HAS_BOOK_PAGES]->(:Book_Page)

I would like to find the duplicates of the (:Book_Page)-nodes, BUT of course in this specific (:Book_Issue) only (because of course all books (and all issues of the books) have a page "1").

But I would like to check all (:Book)-nodes in one query.

Any idea?

THX, JJJ

glilienfield · January 27, 2024, 4:59pm

Sorry, I don't understand where the duplication is.

Does a single Book_Issue have multiple HAS_BOOK_PAGES relationships to the same Book_Page node, so you want to find those Book_Page nodes that are related to the same Book_Issue? This scenario doesn't make sense to me.

The alternative is that a single book has multiple different issues, where some issues related to the same page. If this is the cause, I assume you want to know which pages have multiple issues.

The following should give you the results for the first scenario:

match (book:Book)-[:HAS_BOOK_ISSUE]->(issue:Book_Issue)-[:HAS_BOOK_PAGES]->(page:Book_Page)
with book, issue, page, count(page) as pageCount
where pageCount > 1
return *

The following should give work for scenario 2:

match (book:Book)-[:HAS_BOOK_ISSUE]->(issue:Book_Issue)-[:HAS_BOOK_PAGES]->(page:Book_Page)
with book, page, count(page) as pageCount
where pageCount > 1
return page

Please provide a little more clarity if neither of these is the correct scenario. I feel I don't have a full understanding.

janezic · January 28, 2024, 7:17am

Dear Gary,

THX for your efforts. Let me clarify:

I try to find out (:Book_Pages) which were mistakenly implemented in the DB, so the goal is to find these duplicates and eliminate them.

Of course each (:Book_Issue) should have only one (:Book_Page) with the property {page_no} = 1, but perhaps we (somehow) made a mistake and created a duplicate. Here is a figure with such a mistake:

Finding all (:Book_Page)-nodes and count them and display as a result all those which have a count > 1 (as you normally find duplicates) does not work, because of course all the (:Book_Issue)-nodes have such a (:Book_Page) - as you can see in the figure.

So I would like to obtain only those duplicates in a specific path (or subgraph).

So it's neither of your examples:

I'm not looking for multiple relationships
I'ts not that more than one (:Book_Issue) is related to the same (:Book_Page)

I hope this clarifies my challenge a bit.

THX a lot.

JJJ

glilienfield · January 28, 2024, 12:59pm

In summary, are you looking for the pages related to the same book issue that have the same page number? this works in this scenario.

data:

create(c:User{name:"User"})
create(ba:Book{title:"titleA"})
create(bb:Book{title:"titleB"})
create(c)-[:HAS_BOOK]->(ba)
create(c)-[:HAS_BOOK]->(bb)
create(i0:Book_Issue{name:"issue_0"})
create(i1:Book_Issue{name:"issue_1"})
create(i2:Book_Issue{name:"issue_2"})
create(p0:Book_Page{page:1})
create(p1:Book_Page{page:2})
create(p2:Book_Page{page:3})
create(p3:Book_Page{page:1})
create(p4:Book_Page{page:2})
create(p5:Book_Page{page:3})
create(p6:Book_Page{page:1})
create(p7:Book_Page{page:2})
create(p8:Book_Page{page:3})
create(p9:Book_Page{page:1})
create(ba)-[:HAS_BOOK_ISSUE]->(i0)
create(bb)-[:HAS_BOOK_ISSUE]->(i1)
create(bb)-[:HAS_BOOK_ISSUE]->(i2)
create(i0)-[:HAS_BOOK_PAGE]->(p0)
create(i0)-[:HAS_BOOK_PAGE]->(p1)
create(i0)-[:HAS_BOOK_PAGE]->(p2)
create(i1)-[:HAS_BOOK_PAGE]->(p3)
create(i1)-[:HAS_BOOK_PAGE]->(p4)
create(i1)-[:HAS_BOOK_PAGE]->(p5)
create(i2)-[:HAS_BOOK_PAGE]->(p6)
create(i2)-[:HAS_BOOK_PAGE]->(p7)
create(i2)-[:HAS_BOOK_PAGE]->(p8)
create(i2)-[:HAS_BOOK_PAGE]->(p9)

match(b:Book)-[:HAS_BOOK_ISSUE]->(i:Book_Issue)-[:HAS_BOOK_PAGE]->(p:Book_Page)
with b, i, p.page as page, collect(p) as pages
where size(pages) > 1
return b.title as bookTitle, i.name as issueDescription, page

Let me know if I missed the mark again.

janezic · January 28, 2024, 2:15pm

Great, thank you very much!!!

The idea to collect and count them using "size" did not come into my mind...

glilienfield · January 28, 2024, 11:18pm

As a note, count would work as well.

match(b:Book)-[:HAS_BOOK_ISSUE]->(i:Book_Issue)-[:HAS_BOOK_PAGE]->(p:Book_Page)
with b, i, p.page as page, count(p) as pages
where pages > 1
return b.title as bookTitle, i.name as issueDescription, page

Topic		Replies	Views
Need cypher a query Cypher	5	477	May 30, 2020
Not detecting repeated nodes Neo4j Graph Platform migrated	7	129	January 20, 2023
Subgraph query in graphDB Cypher cypher	3	267	September 25, 2021
Nodes with duplicate property value Cypher browser	2	295	March 2, 2022
Find all paths from node without duplicated Neo4j Graph Platform browser , cypher	4	313	February 21, 2023

Finding duplicate nodes in subgraphs

Related Topics