Finding duplicate nodes in subgraphs

I have a graph which contains the following path:

(:Book)-[:HAS_BOOK_ISSUE]->(:Book_Issue)->[:HAS_BOOK_PAGES]->(:Book_Page)

I would like to find the duplicates of the (:Book_Page)-nodes, BUT of course in this specific (:Book_Issue) only (because of course all books (and all issues of the books) have a page "1").

But I would like to check all (:Book)-nodes in one query.

Any idea?

THX, JJJ

Sorry, I don't understand where the duplication is.

Does a single Book_Issue have multiple HAS_BOOK_PAGES relationships to the same Book_Page node, so you want to find those Book_Page nodes that are related to the same Book_Issue? This scenario doesn't make sense to me.

The alternative is that a single book has multiple different issues, where some issues related to the same page. If this is the cause, I assume you want to know which pages have multiple issues.

The following should give you the results for the first scenario:

match (book:Book)-[:HAS_BOOK_ISSUE]->(issue:Book_Issue)-[:HAS_BOOK_PAGES]->(page:Book_Page)
with book, issue, page, count(page) as pageCount
where pageCount > 1
return *

The following should give work for scenario 2:

match (book:Book)-[:HAS_BOOK_ISSUE]->(issue:Book_Issue)-[:HAS_BOOK_PAGES]->(page:Book_Page)
with book, page, count(page) as pageCount
where pageCount > 1
return page

Please provide a little more clarity if neither of these is the correct scenario. I feel I don't have a full understanding.

Dear Gary,

THX for your efforts. Let me clarify:

I try to find out (:Book_Pages) which were mistakenly implemented in the DB, so the goal is to find these duplicates and eliminate them.

Of course each (:Book_Issue) should have only one (:Book_Page) with the property {page_no} = 1, but perhaps we (somehow) made a mistake and created a duplicate. Here is a figure with such a mistake:

Finding all (:Book_Page)-nodes and count them and display as a result all those which have a count > 1 (as you normally find duplicates) does not work, because of course all the (:Book_Issue)-nodes have such a (:Book_Page) - as you can see in the figure.

So I would like to obtain only those duplicates in a specific path (or subgraph).

So it's neither of your examples:

  • I'm not looking for multiple relationships
  • I'ts not that more than one (:Book_Issue) is related to the same (:Book_Page)

I hope this clarifies my challenge a bit.

THX a lot.

JJJ

In summary, are you looking for the pages related to the same book issue that have the same page number? this works in this scenario.

data:

create(c:User{name:"User"})
create(ba:Book{title:"titleA"})
create(bb:Book{title:"titleB"})
create(c)-[:HAS_BOOK]->(ba)
create(c)-[:HAS_BOOK]->(bb)
create(i0:Book_Issue{name:"issue_0"})
create(i1:Book_Issue{name:"issue_1"})
create(i2:Book_Issue{name:"issue_2"})
create(p0:Book_Page{page:1})
create(p1:Book_Page{page:2})
create(p2:Book_Page{page:3})
create(p3:Book_Page{page:1})
create(p4:Book_Page{page:2})
create(p5:Book_Page{page:3})
create(p6:Book_Page{page:1})
create(p7:Book_Page{page:2})
create(p8:Book_Page{page:3})
create(p9:Book_Page{page:1})
create(ba)-[:HAS_BOOK_ISSUE]->(i0)
create(bb)-[:HAS_BOOK_ISSUE]->(i1)
create(bb)-[:HAS_BOOK_ISSUE]->(i2)
create(i0)-[:HAS_BOOK_PAGE]->(p0)
create(i0)-[:HAS_BOOK_PAGE]->(p1)
create(i0)-[:HAS_BOOK_PAGE]->(p2)
create(i1)-[:HAS_BOOK_PAGE]->(p3)
create(i1)-[:HAS_BOOK_PAGE]->(p4)
create(i1)-[:HAS_BOOK_PAGE]->(p5)
create(i2)-[:HAS_BOOK_PAGE]->(p6)
create(i2)-[:HAS_BOOK_PAGE]->(p7)
create(i2)-[:HAS_BOOK_PAGE]->(p8)
create(i2)-[:HAS_BOOK_PAGE]->(p9)

match(b:Book)-[:HAS_BOOK_ISSUE]->(i:Book_Issue)-[:HAS_BOOK_PAGE]->(p:Book_Page)
with b, i, p.page as page, collect(p) as pages
where size(pages) > 1
return b.title as bookTitle, i.name as issueDescription, page

Let me know if I missed the mark again.

1 Like

Great, thank you very much!!!

The idea to collect and count them using "size" did not come into my mind...

1 Like

As a note, count would work as well.

match(b:Book)-[:HAS_BOOK_ISSUE]->(i:Book_Issue)-[:HAS_BOOK_PAGE]->(p:Book_Page)
with b, i, p.page as page, count(p) as pages
where pages > 1
return b.title as bookTitle, i.name as issueDescription, page