Hey, I have a data set with ~1 million nodes and 1 million relations. My goal now is to return all of the data using the neo4j and cypher(QL) but I get a lot of redundancy. My set contains Posts, Authors, Comments and InLinks. The relations are:
- Post ->WRITTEN_BY->Author;
- Comment ->POSTED_ON->Post;
- InLinks ->LINKS->Post.
My cypher is this:
match (p:Posts)-[:WRITTEN_BY]->(author), (c:Comments)-[:POSTED_ON]->(p), (i:InLinks)-[:LINKS]->(p) RETURN p as Post, c as Comment, i as Inlink, author LIMIT 100
and I get an output like:
Post Comment InLink Author
PostID1 CommentID1 InLinkID1 AuthorID1 (for example)
PostID1 CommentID1 InLinkID2 AuthorID1 (for example)
PostID1 CommentID2 InLinkID3 AuthorID1 (for example)
PostID1 CommentID2 InLinkID4 AuthorID1 (for example)
PostID1 CommentID1 InLinkID5 AuthorID1 (for example)
as you can see here, for the same Post and Author but different InLink and Comment I'm getting redundancy aka I'm getting data that is repeating non necessarily. What I'm trying to achieve is
Post Comment InLink Author
PostID1 CommentID1 InLinkID1 AuthorID1 (for example)
InLinkID2 AuthorID1 (for example)
InLinkID5 AuthorID1 (for example)
-------------------------------------------------------------------------
PostID1 CommentID2 InLinkID3 AuthorID1 (for example)
InLinkID2 AuthorID4 (for example)
Any tips on how to do this?
Thank you