User->post schema design

Hello,

I just started using neo4j and I have some doubt on how should I model a users->post->comments schema....actually I did something like this:

type User {
  uuid: ID!
  username: String
  posts: [Post] @relation(name: "HAS_POSTS", direction: "OUT")
  comments: [Comment] @relation(name: "POST_COMMENTS", direction: "OUT")
}

type Post {
  uuid: ID!
  text: String
  owner: User @relation(name: "HAS_POSTS", direction: "IN")
  comments: [Comment] @relation(name: "HAS_COMMENTS", direction: "OUT")
}

type Comment {
  uuid: ID!
  text: String
  owner: User @relation(name: "POST_COMMENTS", direction: "IN")
}

saving the the uuid of the referenced object, for example each post has the the owner uuid as property and it has the relation too (the same for the comments) but I'm not 100% that this is correct. Reading this article:

I understand why use the relation is better then the property, but if I want to edit the post and be sure that only the owner of the post has the permission of doing that I was thinking to search the post by the uuid of the post and the uuid of the user and then set the data on that particular node...something like this:

MATCH (p:Post) WHERE p.uuid = post.uuid AND p.owner = $cypherParams.user.uuid
      SET p += post
      RETURN p

is this pattern good? or saving the owner property is useless and can brake the consistency of my data?

many thanks

You can do either, following the relationship would be more efficient as it wouldn't need a secondary index lookup.

just adapt your statement slightly.

MATCH (p:Post)<-[:HAS_POSTS]-(u) WHERE p.uuid = post.uuid AND u.uuid = $cypherParams.user.uuid
      SET p += post
      RETURN p

ok many thanks, just a question, shouldn't be something like this?

      MATCH (p:Post)<-[:HAS_POSTS]-(u:User { uuid = $cypherParams.user.uuid }) WHERE p.uuid = post.uuid
      SET p += post
      RETURN p

using the label too

Francesco

While technically both are correct and you'll get the same result, Michael's omission of the :User label is on purpose, as it should ensure the resulting query plan is ideal.

In Michael's query, the only possible index lookup is on :Post(uuid), since the :User label is missing. This will be a unique index lookup of the :Post with that uuid, and from there a single traversal of :HAS_POSTS since there should only be a single relationship of that type from a post to its creator. Then it would validate that the u node has that uuid.

If both labels are present, then the planner has more options on how to start and traverse, but they likely won't be more efficient. If it started at the u node with a lookup on :User(uuid), that would be a single unique lookup, but then it would have to expand all :HAS_POSTS relationships and either do an ExpandInto operation (if the planner decided to do a unique lookup of p by uuid first`, so we would filter on the resulting node itself), or it would an ExpandAll operation and filter based on the uuid property. Both of these approaches are far less efficient than the resulting plan of Michael's query, as it has to basically expand all posts of a user and filter (either on the node or in the worst case by filtering on all the resulting nodes' properties).

ok, understood....probably I need to study a bit more :sweat_smile:!
Actually I was thinking that giving some more info the query would be faster

many thanks for both of you
Francesco

It really depends on your modeling, and with time we'll get the planner even better. It is true that the planner has more options the more info is available in the query, and the planner does have access to statistics including the whether it's looking at indexes or unique constraints, as well as some relationship cardinality stats. You can always run an EXPLAIN of the query to check what the resulting query plan will be. It may in fact use the same sort of plan as Michael's from earlier.

That said if you know your model you may know some things about your graph that the planner does not. The key fact is that a :Post has only one :HAS_POSTS relationship to its :User, but a :User has many :HAS_POSTS relationships to all its :Posts. Since we know that, and we know we have the uuids of both nodes, we know that the most efficient approach is to start at the :Post and expand (its single relationship) to its :User (and filter on that single node if needed) instead of the other way around, which can expand many more nodes and thus require much more filtering.

Compare the plans and see how it looks!