Modelling 'University -> Department -> Program -> Course' hierarchical relationship

I come from the rdbms world and very new to graph databases. I have been struggling with modelling a neo4j database involving Student, University, Department, Program and Course entities. Some examples of these entities in the real world are -

  • 'Carnegie Mellon University' has a 'Department of Computer Science' that offers a 'M.S in Human Assisted Information Systems' program that includes a course in 'Deep Learning'. Jane is a student of this program.

  • 'Massachusetts Institute of Technology' has a 'Department of Information Technology' that offers a 'M.S in Software Engineering' program that includes a course in 'Deep Learning'. Jack is a student of this program and is friends with Jane.

If I start with have a query like 'Show me all universities where friends of Jane are studying a course in Deep Learning'..This is what I tried -
Student - [:IS_STUDYING] -> University
University - [:HAS_DEPARTMENT] -> Department
Department - [:OFFERS] - > Program
Program - [:INCLUDES] - > Course

The challenge I am facing is Department, Program and Course can have the same names. Is it recommended to use an identifier property to differentiate the same programs (or courses) offered by different universities? what would be the best way to model this?

Welcome to the Neo4j community!

To address the unique identifier question: It's always good to have a unique identifier for specific nodes. We can use the course name as an example. I'd set an identifier that will identify a specific course at a specific university for a specific section in a specific year. Then you can include properties like year, semester, section number, and course name. You'd index whichever of the properties you want to search on, like course name. This way you can search by course name and return results from different years, semesters, and universities.

When it comes to modeling the schema of the graph, a lot of depends on what kinds of questions you want to ask and how you want to ask them.

Graph database queries are similar to those in relational databases, but different in important ways. The difference I think you'll find most helpful for this problem is to think of the starting node as your start place, then branching out from there - instead of the relational way of pulling data then filtering (this is explained in more detail in the Patterns section of the Cypher Manual).

For example, in your example query question 'show all universities where friends of Jane are studying a course in Deep Learning', you'd start with finding Jane, then move out from there along the edges/relationships. I'd set up the schema so I could do a query like this:

MATCH (jane: student {name:'Jane'}) - [:friend_of] - (friends:student) - [:taking_class] - (deep_learning: course{title:'Deep Learning'})

The first node (jane) pulls the specific student you're interested in. The relationship [:friends_of] connects you to the nodes (friends) for all her friends. From there you continue the pattern via a relationship [:taking_class] to filter her friends to just the ones taking a Deep Learning class (deep_learning).

From there you can continue the pattern using your schema to find the university name. Based on your example, it'd probably something like:

...(deep_learning: course{title:'Deep Learning'})-[:INCLUDES]-(program)-[:OFFERS]-(department)-[:HAS_DEPARTMENT]-(u:University)
RETURN u.name

Alternately, you could use the friends to go straight to their university:

MATCH (jane: student {name:'Jane'}) - [:friend_of] - (friends:student) - [:taking_class] - (deep_learning: course{title:'Deep Learning'})
WITH friends
MATCH (friends)-[:IS_STUDYING]-(u:University)
RETURN u.name