Database design question

Noob question here. I've been playing around the Neo4j for a few days and I'm trying to wrap my head around database structure. As a test app I'm trying to figure out a good way to represent a peoples careers.

My thought was to create 3 nodes.

  • People
  • Company
  • Position

People -> HAVE_A -> Position
People -> WORK_AT -> Company
Company -> HAS -> Position -> AT -> Company

MY QUESTION
How should you include the time a person works within a Position (ie. Start date, end date)?

My first thought was to include a startDate and an endDate within the Position node, however, that means that I could have the same position multiple times for each person. So I'm not sure if Position should be unique or should I have unique positions and add in a Date node.

Very new to this, so please be kind if I'm missing something fundamental in my thought process.

1 Like

First of all welcome
I would suggest to create a person and company node and reflect the position in this case CEO in the relation between the two even you can add a date property to the relation as well
Rgds
Arthur

So you would create something like

People -> POSITION -> Company

1 Like

Yes correct; and you can add then different properties to the postion relation such as since; or role etc
Rgds
Arthur

I would not use relations for that kind of information, due to the simple fact that properties on relations can't be indexed. Rather go for an extra node in that case: (Person)-[:HAS_POSITION]->(Position)-[:POSITION_AT]->(Company)

2 Likes

Does this mean Position would not be unique and startDate/endDate would be properties within the Position node.

Depending on what you plan to use the data for, yes :)

Hi @blakepascoe ,
you can create an intermediate node to store the dates and you can store some related meta data too in this intermediate node and connect this intermediate node to position and company .
(:Person) - [RELATION]-> (I:IntermediateNode{all properties})
(I)-[:POSITION]-> (:Position)
(I)-[:WORK_AT]->(:Company)

Here the intemediate node wont be unique .. only one person connected to each intermediate node ... this will help if you have queries on startdate and enddate

Why can't we have startDate and endDate as the properties of HAVE_A

You can have startDate and endDate properties on the HAVE_A relationship. But as a previous poster (@Thomas_Silkjaer) was pointing out, if you do this you can't index the properties. Choosing to put the properties on the relationship is IMHO fine, but if you start doing very selective queries that are cutting down your dataset based on criteria dealing with relationship properties, this is where you'll miss the relationship property indexing and where performance may suffer.

  1. @blakepascoe , has not given requirement of querying on the startDate /endDate , so we can place them at the edge property. However we have one post on the index on relationship
    How can i use index in relationship?
  2. Else we can place them in the People Node and store them as Map., where key will be fixed (based on the positions available in the organization.

Can someone please help me out with this. In this situation, I have an array of objects that I'm parsing over to populate the database with. Within each object, there are experiences arrays. I keep getting the error. This might be a javascript error. I'm not 100% sure.

Unhandled promise rejection. This error originated either by throwing
inside of an async function without a catch block, or by rejecting a
promise which was not handled with .catch()

I think the error is related to the loop experiences.forEach. It's a loop within the loop. I'm not sure how I can overcome this.

// Create a person node
function addPerson(tx, name) {
  return tx.run("MERGE (a:Person {name: $name})", { name: name });
}

// Create a position node
async function addPosition(tx, name) {
  return tx.run("CREATE (a:Position {name: $name})", { name: name });
}

// Create an employment relationship to a pre-existing Person node.
// This relies on the person first having been created.
function addPositionConnection(tx, personName, positionTitle) {
  return tx.run(
    "MATCH (person:Person {name: $personName}) " + "MATCH (position:Position {name: $positionTitle}) " + "CREATE (person)-[:HAS_POSITION]->(position)",
    {
      personName: personName,
      positionTitle: positionTitle
    }
  );
}

(async function neo4jInsertData() {
  try {
    const profiles = [
      {
        userProfile: {
          fullName: "Test name 1"
        },
        experiences: [
          {
            title: "CEO"
          }
        ]
      },
      {
        userProfile: {
          fullName: "Test name 2"
        },
        experiences: [
          {
            title: "CTO"
          }
        ]
      }
    ];

    for (var x = 0; x < profiles.length; x++) {
      const name = profiles[x].userProfile.fullName;
      const experiences = profiles[x].experiences;

      const session1 = driver.session(neo4j.WRITE);

      const first = session1
        .writeTransaction(tx => addPerson(tx, name))
        .then(tx => {
          try {
            experiences.forEach((tx, experience, index) => {
              addPosition(tx, experience.title).then(() => addPositionConnection(tx, name, experience.title));
            });
          } catch (err) {
            console.log("err", err);
          }
        })
        .then(() => {
          console.log("closing");

          return session1.close();
        });
    }
  } catch (err) {
    console.log("there is an error", err);
  }
})();

In general, when you're using an API like forEach that doesn't do anything with the return value (or doesn't expect a promise as a return value), either:

  1. Don't pass it an async function, or
  2. Ensure that you handle errors within the function itself

Otherwise, you'll get unhandled errors if something goes wrong in the function.

Tries to use a normal for loop or for...of

When it comes to the design, you want to plan for efficient queries, and a well structured graph. There's a few things I would first point out:

  1. Label names: what kind of thing is this one node? Person rather than People.
  2. Relationships directions are the first method of efficient queries:
    • Many in, few out.
      • MATCH (:A)-[:X]->(:B) will fetch all out relationships and nodes, then filter out non :X matches, then filter out all non :B matches
    • Arrow from child to parent. (:Child)-[]->(:Parent)
    • When in doubt, arrow in the direction you'll be querying.
  3. As with all things, first carefully check that you're asking the right questions, and thinking about all the different things you're representing.

Company -> HAS -> Position -> AT -> Company
You're creating an unnecessary bi-directional relationship, the :HAS and :AT mean exactly the same thing. Following the Rules above, it can be simplified to Position -> AT -> Company.

What are you representing?

Positions in companies.
People in positions.
People in positions at companies.
People assigned to positions at companies.

If we follow the rules above:
Person ← Assignment → Position → Company

Set the person's start/end date, schedule, pay, etc. in an Assignment. Even availability dates in the Position. This could then be expanded to include titles, rolls, meetings, scheduling, etc. More, it makes your queries simple.

Available positions?

OPTIONAL MATCH (p:Position)<-[]-(a:Assignment)
WHERE a = null OR date(a.endDate) < date()
WITH p
MATCH (p)-[]->(c:Company)
RETURN p.name, c.name; 
1 Like

Thanks for this @tony.chiboucas. Some great explanations in here.