Database design question

blakepascoe · December 23, 2019, 10:59pm

Noob question here. I've been playing around the Neo4j for a few days and I'm trying to wrap my head around database structure. As a test app I'm trying to figure out a good way to represent a peoples careers.

My thought was to create 3 nodes.

People
Company
Position

People -> HAVE_A -> Position
People -> WORK_AT -> Company
Company -> HAS -> Position -> AT -> Company

MY QUESTION
How should you include the time a person works within a Position (ie. Start date, end date)?

My first thought was to include a startDate and an endDate within the Position node, however, that means that I could have the same position multiple times for each person. So I'm not sure if Position should be unique or should I have unique positions and add in a Date node.

Very new to this, so please be kind if I'm missing something fundamental in my thought process.

arthurnamiasdecrasto · December 23, 2019, 11:29pm

First of all welcome
I would suggest to create a person and company node and reflect the position in this case CEO in the relation between the two even you can add a date property to the relation as well
Rgds
Arthur

blakepascoe · December 23, 2019, 11:42pm

So you would create something like

People -> POSITION -> Company

arthurnamiasdecrasto · December 24, 2019, 12:07am

Yes correct; and you can add then different properties to the postion relation such as since; or role etc
Rgds
Arthur

Thomas_Silkjaer · December 24, 2019, 12:10am

I would not use relations for that kind of information, due to the simple fact that properties on relations can't be indexed. Rather go for an extra node in that case: (Person)-[:HAS_POSITION]->(Position)-[:POSITION_AT]->(Company)

blakepascoe · December 24, 2019, 12:44am

Does this mean Position would not be unique and startDate/endDate would be properties within the Position node.

Thomas_Silkjaer · December 24, 2019, 8:38am

Depending on what you plan to use the data for, yes :)

ganesanmithun323 · December 24, 2019, 9:44am

Hi @blakepascoe ,
you can create an intermediate node to store the dates and you can store some related meta data too in this intermediate node and connect this intermediate node to position and company .
(:Person) - [RELATION]-> (I:IntermediateNode{all properties})
(I)-[:POSITION]-> (:Position)
(I)-[:WORK_AT]->(:Company)

Here the intemediate node wont be unique .. only one person connected to each intermediate node ... this will help if you have queries on startdate and enddate

intouch_vivek · December 24, 2019, 1:52pm

Why can't we have startDate and endDate as the properties of HAVE_A

david_allen · December 24, 2019, 2:23pm

You can have startDate and endDate properties on the HAVE_A relationship. But as a previous poster (@Thomas_Silkjaer) was pointing out, if you do this you can't index the properties. Choosing to put the properties on the relationship is IMHO fine, but if you start doing very selective queries that are cutting down your dataset based on criteria dealing with relationship properties, this is where you'll miss the relationship property indexing and where performance may suffer.

intouch_vivek · December 24, 2019, 2:45pm

@blakepascoe , has not given requirement of querying on the startDate /endDate , so we can place them at the edge property. However we have one post on the index on relationship
How can i use index in relationship? - #3 by rjpeters
Else we can place them in the People Node and store them as Map., where key will be fixed (based on the positions available in the organization.

blakepascoe · January 2, 2020, 6:00am

Can someone please help me out with this. In this situation, I have an array of objects that I'm parsing over to populate the database with. Within each object, there are experiences arrays. I keep getting the error. This might be a javascript error. I'm not 100% sure.

Unhandled promise rejection. This error originated either by throwing
inside of an async function without a catch block, or by rejecting a
promise which was not handled with .catch()

I think the error is related to the loop experiences.forEach. It's a loop within the loop. I'm not sure how I can overcome this.

// Create a person node
function addPerson(tx, name) {
  return tx.run("MERGE (a:Person {name: $name})", { name: name });
}

// Create a position node
async function addPosition(tx, name) {
  return tx.run("CREATE (a:Position {name: $name})", { name: name });
}

// Create an employment relationship to a pre-existing Person node.
// This relies on the person first having been created.
function addPositionConnection(tx, personName, positionTitle) {
  return tx.run(
    "MATCH (person:Person {name: $personName}) " + "MATCH (position:Position {name: $positionTitle}) " + "CREATE (person)-[:HAS_POSITION]->(position)",
    {
      personName: personName,
      positionTitle: positionTitle
    }
  );
}

(async function neo4jInsertData() {
  try {
    const profiles = [
      {
        userProfile: {
          fullName: "Test name 1"
        },
        experiences: [
          {
            title: "CEO"
          }
        ]
      },
      {
        userProfile: {
          fullName: "Test name 2"
        },
        experiences: [
          {
            title: "CTO"
          }
        ]
      }
    ];

    for (var x = 0; x < profiles.length; x++) {
      const name = profiles[x].userProfile.fullName;
      const experiences = profiles[x].experiences;

      const session1 = driver.session(neo4j.WRITE);

      const first = session1
        .writeTransaction(tx => addPerson(tx, name))
        .then(tx => {
          try {
            experiences.forEach((tx, experience, index) => {
              addPosition(tx, experience.title).then(() => addPositionConnection(tx, name, experience.title));
            });
          } catch (err) {
            console.log("err", err);
          }
        })
        .then(() => {
          console.log("closing");

          return session1.close();
        });
    }
  } catch (err) {
    console.log("there is an error", err);
  }
})();

cnhx27 · January 2, 2020, 10:51am

In general, when you're using an API like forEach that doesn't do anything with the return value (or doesn't expect a promise as a return value), either:

Don't pass it an async function, or
Ensure that you handle errors within the function itself

Otherwise, you'll get unhandled errors if something goes wrong in the function.

Tries to use a normal for loop or for...of

tony.chiboucas · January 6, 2020, 11:02pm

When it comes to the design, you want to plan for efficient queries, and a well structured graph. There's a few things I would first point out:

Label names: what kind of thing is this one node? Person rather than People.
Relationships directions are the first method of efficient queries:
- Many in, few out.
  - MATCH (:A)-[:X]->(:B) will fetch all out relationships and nodes, then filter out non :X matches, then filter out all non :B matches
- Arrow from child to parent. (:Child)-[]->(:Parent)
- When in doubt, arrow in the direction you'll be querying.
As with all things, first carefully check that you're asking the right questions, and thinking about all the different things you're representing.

Company -> HAS -> Position -> AT -> Company
You're creating an unnecessary bi-directional relationship, the :HAS and :AT mean exactly the same thing. Following the Rules above, it can be simplified to Position -> AT -> Company.

What are you representing?

Positions in companies.
People in positions.
People in positions at companies.
People assigned to positions at companies.

If we follow the rules above:
Person ← Assignment → Position → Company

Set the person's start/end date, schedule, pay, etc. in an Assignment. Even availability dates in the Position. This could then be expanded to include titles, rolls, meetings, scheduling, etc. More, it makes your queries simple.

Available positions?

OPTIONAL MATCH (p:Position)<-[]-(a:Assignment)
WHERE a = null OR date(a.endDate) < date()
WITH p
MATCH (p)-[]->(c:Company)
RETURN p.name, c.name;

blakepascoe · January 8, 2020, 2:47am

Thanks for this @tony.chiboucas. Some great explanations in here.

Topic		Replies	Views
Help with modeling customer - employer - employment details Modeling performance , knowledge-base	0	291	May 8, 2022
Graph Data Modeling Question Modeling performance , neo4j-desktop , modeling , data-modeling	12	1276	May 4, 2021
Use cases for node properties vs relationship properties Newbie Questions performance	4	600	January 8, 2021
Guide on data modeling Neo4j Graph Platform	1	898	May 3, 2019
Use node-data in relation Modeling	2	912	September 18, 2018

Database design question

What are you representing?

Available positions?

Related topics