How do I create relationships based on conditions?

sean.c.townsend · July 1, 2020, 11:32am

I'm attempting to setup my first database with imported data from a CSV but I'm having an issue with creating some relationships. Initially, I had the spreadsheet set up like this:
ID Ingredient Country Price Inclusions Source
1 Something USA 10 4 Book 1,2
1 Other USA 35 29 Book 1,3
1 Something USA 12 9 Book 1,2,3

With my code, though, it not only created a node for each source book, it then created nodes like 'Book 1,2' which scrambled everything. There are multiple items that are present between books. I then tried making new columns to see if that would help,as shown here.

ID Ingredient Country Price Inclusions Source1 Source2 Source3
1 Something USA 10 4 Book 1 Book 2
2 Other USA 35 29 Book 1 Book 3
3 Something France 12 9 Book 1 Book 2 Book 3
4 Other Finland 70 16 Book 2 Book 3
5 Something USA 30 9 Book 2
6 Other Spain 44 192 Book 3

But each time a query has worked, it creates something like 15,000 connections with one node. After extensive trial and error, this is what gets me all the initial relationships to Book 1.

LOAD CSV WITH HEADERS FROM "file:///CombinedTotals.csv" as csvLine
CREATE (n:Ingredient {id: toInteger(csvLine.id), ingredient: csvLine.ingredient, price: toInteger(csvLine.price), inclusions: toInteger(csvLine.inclusions)})
MERGE (country:Country {name: csvLine.country})
CREATE (n)-[:MADE_IN]->(country)
MERGE (source:Source {name: csvLine.source})
CREATE (n)-[:SOURCED_FROM]->(source)

How can I set a conditional to create the additional relationships for existing books that ignores null values?

mdfrenchman · July 1, 2020, 2:07pm

Hi @sean.c.townsend,

In your first dataset, are you wanting source to have an Id or name with 1 or Book 1?
Really what you need to do is preformat and parse that field.

Also, you'll want to look at your graph model. price looks to be country AND ingredient dependent. So you don't want that as an ingredient property. Same for inclusions (although I can't even guess what that's related to).

LOAD CSV WITH HEADERS FROM "file:///CombinedTotals.csv" as csvLine
WITH csvLine, split(csvLine.Source,',') as sourceArray
UNWIND sourceArray as sourceName
MERGE (n:Ingredient {name: csvLine.Ingredient})
MERGE (country:Country {name: csvLine.country})
MERGE (n)-[:MADE_IN]->(country)
MERGE (source:Source {name: sourceName})
MERGE (n)-[:SOURCED_FROM]->(source)

Another approach, and a more efficient one, is to loop over the csv multiple times.
One loop for each relationship you create (so minimum two).
That will prevent the MADE_IN merge from being hit for each source on a row.

LOAD CSV WITH HEADERS FROM "file:///CombinedTotals.csv" as csvLine
MERGE (n:Ingredient {name: csvLine.Ingredient})
MERGE (country:Country {name: csvLine.country})
MERGE (n)-[:MADE_IN]->(country);

// Second cypherstatement, we already created the Ingredient so can do a straight match.
LOAD CSV WITH HEADERS FROM "file:///CombinedTotals.csv" as csvLine
WITH csvLine, split(csvLine.Source,',') as sourceArray
UNWIND sourceArray as sourceName
MATCH (n:Ingredient {name: csvLine.Ingredient})
MERGE (source:Source {name: sourceName})
MERGE (n)-[:SOURCED_FROM]->(source);

Have a good day!
Hope that helps!

-Mike

sean.c.townsend · July 1, 2020, 3:54pm

Thank you for the quick reply. To answer some of your questions, I was just using Book 1 as a placeholder; I'm using the names of the books as nodes, so Book 1 as a source would be "The Dead Rabbit Grocery & Grog", Book 2 is "Death & Co." (this is a mixology project). Second, inclusions is a category to show how many times an ingredient is used per book; so like, Angostura Bitters is used in 178 recipes, hence why I've made inclusions a property of the ingredient. But, Angostura bitters is used in all three books so it has to have relationships to all three nodes.

Here's a screenshot from my csv and you can see how things look.

mdfrenchman · July 1, 2020, 4:08pm

If you want to track something like inclusion that is dependent on both the ingredient and book and have the data at the per book level. Then you would want to store that property in the relationship. Based on your screenshot, that's not the data you have, so inclusion def belongs on the ingredient.

If you had data for inclusion per book. Then this would make more sense.

(:Ingredient)-[:INCLUDED_IN {timesMentioned}]->(:Source)

One other tip:
For your source, when you use split you'll want to make sure to take into account that a book title might contain a comma. That's a frequent gotcha when loading csv data and dealing with comma and splits.

Looks like an interesting project! Good luck!

-Mike

sean.c.townsend · July 1, 2020, 4:17pm

That's actually a big help, thank you. I do have the values for each book in separate spreadsheets, but didn't know how to properly do that sort of thing, so I added all values together and just tried to do the relationships. If I use

(:Ingredient)-[:INCLUDED_IN {timesMentioned}]->(:Source)

would I need to preface that with MERGE or CREATE?

Second, using the code you gave me:

// Second cypherstatement, we already created the Ingredient so can do a straight match.
LOAD CSV WITH HEADERS FROM "file:///CombinedTotals.csv" as csvLine
WITH csvLine, split(csvLine.Source,',') as sourceArray
UNWIND sourceArray as sourceName
MATCH (n:Ingredient {name: csvLine.Ingredient})
MERGE (source:Source {name: sourceName})
MERGE (n)-[:SOURCED_FROM]->(source);

It runs but returns with (no changes, no records). I tried this with and without [ ] around the sources.

sean.c.townsend · July 1, 2020, 4:58pm

Second edit (I feel like chat would be so much easier):

Running the following returns: Added 3 labels, set 3 properties, created 1590 relationships.

LOAD CSV WITH HEADERS FROM "file:///CombinedTotals.csv" as csvLine
WITH csvLine, split(csvLine.source,',') as sourceArray
UNWIND sourceArray as sourceName
MATCH (n:Ingredient)
MERGE (source:Source {name: sourceName})
MERGE (n)-[:SOURCED_FROM]->(source)

But here's where it seems...off. Looking at the table for the entries, each one has multiple "identity" entries for a single ingredient.

{
  "start": {
"identity": 1688,
"labels": [
      "Ingredient"
    ],
"properties": {
"ingredient": "Ardbeg 10-Year",
"id": 464,
"price": 50,
"inclusions": 1
    }
  },
  "end": {
"identity": 1756,
"labels": [
      "Source"
    ],
"properties": {
"name": "Dead Rabbit"
    }
  },
  "segments": [
    {
      "start": {
"identity": 1688,
"labels": [
          "Ingredient"
        ],
"properties": {
"ingredient": "Ardbeg 10-Year",
"id": 464,
"price": 50,
"inclusions": 1
        }
      },
      "relationship": {
"identity": 7883,
"start": 1688,
"end": 1756,
"type": "SOURCED_FROM",
"properties": {

        }
      },
      "end": {
"identity": 1756,
"labels": [
          "Source"
        ],
"properties": {
"name": "Dead Rabbit"
        }
      }
    }
  ],
  "length": 1.0
}

It's showing a relationship to "Dead Rabbit" twice but the spreadsheet has it listed as being from "PDT"; it's not in Dead Rabbit at all. I'm not sure how that's even occurring.

Edit: It appears that every single ingredient has a relationship with every source.

mdfrenchman · July 2, 2020, 11:07am

Hey @sean.c.townsend
Yes chat might be easier there is a slack channel for neo4j.

If you have already created the Ingredient and Source nodes.

LOAD CSV WITH HEADERS FROM "file:///CombinedTotals.csv" as csvLine
MATCH (n:Ingredient {name: csvLine.Ingredient})
MATCH (source:Source {name: csvLine.Source})
MERGE (n)-[:INCLUDED_IN {timesMentioned: csvLine.Mentions}]->(source)

(you could also convert these MATCHes to MERGEs and it work, it'd just be slightly slower).

Question 2:

Two things:
First you've got no property to match n:Ingredient to. So it's going to connect every sourceName to every Ingredient.

Second, it's very likely its also failing because of the comma gotcha. If your csv is using comma as the delimiter (which by default it usually is), then a comma separated list isn't going to play nice. Only the first book is going to show up as csvLine.source.

A good way to check is replace the lines after the UNWIND with RETURN csvLine, sourceName LIMIT 25 and see if it matches up with what you expect.

The csv gotcha is probably the best reason for leaving each row as a single Ingredient, mention, Source.

Hope that works out the issue, or at least gets you in the right direction

Mike

Topic		Replies	Views
Create relationship based on column value from CSV Cypher querying , cypher	3	697	July 13, 2021
Create relationships between nodes from different CSV Cypher	6	580	September 8, 2020
Creating relationships conditionally among nodes of same label Cypher cypher	7	8462	December 18, 2020
Create relationship based on column value from CSV Neo4j Graph Platform migrated	1	156	July 8, 2022
Import csv and create relations between product nodes Import / Export cypher , operations , relationship , import	6	361	September 2, 2021

July Summer Fun!

How do I create relationships based on conditions?

Related topics