Load CSV - Loading relationships in one column, but same type of node

Hello,

I am working through understanding the Load CSV options available in the Neo4j Desktop. I have been able to load most of the CSV information that I have in the file, but I currently have a situation I've not been able to figure out. I have the IDs of all my product nodes in my first column of the CSV. Most of the next columns represent properties on those product nodes. However, in one column, I have the ID of another product node that has a relationship with that row's product node.

With my basic cypher understanding, it's essentially something like this:

Match (P from first column)
Match (P from other column)
Merge (P with ID in first column) <-[rel: attached_to]- (P with ID in another column)
Return count(rel)

The documentation I've read has helped me to create mass relationships between product nodes and other types of nodes, like merchant nodes, but not product nodes and other product nodes on the same row but in a different column. I could do it one by one manually, but given I have 1000s of nodes, doing a mass CSV upload would be much easier.

Thoughts? Sorry, I'm new to this section and there is likely a very easy explanation. I'd appreciate any help you might have!

Hi Isaac,

welcome to the Neo4j community!

I think the easiest solution here is to

  1. read in all properties of one line into a product node (even the id of another product it is connected with)
  2. Match on products that have a connected product and connect them.

Something like

LOAD CSV WITH HEADERS FROM 'file:///products.csv' as row
CREATE (p:Product {id:row.id, name: row.name, idConnected: row.idConnected})

and

MATCH (p1:Product), (p2:Product {id:p1.idConnected}) 
CREATE (p1)<-[:ATTACHED_TO]-(p2)

This also has a further advantage. You do not have to make sure that the ids in your other column only relate to products that have already been loaded, e.g. what if the product with id 1 is attached to product with id 2. At the moment that you load the product with id 1, the product with id 2 is not yet in your database and hence no connection is created between the two. That way you would loose data.

Regards,
Elena

Hi Elena,

So, here's what I've got at this point. Not sure where to go from here:

LOAD CSV WITH HEADERS FROM 'file:///simplified.csv' AS row
WITH toInteger(row.id) AS id, toInteger(row.offers) as offers
MATCH (p1:Product {id:id})
MATCH (p2:Product {id:p1.offers})
MERGE (p1)<-[rel:OFFERED_TO]-(p2)
RETURN count(rel)

Please post some consecutive records from your .csv file so that I can offer some help. Please include column headers.

You may need to call toInteger() function on numeric ID fields that you are importing.

CSV files treat all column values as string. Syntactically with or without toInteger is always correct. Only difference is that using toInteger function explicitly stores the value as integer.

Hi there,

Thank you for the information. I was using the toInteger function according to how I understood it. I ended up changing up the naming convention on the "id" column to be "productId" so I could differentiate between the id I assigned and id that NEO4j natively assigns the nodes. I re-ran the script an all the sudden it worked as I had written it.... I'll be honest, I don't know why it is working now and it wasn't working previously. Thank you both for your input. Here is the script that ended up doing what I needed it to do:

LOAD CSV WITH HEADERS FROM 'file:///simplified.csv' AS row

WITH toInteger(row.productId) AS productId, toInteger(row.offers) as offers

MATCH (p1:Product {productId:productId})

MATCH (p2:Product {productId:p1.offers})

MERGE (p1)<-[rel:OFFERED_TO]-(p2)

RETURN count(rel)

Glad to hear that! All is well that ends well!!