Hi, I am a newbie to Neo4j and struck with following problem.
My Dataset is as below
Object, Size, Color
A,200,White
A,300,Black
A,200,Pink
B,300,White
B,300,Black
The expected output is two nodes A and B with two properties Size and Color but there should not be duplicate size being displayed. For Example Node A has 200 in White and Pink Color but Size property should be showing 200 only once i.e. (200,300) and Color property should be showing (White, Black, Pink). Can somebody please help me with cypher code?
How have you modelled this problem? Or is it actually that the question?
Are you planning on creating instances of each object with property/tag the gives you type while you have specific size color combinations on each one of them? Dense Object nodes with HAS_SIZE and HAS_COLOR to other dense nodes?
Hi bennu - thanks for your reply but I want to make the generic query. I have a spreadsheet with thousands of rows like this. I wrote following code but the problem with it is, it keeps on adding new value irrespective of whether it is already present or not so A gets size property(200,300,200) instead of expected (200,300)
LOAD CSV WITH HEADERS FROM "file:///myfile.csv" AS nodeRecord
MERGE (n: Object { object:nodeRecord.object })
on CREATE SET n.size = [ nodeRecord.size ]
on CREATE SET n.color = [ nodeRecord.color ]
on MATCH SET n.size = n.size + [ nodeRecord.size ]
on MATCH SET n.color = n.color + [ nodeRecord.color ]
Hi Bennu, I am sorry I don't know much about usage of apoc. I am just doing a small poc to learn more about neo4j and cypher and my data set has values as i mentioned in my original post.
I don't see anything wrong with your query that would cause duplicate A and B nodes. I run it locally and got a single A node and a single B node, each with the aggregated data.
As a note, the query creates size and color lists that contain duplicate values because you are pushing new values to each list.
You could use a query like the following to avoid this. The query processes the entire import file first to calculate the aggregates of distinct values, then creates the nodes. Just a note, more memory would be required if you have huge files.
LOAD CSV WITH HEADERS FROM "file:///myfile.csv" AS nodeRecord
with nodeRecord.object as object, collect(distinct nodeRecord.size) as size, collect(distinct nodeRecord.color) as color
merge (n: Object { object:object }) set n.size = size, n.color = color