Hi everybody!
I'm new at Neo4j and i'm trying to learn it in order to start a project for my thesis, i spent a lot of time to manually reorganize a large dataset from xml to csv, then i imported a piece of this db inside Neo4j, i would avoid those duplicates but i canno't find a proper query, can you help me?
Thanks!
Please tell me if you need to know anything about this database or other stuff to undestand better the situation.
Hi @alessio9899,
Welcome to the Community!!
I guess post Creating Nodes using Create Statement your applied
CALL apoc.refactor.mergeNodes(nodes, {properties: "combine"}) YIELD node
If that is the case try something like
MATCH (p:Person)
WITH p ORDER BY p.created DESC // newest one first
WITH p.email, collect(p) as nodes
CALL apoc.refactor.mergeNodes(nodes, {properties: {name:'discard', age:'overwrite', kids:'combine', `addr.*`, 'overwrite',`.*`: 'discard'}}) YIELD node
RETURN node
given @ 15.3. Merge Nodes - Chapter 15. Graph Refactoring
If your already have that data in the DB then you can try split function on the property and take first index [0]
thanks you for answer!
i'll try to explaine you step by step what exactly i did:
- first of all i used this query to load from .CSV:
LOAD CSV WITH HEADERS FROM
"file:///C:/opere_prova.csv"
AS line FIELDTERMINATOR ';'
merge(o:Work{
shortTitle:COALESCE(line.ShortTitle,$defaultValue),
longTitle:COALESCE(line.LongTitle,$defaultValue),
academyName:COALESCE(line.ItacAcademyItem,$defaultValue),
academyId:COALESCE(toInteger(line.AcademyId),$defaultInteger),
format:COALESCE(line.Format,$defaultValue),
cityPublicationNameIt:COALESCE(line.CityPublicationPlaceItalianName,$defaultValue),
cityPublicationNameLat:COALESCE(line.CityPublicationPlaceLatinName,$defaultValue),
cityPublicationNameEng:COALESCE(line.CityPublicationPlaceEnglishName,$defaultValue),
cityPublicationPlaceFictitiousName:COALESCE(line.CityPublicationPlaceFictitiousName,$defaultValue),
cityId:COALESCE(toInteger(line.CityId),$defaultInteger),
publicationYear:COALESCE(line.PublicationYear,$defaultValue),
dedicationPlaceDate:COALESCE(line.DedicationPlaceDate,$defaultValue),
shelfmark:COALESCE(line.Shelfmark,$defaultValue),
content:COALESCE(line.Content,$defaultValue),
subjects:COALESCE(line.Subjects,$defaultValue),
language:COALESCE(line.Language,$defaultValue),
illustration:COALESCE(line.Illustration,$defaultValue),
pagination:COALESCE(line.Pagination,$defaultValue),
marginalia:COALESCE(line.Marginalia,$defaultValue),
authorInfo:COALESCE(line.AuthorsInfo,$defaultValue),
authorPersonId:COALESCE(toInteger(line.AuthorsPersonId),$defaultInteger),
edition:COALESCE(line.Edition,$defaultValue),
contributorInfo:COALESCE(line.ContributorsInfo,$defaultValue),
contributorPersonId:COALESCE(toInteger(line.ContributorsPersonId),$defaultInteger),
censorInfo:COALESCE(line.CensorsInfo,$defaultValue),
censorPersonId:COALESCE(toInteger(line.CensorsPersonId),$defaultInteger),
censorAgreement:COALESCE(line.CensorsAgreement,$defaultValue),
dedicateeInfo:COALESCE(line.DedicateesInfo,$defaultValue),
dedicateePersonId:COALESCE(toInteger(line.DedicateesPersonId),$defaultInteger),
editorInfo:COALESCE(line.EditorsInfo,$defaultValue),
editorPersonId:COALESCE(toInteger(line.EditorsPersonId),$defaultInteger),
artistInfo:COALESCE(line.ArtistsInfo,$defaultValue),
artistPersonId:COALESCE(toInteger(line.ArtistsPersonId),$defaultInteger),
illustratorInfo:COALESCE(line.IllustratorsInfo,$defaultValue),
illustratorPersonId:COALESCE(toInteger(line.IllustratorsPersonId),$defaultInteger),
printerInfo:COALESCE(line.PrintersInfo,$defaultValue),
printerPersonId:COALESCE(toInteger(line.PrintersPersonId),$defaultInteger),
printerOrnament:COALESCE(line.PrinterOrnament,$defaultValue),
engraverReference:COALESCE(line.EngraverReference,$defaultValue),
essayTitle:COALESCE(line.EssayTitle,$defaultValue),
essayContributorInfo:COALESCE(line.EssayContributorInfo,$defaultValue),
essayContributorPersonId:COALESCE(toInteger(line.EssayContributorPersonId),$defaultInteger),
citation:COALESCE(line.Citation,$defaultValue),
workId:COALESCE(toInteger(line.WorkId),$defaultInteger),
notes:COALESCE(line.Notes,$defaultValue)
})
RETURN o
-
i obtained this result: https://ibb.co/6mvYKgS
as you can see there is a lot of nodes while i just need 3 nodes in total, based on uniqueness of WorkId. -
then i executed this:
MATCH (o:Work)
WITH o.shortTitle AS title, collect(o) AS node2Merge
WITH node2Merge, extract(x IN node2Merge | x.dedicateePersonId) AS dedicateePersonId,
extract(x IN node2Merge | x.dedicateeInfo) AS dedicateeInfo,
extract(x IN node2Merge | x.authorPersonId) AS authorPersonId,
extract(x IN node2Merge | x.authorInfo) AS authorInfo,
extract(x IN node2Merge | x.censorPersonId) AS censorPersonId,
extract(x IN node2Merge | x.censorInfo) AS censorInfo,
extract(x IN node2Merge | x.contributorPersonId) AS contributorPersonId,
extract(x IN node2Merge | x.contributorInfo) AS contributorInfo,
extract(x IN node2Merge | x.editorPersonId) AS editorPersonId,
extract(x IN node2Merge | x.editorInfo) AS editorInfo,
extract(x IN node2Merge | x.artistPersonId) AS artistPersonId,
extract(x IN node2Merge | x.artistInfo) AS artistInfo,
extract(x IN node2Merge | x.illustratorPersonId) AS illustratorPersonId,
extract(x IN node2Merge | x.illustratorInfo) AS illustratorInfo,
extract(x IN node2Merge | x.printerPersonId) AS printerPersonId,
extract(x IN node2Merge | x.printerInfo) AS printerInfo,
extract(x IN node2Merge | x.essayContributorPersonId) AS essayContributorPersonId,
extract(x IN node2Merge | x.essayContributorInfo) AS essayContributorInfo,
extract(x IN node2Merge | x.essayTitle) AS essayTitle
CALL apoc.refactor.mergeNodes(node2Merge) YIELD node
SET node.dedicateePersonId = dedicateePersonId
SET node.dedicateeInfo = dedicateeInfo
SET node.authorPersonId = authorPersonId
SET node.authorInfo = authorInfo
SET node.contributorPersonId = contributorPersonId
SET node.contributorInfo = contributorInfo
SET node.censorPersonId = censorPersonId
SET node.censorInfo = censorInfo
SET node.editorPersonId = editorPersonId
SET node.editorInfo = editorInfo
SET node.artistPersonId = artistPersonId
SET node.artistInfo = artistInfo
SET node.illustratorPersonId = illustratorPersonId
SET node.illustratorInfo = illustratorInfo
SET node.printerPersonId = printerPersonId
SET node.printerInfo = printerInfo
SET node.essayContributorPersonId = essayContributorPersonId
SET node.essayContributorInfo = essayContributorInfo
SET node.essayTitle = essayTitle
- i obtained this:
https://ibb.co/DQCntDn
as you can see there are a lot of repetitions, not for every properties but for many of them
any help?