Loading nested XML elements not working properly

rubinho101 · August 30, 2021, 6:04pm

I am trying to load a XML file with apoc and merge different nested elements into nodes. Unfortunately, in case there are nested elements, my query returns multiple elements per unique ID where I only want one. A different cypher query only returns a Type missmatch.

XML file:

<?xml version="1.0" encoding="UTF-8"?>
<document>
  <objects type="w">
    <spec>
      <id>some_id</id>
      <shortdesc>sd</shortdesc>
      <status>approved</status>
      <version>1</version>
      <sourcefile>source</sourcefile>
      <sourceline>1</sourceline>
      <description>description</description>
      <needs>
        <needso>f</needso>
      </needs>
      <provides>
        <provcov>
          <L2>some_random_number</L2>
          <dstv>1</dstv>
        </provcov>
        <provcov>
          <L2>some_random_number_1</L2>
          <dstv>1</dstv>
        </provcov>
      </provides>
    </spec>

My cypher query I:

call apoc.load.xml("file:/import.xml",'/document/objects',{}, false) yield value as ids
unwind ids._children as RA
unwind RA._children as RA2
unwind RA2._children as RA3
return ids.type as Type,
[item in RA3._children where item._type = 'L2'|item._text] as L2,
[item in RA._children where item._type = 'id'|item._text] as id

As you can see in the results, there two rows (nodes) for the same ID where I actually only want one. I know the unwind transforms the elements to rows, but otherwise I get an error (see cypher query 2 at the end for details).

Type	L2	id
sw	null	[some_id]
sw	[some_random_number]	[some_id]

Any idea to have only one row (node) per parent (id)? Skipping the 'null' cells is not an option because there are some IDs which have a L2 'null' attribute.

Thank you in advance.

Cypher query 2:

call apoc.load.xml("file:/import.xml",'/document/objects,{}, true) yield value
unwind value as RA
unwind RA._objects as RA1
return RA.type as Type,
    [item in RA1._spec where item._type='id'|item._text] as ID,
    [item in RA1._spec._provides._provcov where item._type='L2'] as L2;

Result:

Type mismatch: expected a map but was List

Bennu · August 30, 2021, 11:21pm

Hi,

It's not really clear from your post which are your expected rows.

If the one not desired is the one with null as tag, you can try:

call apoc.load.xml("file:/t.xml",'/document/objects',{}, false) yield value as ids
unwind ids._children as RA
unwind RA._children as RA2
unwind RA2._children as RA3
with *
where any(item in RA3._children where item._type = 'L2')
return ids.type as Type,
[item in RA3._children where item._type = 'L2'|item._text] as L2,
[item in RA._children where item._type = 'id'|item._text] as id

Bennu

rubinho101 · August 31, 2021, 9:40am

Thx for your reply. My expected result would look sth like this:

Type	L2	id
sw	null, [some_random_number, some_random_number_1, ...]	[some_id]

Basically that all duplicated rows for the same ID are reduced and aggregated into one row so I can process them more consistent. Skipping the null rows per default is not an option because there are some elements that have no L2 entry.

Topic		Replies	Views
Duplicated values while returning Cypher apoc , import	2	320	November 29, 2021
Loading Nested XML elements with APOC Procedures & APOC	9	295	February 21, 2023
Nested XML file load Procedures & APOC	3	530	June 8, 2020
Importing nested XML Elements Neo4j Graph Platform migrated	5	110	August 18, 2022
Loading XML data failure General migrated	6	202	February 14, 2023

July Summer Fun!

Loading nested XML elements not working properly

Related topics