I agree, the XML parsing is not intuitive. I modified your data to have multiple prices as shown in your example. Here is the test data. I extracted each attribute into a map that represents a single book. The query products a list of books.
<?xml version="1.0"?>
<catalog>
<book id="bk101">
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<prices>
<price1>44.95</price1>
<price2>77.95</price2>
</prices>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications
with XML.</description>
</book>
<book id="bk102">
<author>Ralls, Kim</author>
<title>Midnight Rain</title>
<genre>Fantasy</genre>
<prices>
<price1>144.95</price1>
<price2>177.95</price2>
</prices>
<publish_date>2000-12-16</publish_date>
<description>A former architect battles corporate zombies,
an evil sorceress, and her own childhood to become queen
of the world.</description>
</book>
<book id="bk103">
<author>Corets, Eva</author>
<title>Maeve Ascendant</title>
<genre>Fantasy</genre>
<prices>
<price1>244.95</price1>
<price2>277.95</price2>
</prices>
<publish_date>2000-11-17</publish_date>
<description>After the collapse of a nanotechnology
society in England, the young survivors lay the
foundation for a new society.</description>
</book>
</catalog>
Query:
call apoc.load.xml("file:///Sample.xml")
yield value
with value._children as books
unwind books as book
with book, book._children as children
with book, children,
[x in children where x._type = 'prices'| x._children][0] as book_prices
with book.id as id,
[x in book_prices | x._text] as prices,
[x in children where x._type = 'title' | x._text][0] as title,
[x in children where x._type = 'author' | x._text][0] as author,
[x in children where x._type = 'genre' | x._text][0] as genre,
[x in children where x._type = 'publish_date' | x._text][0] as date,
[x in children where x._type = 'description' | x._text][0] as description
return {id: id, author: author, title: title, genre: genre, publish_date: date, description: description, prices: prices
} as books
thank you so much for your valuable help, it works ! Just an other question: How could I access the two prices separatelly ( I tried a lot of combinations - no success ) . I can access them with [0] and [1] , but it is not secure way... I want to know, which one was price1 and price2.
Try this. I broke out each price and showed them separately.
call apoc.load.xml("file:///Sample.xml")
yield value
with value._children as books
unwind books as book
with book, book._children as children
with book, children,
[x in children where x._type = 'prices'| x._children][0] as book_prices
with book.id as id,
[x in book_prices where x._type = 'price1' | x._text][0] as price1,
[x in book_prices where x._type = 'price2' | x._text][0] as price2,
[x in children where x._type = 'title' | x._text][0] as title,
[x in children where x._type = 'author' | x._text][0] as author,
[x in children where x._type = 'genre' | x._text][0] as genre,
[x in children where x._type = 'publish_date' | x._text][0] as date,
[x in children where x._type = 'description' | x._text][0] as description
return {id: id, author: author, title: title, genre: genre, publish_date: date, description: description, prices: {price1: price1, price2: price2}
} as books
Ok, I think this is getting to become very convoluted. I would suggest you write a utility in java or other language in your skill set to extract the xml data into csv format for easier importing. That being said, I got something to work. It gets more complex the greater the hierarchy. Look at the price extraction logic.
call apoc.load.xml("file:///Sample2.xml")
yield value
with value._children as books
unwind books as book
with book, book._children as children
with book.id as id,
[z in [y in [x in children where x._type = 'prices'| x._children][0] | y._children] | {
currency: [i in z where i._type = 'currency' | i._text][0],
price: [i in z where i._type = 'netprice' | i._text][0],
vat: [i in z where i._type = 'add' | i._children][0][0]._text
}] as book_prices,
[x in children where x._type = 'title' | x._text][0] as title,
[x in children where x._type = 'author' | x._text][0] as author,
[x in children where x._type = 'genre' | x._text][0] as genre,
[x in children where x._type = 'publish_date' | x._text][0] as date,
[x in children where x._type = 'description' | x._text][0] as description
return {id: id, author: author, title: title, genre: genre, publish_date: date, description: description, prices: book_prices
} as books