Help with ingesting an html table using apoc.load.html

I would like to dynamically create some product information from scraping a version table.

I'd like to create (:Esxiversion) with .version .name .releasedate .buildnumber .installerbuild from
https://kb.vmware.com/s/article/2143832

I'm not exactly sure how to use the tags to consume the table data within the HTML. any hints to get started would be helpful.


With "https://kb.vmware.com/s/article/2143832" as url
call apoc.load.html(url) yield value
return value

Hi, @pdrangeid

I believe that you cant fetch information from that site. As it uses Content Security Policy (CSP) that protects this site content.

If you query

WITH "https://kb.vmware.com/s/article/2143832/" as url
CALL apoc.load.html(url,{target: 'meta'}) YIELD value
RETURN value

you see in response that

    "http-equiv": "Content-Security-Policy"

.

Consulted with andrea.larus on neo4j-ninjas slack channel - and he explained that the real issue seems to be that page is generated by javascript runtime.

if you put the following link in Chrome URL bar view-source:https://kb.vmware.com/s/article/2143832
then there is no table tag at all.

Created issue post in APOC repo: apoc.load.html ability to read runtime structure of the page · Issue #1372 · neo4j-contrib/neo4j-apoc-procedures · GitHub

1 Like

Paul,

Thanks for the follow-up. I was looking at the page content with a colleague and had determined that it was dynamically delivered content, but wasn't sure how that affected the ability to collect it with apoc.

Thanks for submitting the issue!