I just posted my first blog in a series Exploring the U.S. National Bridge Inventory with Neo4j over on Medium. This first one is just a very brief intro to the why I wanted to do this. Each successive post will be about the next step in importing the data, building the model, running analysis, and making a GRANDstack application. I gave a lightning talk at NODES 2019 on this project. I have been making adjustments since then and will elaborate more of my process with each post.
Just posted Part 2 here.
Much more to come.
Out of curiosity, why store all those file/row details at all, when you could go straight to the representation you really care about?
I'm thinking I'll show you what I mean after you post Part 3.
That's a fair question. I made the decision to store the raw data like this in order to do some data comparisons and to make the importation easier.
With regards to the data comparisons, when I first started working with this data I was looking at both the delimited and non-delimiter files. What I found initially (though not admittedly a thorough exploration) was that the files didn't match as one would expect. I wanted to do a deeper dive to understand what the differences were. I figured if the data looked different between 2 files that should technically be the same, then it is possible to the analysis could be different depending on which file you chose to use. I only have a basic query for that comparison. I may cover that in a later post.
As for the storing the raw data, this is something I picked up in my current job. My exploration of the data has only gone so far. That is one reason I wanted to do the series, have people take the journey with me. Since I do not know the final schema I need to be able to quickly build. And storing the raw data is just one solution for that.
Part 3 is up if you are interested on Medium