Best Graph model for travel-price-table: [Provider - Geo - Plan - Age - Duration - Price]

marvin-hansen · December 20, 2019, 1:58pm

Hey, I am modeling various insurance products across different domains and so far modeling these as graphs were way easier than doing so as an ER model.

However, for the travel domain, I got over 550 different prices in a long Excel sheet that follow a certain standard format:

The goal is to find for a given travel of a given length, to a given destination region the plan with the lowest price.

Conventionally, this is just a look-up from a data table and then just sort the results by price.

In a graph, well, here we go.

Common sense, to me, would be to align the look-up graph with the question one would ask a customer, like:

Domestic / International
Region: Asia / US / Etc
Duration i.e. number of total days
Number of travelers i.e. alone, couple, group
Any extras?

And from there, traversing the graph and returning all the nodes, sorted by rate. From the >550 combinations, I may reduce it to say 5 - 10 at most.

And here is the big question:

Should I link each result node to the corresponding insurance plan or is it better just placing the plan id as property and just look it up on demand?

Due to the complex pricing structure, adding a relation to each matching plan, I get easily several dozens of relations to each plan but afaik, there is a real cost to nodes with a large number of relations.

So what's the best way of modeling the case w.r.t. to sustaining performance in an online system?

yyyguy · December 20, 2019, 6:49pm

Hey @marvin-hansen,

Due to the complex pricing structure, adding a relation to each matching plan, I get easily several dozens of relations to each plan but afaik, there is a real cost to nodes with a large number of relations.

The number of relationships you are referring to in this data set will not create a node density problem. Neo4j can handle what you are planning in this use case quite nicely. It shouldn't be a problem.

With that said, here is one take on a data model for this use case (see below). In short, I would create relationships from the plan node (Plan ID as a property) to the other nodes (i.e. geography, region, age group, travel duration)

And then, for a potential Travel node and the relationships to the other node (i.e. geography, region, age group, travel duration), the matching becomes straightforward.

There are other ways obviously, this is just my take on this use case.

Cheers,

yyyguy

marvin-hansen · December 20, 2019, 7:27pm

@yyyguy Essentially, you are saying the Travel node contains the price information as a property, right?

How do I query this thing to find all plans for a given journey to find the one with the cheapest rate?

Would that look something like:

MATCH (n:Travel)
WHERE n.REGION = X
AND n.DURATION = Z 
...
RETURN n ORDERED BY n.rate

yyyguy · December 20, 2019, 7:46pm

@marvin-hansen No, the rate (price) would be in the Plan node.

The Travel node (if you used it - could forego it if you are not retaining information about the potential trip) would be to hold the potential trip (Travel) for a customer.

For example, a potential trip.

Geography = Domestic
Region = Washington, DC USA
Duration = 10 days
Travellers = 2 Adults

Then you would query the database.
MATCH (p:Plan)-[:FOR_GEOGRAPHY]->(g:Geography {type: 'Domestic})
MATCH (p)-[:FOR_REGION]->(r:Region {region_name: 'USA'})
MATCH (p)-[:TRAVELLERS]->(a:AgeGroup {age_group_id: X})
MATCH (p)-[:TRAVEL_DURATION]->(d:Duration {duration_group_id: Y})
RETURN p.plan_name, p.plan_rate, ...

Does that makes sense?

-yyyguy

marvin-hansen · December 20, 2019, 8:46pm

Well, I have 9 plans in total, a few regions, ten age groups, and about 585(!) prices.

Duplicating each plan 65 times to match 585 prices makes no sense. Nobody does that.

That said, I think it makes sense to do a central unified graph structure for traversal, adding plan and price at the bottom with a relationship from each price to the matching plan.

That way, I still have 9 plans, albeit with 585 inbound relations. Assuming an even distribution, that equates to 65 relations per plan.

The initial concern about node/relation density was expressed for that very reason. I guess, I have to measure it to find out if there is an impact.

M

yyyguy · December 20, 2019, 8:48pm

Given a better understanding of your data set. I would stick with a spreadsheet. (rows and columns) make more sense for this use case.

Topic		Replies	Views
Graph modelling question: many relationships between two same nodes, or separate relationships and have more nodes? Modeling	1	390	March 9, 2021
Need some help related to implementation of Journey Planning application Modeling	0	720	July 15, 2019
Better graph model for GDS Modeling performance	3	445	December 22, 2020
Optimizing Hierarchical Transit Time Queries in Neo4j Modeling	0	48	June 13, 2024
Flight with engine usage cost Neo4j Graph Platform migrated	3	57	June 16, 2022

Best Graph model for travel-price-table: [Provider - Geo - Plan - Age - Duration - Price]

Related topics