Best Graph model for travel-price-table: [Provider - Geo - Plan - Age - Duration - Price]

Hey, I am modeling various insurance products across different domains and so far modeling these as graphs were way easier than doing so as an ER model.

However, for the travel domain, I got over 550 different prices in a long Excel sheet that follow a certain standard format:

The goal is to find for a given travel of a given length, to a given destination region the plan with the lowest price.

Conventionally, this is just a look-up from a data table and then just sort the results by price.

In a graph, well, here we go.

Common sense, to me, would be to align the look-up graph with the question one would ask a customer, like:

  1. Domestic / International
  2. Region: Asia / US / Etc
  3. Duration i.e. number of total days
  4. Number of travelers i.e. alone, couple, group
  5. Any extras?

And from there, traversing the graph and returning all the nodes, sorted by rate. From the >550 combinations, I may reduce it to say 5 - 10 at most.

And here is the big question:

Should I link each result node to the corresponding insurance plan or is it better just placing the plan id as property and just look it up on demand?

Due to the complex pricing structure, adding a relation to each matching plan, I get easily several dozens of relations to each plan but afaik, there is a real cost to nodes with a large number of relations.

So what's the best way of modeling the case w.r.t. to sustaining performance in an online system?

Hey @marvin-hansen,

Due to the complex pricing structure, adding a relation to each matching plan, I get easily several dozens of relations to each plan but afaik, there is a real cost to nodes with a large number of relations.

The number of relationships you are referring to in this data set will not create a node density problem. Neo4j can handle what you are planning in this use case quite nicely. It shouldn't be a problem.

With that said, here is one take on a data model for this use case (see below). In short, I would create relationships from the plan node (Plan ID as a property) to the other nodes (i.e. geography, region, age group, travel duration)

And then, for a potential Travel node and the relationships to the other node (i.e. geography, region, age group, travel duration), the matching becomes straightforward.

There are other ways obviously, this is just my take on this use case.

Cheers,

yyyguy

@yyyguy Essentially, you are saying the Travel node contains the price information as a property, right?

How do I query this thing to find all plans for a given journey to find the one with the cheapest rate?

Would that look something like:

MATCH (n:Travel)
WHERE n.REGION = X
AND n.DURATION = Z 
...
RETURN n ORDERED BY n.rate

@marvin-hansen No, the rate (price) would be in the Plan node.

The Travel node (if you used it - could forego it if you are not retaining information about the potential trip) would be to hold the potential trip (Travel) for a customer.

For example, a potential trip.

Geography = Domestic
Region = Washington, DC USA
Duration = 10 days
Travellers = 2 Adults

Then you would query the database.
MATCH (p:Plan)-[:FOR_GEOGRAPHY]->(g:Geography {type: 'Domestic})
MATCH (p)-[:FOR_REGION]->(r:Region {region_name: 'USA'})
MATCH (p)-[:TRAVELLERS]->(a:AgeGroup {age_group_id: X})
MATCH (p)-[:TRAVEL_DURATION]->(d:Duration {duration_group_id: Y})
RETURN p.plan_name, p.plan_rate, ...

Does that makes sense?

-yyyguy

Well, I have 9 plans in total, a few regions, ten age groups, and about 585(!) prices.

Duplicating each plan 65 times to match 585 prices makes no sense. Nobody does that.

That said, I think it makes sense to do a central unified graph structure for traversal, adding plan and price at the bottom with a relationship from each price to the matching plan.

That way, I still have 9 plans, albeit with 585 inbound relations. Assuming an even distribution, that equates to 65 relations per plan.

The initial concern about node/relation density was expressed for that very reason. I guess, I have to measure it to find out if there is an impact.

M

Given a better understanding of your data set. I would stick with a spreadsheet. (rows and columns) make more sense for this use case.