TLDR: single and multi-select choice fields: for a generic ETL tool should I model them as nodes connected with relationships or direct labels on the parent/starting node. I realize the optimal modeling decision depends on the specific schema/questions being asked, but I am trying to agree upon a generic rule that is mostly right in most cases because my ETL tool won't have the necessary context about the data model it's transforming.
Detail:
I am thinking through an ETL tool/workflow that I am building that pulls from an API source to create graphs on the fly. The api generally returns data structured as individual entries that belong to "lists" such as People, Company, City, etc. with individual attributes/properties also present on the entries.
I know that I want to create node labels for each 'List', BUT the thing is that multiple list types have single and multi-select choice fields on an individual entry (such as company type for the company list. Company type options might include institution, bank, school, investor, real estate company, etc. and a company can have one or many types ).
I ideally would want a generic rule that can handle all choice fields across all lists when writing to the neo4j graph. (Another choice field is "Tier" for example, with Tier 1, Tier 2, Tier 3 and Tier 4 as potential values)
Would it be best practice to
- create a node label for each choice field category (i.e. a node label called "CompanyType" with X instances of that node one for each company type with a "name" property equal to the company type), and then link the Company node to its one or many CompanyType nodes via a "company_type" edge
or
- create a node label for each choice field option and apply that label to the Company node in question (So a Company node might then have 5-6 labels such as Company:Tier1:Tier2:Investor:Bank)