Dear community members,
I have been working for a bunch of years with a rather large graph, which is now 14-ish million nodes and 300-ish millions relationships. So far it is implemented with an RDBMS, and I would like to turn it into a neo4j graph structure.
What I am wondering is whether an import mechanism exist, which would let me specify my relationship types and labels in separate CSV tables, each with an ID, and have them referred to in both the nodes and relationship CSV tables (through IDs)?
Let me give you an example of what my CSV dump files look like.
Id, Content, type, weight
4803, Content1, 4, 50
4766, content2, 4, 50
4396, content3, 1, 50
Node types (i.e. Neo4j labels):
id, content, extendedContent, info
1, n_type1, extendedType1 , "here is a description"
4, n_type2, extendedType2 , "here is another description"
id, source, destination, type, weight
1, 1, 7, 2, 50
2, 7, 1, 1, 50
id, content, extendedContent, info
1, r_type1, type1, "To model stuff"
2, r_type2, type2, "To model other stuff"
I have about 50 different node labels and over 150 relationship types, and growing.
Of course I could pre-process my files, so as to resolve the type references and create only 2 files, for nodes and relationships, where the types (and labels) would be the plain ones. But as you may have noticed, an interesting reason for having separate tables for the types and node labels is to have properties associated with each of them, such as a textual description. And of course I would very much like to keep these attached properties.
Do I have a way to import my data structure with no data loss? As far as I understand from the documentation and tutorials there isn't any, but since I am a newbie I wanted to make sure of it before I go any further.
I guess another way of wording the same question would be to ask whether it is possible to attach properties to node labels and relationship types?
Also, the import procedure will have to be automated.
I am working with Neo4J Enterprise 4.1.3 on linux, experimenting locally for the time being.
I recommend you preprocess the files and use the neo4j-admin import. If you want to keep them separate you should investigate the APOC library it has a number of helper functions that allow to dynamically set the label.
You can't "attach properties to node labels and relationship types" but you can attach properties to Nodes with specific labels or Relationship with specific relationship types.
Thanks. I will check APOC, then.
I'm trying to avoid redundancy, basically. Attaching properties to nodes with specific labels would introduce too much redundant information. Same thing for the relationships.
Given the size of my graph I'm afraid that such redundancy would waste too much room.
I can understand that. It wasn't clear what the cardinality between a label and its properties was. For nodes you can use "category nodes" and store the properties there, link all nodes with the same label to them. I don't really like an "intervening node" for relationships though, so not a consistent solution. I suspect (but haven't run the tests) that redundant data is going to perform better than any solution that factors it away.
thank you for your replies. And sorry for the delayed reaction. I'm multi-tasking and can't always make this project my top priority. Unfortunately.
In the end I preprocessed my data files and imported everything with neo4j-admin import. It worked out well.
As for the structure itself, I decided to go for the "category nodes" that you mentionned earlier. I know it is not the best option, but since they are meant for storing meta-data, such as the semantics of relationship types for example, I suspect that alternatively it would be troublesome to enforce consistency if every single instance of a given relationship type is required to specify all these common key-value pairs. Unless there would be a way to put constraints on specific key-value pairs, but I don't think it is possible, is it? I know about node/relationship property constraints, but I don't see a way to force a specific value for a given property. And even that, would make the creation process quite tedious anyway.
So I will stick with the category nodes, experiment with them for a while and see how it goes. The main problem, as you mentionned, is that those that describe relationship types will have to stay disconnected from the rest of the graph. I can still consult them when needed, though. I'll see if that's enough.
Thanks again for your help.