I am relatively new to Neo4j and graph databases. Maybe someone could help and steer me in the right direction.
In general, we need a multi-label classification of nodes according to certain criteria/rules for creating a normalized reasoning mechanism between node classes. Between classified nodes there will be edges with weights.
Example:
Node A has class/label A’ and A’’, node B has only class B’.
In our knowledge domain
A’ -> links to, weight =0,8 -> B’
A’’ -> links to, weight =0,4 -> B’
Our issues/questions
How can we normalize path lengths or manual weights for our reasoning approach? Normalize = [0;1]
Maybe we could also employ a kind of cluster similarity between classes. But for each class we would need to compute the similarity between all classes. We guess that’s slow!
Maybe someone who had to do a similar thing can give me some thoughts or point me to some resources. That would be very much appreciated.
You can have multiple labels on a node. For your case, maybe you have ‘A’ and ‘B’ labels (and an other classification). You can then mark any of them with a ‘Prime’ or ‘DoublePrime’ labels. Your match for an A’ node would be match(n:A:Prime).
As it seems there is a small problem with the rendering on windows machines, so i just wanted to clarify that if there are appearing some weird symbols in the first two bullet points that they should be an arrow pointing to the right. ( - > )
At the moment we were thinking about having a node for each class and connecting the classified nodes via a relationship or multiple ones if they get classified as more than one. This way we could give the relationships a weight property. Currently we are trying to figure out how we can best classify nodes and how to do the weighting.
The weighting is also what we want to normalize, because if we want to compare different taxonomies we need to normalize the weights. Because we want to say the longer the path length the more specific something is. But what if in one taxonomy the max path length is 3 and in another one it is 5, it would be hard to compare them.
I think you will have to perform the normalization at the time when you want to compare multiple taxonomies, or extract metrics from a taxonomy. If not, every time you modify a taxonomy you will have to renormalize every weight if the max length of the taxonomy changed. This would not be hard using a custom procedure.
So basically the general graph regarding the task is the one with the round nodes. So what it should do based on the classification is some kind of sophisticated recommendation of products for the user based on personal characteristics. So for example we classify the user as overweight, which would be the general type, based on the weight and height. And on the other side we have our products which have to get classified as well, primarily based on their description and keywords, so that products regarding overweight get classified as the same general class. Then we bring these two general classes together. So we can recommend those products. The weighting should determine the order in which the products get recommended. A person can be associated with more than on general class and general product classes are of course associated with many products. So we need some kind of order besides one based on the time factor (whats the newest thing we know about the person).