Why machine learning on graph?

Hello, I would be so grateful if someone can kindly clarify this for me. For a while, i read about knowledge graph and graph embedding; I learnt that firstly people build knowledge graph/property graph to power Machine learning followed by representing them in low-dimensional space using graph embedding. Yes, graph provides great analytics but i read that it is computationally expensive, as a result it has to be converted in low dimension. Beside context, what is the benefit of network based machine learning as compared to traditional machine learning?

Thank you so much for your time, perhaps my question is wrong but this can really help me a lot.

Hi chim3yy,

first of all, you need to separate machine learning into 3 main areas:

  • supervised learning
  • unsupervised learning
  • reinforcement learning

Algorithms on the 1st and 2nd types rely, to my knowledge, in tabular data as an input and then process it in several different ways, using neural networks, search trees, support vector machines, etc.
I see definitely a possibility of developing neural networks or search trees using a graph database. Regarding performance, i really can't tell you if it would run faster than the current algorithms not using graph databases. My guess is they wouldn't, because most of them probably stay in RAM when executing. With a graph db you probably will have a lot of I/O operations, slowing down the process.

The 3rd one, that's a whole different world. Here you have something called a Markov Decision Process (MDP), where you need to find policies that make an agent achieve a goal in an environment based on rewards. I'm currently working on my Msc Thesis that is precisely this type of implementation. Since i don't have my system already running and also don't know performances of similar MDPs done in other platforms i can't tell you if a graph database will be the best option. My gut feeling tells me it is.

José Salvador

These are truly exciting days to be a data scientist. In large part because the industry is innovating so quickly. There is no question that graph has immense potential in the AI/ML world. Where it all goes, we shall see. But's, let's face it, it's not like "traditional" (Python or R libraries) has it all figured out yet, either. Look at PyTorch. It's just now getting fully warmed up. The entire industry is in massive development. Everything is in flux and the data science world we see 5-10 years down the road will be unrecognizable to us today.

My opinion is that graph data science has a significant role to play. Whether that is primarily stand-alone or more in conjunction with traditional (adding graph features to present models), there's no way to tell as of yet. But I will say one thing: Neo4j has proven themselves at the highest levels of tech (and business). And they are excited enough about graph data science to have just recently developed a new library from scratch and ramp up their innovation in the field even further. Plus, they seem to be making major strides in neural networks. All that makes me tremendously excited. This is a story that has just begun to be told.