At the 'Add Features' step I want to do the following:
CALL gds.beta.pipeline.linkPrediction.addFeature('webnovel', 'hadamard', {
nodeProperties: ['embedding', 'MainGenre','SubGenre'],
edgeProperties: ['TimeStr', 'ReadDuration']
})
YIELD featureSteps
Is this possible or should is ChatGPT messing with me because I cannot find any documentation in the Neo4j site.
If you have any other alternative or advice on how to proceed, I would be most grateful.
After that you specify features of the links, which as some combination of the two end nodes. What node properties to use, and the algebraic operation used to combine them, is the addFeature step (Configuring the pipeline - Neo4j Graph Data Science).
The parameters for the addFeature step is in the documentation link above.
You were right in the other post, I forgot to add my Novel node properties in the projected graph.
In the INTERACTS_WITH edge, shows how each SessionID interacted with a novel. There were multiple interactions over months - And I thought I could create the pipeline focusing on these features.
However, as per documentation, Neo4j Link Pipeline prediction takes in NODE PROPERTIES ONLY . Edge properties are left out
at least that is what I understand from the link you've shared.
I see, yes then this is unfortunately a current limitation. The link features are some featureType combination of node properties.
Depending on your graph schema, it might be possible to encode the current edge properties as node properties, on some nodes, which could be a workaround.
I felt that the TimeStr of when the interaction occurred was important - also shows how many times a SessionId has interacted with a Novel...
I had a thought to combine the INTERACTS_WITH edges by creating a weight on no. of times of proper interactions (i.e. Click=1, RealRead=1, Into=1, Read=1,Collect=1,Expose=1, sum(ReadDuration) -- but I unfortunately lost the other interactions as a result of that attempt
I lost the interactions that could help me identify Bot users, where (Click=1, Read/RealRead=0, Intro/Collect/Expose=0, ReadDuration > 1 day) as well as the binge-readers....
Sorry for the info dump -- this project is stressing me out
I agree with this schema, to maintain all information and fit that into the LP pipeline is not easy. The workaround as you suggest here partly works at the expense of losing some info (due to aggregated information).
I think it requires thinking more on various ways of representing this graph. Possibly involving introducing new node types or new relationships types etc. This will depend on the end goal & deep domain knowledge for your graph.
Another possibility is once you have some suitable schema, you can run embedding algorithms (node2vec, fastRP, graphSAGE etc.) and then export the graph through GDS python client (The graph object - Neo4j Graph Data Science Client), and afterwards write some little python model of your own, if that's possible. It does mean the benefits of the automatic splits and evaluations provided by the GDS pipelines are gone.