Link Prediction Pipeline - help on 'Add Features' step

rosinialexander · November 26, 2024, 2:26pm

I am experimenting with creating a Link Prediction Pipeline for WebNovels recommendation. I have created a graph with the following:

Nodes:

Author
Properties: AuthorName
Novel:
Properties: BookName, MainGenre, SubGenre
Session
Properties: SessionID

Edges

WROTE [ Author -> Novel ]
INTERACTS_WITH [ Session -> Novel]
Properties:

image940×395 91.6 KB

At the 'Add Features' step I want to do the following:
CALL gds.beta.pipeline.linkPrediction.addFeature('webnovel', 'hadamard', {
nodeProperties: ['embedding', 'MainGenre','SubGenre'],
edgeProperties: ['TimeStr', 'ReadDuration']
})
YIELD featureSteps

Is this possible or should is ChatGPT messing with me because I cannot find any documentation in the Neo4j site.

If you have any other alternative or advice on how to proceed, I would be most grateful.

brian.shi1 · December 5, 2024, 3:31pm

Hi @rosinialexander ,

Adding link feature is possible and needed for the link prediction pipeline.
The way the whole pipeline works is that first you'll need to have properties on nodes (embeddings), that are either already on the projected graph, or generated as a node property step (https://neo4j.com/docs/graph-data-science/current/machine-learning/linkprediction-pipelines/config/#linkprediction-adding-node-properties).

After that you specify features of the links, which as some combination of the two end nodes. What node properties to use, and the algebraic operation used to combine them, is the addFeature step (Configuring the pipeline - Neo4j Graph Data Science).

The parameters for the addFeature step is in the documentation link above.

Thanks,
Brian

rosinialexander · December 6, 2024, 6:26am

Dear @brian.shi1

I was asking as more of my graph's features are coming from the INTERACTS_WITH Edges:

My node properties are as follows:

You were right in the other post, I forgot to add my Novel node properties in the projected graph.

In the INTERACTS_WITH edge, shows how each SessionID interacted with a novel. There were multiple interactions over months - And I thought I could create the pipeline focusing on these features.

However, as per documentation, Neo4j Link Pipeline prediction takes in NODE PROPERTIES ONLY . Edge properties are left out

at least that is what I understand from the link you've shared.

brian.shi1 · December 6, 2024, 10:17am

I see, yes then this is unfortunately a current limitation. The link features are some featureType combination of node properties.

Depending on your graph schema, it might be possible to encode the current edge properties as node properties, on some nodes, which could be a workaround.

rosinialexander · December 6, 2024, 11:07am

@brian.shi1

This is my graph schema: any helpful tips will be much appreciated:

I felt that the TimeStr of when the interaction occurred was important - also shows how many times a SessionId has interacted with a Novel...

I had a thought to combine the INTERACTS_WITH edges by creating a weight on no. of times of proper interactions (i.e. Click=1, RealRead=1, Into=1, Read=1,Collect=1,Expose=1, sum(ReadDuration) -- but I unfortunately lost the other interactions as a result of that attempt

I lost the interactions that could help me identify Bot users, where (Click=1, Read/RealRead=0, Intro/Collect/Expose=0, ReadDuration > 1 day) as well as the binge-readers....

Sorry for the info dump -- this project is stressing me out

brian.shi1 · December 6, 2024, 3:44pm

I agree with this schema, to maintain all information and fit that into the LP pipeline is not easy. The workaround as you suggest here partly works at the expense of losing some info (due to aggregated information).

I think it requires thinking more on various ways of representing this graph. Possibly involving introducing new node types or new relationships types etc. This will depend on the end goal & deep domain knowledge for your graph.

Another possibility is once you have some suitable schema, you can run embedding algorithms (node2vec, fastRP, graphSAGE etc.) and then export the graph through GDS python client (The graph object - Neo4j Graph Data Science Client), and afterwards write some little python model of your own, if that's possible. It does mean the benefits of the automatic splits and evaluations provided by the GDS pipelines are gone.

Hope the project is successful in the end!

Topic		Replies	Views
Link Prediction Pipeline - Node properties do not exist Graph Algorithms/Graph Data Science	2	377	May 15, 2024
Link Prediction Pipeline Experiment[Help please] - Node Embeddings Comparision Graph Algorithms/Graph Data Science	2	76	December 6, 2024
FastRP Settings in link prediction pipelines Graph Algorithms/Graph Data Science	1	326	November 17, 2023
Node Regression for nodes with properties and no relationships Graph Algorithms/Graph Data Science	0	41	March 17, 2025
Issue incorporating centrality features in gds node classification model Graph Algorithms/Graph Data Science operations	2	380	July 17, 2023

July Summer Fun!

Link Prediction Pipeline - help on 'Add Features' step

Related topics