I'm trying to construct a pipeline for link prediction to find novel links between the entity nodes. My objective is to identify the future links between protein and target given positive and negative links. I referred to the co-author link prediction tutorial, in that they considered all pair of nodes that don’t have a relationship as negative classes. But in my case, I have two csv files, one with the positive classes (i.e, proteins binding to a target) and other with the negative classes (i.e., proteins not binding to a target). I created the network as: (P:protein_id)-[:POSITIVE]-->(T:target_id) and (P:protein_id)-[:NEGATIVE]-->(T:target_id). Is that approach correct for the link prediction?
I also want to include tested_species, scale, unit and value (will use this as a weight property), all of these being string values except the target_measure_value, would that improve the prediction if I add them as properties to the target_id node or should I add them as separate nodes. Can someone guide how to proceed with this, thanks in advance.
protein_id
target_id
tested_species
target_measure_scale
target_measure_units
target_measure_val
target
A0JP26
mus musculus
homo sapiens
ic50
ug/ml
0.01
POSITIVE
A1L190
hiv inhibition
trypanosoma cruzi
mc50
um
10
POSITIVE
protein_id
target_id
tested_species
target_measure_scale
target_measure_units
target_measure_val
target
A2RUB6
venom activity
homo sapiens
ic50
um
1000
NEGATIVE
A4D1B5
signalling activity
rattus norvegicus
mc50
ug/ml
250
NEGATIVE