cancel
Showing results for 
Search instead for 
Did you mean: 

Link Prediction Pipeline for Protein Target Binding

Kevin6482
Node Clone

I'm trying to construct a pipeline for link prediction to find novel links between the entity nodes. My objective is to identify the future links between protein and target given positive and negative links. I referred to the co-author link prediction tutorial, in that they considered all pair of nodes that don’t have a relationship as negative classes. But in my case, I have two csv files, one with the positive classes (i.e, proteins binding to a target) and other with the negative classes (i.e., proteins not binding to a target). I created the network as: (P:protein_id)-[:POSITIVE]-->(T:target_id) and (P:protein_id)-[:NEGATIVE]-->(T:target_id). Is that approach correct for the link prediction?

I also want to include tested_species, scale, unit and value (will use this as a weight property), all of these being string values except the target_measure_value, would that improve the prediction if I add them as properties to the target_id node or should I add them as separate nodes. Can someone guide how to proceed with this, thanks in advance.

protein_id target_id tested_species target_measure_scale target_measure_units target_measure_val target
A0JP26 mus musculus homo sapiens ic50 ug/ml 0.01 POSITIVE
A1L190 hiv inhibition trypanosoma cruzi mc50 um 10 POSITIVE

protein_id target_id tested_species target_measure_scale target_measure_units target_measure_val target
A2RUB6 venom activity homo sapiens ic50 um 1000 NEGATIVE
A4D1B5 signalling activity rattus norvegicus mc50 ug/ml 250 NEGATIVE

0 REPLIES 0