How to get topN for each node in link prediction algorithm in the GDS
Hi ! I am using the link prediction algorithm from the gds library to predict links in a network. I generate the fastRP embeddings using 10 properties of nodes. Its takes 5-6 minutes to train the model on 27k nodes and 4million relations(having only 2 types of relations and 1 type of nodes). I have 2 questions.
- When I predict the links after training, can I get topN for every node.(Let's say I want to get top 10 possible links for every node even if its 0.00001 probability)
- Is there any better way of writing query for prediction part so that it takes lesser time. I am using the following code to predict the links after training.
WITH "CALL gds.alpha.ml.linkPrediction.predict.stream('Mygraph', {relationshipTypes: ['connected'],modelName: 'linkpredict_with_embedding',topN: 1800, threshold: 0.00001}) YIELD node1, node2, probability MATCH (n), (m) WHERE id(n) = node1 AND id(m) = node2 RETURN n.nodeid AS node1, m.nodeid AS node2, probability;" AS query
CALL apoc.export.csv.query(query, "predcited_links.csv", {}) YIELD file RETURN file;