I have a monopartite graph projection (one node type, one undirected relationship type) filtered to include only one connected component (the largest) so all nodes are reachable from all others. The largest hop distance is 12.
I have set up in python to embed with parameters, compute distance matrix, and compute pearson r correlation between the hop distance, embedding distance for all node pairs, and plot a heatmap of the 2d histogram of embedding distance vs hop distance.
My objective is to embed the nodes such that Euclidean distances in the embedding are maximally correlated to hop distances in the graph. I'm looking for suggestions to improve the results. Any tips appreciated. Thanks in advance!
The projection is made as follows:
MATCH (source)
WHERE source:Organization AND source.componentId IN [0] //[8,13,5,6,7,9,10,11,12,4,3,2,1,0]
OPTIONAL MATCH (source)-[r:R]-(target)
WHERE target:Organization AND target.componentId IN [0] //[8,13,5,6,7,9,10,11,12,4,3,2,1,0]
WITH gds.graph.project(
'testComps',
source,
target,
{
sourceNodeProperties: source { componentId: source.componentId },
targetNodeProperties: target { componentId: target.componentId },
sourceNodeLabels: labels(source),
targetNodeLabels: labels(target),
relationshipType: type(r)
},
{undirectedRelationshipTypes: ['R']}
) AS g
RETURN g.graphName AS graph, g.nodeCount AS nodes, g.relationshipCount AS rels;
and an example of the embedding parameters is
fastRP_params = {
'embeddingDimension': 1024,
'iterationWeights': [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0],
'propertyRatio': 0.0,
'nodeSelfInfluence': 1.0
}
cypher = f"""CALL gds.fastRP.stream(
'testComps',
{{
embeddingDimension: {fastRP_params['embeddingDimension']},
iterationWeights: {fastRP_params['iterationWeights']},
randomSeed: 42,
propertyRatio: {fastRP_params['propertyRatio']},
nodeSelfInfluence: {fastRP_params['nodeSelfInfluence']}
}}
);
"""
which results in