Error: Neo.ClientError.Procedure.ProcedureCallFailed LPA community detection relationship issue

Hi,
I'm trying to project a graph in order to perform a Label Propagation LPA community detection algorithm and am coming up with this error:

Failed to invoke procedure `gds.graph.project.cypher`: Caused by: java.lang.IllegalArgumentException: Failed to load a relationship because its source-node with id 7 is not part of the node query or projection. To ignore the relationship, set the configuration parameter `validateRelationships` to false.

I'm trying to update an older GA syntax but running into difficulties. Here is the original code:

CALL algo.labelPropagation.stream(
'MATCH (p:Publication) RETURN id(p) as id',
'MATCH (p1:Publication)-[r1:HAS_WORD]->(w)<-[r2:HAS_WORD]-(p2:Publication) 
WHERE r1.occurrence > 5 AND r2.occurrence > 5
RETURN id(p1) as source, id(p2) as target, count(w) as weight',
{graph:'cypher',write:false, weightProperty : "weight"}) yield nodeId, label

with label, collect(algo.asNode(nodeId)) as nodes where size(nodes) > 2
MERGE (c:PublicationLPACommunity {id : label})
FOREACH (n in nodes |
   MERGE (n)-[:IN_LPA_COMMUNITY]->(c)
)
return label, nodes

And the projection code I'm trying looks like this:

CALL gds.graph.project.cypher(
  'publicationsAndDocuments',
  'MATCH (n) WHERE n:Publication OR n:Document RETURN id(n) AS id, labels(n) AS labels', 
  'MATCH (n1)-[r1:HAS_WORD]->(w)<-[r2:HAS_WORD]-(n2) RETURN id(n1) AS source, id(n2) AS target', {validateRelationships:TRUE})
YIELD
  graphName AS graph, nodeQuery, nodeCount AS nodes, relationshipCount AS rels

It works if I set the validateRelationships to FALSE - but I need these to perform the community detection. I'm not sure if this is correct - or close to correct at this stage. As you can see, I've not yet managed to fit the 'weighting' of or the word occurrences part.
Would be grateful for any help on this matter

Hello @stephflint :blush:

The error message means that you are trying to add a relation but at least one of its nodes has not been projected. That's why you must set the validateRelationships parameter to false if you are fine with this or modify the node/relation projections.

Can you share your data model?

CALL db.schema.visualization()

What sub-graph are your trying to project?

Regards,
Cobra

Hi @Cobra , sure here is the data model:

The sub-graph - I'm not entirely sure(!?). What I'm trying to do is create a graph projection of Publication and Document nodes that contain the same keyword, then perform community detection with LPA (Label Propagation) taking the weights (number of occurrences) into account (>5). So in the end I would have the relationship IN LPA COMMUNITY between either a Publication or Document and the community: i.e:

(n) -> [:IN_LPA_COMMUNITY ]-> (c)

My issue is that I don't understand how to create this projecting like in the GA version in GDS - I haven't yet been able to figure it out from the documentation / videos / tutorials

To facilitate your work you should first create a new type of relationships:

CALL apoc.periodic.iterate("
    MATCH (d:Document)-[:HAS_WORD]->(w)<-[:HAS_WORD]-(p:Publication) 
    RETURN d, p, count(w) AS total
    ", "
    MERGE (d)-[r:COOCCURRED]->(p) 
    SET r.total = total
    ", {batchSize: 10000, parallel: true}
)

Then, create the graph projection:

CALL gds.graph.project(
    'publicationsAndDocuments', 
    ['Publication', 'Document'], 
    {COOCCURRED: {orientation: 'UNDIRECTED'}}    
)
YIELD graphName AS graph, nodeProjection, nodeCount AS nodes, relationshipCount AS rels

Then, you can execute Label Propagation:

CALL gds.labelPropagation.write(
    'publicationsAndDocuments', 
    {writeProperty: 'community'}
)
YIELD communityCount, ranIterations, didConverge

Then, you create a UNIQUE CONSTRAINT for Community nodes on id property:

CREATE CONSTRAINT constraint_Community_id IF NOT EXISTS FOR (n:Community) REQUIRE n.id IS UNIQUE;

Finally, you create Community node:

CALL apoc.periodic.iterate("
    MATCH (n) 
    WHERE n:Document OR n:Publication 
    RETURN n
    ", "
    MERGE (c:Community {id: n.community}) 
    MERGE (n)-[:IN_LPA_COMMUNITY]->(c)
    ", {batchSize: 10000, parallel: true}
)

It should work and perform better. In case of trouble with parallel configuration, you can set it to false.

@Cobra cool. I can see how this would help with identifying the connections - one thought, will it affect the LPA (or other algorithms) if there are already new connections ? I'm essentially first trying to replicate what worked on the old GA version in the new GDS version. So this adds a bit more complexity for understanding how to make the LPA stream call to visualise communities - or did it make it simpler ? I'm new to this - it's a learning curve at the moment...

I updated my previous answer with all the queries you need. GDS algorithms only use what is in the Cypher projection so my solution should be simpler and better:)

@Cobra amazing - thank you. The last part, however fails to commit any operations or complete batches

I forget a }, I updated the answer :slightly_smiling_face:

CALL apoc.periodic.iterate("
    MATCH (n) 
    WHERE n:Document OR n:Publication 
    RETURN n
    ", "
    MERGE (c:Community {id: n.community}) 
    MERGE (n)-[:IN_LPA_COMMUNITY]->(c)
    ", {batchSize: 10000, parallel: true}
)