Call apoc.meta.graph() expected runtime

JaHo · March 2, 2021, 10:32am

Can anyone tell me how long I should roughly expect call apoc.meta.graph() to run on a graph with 500 million nodes and 1.5 billion relationships?

I ran into the bug with call db.schema.visualize() and apoc.meta.graph() gave the correct answer before but is taking a while this time.

Thanks!

markhneedham · March 4, 2021, 12:07pm

It's probably gonna take ages to return. db.schema.visualize is using pre-computed data, whereas apoc.meta.graph is computing it all from scratch. Maybe you can take a look at apoc.meta.graphSample instead?

JaHo · March 4, 2021, 12:35pm

I tried apoc.meta.graphSample but similarly to db.schema.visualize (and as stated in the documentation) it returned extra relationships.
I also played around with apoc.meta.subGraph a bit which I got to yield a satisfactory result in the end. I'm still a bit confused though where the computational cost is coming from; for many subsets of nodes and relationships the result was instant while including some labels with fairly small sets of nodes/relationships resulted in long runtimes that I stopped after a while.

markhneedham · March 4, 2021, 12:38pm

I don't know this code off by heart, but this is the function that it's calling:

github.com

neo4j-contrib/neo4j-apoc-procedures/blob/4.2/core/src/main/java/apoc/meta/Meta.java#L993


      
                      Pattern pattern = (Pattern) o;
                      return from.equals(pattern.from) && type.equals(pattern.type) && to.equals(pattern.to);
                  }
                  return false;
              }
          
          
    @Override
              public int hashCode() {
                  return 31 * (31 * from.hashCode() + type.hashCode()) + to.hashCode();
              }
          
          
    public Label labelTo() {
                  return Label.label(to);
              }
              public Label labelFrom() {
                  return Label.label(from);
              }
              public RelationshipType relationshipType() {
                  return RelationshipType.withName(type);
              }
          }

that then calls the metaGraph function:

github.com

neo4j-contrib/neo4j-apoc-procedures/blob/f3a42f8b6e344a0ca1a2b7d497be7c02853b4ca9/core/src/main/java/apoc/meta/Meta.java#L888


      
                  return RelationshipType.withName(type);
              }
          }
          @Procedure
          @Description("apoc.meta.graph - examines the full graph to create the meta-graph")
          public Stream<GraphResult> graph(@Name(value = "config",defaultValue = "{}") Map<String,Object> config) {
              MetaConfig metaConfig = new MetaConfig(config);
              return metaGraph(null, null, true, metaConfig);
          }
          
          
private Stream<GraphResult> metaGraph(Collection<String> labelNames, Collection<String> relTypeNames, boolean removeMissing, MetaConfig metaConfig) {
              Read read = kernelTx.dataRead();
              TokenRead tokenRead = kernelTx.tokenRead();
          
          
    Map<String, Integer> labels = labelsInUse(tokenRead, labelNames);
              Map<String, Integer> relTypes = relTypesInUse(tokenRead, relTypeNames);
          
          
    Map<String, Node> vNodes = new TreeMap<>();
              Map<Pattern, Relationship> vRels = new HashMap<>(relTypes.size() * 2);
          
          
    labels.forEach((labelName, id) -> {

And actually it doesn't look like it computes everything from scratch like I thought it did. It's kinda hard to say why it would be working better for some labels than others.

JaHo · March 4, 2021, 12:58pm

Thanks for the pointer, I don't really know any Java, though.
It actually only started being slow after I recently added some new labels that about doubled the number of existing nodes. With the already pretty large number of nodes before that it worked instantly and returned the correct result.

markhneedham · March 4, 2021, 2:07pm

And on that graph you said apoc.meta.graphSample returns quickly but has extra relationships?

The only difference between apoc.meta.graphSample and apoc.meta.graph is a post processing step where missing relationships are removed (or not) so that's where the time must be spent.

Reading the code of that function I can see that it's doing a scan of all the nodes with each label and then checking all of the relationships for 1 in 1000 of those nodes, which would be time-consuming. You can configure the sampling rate via the sample key e.g. sample: 10000 would make it sample every 10,000 nodes instead of every 1,000 nodes.

JaHo · March 10, 2021, 10:14am

Sorry for the delay.
Yes, for the full graph, apoc.meta.graphSample runs quickly but has extra relationships. I tried running it with different sample sizes but I must have been doing it wrong as there was no difference in both runtime and result. Is call apoc.met.graphSample({sample: 1000}) the correct syntax?

markhneedham · March 10, 2021, 11:11am

Yup, just gotta fix the typo on here:

call apoc.meta.graphSample({sample: 1000})

JaHo · March 10, 2021, 2:32pm

Ah my bad. Still, even if I call it with sample: 1 (which I guess would mean it checks every node), it returns instantly and contains additional relationships.

markhneedham · March 10, 2021, 2:58pm

Can you try:

call apoc.meta.graph({sample: 1000})

JaHo · March 10, 2021, 3:22pm

That seems to run slowly irrespective of what I set sample to. I haven't let it run longer than a minute or so, though.

markhneedham · March 10, 2021, 4:43pm

I'm playing around with it on a dummy graph with 40m nodes/relationships and I can see different speeds of response when specifying sample.

JaHo · March 11, 2021, 8:42am

That's strange, not exactly sure what's going on. Anyways, it's not a pressing issue for me at the moment so I don't want to steal too much of your time. If I can help by providing more info I'd be happy to. Thanks again for your help!

Topic		Replies	Views
Slowness in apoc.meta.subGraph General migrated	0	90	September 2, 2022
Is there way to speedup apoc.meta.schema()? Neo4j Graph Platform apoc	2	247	April 5, 2023
Create meta graph of subset of nodes Procedures & APOC	0	355	October 16, 2020
Apoc.meta.subgraph slow in neo4j5 vs neo4j 3? Procedures & APOC	0	136	December 21, 2023
How to create a meta graph (not only for visualization!) and compare it to another meta graph automatically Procedures & APOC	1	942	December 16, 2019

Take the Course Then Join The Aura Agent Hackathon

Call apoc.meta.graph() expected runtime

Related topics

Take the Course Then Join
The Aura Agent Hackathon