cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! Site migration is underway. Phase 2: migrate recent content

Deleting older relationships

DanielGittx
Node Link

I'm new to graph and I'm evaluating if neo4j would fit my use case.

I have 2 CSV files as follows:-

  1. Persons file (phone number, name columns)
  2. Calls file (callerNumber , recipientNumber, callDate columns).

I anticipate >50million nodes, >20billion relationships)

I have been able to create Nodes using Persons file and Relationships using Calls files through Neo4j admin import.

Challenge comes when deleting relationships for a certain callDate so that I can add newer relationships. It's too/painfully slow for large datasets.
match ()-[r: {callDate:20200101}]->() delete r;

I found out I can't index relationship properties.

Is there a way to optimize this cypher? How could I possibly re-model my CSVs?

1 ACCEPTED SOLUTION

Nice, happy to hear this 🙂

The apoc procedure and the index should really speed up your query

Regards,
Cobra

View solution in original post

10 REPLIES 10

Cobra
Ninja
Ninja

Hello @DanielGittx,

Yes, it's possible This request should work:

CALL apoc.periodic.iterate('MATCH ()-[r:{callDate:20200101}]->() RETURN r', 'DELETE r', {batchSize:1000, iterateList:true})

It deletes relationships by batches of 1000 relationships.

Regards,
Cobra

Hi @Cobra,

Thanks much. Indeed the apoc you shared works (I just refractored syntax abit). But it's a bit slow for about 10billion relationships I'm working with(6 months data)

I came across this "db.index.fulltext.createRelationshipIndex" as a way of indexing relationship property.
The index is currently populating hopefully the cypher will gain some speed once done

Nice, happy to hear this 🙂

The apoc procedure and the index should really speed up your query

Regards,
Cobra

Just an update...
The indexing process is very slow.

Considering:-

  • Database size is 1.2t
    Server configs:-
  • Heap - 230g
  • Page cache - 1.182t

Neo4j Version:-
Neo4j Browser version: 4.0.3
Neo4j Server version: [3.5.15]

It has taken 3hrs to just get to 12% (index populating)

  • CALL db.index.fulltext.createRelationshipIndex("callDateRelationship",["CALLS"],["CALL_DATE"], { analyzer: "url_or_email", eventually_consistent: "true" })

Why is this and is it possible to fast track?

Hello @DanielGittx,

Yeah because it has to index all your database, that's why it's better to do it when you create the database

Regards,
Cobra

DanielGittx
Node Link

Agreed, however initially had done a bulk import (neo4j admin import).
Will neo4j admin import preserve indexes if i create them in advance then do a bulk import?

If I'm right, the index is set at the importation

Regards,
Cobra

I don't think so, especially for relationship indexes

I marked one of your messages as solution because i tested that with a subset of the graph and it worked(was fast) also for the fact that i'm solving a different issue now

I don't know more about this topic but I think you right, according to the DOC,

Full-text indexes are powered by the Apache Lucene indexing and search library

so it must be pre-computed already

Regards,
Cobra

Daniel,
I would suggest changing your data model to have a day of the call as a node. So it would look like:
(:Person) -[:CALLED_ON]->(:DayOfCall) <-[:RECEIVED_CALL]- (:Person)
Then you can index day of the call with date property - then DELETE request will work much faster. Please note that you will still need to use apoc.periodic.iterate()