Neo4j for real time anomaly detection

Hello, I'm a PhD student from VCU. I'm planning to use Neo4j to store network data in a power system. The power system consists of substation and data will be stored in roughly 1200 data points, at a 120hz rate.
I have a couple of questions.

  1. What is the best way to store real-time data into Neo4j? Is it Kafka, or any other recommendations?
  2. Can i do multiple data injection to Neo4j using community edition?
  3. Can I run multiple Cypher queries simultaneously?
  4. Is it a good idea to store high-speed network .pcap data into Neo4j?

Any help will be highly appreciated. Thank you.

  1. Best way to store real-time data in Neo4j will depend on where it's coming from and what your format is, but yes in general Kafka is a good option, and the fact that you can use a managed cloud Kafka to do this makes it a lot easier.
  2. Not sure what you mean by multiple data injection - but these general techniques should work with neo4j community
  3. Absolutely. There are some caveats here about how locking works in Neo4j. Doing lots of parallel queries that modify the same node at the same time is not a good idea, but Neo4j can do many parallel transactions.
  4. It's OK to do this -- but a thing to keep in mind is that while Neo4j supports the byte type, it's a "worst practice" to try to use Neo4j as a blob store. So if you want to store information about packets, routing data and so forth, plus topology -- yes go for it, but if you want to store the content of packets -- I'd strongly suggest not to do that, and to combine Neo4j with something else, i.e. to use neo4j to store the topology aspects of the graph and something perhaps like redis to allow you to look up the binary content of a packet with a corresponding ID.

Also: I'm a VCU alum, are you in the CS program? Feel free to send me a DM, if you want to talk about Neo4j more. I've previously been engaged with VCU helping students of all sorts through the alumni board, so it'd kind of be fun to learn more about what you're doing.

1 Like

See also: https://www.confluent.io/blog/stream-analyze-visualize-data-with-kafka-ksqldb-and-friends/?utm_source=twitter&utm_medium=rmoff&utm_campaign=ty.community.con.rmoff_twitter_2020-06-12&utm_term=rmoff-devx

1 Like

First of all, Thank you very much for the reply. I'll DM you for further details.

  1. I'm mainly concern about the amount of data which should transferred, or the bandwidth. With pcap files, i'll be extracting 120*1000 points per sec.
  2. Several subsystem needs to access neo4j database simultaneously. Does community edition support multiple database access simultaneously.

Yes, I'm in CS program. I would really appreciate your help. I'll DM you further details.

Thank you.