Data encryption for NEO4J with AWS EC2

Hell Community! I hope you guys are doing well :slight_smile:

I've a question about data encryption for Neo4j with AWS. Does anyone has experience working with data encryption for Neo4j with AWS? If I need to mask the data or encrypt the data at database, it seems like Neo4j not having features to do this..

What is the best practice for this case?
What tools/steps do you guys are using or available?

Any advice, recommendation or resource would be really helpful and appreciated.
Thank you for your time in advance.

Respectfully,

Lee

1 Like

@plee It depends on what layer you want to encrypt: over the wire, at the storage layer, or at the property layer. There may also be other layers I'm overlooking, but these are not mutually exclusive and can be used in conjunction.

Encrypting over the wire is a matter of setting up TLS/SSL certificates and should be done regardless of which other solutions you choose here.

If you need to query the cleartext data, you'll need access to it within the query. If you're encrypting at the property layer, this would likely involve decrypting the property using a key you inject via the query parameters. Something like this:

MATCH (n:MyNode)
WHERE decrypt(n.encrypted_value, $encryption_key) = $decrypted_value
RETURN n

… where decrypt could be a function provided by a Neo4j plugin. I wouldn't store the encryption key in the same database. Keep in mind, this would be an unindexed query (an index on :MyNode.encrypted_value would not be used) so you should plan your queries around that.

The most common scenario I've seen is to encrypt the disk partition at the OS level. This way people who have direct access to the disk partition (for example, if it is reused when you remove the EC2 instance or EBS volume from your account without it being zeroed out) can't mine the disk for data, such as PII, passwords, etc. Encrypting a disk partition has a runtime I/O cost, though, so make sure it'll be fast enough for your needs if you do this.

Just a thought...
I come from the world of business applications. In my experience, customer PII is the most common form of encrypted data, and not all PII is worthy of encryption. It's hard to imagine a scenario where using a sensitive piece of data as a search key makes sense. I'm more likely to use some piece of info that doesn't need encryption, like a name or ID.

If for some reason you really, really just have to search by some encrypted thing, you could pass the encrypted version of that thing as a search key, instead of the clear text version. That way, you could index the encrypted thing.