Help in understanding why this query works please

hi

I am new to Cypher and implementing a Financial Crime DB.

This is for a superannuation product and my schema is -
(Member), (BankAccount), (EmailAddress), (MobileNumber) and (IPAddress).
And relationships between all these nodes of USES.

I needed to derive for an email pattern any members with this email and did they have payments, this Cypher gave accurate results ( as the DBAs ran theirs in their OLTP database) but I am confused on what it is doing as I have not mentioned any 'real' nodes and in particular returns the EmailAddress node and has filtered the list to only those who have a BankAccount node.

It is just searching everything maybe ?

WITH ['.aol.com.', '.yandex.com.', '.fastemail.com.'] AS emaillist
UNWIND emaillist AS emailextension
match (email) -- (member) -- (bankaccount)
WITH emailextension, email
WHERE email.name =~ emailextension
and exists (bankaccount.name )
and member.status = 'Current'
RETURN * limit 10

Thanks in advance :)

WITH ['.aol.com.', '.yandex.com.', '.fastemail.com.'] AS emaillist
UNWIND emaillist AS emailextension

This is part of your query is assigning values to your variable
and looping over it and it can be explained with help of a psuedo code

emailList = ['.aol.com.', '.yandex.com.', '.fastemail.com.'],
emailList.foreach((emailExtension) => {
    // you will get  indiviual email extenstion value in it  
)

and you can think rest of your query is executed inside of this forEach.

Now coming to the MATCH clause

match (email) -- (member) -- (bankaccount)
WITH emailextension, email

This returns a nodes with the relationship as you have defined in your schema.
Relationship is expressed as a member having a email and a bank account.

WITH clause passes the variables you have defined to next operations.

WHERE email.name =~ emailextension
and exists (bankaccount.name )
and member.status = 'Current'
RETURN * limit 10

WHERE clause is used perform a regex check on the email, so that it has a following email extension in it and EXISTS clause is used here to check the if the email is existing in the bank account. Rest of cypher is use to look if Bank account associated with that member has a status of current and returns the first 10 nodes.

Please familiarize yourself with labels! These are like types for nodes!

Right now in all your queries, you aren't using labels at all! This means that Cypher will perform all nodes scans to find nodes in your database, and won't restrict nodes to the types that you want!

Labels start with a colon, so use (e:Email) for an :Email labeled node bound to the e variable, Cypher will be able to check only :Email nodes in the database to find it, which is much faster than having to try every node in the database.

Also, when you have created indexes (to speed up matching by a property on nodes of a certain label), then you NEED labels in your MATCH pattern in order for the index to be used.

hi Andrew,

Thanks for the reply. I tried to use labels, and found an example like the one i have used in neo4j help. I understand that i was not using labels at all, which is why i wondered what it was actually doing and you have confirmed 'scanning everything'.
I was just having trouble as i could not get CONTAINS and 'wildcard' approach to work.
Basically needed to find any 'current' members with these types of emails and have a bank account...

Being extremely experienced in SQL, still converting my mind to neo4j..

Thanks
Felicity

Well first of all let's fix up the original query. I've added labels where appropriate, and changed around the order of some of the lines:

WITH ['.aol.com.', '.yandex.com.', '.fastemail.com.'] AS emaillist
UNWIND emaillist AS emailextension
MATCH (email:Email) -- (member:Member) -- (account:BankAccount)
WHERE email.name =~ emailextension
  and exists (account.name )
  and member.status = 'Current'
RETURN emailextension, email
LIMIT 10

Some things to note:
The MATCH here shows a pattern between :Email, :Member, and :BankAccount nodes, but only that they are connected to each other in this way, no information is provided as to the type or direction of the relationships that connect them. Maybe this is intended, but if certain restrictions exist on the relationship types or directions, then that should be included in the pattern.

The first WHERE clause uses a regular expression check on email.name. That won't leverage any indexes that you set up for faster matching. Since these are extensions, a better check might be to use ENDS WITH, as in: WHERE email.name ENDS WITH emailextension, however you would need to alter your inputs accordingly.

Usually * isn't used all that often, you'll normally want to be specific about the variables you are returning, or are ferrying along in WITH clauses.

2 Likes