Can there be multiple relationships of the same type, between two specific nodes? (timestamp)

Let's say I want to make this relationship:

a:User(john) -b:CREATED-> c:File(abc.exe)
SET b.DateTime = now()

Can I keep doing this over and over each time john creates abc.exe, and have multiple relationships between those two exact nodes?

I'm trying to avoid making a new node for "abc.exe" each time it's created, and has a new unique timestamp. If I follow that design, the amount of nodes in my db will be outrageous.

Put another way, if the multiple relationships of the same type are possible: Is it more performant to have a ton of nodes, or a ton of relationships? What about in terms of ease of querying later?

Hi Nodeynode! Great name, btw.

Being new here, I will try to answer your question and hope someone corrects me if I say anything not quite right. :sweat_smile:

You can definitely have multiple relationships of the same type between the exact same nodes. You can see this yourself by running this code multiple times:
MERGE (t1:Temp {tempid: 1})
MERGE (t2:Temp {tempid: 2)}
CREATE (t1)-[:TEMP_REL]->(t2)

If you MATCH/MERGE the nodes before creating the relationship, you will avoid creating your abc.exe node every time. But by using CREATE in the relationship path, it will always create a new relationship.

The best performance and ease of querying depends on the use case and the number of nodes/relationships involved.

For example, if node A has 40000 relationships (of the same type) to node B, and if you only want one of those relationships where you are finding it by a property, and let's say you're starting with node A by its indexed identifier. Then the query finds node A very quickly, but has to search all 40000 relationships and check their properties to find the one you want.

If you have split out node B into B1, B2, B3, B4, etc. then node A still has 40000 relationships. If you put the property you're trying to find on B, then it still has to search 40000 properties.

A principle I've seen in several examples has been to use nodes that split out parts of the graph. Think of the example in the documentation where they use AirportDay nodes to avoid having a node for each airport that will eventually have tens of thousands of relationships.

If your user John is repeatedly creating the same file, you could also create B1-NEXT->B2-NEXT->B3-NEXT-B4, etc. and then node A has something like a MOST_RECENT relationship to the last node in the chain. Then you can find the most recent creation of abc.exe very fast.

Or maybe you have some combination of model where A-->B and then B goes to separate CreationDay nodes. It'll depend on your exact use case and how many items you want/need in the graph.

That's all I've got. Hopefully it's a little helpful. :smiley:

Happy graphing!
Vincent

1 Like

Thanks Vincent! I still need to digest your entire post, but is the index number always incrementing? If so, it may be safe to say that it's quick to find the latest timestamped relationship between a user and a file he created. The latest time will always be the highest indexed attribute. (Edit: nope, nevermind. You staye that it still has tonsearch the rest anyways, which makes sense).

However, it doesnt sound like there's any efficient away around finding all of the file creation events by a user, over a given time span. You will have to crawl all of the creation relationships, and check if they are in the time range, wheter its 40k relationships to one node, or 1 relationship per 40k nodes.

I also couldn't find any proof of neo4j supporting a time based index, sadly. And I'd really not like to mimic this by making day/hour/minute paetition nodes as well. It just wont make sense when querying or manually traversing the graph, especially when different types of actions and node types are added later.

That said, maybe partitioning by "day" nodes would still meet our use case if it doesnt get in the way of manually analyzing or querying the graph too much.

Does anyone know if time indexing is a feature being worked on? I think that's the best solution.

Please provide little bit more info on your data model (or use case) so that I can offer some solution.

Q: Why the same file has different time stamps? Is getting any updates at each time stamp?Even here the file getting replaced with new version. In this case you can have two relationships: one 'created date' and the other 'last updated date'.

File names are common and can be created by multipe users at different times and on their own unique host machines. And/or get recreated due to updates.

Or, you could think of it like processes running, and capturing the start timestamp.

It will be completely irrelevant to an investigation if a process ran two minutes or two weeks ago, or on a random host. For example, we may only care if a process ran in the investigation time frame of interest, say, 30-60 minutes ago, and only on a specific host we're interested in.

I dont have a full schema yet because using time in graph DBs seems to be an unsolved problem as of today, but there will be nodes such as:

  • Username
  • Hostname
  • IPAddress
  • ProcessName
  • ProcessPath
  • FileName
  • FilePath
  • CommandLineArguments
  • Registry Path/Key/Data

Where users have hosts. Hosts have IP addresses. Hosts start processes. Proxesses create files.

And the list continues for whatever else is in the process explorer tool. It just depends on what kernel API is being called and monitored by process explorer.

Thanks for the reply. I will look into this info and let you know. Using time in graph DB , please see this link:

where in I modeled Call Detail Records with date and time of incoming/outgoiing/going to voice mail.

Hi Nodey,

If John is always creating the same file, you can consider a data model like this:

(did that picture work?)

The benefit of this model is that if you are starting your query at (User {Username: "John"}), you can find the most recent version of the file very quickly.

If you don't want to have a node for every version of the file, consider a data model like this:
(new users can put only one embedded media in a post, so see my next reply for this image)

The benefit of that model is you can search only the CREATED_MOST_RECENTLY relationship. The way Neo4j saves that, it won't even look at the CREATED relationships.

It's generally not practice to have two nodes with tons of relationships between them, but whatever works for your use case!

Hope the pictures help! :slight_smile:

-Vincent

temp2

Here's the other data model I mentioned.

Hi, I come up with one more question. Now that in Neo4j we can create multiple relationships of the same type between two nodes, is there any possibility that we can overwrite the relationships of the same type with different property? THX!