Can we do compare Datetime with only day and month or only month in Cypher?
For a question like: What events occurred from 1st January to 15th March? In my Graph, I want to match all events occurred between Jan and Mar except the years (mean every possible years) I have start_at and end_at relation.
In addition to @dana_canzano suggestion, consider how you'd do this in any database. If it needs to be done "often" and with "high performance", you could consider adding day+month and/or month to your data model.
@dana_canzano@john.stegeman I don't know if my question is confused but above answer I think is not relevant with my question. I ask about how to compare datetime variable but only with day and month in cypher? Because I know it can do greater or less operation with datetime, but it has to do with (day, month and year) I haven't figured out how to do that with only (day and month) I think it can do with if else statement
Here is an example:
I have 3 events: to make it simple, let assume I have only one relation: occur_on
A - occur_on 28th Feb 1975
B - occur_on 20th March 2000
C - occur_on 12 Jan 1966
Question: Which events occurred between 1st Jan to 15th March (doesn't consider the year)? I mean this question sound non-sense but somehow users still want to know that.
For above database, the query should return A and C
Both Dana and I gave you some building blocks to come up with the solution. For example, you could use Dana's techniques to get the month and day from the dates and use those as filters in your query. You could use my technique to do the same thing except write the values to the database, enabling you to use indexes when you filter.
This works, but I don't think it will perform well over a large graph, as it will be forced to perform a full search. I don't see how a date index would help.
If you know you are going to do this, you could create another date property that stores the truncated date to a normalized year (as below). Then you could use a date range index.
I have used the date.truncate() method to set each date to a common year (the year does not matter) and then searched for those dates that are within your interval (using the common date for your given dates).
unwind [
date({year:1975, month:2, day:28}),
date({year:2000, month:3, day:20}),
date({year:1966, month:1, day:12})
] as aDate
with aDate, date.truncate("day", aDate, {year: 0}) as monthDay
where date({year:0, month:1, day:1}) <= monthDay <= date({year:0, month:3, day:15})
return aDate
From you recent posts, I feel you are trying to use a graph database (Neo4j) for data analytics. I feel this is a mismatch. A graph database is ideal for managing networks for related entities, allowing you go gain insights into these relationships or perform complex transversal algorithms that would be impossible with a relational database. I will be interested in your insights.
@glilienfield
I don't know how to express this, but you're really kind.
To explain more about my project. I'm not doing about data analytics, but yes, it's a little forcing here. I'm working on Education domain, especially on History Subject to help the students or teacher learn History. Again, yes, it's a complicated domain we have to design to structure knowledge from unstructured data (text) (war, events, battle, organization, historical character,.. are what we're trying to structure) . But it's still work just a bit unnatural.
Again, I appreciated a lot to have your perspective view about graph usecases.
@glilienfield
As you mentioned, I was thinking, actually we're just in developing stage, we haven't research about the ability of Neo4j if the number of query reach 1000 at a time. Then can it adapt?
I think performance is very dependent on your data and the type of query. The larger the graph and the longer the query pattern is, the slower the performance. I have seen some people complain about performance when their variable length queries start exceeding lengths of 5 or so and they are looking for all paths in a large database.
In my use case, I have an anchor node search that utilizes an index, then I execute a custom traversal algorithm originating from the node. It’s very fast. I suspect any time you start with an achieved search and then a query from that node, performance will be good and independent to the size of the database, as long as the subgraph originating from your anchor node is not complex. This is because neo4j can find the anchor node quickly with and index, even with a high graph. The remaining speed of your query depends on the complexity of the graph isolated to the anchor node.
I am still developing my application, so I have not gotten to stress testing with large number of concurrent users, so I don’t have experience to leverage.
I am sure 1000’s of basic queries will be ok. You will want to utilize indexes where possible. Writing smart cypher is also important. A good group of community questions is on cypher optimization recommendations.
Blockquote From you recent posts, I feel you are trying to use a graph database (Neo4j) for data analytics. I feel this is a mismatch.
@glilienfield are you saying it is not a good idea to use graph database in order to analyze data ?
I'm not English native speaker, making sure I'm not missing sthg due to language, and I'm not sure about the exact scope of the term "analytics"
but if that is what you are saying, I would be interested in reading why you say so (because I'm saying the exact opposite.) maybe we do not mean the same by "analysis" ?
Yes, that will work too. We routinely represent dates as strings at work in YYYY-MM-DD format because it’s easy and the strings are in both alphanumeric order and chronological order. The disadvantage is you are not validating someone passed a valid date. You solution is similar in that it is MMDD format, so the same properties apply.
I do agree that the query conditions is easier with your approach. You can probably use the methods of the Date object as shown by @dana_canzano to extract the month and day, instead of depending g on apoc.
This is a common problem, in healthcare and genealogy. Not only may you have an incomplete date, but also approximate dates: before 1970, after 2010, between 1970 and 1975, about Jan 2011, etc.
The GEDCOM standard has a notion system for this. For some Neo4j solutions, see this function for a User Defined Function:
Hey sorry for continuing this post, but I met some trouble.
As by default datetime gonna set missing field (day or month) by 1. Then we don't know actually if that event occur on 1st day of the month or not.
Actually I handle a time expression (string) by 2 time stamp (for start and end ). For example: Jan,2000 will be expressed by 2 time stamp (1st Jan 2000 and 31st Jan 2000)
This is quite tricky and hard to process about until which range you gonna choose to separate into start and end. Like even day you still can parse into 00:00:00 to 23:59:00. But I only choose time stamp from Month
Then if the user ask, which event occur on 1st Jan, 2000. Then my query will check if start = end then it actually happened on 1st Jan 2000 (I set start = end for specific datetime (had day, month and year) , if it not then we not sure if that event actually happened on 1st Jan 2000 (but in Jan 2000), I don't return these uncertain cases
I don't know if you've met this case. And how you solve this @glilienfield
It’s not clear to me what your use cases are. If you need flexibility, you can create properties to handle each individual type of date range query. For instance, one property that has the full timestamp, one with the date, one with just the year and month, etc. Then you can query for each different scenario.
Let me know if I missed the mark and we can try again. Some examples would help.
one of my rule of thumb is to (try and) never encode something that is not correct. as you say an unknown day of the month is not the same as the first of the month , so I would not encode this as a date type, the format of which forces you to encode something wrong or ambiguous
since the format checking of date() gets in your way, it pleads for a string property where you can have "YYYYMM" or "YYYYMM00" for dates with unspecified day in the month
if you need to extract year months days, chose for a format with a separators
unwind ["2000-01","2024-04-17","-04-","-04-12"] as date
return split(date,"-")[0] as year,split(date,"-")[1] as month,split(date,"-")[2] as day
the format checking, you can do otherwise at import time and if you need to calculate time differences or duration, as glilienfield pointed out, you can have several properties to query in different scenario. beware of too many duplicates that can get tricky to keep track of though.
Try this:
with toString(date()) as dte
return REPLACE(dte, "-", "") as dt
The result: 20240417. For 2023 year this will be 20230417.
To search on a particular day say 0417, run a query like where dt ends with "0417".