Data type of a property

In Cypher how can I check the data type of a property?

Once a property is used for one data type, are all properties for that node label that same property? I'm assuming no. So if I have a node label :testnode and I have a property of mystery. I could mix-match putting numbers, strings, dates, etc... in the nodes. Obviously a data quality nightmare but that leads me back to my first question. I'd like to be able to query my nodes and look for data quality issues.

Cypher properties always have a type, but Neo4j doesn't constrain their type. That is to say that if you have a node property called mystery, it's possible to make it sometimes a string, sometimes an integer.

For example, this is OK:

CREATE (:testnode { mystery: 1 });
CREATE (:testnode { mystery: "Hello" });

A way that you can profile looking for data quality issues is by checking types like this:

MATCH (t:testnode)
WHERE t.mystery = toString(t.mystery)
RETURN count(t);

That will tell you how many strings there are. If the property were an int, it wouldn't match. You could do a similar things with other types too, to build up a table of how many instances were which type for an attribute.

UPDATE August 2020

In recent versions of APOC, there is CALL apoc.meta.nodeTypeProperties() and CALL apoc.meta.relTypeProperties() which samples the database and outputs schema for all labels & rel types. This is a great way to detect (for example) if any property in the database ever has more than one distinct type. So in the example I gave above with the "mystery" property, the results of those procedures would report that the "mystery" property is of type ["String", "Long"]

5 Likes

Thanks!

I think you may have given me an idea for a way I could contribute to the APOC library. Have a function like DataType() that would go through a series of case statements testing each data type and return a string value of the data type that it was determined to be.

Hi Mike,

In fact it already exists, take a look at apoc.meta.type function.

Example :

WITH [true, 42, 'foo', 1.2] AS data
UNWIND data as value
RETURN apoc.meta.type(value)

Result is :

"BOOLEAN"
"INTEGER"
"STRING"
"FLOAT" 

Cheers

6 Likes

Benoit's got a good idea -- thing is I would just caution you that there's a way of telling the type of an individual property value, but that's not the same thing as a property having a type -- they don't have types, or at least they can vary.

Reason I bring this up is that you need to do some kind of sampling, like for example MATCH (n:Node) return n.mystery limit 100 or MATCH (n:Node) where id(n) % 3 = 0 return n.mystery limit 100. Only if the types of all of the sample agree is it probably safe to assume that's the type of the property.

2 Likes

Once a property is set as INT, how to make sure new values for that property too are INT.

Example:
I created a node with age as property and type casted it to INT.
When am creating nodes using load csv or jdbc, this age property is loaded as string not INT.
Do I need to typecast everytime i load?

1 Like

Properties do not have types. When you load the data, you must convert the data to the type that you want for that property.

3 Likes

Is this always true? I seem to be ingesting CSV files using neo4j-admin import with arguments like --nodes "import/nodes_header.csv,import/nodes.csv" where nodes_header.csv specifies data types e.g. value:int and live:Boolean..

Properties themselves do not have types, but the values they hold do. On import, numeric values, booleans, strings are all stored differently for a property which is why they need to be imported as typed data. However, you can have a property, X which could be assigned a string value or a numeric value. We don't enforce typing for a property. Of course, it would not make sense to assign a property different type values in your graph.

Elaine

2 Likes

is there any way to constrain node properties to type? like some kind of schema. Would save a lot of boilerplate code, increase readability of cypher queries, and give us a safer database.

either that or some kind of database action that validates type before the node is comitted (which we could define the rules for ourselves as a hacky solution)

The syntax for creating constraints was designed with future extensibility in mind. With a growing interest in more specific and rigid constraints, it might be worth revisiting the Cypher constraint parser to add more flexible syntax.

Existing constraint syntax implies ASSERT xxx is a parsed result expecting xxx to be a boolean result. If it was, we could do something like the following:

CREATE CONSTRAINT ON (book:Book) ASSERT apoc.meta.type(book.id) = "INTEGER";

Currently supported constraints are only those documented in 5.4 Constraints:

  • ASSERT (node.property [, ...]) IS UNIQUE
  • ASSERT (node.property [, ...]) EXISTS
  • ASSERT (node.property [, ...]) IS NODE KEY

These constraints are a boon to defining production data, but have not yet been expanded beyond the initial definition:

So, if anyone out there is looking for a way to contribute to Neo4j in a big way, this would be a good one. (Might be me, once I finish my current efforts)

4 Likes