How to query multiple branches, constrain using aggregations, limit on the number of branches

Hello, I have the following structure:

(building)-[:HAS]->(:Floor)-[:HAS]-(:Room)-[:HAS_OBJECT]->(:Object),
(:Room)-[:HAS_REQUIREMENT]-(:Requirement)

I've attached an image of the dataset, and a create script at the bottom

I would like to output this data as a json object, but with a few constraints

{
  "Name": "Building one",
  "floors": [
    {
      "rooms": [
        {
          "requirements": [
            {
              "Name": "Three requirement 2"
            },
            {
              "Name": "Three requirement"
            }
          ],
          "Name": "Three",
          "objects": [
            {
              "Name": "Three object 2"
            },
            {
... etc

So the path to match is:
Give me all buildings, with their floors, with their rooms, with their objects and requirements.

And now i want to be able to constraint my data, e.g.: only give me two rooms per floor, ordered by room.Name.

Another case would be be: give me the two biggest rooms on the floor.

I've tried a lot of things, but it just seems impossible in Neo4J.

I tried:

  • Map projections (https://neo4j.com/docs/cypher-manual/current/syntax/maps/#cypher-map-projection) . Using map projections doesn't work because aggregating functions don't work in the constraint, also, limiting the resultset cannot be done
  • apoc.convert.toTree. I can't get apoc.convert.toTree to work at the query part where the query splits (room)->(object), (room)->(requirement). I've tried chaining the different paths together, but I can't get that to work at the point were the query splits (at the room node)
  • subqueries using CALL {}. This doesnt work because I can't pass variables from the outer scope to the subquery.
  • A combination of unwind / collect becomes doesn't work in combination where some rooms don't have requirements, but do have objects.

It seems like a very trivial task, but I can't get it to work. So, any help with this would be greatly appreciated.



UNWIND [{_id:1, properties:{NumberOfRooms:5, Name:"First floor"}}, {_id:8, properties:{NumberOfRooms:0, Name:"Floor X"}}, {_id:9, properties:{NumberOfRooms:1, Name:"Floor Y"}}] AS row
CREATE (n:`UNIQUE IMPORT LABEL`{`UNIQUE IMPORT ID`: row._id}) SET n += row.properties SET n:Floor;
UNWIND [{_id:12, properties:{Name:"Zero requirement"}}, {_id:13, properties:{Name:"One requirement"}}, {_id:14, properties:{Name:"Two requirement"}}, {_id:15, properties:{Name:"Three requirement"}}, {_id:18, properties:{Name:"Zero requirement 2"}}, {_id:19, properties:{Name:"One requirement 2"}}, {_id:20, properties:{Name:"Two requirement 2"}}, {_id:21, properties:{Name:"Three requirement 2"}}] AS row
CREATE (n:`UNIQUE IMPORT LABEL`{`UNIQUE IMPORT ID`: row._id}) SET n += row.properties SET n:Requirement;
UNWIND [{_id:0, properties:{Name:"Building one"}}, {_id:7, properties:{Name:"Building two"}}, {_id:11, properties:{Name:"Empty building"}}] AS row
CREATE (n:`UNIQUE IMPORT LABEL`{`UNIQUE IMPORT ID`: row._id}) SET n += row.properties SET n:Building;
UNWIND [{_id:22, properties:{Name:"Zero object"}}, {_id:23, properties:{Name:"One object"}}, {_id:24, properties:{Name:"Two object"}}, {_id:25, properties:{Name:"Three object"}}, {_id:26, properties:{Name:"Four object"}}, {_id:27, properties:{Name:"Room second building floor Y object"}}, {_id:28, properties:{Name:"Zero object 2"}}, {_id:29, properties:{Name:"One object 2"}}, {_id:30, properties:{Name:"Two object 2"}}, {_id:31, properties:{Name:"Three object 2"}}, {_id:32, properties:{Name:"Four object 2"}}, {_id:33, properties:{Name:"Room second building floor Y object 2"}}] AS row
CREATE (n:`UNIQUE IMPORT LABEL`{`UNIQUE IMPORT ID`: row._id}) SET n += row.properties SET n:Object;
UNWIND [{_id:2, properties:{Depth:4.8, Name:"Zero"}}, {_id:3, properties:{Area:10.8, Width:3, Depth:3.5999999999999996, Name:"One"}}, {_id:4, properties:{Area:10.8, Width:3, Depth:3.5999999999999996, Name:"Two"}}, {_id:5, properties:{Area:30, Width:5, Depth:6, Name:"Three"}}, {_id:6, properties:{Area:19.2, Width:4, Depth:4.8, Name:"Four"}}, {_id:10, properties:{Width:4.4, Depth:2.2, Name:"Room second building floor Y"}}] AS row
CREATE (n:`UNIQUE IMPORT LABEL`{`UNIQUE IMPORT ID`: row._id}) SET n += row.properties SET n:Room;

UNWIND [{start: {_id:0}, end: {_id:1}, properties:{}}, {start: {_id:7}, end: {_id:8}, properties:{}}, {start: {_id:7}, end: {_id:9}, properties:{}}] AS row
MATCH (start:`UNIQUE IMPORT LABEL`{`UNIQUE IMPORT ID`: row.start._id})
MATCH (end:`UNIQUE IMPORT LABEL`{`UNIQUE IMPORT ID`: row.end._id})
CREATE (start)-[r:HAS]->(end) SET r += row.properties;
UNWIND [{start: {_id:2}, end: {_id:22}, properties:{}}, {start: {_id:3}, end: {_id:23}, properties:{}}, {start: {_id:4}, end: {_id:24}, properties:{}}, {start: {_id:5}, end: {_id:25}, properties:{}}, {start: {_id:6}, end: {_id:26}, properties:{}}, {start: {_id:10}, end: {_id:27}, properties:{}}, {start: {_id:2}, end: {_id:28}, properties:{}}, {start: {_id:3}, end: {_id:29}, properties:{}}, {start: {_id:4}, end: {_id:30}, properties:{}}, {start: {_id:5}, end: {_id:31}, properties:{}}, {start: {_id:6}, end: {_id:32}, properties:{}}, {start: {_id:10}, end: {_id:33}, properties:{}}] AS row
MATCH (start:`UNIQUE IMPORT LABEL`{`UNIQUE IMPORT ID`: row.start._id})
MATCH (end:`UNIQUE IMPORT LABEL`{`UNIQUE IMPORT ID`: row.end._id})
CREATE (start)-[r:HAS_OBJECT]->(end) SET r += row.properties;
UNWIND [{start: {_id:2}, end: {_id:12}, properties:{}}, {start: {_id:3}, end: {_id:13}, properties:{}}, {start: {_id:4}, end: {_id:14}, properties:{}}, {start: {_id:5}, end: {_id:15}, properties:{}}, {start: {_id:2}, end: {_id:18}, properties:{}}, {start: {_id:3}, end: {_id:19}, properties:{}}, {start: {_id:4}, end: {_id:20}, properties:{}}, {start: {_id:5}, end: {_id:21}, properties:{}}] AS row
MATCH (start:`UNIQUE IMPORT LABEL`{`UNIQUE IMPORT ID`: row.start._id})
MATCH (end:`UNIQUE IMPORT LABEL`{`UNIQUE IMPORT ID`: row.end._id})
CREATE (start)-[r:HAS_REQUIREMENT]->(end) SET r += row.properties;
UNWIND [{start: {_id:1}, end: {_id:2}, properties:{}}, {start: {_id:1}, end: {_id:3}, properties:{}}, {start: {_id:1}, end: {_id:4}, properties:{}}, {start: {_id:1}, end: {_id:5}, properties:{}}, {start: {_id:1}, end: {_id:6}, properties:{}}, {start: {_id:9}, end: {_id:10}, properties:{}}] AS row
MATCH (start:`UNIQUE IMPORT LABEL`{`UNIQUE IMPORT ID`: row.start._id})
MATCH (end:`UNIQUE IMPORT LABEL`{`UNIQUE IMPORT ID`: row.end._id})
CREATE (start)-[r:HAS]->(end) SET r += row.properties;

MATCH (n:`UNIQUE IMPORT LABEL`)  WITH n LIMIT 20000 REMOVE n:`UNIQUE IMPORT LABEL` REMOVE n.`UNIQUE IMPORT ID`;

@sylvestermachielse, welcome to the community! Fun post. Loads to explore.

Let's first look at making things a bit simpler and end with a "gotcha". To begin with, you do not need to delete the 'UNIQUE IMPORT LABEL' at the end of your CREATE script because you didn't need to create it in the first place. The schema can be created in 1 line and you could then just import the data in a CSV. Also, the number of rooms is the count of outgoing relationships from the floor, so no need for that attribute on floor, which in a large dataset, could become a nightmare for updates…and there are a few other complexities that you can consider removing.

Next, this is the taxonomy:

  1. Building
    1.1 Floor
    1.1.1 Room
    1.1.1.1 Object (optional)
    1.1.1.2 Requirement (optional)

I've added optional to object and requirement, which I think is ok, but in your design, all but the building is "optional". You've a building node with no floors nor rooms. I do not think this is correct. A graph represents relationships and I would not recommend making a building with no floors, a building. It can be a “lot”; “scaffolding” or some other entity, but I don’t think it’s a building. A graph is a representative model and its actuality and/or believability, drives, in part, some of its strength from that consistency with “real” – in other words, the query will make more sense. And because this specific problem space is so dependent on the taxonomy of the “building” - from which you want to know about it’s object and requirements - you’ll gain a lot by having that consistency up and down the building’s taxonomy – i.e., you will worry less about the floor and room and focus on what the room cares about. What makes a taxonomy so powerful is the consistencies in the top levels of that taxonomy design because those leave us free to focus on the important details without worry. The classic examples here are how little graphs worry about JOINs.

So, taxonomies are classic graph models because of how well graphs handle recursion and consistent hierarchies. But here's the gotcha. Below, in the side-by-side picture, I've added an updated model highlighting a rather interesting tidbit about relationship names. When you created some of your relationships, you used the same "HAS" as the name on relationships between distinct nodes. Because "HAS" is used in various places, the schema will take the logical path to fill in what’s being said partially. I.e., if a building “has” a floor and a floor “has” a room, a building “has” a room. And it goes on from there leading to a “room having a room”. Cool gotcha no? Imagine what queries could inadvertently produce if you’re not aware of this! Remember, graphs are NOT simple, they are elegant. Big diff.

IMHO, relationships are more important in a graph than nodes. And the naming of them is a critical success factor. I spend 10x more on naming relationships than nodes and always remain worried I’ve missed something. A clear relationship facilitates the creation of a clear PATH - which is one of the major power offerings of a graph query.

I know I ended up not answering your question because I think, once you update this, you'll have what you need to create your own answer. You'll use WITH and WHERE after that WITH and probably a collect() and order by to boot. But, give this a try and see how that works.

HTH.

Thanks for your reply. I only included the model as an example, the query to create the data is generated by apoc, therefor it contained that UNIQUE IMPORT LABEL Your analysis of the model showed me that meta analysis functions of Neo4j seem to union the meta behavior of relationships based on their label. Hence, the floor pointing to floor with the HAS relation.

My question however is really about how to write that query that can get me the results I want. Even with your proposed changes to the model, and the data. I can't get it to work.

I would really like an example of:
Give me all buildings, with their floors, for each floor, give me the number of rooms. For each floor, give me all the rooms, with their top 2 requirements, ordered by name, and with all their objects. And for the fun of it, all joins are optional. Also, lets say, I only want want floors that either have 0 rooms, or more than 2 rooms. I come close to writing this query using map projections, but it fails on the aggregate functions in the WHERE statement of the pattern comprehensions, also I can't sort there.

I believe you're going to need to split up your matches instead of generating all paths and filtering.

For this example I will use a taxonomy similar to what Llewellyn provided.

MATCH (building:Building)
OPTIONAL MATCH (building)-[:HAS_FLOOR]->(floor)
WHERE NOT (floor)-[:HAS_ROOM]->() OR size((floor)-[:HAS_ROOM]->()) > 2
OPTIONAL MATCH (floor)-[:HAS_ROOM]->(room)
WITH building, floor, room, [(room)-[:HAS_REQUIREMENT]->(requirement) | requirement.name] as requirements
WITH building, floor, room, apoc.coll.sort(requirements)[..2] as topRequirements, [(room)-[:HAS_OBJECT]->(object) | object.name] as objects
WITH building, floor, collect(room {.name, topRequirements, objects}) as rooms
WITH building, collect(floor {.name, rooms}) as floors
RETURN building {.name, floors} as building

This also assumes you are using conventions of ALL_CAPS for relationship types, Capitalized for node labels, and lowercase for property names.