Cypher: returning custom JSON from a graph projection query

It looks like a primary use case for me when querying is going to involve
a) obtain a graph projection containing the information the user wants to see
b) format this into a custom JSON format so that the client UI can visualize this directly.

The dataviz format the client UI uses is a collection of nodes, each with id and data attributes, and a collection of edges with from and to attributes referring to the id values in the nodes

{
   nodes: [
      {id: id1,
       data:{...}
      },
      {id: id2,
       data:{...}
      },
      ...
   ],
   edges:[
      {from:id1, to:id2},
      ...
   ]
}

I've got some way towards achieving this with a Cypher query (see example below), but the node format isn't right. I require, for example

{id: EDI,
 data: {
   name:'Edinburgh'
  }
}

My questions are:
i) can this be done?
ii) should I be attempting to do this in the Cypher query, or am I better leaving this as post-processing in the software that makes the call to Neo4J? I'm thinking things like Neo4J excels at graph operations and adding result formatting is piling extra work onto the database engine that can be done just as easily by, say, a Java program.

Example:

CREATE (s1:Station {stationCode:'EDI', name:'Edinburgh'})-[ts1:TRAIN_SERVICE]->(s2:Station {stationCode:'NCL', name:'Newcastle'})-[ts2:TRAIN_SERVICE]->(s3:Station {stationCode:'YRK', name:'York'})
CREATE (s1:Station {stationCode:'GLW', name:'Glasgow'})-[ts1:TRAIN_SERVICE]->(s2:Station {stationCode:'CAR', name:'Carlisle'})-[ts2:TRAIN_SERVICE]->(s3:Station {stationCode:'PNR', name:'Penrith'})

MATCH (s1:Station)-[:TRAIN_SERVICE]->(s2:Station)-[:TRAIN_SERVICE]-(s3:Station)
CALL apoc.create.vRelationship(s1, 'ROUTE_EXISTS', {from:s1.stationCode, to:s3.stationCode}, s3) YIELD rel
WITH COLLECT(s1)+COLLECT(s3) AS stations, COLLECT(rel) AS existingRoutes
RETURN {nodes:stations, edges:existingRoutes}

This returns a collection of nodes and edges, as follows:

{
"nodes":[
  {"name":"Glasgow","stationCode":"GLW"},
...
],
"edges":[
  {"from":"GLW","to":"PNR"},
  {"from":"EDI","to":"YRK"}
]
}

Hello and welcome to the community. Here's a very good thread about this topic that will probably get you the answers you're after Extracting subgraph into JSON format - #5 by andrew.bowman

Thanks for that Michael. The thread gave me a hint that pointed me in the right direction. To anyone reading this, the hint was not a subgraph, it was the use of WITH plus pipe symbol plus apoc.create.

In addition to the virtual relationships I had created, I realised I also needed virtual nodes that contained objects as attributes. Once I introduced them, the answer fell out.

In answer to my other question about whether I should be using Neo4J CPU cycles to format results, the approach below doesn't seem unduly complex, but if we suspect performance is being hit by excessive query formatting then we'll run some benchmarks.

MATCH (s1:Station)-[:TRAIN_SERVICE]->(s2:Station)-[:TRAIN_SERVICE]-(s3:Station)
CALL apoc.create.vRelationship(s1, 'ROUTE-EXISTS', {from:s1.stationCode, to:s3.stationCode}, s3) YIELD rel
WITH COLLECT(s1)+COLLECT(s3) AS stationsCollected, COLLECT(rel) AS vRoutes
WITH [station IN stationsCollected | apoc.create.vNode(['vStation'], {id: station.stationCode, data:{name: station.name}})] AS vStation, vRoutes
WITH COLLECT(vStation) AS vStations, vRoutes 
RETURN {nodes:vStations, edges:vRoutes}

giving the result as desired (apart from it being an array within an array for nodes - if I can work out what's causing that then I'll edit the answer):

{
"nodes":[[{"data":{"name":"Glasgow"},"id":"GLW"},{"data":{"name":"Edinburgh"},"id":"EDI"},{"data":{"name":"Penrith"},"id":"PNR"},{"data":{"name":"York"},"id":"YRK"}]],
"edges":[{"from":"GLW","to":"PNR"},{"from":"EDI","to":"YRK"}]
}

Don't seem to be able to edit the answer, but the double brackets were occurring because I was collecting the virtual nodes when I didn't need to. This is the new code:

MATCH (s1:Station)-[:TRAIN_SERVICE]->(s2:Station)-[:TRAIN_SERVICE]-(s3:Station)
CALL apoc.create.vRelationship(s1, 'ROUTE-EXISTS', {from:s1.stationCode, to:s3.stationCode}, s3) YIELD rel
WITH COLLECT(s1)+COLLECT(s3) AS stationsCollected, COLLECT(rel) AS vRoutes
WITH [station IN stationsCollected | apoc.create.vNode(['vStation'], {id: station.stationCode, data:{name: station.name}})] AS vStations, vRoutes
RETURN {nodes:vStations, edges:vRoutes}

resulting in the required response

{"nodes":[{"data":{"name":"Glasgow"},"id":"GLW"},{"data":{"name":"Edinburgh"},"id":"EDI"},{"data":{"name":"Penrith"},"id":"PNR"},
{"data":{"name":"York"},"id":"YRK"}],"edges":[{"from":"GLW","to":"PNR"},{"from":"EDI","to":"YRK"}]}

Hello, first I want to thank you for the solutions, that led me to fit these steps to solve my problem, so I wanted to share another solution to this, in my case I wanted to retrieve all the nodes and edges,

{
nodes:{label,id}
edges:{from,to}
}

and I chieved this using this query:

match (m)-[r]->(n)
match(o)
return collect(distinct{from:ID(m),to:ID(n)}) as edges,collect(distinct{id:ID(o),label:o.name}) as nodes

I was impelled to do it this way because I had no access to install apoc on neo4j database.

This explanation is for newcomers: the first MATCH returns all the related nodes, if two nodes are related then bring em here,
the second match just retrieves all the existing nodes (and after we 'envelop' this res in other variable)
and then the return part, that's the part where the magic begins, the collect sentence allows us to output the data in a custom way, the DISTINCT clause prevent the node or edge duplication, and guarantees integrity,

that is the way I solved it, hoping this helps other people not to get swamped in this issue anymore.
Almost 4got it, Neo4j Version is 4.3.1