Creating procedure in APOC with the same behaviour as cube function in OLAP

Hi,
I am new to Neo4j and Cypher and I am trying to create the behaviour of cube function (OLAP function) for Neo4j by using APOC library. More specifically, I am trying to have the same result I have using this query :

MATCH(p:Product)
RETURN p.reorderLevel as ReordLevel, p.supplierID as Supplier, sum(p.unitsIn Stock) as SommeUnits
ORDER BY ReordLevel 

UNION ALL

MATCH(p:Product)
RETURN 'null' as ReordLevel, p.supplierID as Supplier, sum(p.unitsIn- Stock) as SommeUnits
ORDER BY ReordLevel

UNION ALL

MATCH(p:Product)
RETURN p.reorderLevel as ReordLevel,'null' as Supplier, sum(p.unitsIn- Stock) as SommeUnits
ORDER BY ReordLevel 

UNION ALL

MATCH(p:Product)
RETURN 'null' as ReordLevel, 'null' as Supplier, sum(p.unitsInStock) as SommeUnits
ORDER BY ReordLevel

I would like to obtain something like that :

MATCH (p:Product, ....)
CUBE (p.reorderLevel , p.supplierID, ....)
ORDER BY p.reorderLevel , p.supplierID, ....;

Could someone help me with some ideas ?

I think you should be able to do that with a procedure that:

  1. takes a label
  2. takes a list of properties to cube
  3. takes a list of properties to aggregate

and then either generate and run that cypher code internally.
or use the java api to do the same.

or you could use apoc.cypher.run with a loop/combination on field-name + null

unwind ['p.reorderLevel',null] as ReordLevel
unwind ['p.supplierID', null] as Supplier
call apoc.cypher.run('
MATCH(p:Product)
RETURN '+ReordLevel+' as ReordLevel, '+Supplier+' as Supplier, sum(p.unitsIn Stock) as SommeUnits
ORDER BY ReordLevel 
') yield value as row
return row. ReordLevel as ReordLevel, row.Supplier as Supplier, row.SommeUnits as SommeUnits

Thank you for your answer, @michael.hunger. I am gonna take into consideration your ideas in order to create the procedure. I tried to execute your function for apoc.cypher.run but I get an error

Neo.ClientError.Procedure.ProcedureCallFailed: Failed to invoke procedure apoc.cypher.run: Caused by: org.opencypher.v9_0.util.SyntaxException: Variable p.reorderLevel not defined (line 1, column 25 (offset: 24))

I tried to fix it, but I am new to Neo4j so I couldn't fix it.

What exactly did you run? I don't see where this could come from in the code I've shared.

I ran again the code you have shared and now I have this

I even tried to add null as second param but it stil doesn't work and I don't understand why

unwind ['p.reorderLevel',null] as ReordLevel
unwind ['p.supplierID', null] as Supplier
call apoc.cypher.run('
MATCH(p:Product)
RETURN '+ReordLevel+' as ReordLevel, '+Supplier+' as Supplier, sum(p.unitsIn Stock) as SommeUnits ORDER BY ReordLevel', null ) yield value as row
return row. ReordLevel as ReordLevel, row.Supplier as Supplier, row.SommeUnits as SommeUnits

Neo.ClientError.Procedure.ProcedureCallFailed: Failed to invoke procedure apoc.cypher.run: Caused by: org.opencypher.v9_0.util.SyntaxException: Invalid input 'o': expected 'a/A' (line 1, column 98 (offset: 97))

can you remove the space there?

@michael.hunger, already done with all empty spaces, but still the same error

Neo.ClientError.Statement.SyntaxError: Procedure call does not provide the required number of arguments: got 1 expected 2.

Procedure apoc.cypher.run has signature: apoc.cypher.run(cypher :: STRING?, params :: MAP?) :: value :: MAP?
meaning that it expects 2 arguments of type STRING?, MAP?
Description: apoc.cypher.run(fragment, params) yield value - executes reading fragment with the given parameters (line 3, column 1 (offset: 87))
"call apoc.cypher.run('MATCH(p:Product) RETURN '+ReordLevel+' as ReordLevel, '+Supplier+' as Supplier, sum(p.unitsInStock) as SommeUnits ORDER BY ReordLevel') yield value as row"
^

If if add null as second param I have null pointer error

Neo.ClientError.Procedure.ProcedureCallFailed: Failed to invoke procedure apoc.cypher.run: Caused by: java.lang.NullPointerException

Then pass in an empty map {} as 2nd argument.

I tried it already and this gives another syntax error

Neo.ClientError.Procedure.ProcedureCallFailed: Failed to invoke procedure apoc.cypher.run: Caused by: org.opencypher.v9_0.util.SyntaxException: Invalid input 'o': expected 'a/A' (line 1, column 97 (offset: 96))

I also tried to check online for other syntax errors that are similar to mine but I still have no answer to it.

Do you have some example data for me to try this on?

Michael

I got it. It seems that the aproc procedure doesn't like to group on null, so I changed that to 'null' but you can change it also to a numeric value like -1 or whatever works for you.

unwind ['p.reorderLevel','null'] as ReordLevel
unwind ['s.supplierId', 'null'] as Supplier
call apoc.cypher.run('
MATCH (p:Product)-[:SUPPLIERS]->(s:Supplier)
RETURN '+ReordLevel+' as ReordLevel, '+Supplier+' as Supplier, sum(p.unitsInStock) as SommeUnits ORDER BY ReordLevel', {} ) 
yield value as row
return row. ReordLevel as ReordLevel, row.Supplier as Supplier, row.SommeUnits as SommeUnits
order by ReordLevel, Supplier

I ran this on a northwind import and it worked fine. Sorry that it took so long.

Thank you so much @michael.hunger, it works !

This sounds actually like a great topic to write up. Would you mind creating a short blog post/article about it? With your original intention / source of inspiration to how it was solved and can be generalized.

This would be a great idea, to write an article about this subject.
Now I am trying to compare the output obtained with Cube function in SQL with the output obtained with the apoc.cypher.run function. I just observed that there are two lines at the end of the output table corresponding to :

I don't understant the second line corresponding to null null 6238 and I didn't find how to get rid of it. Do you have any idea ?

Can it be that you have a few NaN in your data? Which might be rendered as null?

I am not sure there are NaN in the Northwind data set. But what suprises me is that 6238 is exactely the double of 3119 so I think it does somewhere 3119 + 3119 and I still don't know why.

[EDIT]

In fact I think I have an idea. It computes this output because it does this operation

MATCH(p:Product)
RETURN 'null' as ReordLevel, 'null' as Supplier, sum(p.unitsInStock) as SommeUnits
ORDER BY ReordLevel

And this corresponds exactely to the line null null 6238. But I don't know how to get rid of this operation.

Could you by accident imported the data twice?
Or there is another dimension that differentiates the two sets?

Oh and can you differentiate between our null string and an actual null stored?
you could pass instead of our 'null' e.g. '"total"' to see the difference.

The problem is solved ! You were right, I accidentally imported the data twice.

Now I am trying to generalize the apoc.cypher.run function :

unwind ['p.reorderLevel','null'] as ReordLevel
unwind ['s.supplierId', 'null'] as Supplier
call apoc.cypher.run('
MATCH (p:Product)-[:SUPPLIERS]->(s:Supplier)
RETURN '+ReordLevel+' as ReordLevel, '+Supplier+' as Supplier, sum(p.unitsInStock) as SommeUnits ORDER BY ReordLevel', {} ) 
yield value as row
return row. ReordLevel as ReordLevel, row.Supplier as Supplier, row.SommeUnits as SommeUnits
order by ReordLevel, Supplier

by creating my own function in Java.

As first approach I tried to generalize every row of the apoc.cypher.run function and I wrote this :

package example;

import java.util.HashMap;
import java.util.List;
import java.util.Map;
import org.neo4j.graphdb.GraphDatabaseService;
import org.neo4j.graphdb.Result;
import org.neo4j.procedure.Context;
import org.neo4j.procedure.Description;
import org.neo4j.procedure.Name;
import org.neo4j.procedure.UserFunction;

/**
 * This is an example how you can create a simple user-defined function for Neo4j.
 */
public class Agreggate
{
	@Context public GraphDatabaseService graphDb;
    @UserFunction
    @Description("example.cube(['s1','s2',...], operation)")
    public String cube(
            @Name("strings") List<String> strings,
            @Name("operation") String operation,
            @Name("aggregateName") String aggregateName,
            @Name("table") String table) {
    	Map<String,Object> parameters = new HashMap<String, Object>();
    	String str1, str3, str4;
    	
    	str1 = "unwind ['p.strings.get(0)', 'null'] as strings.get(0)";
    	str3 = "RETURN '+strings.get(0)+' as strings.get(0) ";
    	str4 = "RETURN row.strings.get(0) as strings.get(0) ";
    	parameters.put("strings.get(0)", strings.get(0));
    	for(int i = 1; i<strings.size(); i++) {
    		parameters.put("strings.get(i)",strings.get(i));
    		str1 = str1 + "\n" + "unwind ['strings.get(i)', 'null'] as strings.get(i)";
    		str3 = str3 + ", '+strings.get(i)+' as strings.get(i) ";
    		str4 = str4 + ", row.strings.get(0) as strings.get(0) ";
    	}
    	
    	str3 = str3 + " ORDER BY " + strings.get(0);
    	str4 = str4 + ", row.aggregateName as aggregateName";
    	parameters.put("operation", operation);
    	parameters.put("aggregateName", aggregateName);
    	parameters.put("table", table);    	 
    	
    	Result result = graphDb.execute(str1 + 
    			"call apoc.cypher.run('\n" + 
    			"MATCH (p:table)\n" + 
    			str3 + ", {} ) \n" + 
    			"yield value as row\n" + 
    			str4, parameters);
    	return result.toString();
    } 

The function doesn't work, because I have some problems with what should I return and also because I don't know if this is the best approach in order to generalize the apoc.cypher.run function. Could you help me with a suggestion ?