Creating procedure in APOC with the same behaviour as cube function in OLAP

apoc
(Gabriela) #1

Hi,
I am new to Neo4j and Cypher and I am trying to create the behaviour of cube function (OLAP function) for Neo4j by using APOC library. More specifically, I am trying to have the same result I have using this query :

MATCH(p:Product)
RETURN p.reorderLevel as ReordLevel, p.supplierID as Supplier, sum(p.unitsIn Stock) as SommeUnits
ORDER BY ReordLevel 

UNION ALL

MATCH(p:Product)
RETURN 'null' as ReordLevel, p.supplierID as Supplier, sum(p.unitsIn- Stock) as SommeUnits
ORDER BY ReordLevel

UNION ALL

MATCH(p:Product)
RETURN p.reorderLevel as ReordLevel,'null' as Supplier, sum(p.unitsIn- Stock) as SommeUnits
ORDER BY ReordLevel 

UNION ALL

MATCH(p:Product)
RETURN 'null' as ReordLevel, 'null' as Supplier, sum(p.unitsInStock) as SommeUnits
ORDER BY ReordLevel

I would like to obtain something like that :

MATCH (p:Product, ....)
CUBE (p.reorderLevel , p.supplierID, ....)
ORDER BY p.reorderLevel , p.supplierID, ....;

Could someone help me with some ideas ?

(Michael Hunger) #2

I think you should be able to do that with a procedure that:

  1. takes a label
  2. takes a list of properties to cube
  3. takes a list of properties to aggregate

and then either generate and run that cypher code internally.
or use the java api to do the same.

or you could use apoc.cypher.run with a loop/combination on field-name + null

unwind ['p.reorderLevel',null] as ReordLevel
unwind ['p.supplierID', null] as Supplier
call apoc.cypher.run('
MATCH(p:Product)
RETURN '+ReordLevel+' as ReordLevel, '+Supplier+' as Supplier, sum(p.unitsIn Stock) as SommeUnits
ORDER BY ReordLevel 
') yield value as row
return row. ReordLevel as ReordLevel, row.Supplier as Supplier, row.SommeUnits as SommeUnits
(Gabriela) #3

Thank you for your answer, @michael.hunger. I am gonna take into consideration your ideas in order to create the procedure. I tried to execute your function for apoc.cypher.run but I get an error

Neo.ClientError.Procedure.ProcedureCallFailed: Failed to invoke procedure apoc.cypher.run: Caused by: org.opencypher.v9_0.util.SyntaxException: Variable p.reorderLevel not defined (line 1, column 25 (offset: 24))

I tried to fix it, but I am new to Neo4j so I couldn't fix it.

(Michael Hunger) #4

What exactly did you run? I don't see where this could come from in the code I've shared.

(Gabriela) #5

I ran again the code you have shared and now I have this

I even tried to add null as second param but it stil doesn't work and I don't understand why

unwind ['p.reorderLevel',null] as ReordLevel
unwind ['p.supplierID', null] as Supplier
call apoc.cypher.run('
MATCH(p:Product)
RETURN '+ReordLevel+' as ReordLevel, '+Supplier+' as Supplier, sum(p.unitsIn Stock) as SommeUnits ORDER BY ReordLevel', null ) yield value as row
return row. ReordLevel as ReordLevel, row.Supplier as Supplier, row.SommeUnits as SommeUnits

Neo.ClientError.Procedure.ProcedureCallFailed: Failed to invoke procedure apoc.cypher.run: Caused by: org.opencypher.v9_0.util.SyntaxException: Invalid input 'o': expected 'a/A' (line 1, column 98 (offset: 97))

(Michael Hunger) #6

can you remove the space there?

(Gabriela) #7

@michael.hunger, already done with all empty spaces, but still the same error

Neo.ClientError.Statement.SyntaxError: Procedure call does not provide the required number of arguments: got 1 expected 2.

Procedure apoc.cypher.run has signature: apoc.cypher.run(cypher :: STRING?, params :: MAP?) :: value :: MAP?
meaning that it expects 2 arguments of type STRING?, MAP?
Description: apoc.cypher.run(fragment, params) yield value - executes reading fragment with the given parameters (line 3, column 1 (offset: 87))
"call apoc.cypher.run('MATCH(p:Product) RETURN '+ReordLevel+' as ReordLevel, '+Supplier+' as Supplier, sum(p.unitsInStock) as SommeUnits ORDER BY ReordLevel') yield value as row"
^

If if add null as second param I have null pointer error

Neo.ClientError.Procedure.ProcedureCallFailed: Failed to invoke procedure apoc.cypher.run: Caused by: java.lang.NullPointerException

(Michael Hunger) #8

Then pass in an empty map {} as 2nd argument.

(Gabriela) #9

I tried it already and this gives another syntax error

Neo.ClientError.Procedure.ProcedureCallFailed: Failed to invoke procedure apoc.cypher.run: Caused by: org.opencypher.v9_0.util.SyntaxException: Invalid input 'o': expected 'a/A' (line 1, column 97 (offset: 96))

I also tried to check online for other syntax errors that are similar to mine but I still have no answer to it.

(Michael Hunger) #10

Do you have some example data for me to try this on?

Michael

(Michael Hunger) #11

I got it. It seems that the aproc procedure doesn't like to group on null, so I changed that to 'null' but you can change it also to a numeric value like -1 or whatever works for you.

unwind ['p.reorderLevel','null'] as ReordLevel
unwind ['s.supplierId', 'null'] as Supplier
call apoc.cypher.run('
MATCH (p:Product)-[:SUPPLIERS]->(s:Supplier)
RETURN '+ReordLevel+' as ReordLevel, '+Supplier+' as Supplier, sum(p.unitsInStock) as SommeUnits ORDER BY ReordLevel', {} ) 
yield value as row
return row. ReordLevel as ReordLevel, row.Supplier as Supplier, row.SommeUnits as SommeUnits
order by ReordLevel, Supplier

I ran this on a northwind import and it worked fine. Sorry that it took so long.

(Gabriela) #12

Thank you so much @michael.hunger, it works !

(Michael Hunger) #13

This sounds actually like a great topic to write up. Would you mind creating a short blog post/article about it? With your original intention / source of inspiration to how it was solved and can be generalized.

(Gabriela) #14

This would be a great idea, to write an article about this subject.
Now I am trying to compare the output obtained with Cube function in SQL with the output obtained with the apoc.cypher.run function. I just observed that there are two lines at the end of the output table corresponding to :

I don't understant the second line corresponding to null null 6238 and I didn't find how to get rid of it. Do you have any idea ?

(Michael Hunger) #15

Can it be that you have a few NaN in your data? Which might be rendered as null?

(Gabriela) #16

I am not sure there are NaN in the Northwind data set. But what suprises me is that 6238 is exactely the double of 3119 so I think it does somewhere 3119 + 3119 and I still don't know why.

[EDIT]

In fact I think I have an idea. It computes this output because it does this operation

MATCH(p:Product)
RETURN 'null' as ReordLevel, 'null' as Supplier, sum(p.unitsInStock) as SommeUnits
ORDER BY ReordLevel

And this corresponds exactely to the line null null 6238. But I don't know how to get rid of this operation.

(Michael Hunger) #17

Could you by accident imported the data twice?
Or there is another dimension that differentiates the two sets?

(Michael Hunger) #18

Oh and can you differentiate between our null string and an actual null stored?
you could pass instead of our 'null' e.g. '"total"' to see the difference.

(Gabriela) #19

The problem is solved ! You were right, I accidentally imported the data twice.

(Gabriela) #20

Now I am trying to generalize the apoc.cypher.run function :

unwind ['p.reorderLevel','null'] as ReordLevel
unwind ['s.supplierId', 'null'] as Supplier
call apoc.cypher.run('
MATCH (p:Product)-[:SUPPLIERS]->(s:Supplier)
RETURN '+ReordLevel+' as ReordLevel, '+Supplier+' as Supplier, sum(p.unitsInStock) as SommeUnits ORDER BY ReordLevel', {} ) 
yield value as row
return row. ReordLevel as ReordLevel, row.Supplier as Supplier, row.SommeUnits as SommeUnits
order by ReordLevel, Supplier

by creating my own function in Java.

As first approach I tried to generalize every row of the apoc.cypher.run function and I wrote this :

package example;

import java.util.HashMap;
import java.util.List;
import java.util.Map;
import org.neo4j.graphdb.GraphDatabaseService;
import org.neo4j.graphdb.Result;
import org.neo4j.procedure.Context;
import org.neo4j.procedure.Description;
import org.neo4j.procedure.Name;
import org.neo4j.procedure.UserFunction;

/**
 * This is an example how you can create a simple user-defined function for Neo4j.
 */
public class Agreggate
{
	@Context public GraphDatabaseService graphDb;
    @UserFunction
    @Description("example.cube(['s1','s2',...], operation)")
    public String cube(
            @Name("strings") List<String> strings,
            @Name("operation") String operation,
            @Name("aggregateName") String aggregateName,
            @Name("table") String table) {
    	Map<String,Object> parameters = new HashMap<String, Object>();
    	String str1, str3, str4;
    	
    	str1 = "unwind ['p.strings.get(0)', 'null'] as strings.get(0)";
    	str3 = "RETURN '+strings.get(0)+' as strings.get(0) ";
    	str4 = "RETURN row.strings.get(0) as strings.get(0) ";
    	parameters.put("strings.get(0)", strings.get(0));
    	for(int i = 1; i<strings.size(); i++) {
    		parameters.put("strings.get(i)",strings.get(i));
    		str1 = str1 + "\n" + "unwind ['strings.get(i)', 'null'] as strings.get(i)";
    		str3 = str3 + ", '+strings.get(i)+' as strings.get(i) ";
    		str4 = str4 + ", row.strings.get(0) as strings.get(0) ";
    	}
    	
    	str3 = str3 + " ORDER BY " + strings.get(0);
    	str4 = str4 + ", row.aggregateName as aggregateName";
    	parameters.put("operation", operation);
    	parameters.put("aggregateName", aggregateName);
    	parameters.put("table", table);    	 
    	
    	Result result = graphDb.execute(str1 + 
    			"call apoc.cypher.run('\n" + 
    			"MATCH (p:table)\n" + 
    			str3 + ", {} ) \n" + 
    			"yield value as row\n" + 
    			str4, parameters);
    	return result.toString();
    } 

The function doesn't work, because I have some problems with what should I return and also because I don't know if this is the best approach in order to generalize the apoc.cypher.run function. Could you help me with a suggestion ?