Showing results for 
Search instead for 
Did you mean: 

Join the community at Nodes 2022, our free virtual event on November 16 - 17.

Best way to run a Multinomial Conditional Logistic Regression?


Would I need to write a custom user-defined procedure to run a Multinomial Conditional Logistic Regression on neo4j data? I'm trying to think through the best way to run a MCLR on data in my database. Right now I'm thinking that I'll use the ConditionalMNLogit statsmodel method in python. So I'll query the database in a python script and fit the model with the results of my query. I'm guessing the limitation to this approach would be the amount of data I query from neo4j.
Would anyone have any suggestions?


We have a native node classification pipeline that can handle multiple labels. Logistic regression is one of the modeling options.  This is more akin to ordinary multinomial logistic regression and is generally geared more towards prediction and machine learning use cases.

You can very well pull data into python using the GDS python client and conduct your modeling there too.  If it is helpful, here is a notebook with an example of doing just that, generating features in Neo4j GDS, reading back to python, and using statsmodel logit on the data.

There are multiple other ways to get your data from Neo4j into Python.  So if you run into any performance bottlenecks please feel free to reach back out!

Nodes 2022
NODES 2022, Neo4j Online Education Summit - November 16 - 17, 2022.

Free NODES Training Series

October 19th -

Intro to Neo4j

October 20th -

Healthcare Analytics Using Neo4j

October 25th -

Handling Neo4j data with Apache Hop

October 26th -

Blazing Fast Graphs: Hands-on with Apache Arrow and Neo4j

November 2nd -

Graph EDA Using the Neo4j GDS Client