Would I need to write a custom user-defined procedure to run a Multinomial Conditional Logistic Regression on neo4j data? I'm trying to think through the best way to run a MCLR on data in my database. Right now I'm thinking that I'll use the
ConditionalMNLogit statsmodel method in python. So I'll query the database in a python script and fit the model with the results of my query. I'm guessing the limitation to this approach would be the amount of data I query from neo4j.
Would anyone have any suggestions? https://www.statsmodels.org/dev/generated/statsmodels.discrete.conditional_models.ConditionalMNLogit...
We have a native node classification pipeline that can handle multiple labels. Logistic regression is one of the modeling options. This is more akin to ordinary multinomial logistic regression and is generally geared more towards prediction and machine learning use cases.
You can very well pull data into python using the GDS python client and conduct your modeling there too. If it is helpful, here is a notebook with an example of doing just that, generating features in Neo4j GDS, reading back to python, and using statsmodel logit on the data.
There are multiple other ways to get your data from Neo4j into Python. So if you run into any performance bottlenecks please feel free to reach back out!