Import data into Neo4j from MS SQL or ElasticSearch

Hello everyone! I have a very lagre amount of call data records ( CDR ) stored in MS SQL Server 2017 database.
I was impressed by the ability of Neo4j to present this information in the form of a graph and then work with it, so I want to try it with my dataset. But problem is how to migrate data from MS SQL db to Neo4j? Are there any tools to do it without pain and suffering:)?

Also I can import MS SQL data to Elasticsearch. Will it simplify the task?
Has anyone encountered the same problem?

You can use JDBC driver to connect to SQL server and get the date.Or, you can export the data to a .,csv file and import it.
Check this link: https://neo4j.com/blog/neo4j-call-detail-records-analytics/

1 Like

Thank you for your answer!

Csv is not suitable, because there are a lot of rows in my ms sql server tables. If I use CentOS how can I use JDBC driver, from which application?

And another question, please: can I detect groups of potential suspects in my cdr data with Neo4j functionaity and similaries between several cdr datasets?

For the initial load the neo4j-admin import is going to be the fastest way to get your data into the graph. Then keeping the data up to date then you can program in your favorite language to insert new rows into your database. Most linux distros come with python already installed and would be an easy language to write a little program to connect and transfer. Then there's the option for ETL tools such as Pentaho or message bus systems such as Apache Kafka for more robust designs.

In the end it's the same challenge as if you were going from (MS-SQL)-[:to]->(Oracle) or any other DB to other DB. You need an ETL process to select from one database and insert to another. Neo4j is just another (super awesome) database.

1 Like

mike.r.black, thank you for your answer. But do i understand correctly: neo4j-admin can only import from .csv file but in my MS SQL there are billions of rows, so i will spent a lot of time to upload into .csv and then load into neo4j. Can neo4j-admin support large .csv. files (hundreds of GB?)

And about another question, please: I want to understand if the tasks of finding groups of potential suspects in cdr data and similaries between several cdr datasets can be solved with Neo generally?
I don't ask: how to do, but i just want to understand it before i buy hardware for Neo server and upload my data from ms sql server?

thanks

I'm not aware of technical limitations to size of neo4j-admin, just whatever your machine can handle. More power = faster load times but no file size limits that I know of.

I've used it to initialize a database of over 100 GB with billions of nodes & relationships. This was a MS-SQL db that I used SSIS to export the data as CSVs, transforming the data into a graph model instead of the originating star schema model. Think of neo4j-admin as doing a "bulk" insert, skipping the transaction and roll back logs and inserting data straight into the data file. You have to get your data formatted just right ensuring referential integrity and all that good stuff because as it imports it validates the graph structure, but it is fastest way to initialize a database.

To answer your second question, yes Neo4j can do pattern matching or community detection. I originally stumbled across Graph DBs when I was searching for pattern matching solutions and Cypher is tailored to writing queries to look for patterns. You'll want to read about Graph Data Science there's a whole section on Community Detection.

If you want to save yourself from buying expensive hardware or have the flexibility of scaling up or down, Neo4j has a cloud hosted service Aura that I'd check out before buying hardware.

1 Like

Thank you for your answer. It's will help me to take a descition to choose Neo for my purpose.
But one little question, please:

How Neo4j db size differs from MS SQL db size with same data, i.e i have 1 Tb MS SQL table, and i put this data in Neo4j, Neo4j db of what size will I have? Because MS SQL db contains indexes and they can take up to 3-5% of db size? I ask it because i care aboutnumber of disks in my raid (or stripe is better - is read/wrie speed really important to work with Neo4j?)

I've never done a size on disk comparison but I would anticipate at least the same as in MS SQL. There's a lot of factors that go into calculating how much disk your graph will take. Maybe a good idea for someone to write up a Neo4j App plugin to calculate this? I did find one blog but it's 5 years old on calculating disk size http://sgerogia.github.io/Disk-Capacity-Planning-for-Neo4J/ .

Depending on how it is modeled out, Adobe actually ended up reducing their disk size from 50 TB to 40 GB. Jim Webber also has a youtube video about Neo4j at scale.

1 Like

mike.r.black, thank you to your answers.I am very inspired by the opportunities that Neo4j gives when working with data. I already have HP ProLiant DL320e Gen8 server with 32 Gb DDR3 and QuadCore Intel Xeon E3-1220 v2 CPU, 4 TB stripe array. Do you think it's quiet good to make analytics of 1-2 Tb call detail records data on this hardware?