CDR analysis

Hello,

I'm doing some analysis on Call Detail Records (CDR). My dataset is similiar to this: https://neo4j.com/blog/neo4j-call-detail-records-analytics/

Here are the fields from my dataset :

  • source (operator)

  • called_number

  • calling_number

  • calling_date

  • country_code_from

  • country_code_to

  • usage

  • service_name (SMS, DATA, VOICE)

    • SMS-OUTGOING
    • SMS-OUTGOING-ROAMING
    • SMS-INCOMING
    • DATA-OUTGOING
    • DATA-OUTGOING-ROAMING
    • VOICE-OUTGOING
    • VOICE-OUTGOING-ROAMING
    • VOICE-INCOMING
    • VOICE-INCOMING-ROAMING

If the service_name is SMS, the usage value will be set to 1.
If the service_name is DATA, the called_number and country_code_to will be empty.

I'd like to apply some machine learning algorithms and predictions for fraud/anomaly detection. I'm wondering wich one would be best for my use case? Kmeans, RandomForest, NaiveBayes, TimeSeries?

I found this:

I'm using py2neo and MLlib.

What kinds of fraud or anomalies are you looking for in this data set? I think understanding a bit more about your use case would help me narrow down the better options.

Cheers,
Jennifer

Hi dimespi, were u able to find any example code on CDR analytics.. Great if you share link for example... also do have sample dataset for this.. Thanks in advance..