Hello,
I'm doing some analysis on Call Detail Records (CDR). My dataset is similiar to this: Using Neo4j for Call Detail Records (CDR) Analytics [Community Post]
Here are the fields from my dataset :
-
source (operator)
-
called_number
-
calling_number
-
calling_date
-
country_code_from
-
country_code_to
-
usage
-
service_name (SMS, DATA, VOICE)
- SMS-OUTGOING
- SMS-OUTGOING-ROAMING
- SMS-INCOMING
- DATA-OUTGOING
- DATA-OUTGOING-ROAMING
- VOICE-OUTGOING
- VOICE-OUTGOING-ROAMING
- VOICE-INCOMING
- VOICE-INCOMING-ROAMING
If the service_name is SMS, the usage value will be set to 1.
If the service_name is DATA, the called_number and country_code_to will be empty.
I'd like to apply some machine learning algorithms and predictions for fraud/anomaly detection. I'm wondering wich one would be best for my use case? Kmeans, RandomForest, NaiveBayes, TimeSeries?
I found this:
- telco/kmeansCDRwithNormalizationOfFeatures2019.pdf at master · git4impatient/telco · GitHub
- RPubs - Call Data Record Analysis
- machine-learning-for-telecommunications/source/industry/telecom/notebooks at master · aws-solutions/machine-learning-for-telecommunications · GitHub
I'm using py2neo and MLlib.