Typo or correct in 04_Predictions.ipynb of Data Science with Neo4j 3.5

Hey,

No it isn't a typo.

So we do the splitting into EARLY (train) and LATE (test) graphs to help pick pairs of positive and negative examples to go into the feature matrices.

And then when we're computing the scores for the train matrix we need to make sure that we don't look at any data that's in the test graph, hence using CO_AUTHOR_EARLY for all our computations there.

But when we compute the scores for the test matrix we don't need to worry about that, and it wouldn't actually make sense if we only computed the scores based on the LATE graph, as we'd be missing all of the collaborations that have already happened.

Hope that makes sense.

Cheers, Mark

2 Likes