How to use Jaccard similarity algorithm in neo4j to find the similar nodes

Hi All,

We are trying to build graph using software and hardware informations and each hardware has list of softwares installed and I am using "Jaccard Similarity Algorithm" to show the hardware which has similar softwares installed. Below is the query I tried,

I followed this link to write cypher query to get similar hardware,

Below is the query I tried executing in neo4j browser but didn't get any response.

MATCH (s:Software)-[:installed]->(Hardware)
WITH {item:id(s), categories: collect(id(Hardware))} as userData
WITH collect(userData) as data
CALL algo.similarity.jaccard.stream(data)
YIELD item1, item2, count1, count2, intersection, similarity
RETURN algo.getNodeById(item1).name AS from, algo.getNodeById(item2).name AS to, intersection, similarity
ORDER BY similarity DESC LIMIT 20

Output Response

Please correct me If I am doing anything wrong.

Attached screenshot of EXPLAIN Cypher query,


Thanks,
Ganeshbabu R

Can you run the data collection query on it's own and see if that works?
i.e.

MATCH (s:Software)-[:installed]->(Hardware) 
WITH {item:id(s), categories: collect(id(Hardware))} as userData 
WITH collect(userData) as data 
CALL algo.similarity.jaccard.stream(data) YIELD item1, item2, count1, count2, intersection, similarity
RETURN count(*);

What happens if you run your query with PROFILE instead of EXPLAIN?

@michael.hunger

Below is the response when I ran the query with PROFILE




Below is the response I got,

Correct me If I am doing anything wrong and let me know your thoughts.

Regards,
Ganeshbabu R

But there is overlap between the software and installed hardware?

What happens if you run:

MATCH (s:Software)-[:installed]->(Hardware) 
WITH {item:id(s), categories: collect(id(Hardware))} as userData
RETURN userData LIMIT 100;

if you look at the data (esp. for categories), are there any overlaps?

Hi @michael.hunger,

Below is the response when I ran the above query,

Please check and let me know your thoughts also I am not sure how to check whether there is any overlap in the categories data.

Regards,
Ganeshbabu R

Odd, I cannot access pastebin.

sorry for this I thought its public view and below is the response of the query

userData
{item:2990,categories:[2972]}
{item:2685,categories:[2680,2774]}
{item:3340,categories:[3334]}
{item:2344,categories:[2338]}
{item:2012,categories:[3587,2193,3086,2007,3504]}
{item:137,categories:[1899,2774,3112,112,3273,2927,3133,3272,2680]}
{item:3107,categories:[3479,3106]}
{item:2021,categories:[2927,2016]}
{item:891,categories:[2689,2742,2648,3097,2149,1862,1690,2602,2152,3232,3571,1743,2170,3565,1456,2181,925,1733,2423,1962,3315,2633,857,2385,2716,3302,3589,1918,1970,1542,3173]}
{item:1205,categories:[2715,3589,2580,1870,1201,3417,2777,2967,3272]}
{item:550,categories:[2607,1646,1456,517]}
{item:864,categories:[857]}
{item:146,categories:[2774,112,2927,2680]}
{item:2775,categories:[2774]}
{item:559,categories:[557,2347,2379]}
{item:218,categories:[2088,3496,2723,3151,298,3112,292,2733,215,3546,1103,3285,3152,2715,1944,3173,3176,989,1611,3101,711,1576]}
{item:568,categories:[557]}
{item:227,categories:[2131,2327,3497,2193,711,3614,944,1494,2727,2729,3546,2916,2159,2824,3509,1947,1097,2623,1946,2750,2602,1226,989,2521,1637,2520,1568,598,3505,1576,1671,2591,3504,664,681,215,2809,3534,298,1956,3382,2949,914,742,2185,3061,3515,2915,1813,1812,3151,1051,1250,3320,3157,1177,2755,332]}
{item:1752,categories:[3254,1862,3233,1743,2964,2378,3101,2742,3255,2114]}
{item:2981,categories:[2972]}
{item:2640,categories:[2639]}
{item:765,categories:[762]}
{item:2649,categories:[2648]}
{item:1833,categories:[1831]}
{item:1115,categories:[3159,1690,1110]}
{item:774,categories:[2407,762,1025,836,1882,2347,3353,3546,2841,1056,2379]}
{item:433,categories:[3232,351]}
{item:92,categories:[2533,2405,3498,557,662,3479,3552,762,3497,681,614,956,2927,3086,112,3230,2639,944,2548,3609,818,1712,3441,1630,3179,2100,2002,2918,1611,2155,635,2575,789,3315,3587,2804,1795,3380,3489,1726,1220,2826,2558,1663,1949,2607,3159,1177,1926,920,3305,3504,1690,1069,1051,1354,2809,2423,2774,1952,3418,2159,1171,3211,2653,1910,1882,2249,2916,3078,298,1226,3450,3454,1247,1890,3226,78,3272,1941,2076,1193,3065,3310]}
{item:3403,categories:[3401]}
{item:1528,categories:[1527]}
{item:1187,categories:[1181]}
{item:1501,categories:[2418,1497,1962,3429,2358]}
{item:846,categories:[836,3159]}
{item:1160,categories:[1158]}
{item:442,categories:[2804,2868,3211,1807,1497,2639,351]}
{item:101,categories:[2916,666,3226,1171,944,2159,3489,2249,1630,1226,1663,3078,2918,1247,3272,2076,1051,920,1690,1882,78,1949,3418,1354,2405,1910,1726,3315,762,614,3310,557,1193,1177,956,3305,2002,662,818,2716,2620,1611,2653,2575,3159,635,2155]}
{item:1196,categories:[2116,3096,3143,1193]}
{item:200,categories:[3273,2927,112,2774,2680,1899]}
{item:514,categories:[496]}
{item:3062,categories:[3061]}
{item:173,categories:[3272,2680,1899,3273,3133,2774,3112,112,2927]}
{item:209,categories:[2607,2558,2121,3086,2575,2723,2100,3331,2701,2533,2620,1970,3302,1831,246,2423,2672,112,2456,2916,2380,3353,2584,3429,2526,944,2580,2645,1949,1887,1768,1527,2569,2193,1429,2152,836,2407]}
{item:523,categories:[1456,2607,1646,517,1726]}
{item:182,categories:[2680,3272,1899,3112,3273,2774,112,2927,3133]}
{item:1393,categories:[2322,1373,3085,3052,3049]}
{item:3618,categories:[3617]}
{item:3277,categories:[3273]}
{item:254,categories:[3520,1795,2602,2860,3550,3000,2648,3145,3441,1497,2805,3026,2100,3545,3180,3417,246,1566,3454,2803,3539,2607,1247,1148,2972,1690,3401,3395,3450,1918,3583,2599,3302,3571,2596,1106,2423,2002,1646,3389,2531,3211,2171,1573,2526,1354,789,1862,1080,3029,2321,2003,3199,2380,2548,2307,2916,1910,1063,1056,3462,2170,2479,3173,3522,3459,3282,986,2840,3052,925,1762,2650,3546,3609,2214,2185,2181,1359,3382,3387,3587,3498,3096,2772,3495,1949,2474,3259,2316,2728,2958,2716,2653,1527,1578,1051,3315,1726,2443,3617,635,3067,2131,2959,3380,3001,2633,2347,2621,2239,2752,2639,3204,1373,662,2522,3589,920,2456,3563,2146,3232,2742,557,1568,2703,2604,3540,3552,1250,2757,914,711,3097,2821,1890,1970,3565,2966,857,2808,2385,1923,3367,1712,1743,3591,2198,2868,1707,2580,1456,3226,3463,1542,2809,2628,3224,303,3078,2969,1171,847,3505,2277,1952,2322,3256,2584,2241,1962,1887,2149,2801,1725,762,1232,2822,2229,3202,3298,2689,3085,3479,3060,2152,3334,1822,1733,1429,3106,2437,2949,1292,855,3322,2771]}
{item:191,categories:[1429,3386,2680,2607,2774,3418,2007,1043,2456,3112,2575,2953,2844,2927,1415,989,3067,2569,1949,1066,3502,3498,1926,3522,2700,3310,3331,3239,3133,1882,3273,112,987,664,2407,2964,2797,3305,2566,3272,3071,956,3442,2722,1899]}
{item:2461,categories:[2456]}
{item:1402,categories:[3085,3052,2959,3049,1373,2322]}
{item:1061,categories:[3502,2241,2953,1059,1292]}
{item:720,categories:[2626,818,711]}
{item:2156,categories:[2155]}
{item:1815,categories:[1813,2078,2729,2116]}
{item:2470,categories:[2456]}
{item:1474,categories:[1456,2607]}
{item:1411,categories:[2476,1407,2680,3589,2453]}
{item:2129,categories:[2121]}
{item:1070,categories:[1069]}
{item:729,categories:[833,3459,2964,2146,3247,711,2912]}
{item:2165,categories:[2164]}
{item:1824,categories:[1822]}
{item:1483,categories:[2152,1456,1918,3302,2689]}
{item:828,categories:[818]}
{item:1142,categories:[3591,1110,2016]}
{item:424,categories:[2569,1373,3056,3232,3001,2584,351,3204,2822,3054,2239,3052,2868]}
{item:738,categories:[711]}
{item:83,categories:[78,3311,2744]}
{item:3349,categories:[3334]}
{item:1178,categories:[1637,1177,1956,3509,1807,1947,1708,1812]}
{item:1492,categories:[1456]}
{item:837,categories:[836,2249,2597,3565,2131,1807]}
{item:1151,categories:[1148]}
{item:155,categories:[2927,2680,3272,1899,956,2774,3112,112,3273,3133]}
{item:810,categories:[789]}
{item:469,categories:[2076,2832,1816,2702,2358,1454,2327,1676,2733,453,3031,3247,2729,2338,3056,1201,742,1230,581,2777,3285,2706,1884,3427,2006,2723,1569]}
{item:3017,categories:[3001]}
{item:505,categories:[3454,1962,3179,1419,1361,2307,2233,2918,2321,3395,2804,3389,2772,3199,1762,1086,3000,3026,2969,3112,3455,3285,3583,2324,2322,3085,3202,1250,3298,2229,2808,2869,3031,2155,2437,2966,2821,2277,2239,1295,2959,3353,1226,789,742,3387,2752,496,956,3563,3197,2840,1816,2379,2771,3462,3052,2972,1102,2443,2241,3029]}
{item:819,categories:[3450,818,2723,1106]}
{item:164,categories:[2680,956,3272,1899,3273,3112,2774,3133,112,2927]}
{item:3430,categories:[3429]}
{item:478,categories:[2768,2279,453,2214,2607,2832,1795]}
{item:2371,categories:[2706,2358]}
{item:2030,categories:[2164,2358,2016,2100]}
{item:1689,categories:[2378,3101,1676,2742,3243,3255,3254,2114,1862,3233,1743]}
{item:577,categories:[3232,557,2016,2358,2068,2378,3243,2385,2569,3455,2584,1648,2526]}
{item:2784,categories:[2777]}
{item:3439,categories:[3429]}
{item:3098,categories:[3097]}
{item:2039,categories:[2016]}
{item:1698,categories:[3097,3232,3173,2602,1862,2742,1743,1690,3315,2716]}
{item:245,categories:[3230,3272,1232,2661,1676,3243,3540,2347,2701,2378,3310,2121,3246,1454,3254,3386,603,3239,2474,1816,3255,3562,3417,3442,3441,2076,2114,3427,2566,3558,1690,1638,1415,3443,1884,2607,1831,1743,2742,496,2774,2152,2083,2672,2379,2797,1527,1862,2476,2964,3418,3247,2620,2722,2453,1826,3610,2802,246,3259]}
{item:2452,categories:[2450]}
{item:2111,categories:[2456,2100]}
{item:1052,categories:[2648,1051,2088,2650,3617]}
{item:1770,categories:[1768]}
{item:2425,categories:[3029,2822,3179,2423]}
{item:1366,categories:[1361]}
{item:2084,categories:[2772,2439,2083]}

Just looking at the data visually I see several overlaps.
If I take it and run it just with your data it also returns the appropriate data:

WITH [
{item:2990,categories:[2972]},
{item:2685,categories:[2680,2774]},
{item:3340,categories:[3334]},
{item:2344,categories:[2338]},
{item:2012,categories:[3587,2193,3086,2007,3504]},
{item:137,categories:[1899,2774,3112,112,3273,2927,3133,3272,2680]},
{item:3107,categories:[3479,3106]},
{item:2021,categories:[2927,2016]},
{item:891,categories:[2689,2742,2648,3097,2149,1862,1690,2602,2152,3232,3571,1743,2170,3565,1456,2181,925,1733,2423,1962,3315,2633,857,2385,2716,3302,3589,1918,1970,1542,3173]},
{item:1205,categories:[2715,3589,2580,1870,1201,3417,2777,2967,3272]},
{item:550,categories:[2607,1646,1456,517]},
{item:864,categories:[857]},
{item:146,categories:[2774,112,2927,2680]},
{item:2775,categories:[2774]},
{item:559,categories:[557,2347,2379]},
{item:218,categories:[2088,3496,2723,3151,298,3112,292,2733,215,3546,1103,3285,3152,2715,1944,3173,3176,989,1611,3101,711,1576]},
{item:568,categories:[557]},
{item:227,categories:[2131,2327,3497,2193,711,3614,944,1494,2727,2729,3546,2916,2159,2824,3509,1947,1097,2623,1946,2750,2602,1226,989,2521,1637,2520,1568,598,3505,1576,1671,2591,3504,664,681,215,2809,3534,298,1956,3382,2949,914,742,2185,3061,3515,2915,1813,1812,3151,1051,1250,3320,3157,1177,2755,332]},
{item:1752,categories:[3254,1862,3233,1743,2964,2378,3101,2742,3255,2114]},
{item:2981,categories:[2972]},
{item:2640,categories:[2639]},
{item:765,categories:[762]},
{item:2649,categories:[2648]},
{item:1833,categories:[1831]},
{item:1115,categories:[3159,1690,1110]},
{item:774,categories:[2407,762,1025,836,1882,2347,3353,3546,2841,1056,2379]},
{item:433,categories:[3232,351]},
{item:92,categories:[2533,2405,3498,557,662,3479,3552,762,3497,681,614,956,2927,3086,112,3230,2639,944,2548,3609,818,1712,3441,1630,3179,2100,2002,2918,1611,2155,635,2575,789,3315,3587,2804,1795,3380,3489,1726,1220,2826,2558,1663,1949,2607,3159,1177,1926,920,3305,3504,1690,1069,1051,1354,2809,2423,2774,1952,3418,2159,1171,3211,2653,1910,1882,2249,2916,3078,298,1226,3450,3454,1247,1890,3226,78,3272,1941,2076,1193,3065,3310]},
{item:3403,categories:[3401]},
{item:1528,categories:[1527]},
{item:1187,categories:[1181]},
{item:1501,categories:[2418,1497,1962,3429,2358]},
{item:846,categories:[836,3159]},
{item:1160,categories:[1158]},
{item:442,categories:[2804,2868,3211,1807,1497,2639,351]},
{item:101,categories:[2916,666,3226,1171,944,2159,3489,2249,1630,1226,1663,3078,2918,1247,3272,2076,1051,920,1690,1882,78,1949,3418,1354,2405,1910,1726,3315,762,614,3310,557,1193,1177,956,3305,2002,662,818,2716,2620,1611,2653,2575,3159,635,2155]},
{item:1196,categories:[2116,3096,3143,1193]},
{item:200,categories:[3273,2927,112,2774,2680,1899]},
{item:514,categories:[496]},
{item:3062,categories:[3061]},
{item:173,categories:[3272,2680,1899,3273,3133,2774,3112,112,2927]},
{item:209,categories:[2607,2558,2121,3086,2575,2723,2100,3331,2701,2533,2620,1970,3302,1831,246,2423,2672,112,2456,2916,2380,3353,2584,3429,2526,944,2580,2645,1949,1887,1768,1527,2569,2193,1429,2152,836,2407]},
{item:523,categories:[1456,2607,1646,517,1726]},
{item:182,categories:[2680,3272,1899,3112,3273,2774,112,2927,3133]},
{item:1393,categories:[2322,1373,3085,3052,3049]},
{item:3618,categories:[3617]},
{item:3277,categories:[3273]},
{item:254,categories:[3520,1795,2602,2860,3550,3000,2648,3145,3441,1497,2805,3026,2100,3545,3180,3417,246,1566,3454,2803,3539,2607,1247,1148,2972,1690,3401,3395,3450,1918,3583,2599,3302,3571,2596,1106,2423,2002,1646,3389,2531,3211,2171,1573,2526,1354,789,1862,1080,3029,2321,2003,3199,2380,2548,2307,2916,1910,1063,1056,3462,2170,2479,3173,3522,3459,3282,986,2840,3052,925,1762,2650,3546,3609,2214,2185,2181,1359,3382,3387,3587,3498,3096,2772,3495,1949,2474,3259,2316,2728,2958,2716,2653,1527,1578,1051,3315,1726,2443,3617,635,3067,2131,2959,3380,3001,2633,2347,2621,2239,2752,2639,3204,1373,662,2522,3589,920,2456,3563,2146,3232,2742,557,1568,2703,2604,3540,3552,1250,2757,914,711,3097,2821,1890,1970,3565,2966,857,2808,2385,1923,3367,1712,1743,3591,2198,2868,1707,2580,1456,3226,3463,1542,2809,2628,3224,303,3078,2969,1171,847,3505,2277,1952,2322,3256,2584,2241,1962,1887,2149,2801,1725,762,1232,2822,2229,3202,3298,2689,3085,3479,3060,2152,3334,1822,1733,1429,3106,2437,2949,1292,855,3322,2771]},
{item:191,categories:[1429,3386,2680,2607,2774,3418,2007,1043,2456,3112,2575,2953,2844,2927,1415,989,3067,2569,1949,1066,3502,3498,1926,3522,2700,3310,3331,3239,3133,1882,3273,112,987,664,2407,2964,2797,3305,2566,3272,3071,956,3442,2722,1899]},
{item:2461,categories:[2456]},
{item:1402,categories:[3085,3052,2959,3049,1373,2322]},
{item:1061,categories:[3502,2241,2953,1059,1292]},
{item:720,categories:[2626,818,711]},
{item:2156,categories:[2155]},
{item:1815,categories:[1813,2078,2729,2116]},
{item:2470,categories:[2456]},
{item:1474,categories:[1456,2607]},
{item:1411,categories:[2476,1407,2680,3589,2453]},
{item:2129,categories:[2121]},
{item:1070,categories:[1069]},
{item:729,categories:[833,3459,2964,2146,3247,711,2912]},
{item:2165,categories:[2164]},
{item:1824,categories:[1822]},
{item:1483,categories:[2152,1456,1918,3302,2689]},
{item:828,categories:[818]},
{item:1142,categories:[3591,1110,2016]},
{item:424,categories:[2569,1373,3056,3232,3001,2584,351,3204,2822,3054,2239,3052,2868]},
{item:738,categories:[711]},
{item:83,categories:[78,3311,2744]},
{item:3349,categories:[3334]},
{item:1178,categories:[1637,1177,1956,3509,1807,1947,1708,1812]},
{item:1492,categories:[1456]},
{item:837,categories:[836,2249,2597,3565,2131,1807]},
{item:1151,categories:[1148]},
{item:155,categories:[2927,2680,3272,1899,956,2774,3112,112,3273,3133]},
{item:810,categories:[789]},
{item:469,categories:[2076,2832,1816,2702,2358,1454,2327,1676,2733,453,3031,3247,2729,2338,3056,1201,742,1230,581,2777,3285,2706,1884,3427,2006,2723,1569]},
{item:3017,categories:[3001]},
{item:505,categories:[3454,1962,3179,1419,1361,2307,2233,2918,2321,3395,2804,3389,2772,3199,1762,1086,3000,3026,2969,3112,3455,3285,3583,2324,2322,3085,3202,1250,3298,2229,2808,2869,3031,2155,2437,2966,2821,2277,2239,1295,2959,3353,1226,789,742,3387,2752,496,956,3563,3197,2840,1816,2379,2771,3462,3052,2972,1102,2443,2241,3029]},
{item:819,categories:[3450,818,2723,1106]},
{item:164,categories:[2680,956,3272,1899,3273,3112,2774,3133,112,2927]},
{item:3430,categories:[3429]},
{item:478,categories:[2768,2279,453,2214,2607,2832,1795]},
{item:2371,categories:[2706,2358]},
{item:2030,categories:[2164,2358,2016,2100]},
{item:1689,categories:[2378,3101,1676,2742,3243,3255,3254,2114,1862,3233,1743]},
{item:577,categories:[3232,557,2016,2358,2068,2378,3243,2385,2569,3455,2584,1648,2526]},
{item:2784,categories:[2777]},
{item:3439,categories:[3429]},
{item:3098,categories:[3097]},
{item:2039,categories:[2016]},
{item:1698,categories:[3097,3232,3173,2602,1862,2742,1743,1690,3315,2716]},
{item:245,categories:[3230,3272,1232,2661,1676,3243,3540,2347,2701,2378,3310,2121,3246,1454,3254,3386,603,3239,2474,1816,3255,3562,3417,3442,3441,2076,2114,3427,2566,3558,1690,1638,1415,3443,1884,2607,1831,1743,2742,496,2774,2152,2083,2672,2379,2797,1527,1862,2476,2964,3418,3247,2620,2722,2453,1826,3610,2802,246,3259]},
{item:2452,categories:[2450]},
{item:2111,categories:[2456,2100]},
{item:1052,categories:[2648,1051,2088,2650,3617]},
{item:1770,categories:[1768]},
{item:2425,categories:[3029,2822,3179,2423]},
{item:1366,categories:[1361]},
{item:2084,categories:[2772,2439,2083]}
] as data 
CALL algo.similarity.jaccard.stream(data, {similarityCutoff:0.1}) YIELD item1, item2, count1, count2, intersection, similarity
RETURN item1, item2, count1, count2, intersection, similarity LIMIT 10

For some meaningful data I added a cutoff but even without it you see proper results.

╒═══════╤═══════╤════════╤════════╤══════════════╤═══════════════════╕
│"item1"│"item2"│"count1"│"count2"│"intersection"│"similarity"       │
╞═══════╪═══════╪════════╪════════╪══════════════╪═══════════════════╡
│92     │101    │84      │47      │44            │0.5057471264367817 │
├───────┼───────┼────────┼────────┼──────────────┼───────────────────┤
│191    │200    │45      │6       │6             │0.13333333333333333│
├───────┼───────┼────────┼────────┼──────────────┼───────────────────┤
│191    │209    │45      │38      │9             │0.12162162162162163│
├───────┼───────┼────────┼────────┼──────────────┼───────────────────┤
│191    │245    │45      │60      │13            │0.14130434782608695│
├───────┼───────┼────────┼────────┼──────────────┼───────────────────┤
│92     │191    │84      │45      │14            │0.12173913043478261│
├───────┼───────┼────────┼────────┼──────────────┼───────────────────┤
│92     │254    │84      │198     │40            │0.1652892561983471 │
├───────┼───────┼────────┼────────┼──────────────┼───────────────────┤
│200    │1411   │6       │5       │1             │0.1                │
├───────┼───────┼────────┼────────┼──────────────┼───────────────────┤
│200    │2021   │6       │2       │1             │0.14285714285714285│
├───────┼───────┼────────┼────────┼──────────────┼───────────────────┤
│200    │2685   │6       │2       │2             │0.3333333333333333 │
├───────┼───────┼────────┼────────┼──────────────┼───────────────────┤
│200    │2775   │6       │1       │1             │0.16666666666666666│
└───────┴───────┴────────┴────────┴──────────────┴───────────────────┘

Hi Michael,

In context to the Jaccard Algorithm usage -
My graph looks like this -
api_mining_jaccard_q1

[1]
Here when i run this query :

MATCH (s:claimIntimationRequestHeader)-[:Request]-(claimIntimationRequestBody)
WITH {item:id(s), categories: collect(id(claimIntimationRequestBody))} as userData
RETURN userData

I get this output -


userData
{
  "item": 1929,
  "categories": [
    1928
  ]
}

So using it like this -

WITH[{
  item: 1929,
  categories: [
    1928
  ]
}] as data 
CALL algo.similarity.jaccard.stream(data, {similarityCutoff:0.1}) YIELD item1, item2, count1, count2, intersection, similarity
RETURN item1, item2, count1, count2, intersection, similarity LIMIT 10

i get this output -

(no changes, no records)

So i guess because i have only one Item in the userData.

And when i do its count -


MATCH (s:claimIntimationRequestHeader)-[:Request]-(claimIntimationRequestBody) 
WITH {item:id(s), categories: collect(id(claimIntimationRequestBody))} as userData 
WITH collect(userData) as data 
CALL algo.similarity.jaccard.stream(data) YIELD item1, item2, count1, count2, intersection, similarity
RETURN count(*)

i get this output -

count(*)
0

[2]
When i try this cypher query :

MATCH (s:claimIntimationRequestBody)-[:Parameter]-(requestID) 
WITH {item:id(s), categories: collect(id(requestID))} as userData
RETURN userData

Output -

userData
{
  "item": 0,
  "categories": [
    1794,
    1793,
    1792,
    1791,
    1790,
    1789,
    1788,
    1787,
    1786,
    1785,
    1784,
    1783,
    1782,
    1781,
    1780,
    1779,
    1778
  ]
}

And then when i utilize this output with this query -

WITH[{
  item: 0,
  categories: [
    1794,
    1793,
    1792,
    1791,
    1790,
    1789,
    1788,
    1787,
    1786,
    1785,
    1784,
    1783,
    1782,
    1781,
    1780,
    1779,
    1778
  ]
}] as data 
CALL algo.similarity.jaccard.stream(data, {similarityCutoff:0.1}) YIELD item1, item2, count1, count2, intersection, similarity
RETURN item1, item2, count1, count2, intersection, similarity LIMIT 10

Output -

(no changes, no records)

Please help in creating a proper Jaccard algorithm query.

You have to have more than one item in your list
It seems that in each of your cases you have only one item.

If you want to compute similarities between cIRH and cIRB you have to have at least a few cIRH
in your query result.

Also from your model there is at least one extra node in between the two, so your query would not return anything. You basically just return "postClaims" as you don't use a label on the end-node.

Hello,

Please, could you tell me why the write RETURN is false for the following query?

MATCH (user:User) WHERE size((user)-[:CONNECT]->())>20 WITH user
MATCH (user)-[r:CONNECT]->(k:Keyword)
WHERE r.weight > 10
WITH {item:id(user), categories: collect(id(k))} AS userData
WITH collect(userData) AS data
CALL algo.similarity.jaccard(data, {write:TRUE, graph:'HUGE', writeRelationshipType: 'SIMILARITY', writeProperty:'keywords_jaccard'})
YIELD nodes, similarityPairs, write, writeRelationshipType, writeProperty
RETURN nodes, similarityPairs, write, writeRelationshipType, writeProperty

nodes similarityPairs write writeRelationshipType writeProperty
9684 46885086 false "SIMILARITY" "keywords_jaccard"

Thanks in advance

Probably b/c you didn't specify a similarityCutoff value, to avoid writing the "0" similarity pairs.

try to add: similarityCutoff:0.1 or whatever makes sense in your case.

Thanks for your quickly answer! Just to have an idea about hardware, I saw in :TagOverflow — Correlating Tags in Stackoverflow | by Michael Hunger | Towards Data Science that you run jaccard similarity (17Kx17K) in about 13 min. Please, could you tell me how much ram, memory heat for getting that result? In my case, I'm running with :

  • NEO4J_dbms_memory_heap_initial__size=8G
  • NEO4J_dbms_memory_heap_max__size=40G
  • NEO4J_dbms_memory_pagecache_size=8G
    Neo4j version 3.4.9
    It takes much more time, data is around 10Kx10K , 30 min and still running...

After more than an hour, it doesn't work , in logs : ERROR [o.n.b.t.p.HouseKeeper] Fatal error occurred when handling a client connection.
Thanks in advance.

We ran it on an 8 CPU AWS machine with 32G RAM.

Perhaps your category lists are much larger?
As you can see from your unfiltered output you get about 46M similarity pairs.

I also used topK which limits the pairs per element to K.
Do you have duplicate connections to the keywords or only unique ones per user?
Otherwise use collect(distinct id(k))
I would run it with write:false first to see the pure output + statistics.

How long did the compute above run that you shared?

and how long does this run? and what does it output?

MATCH (user:User) WHERE size((user)-[:CONNECT]->())>20 WITH user
MATCH (user)-[r:CONNECT]->(k:Keyword)
WHERE r.weight > 10
WITH id(user) as item, count(id(k)) as categories, count(distinct id(k)) as uniqueCategories
RETURN count(*), max(categories), max(uniqueCategories)

Thanks again!
The answer to your query is
|count(*)|max(categories)|max(uniqueCategories)|
|9684|2183|2183|

Categories are unique, the number of items is 9684 ~ 10K.

How long did the compute above run that you shared? >>> Without including similarityCutoff:0.1 it doesn't write any result. Including similarityCutoff:0.1 it never finished , get out with lost connection, I have to down docker and re-start.

Including similarityCutoff:0.3 in the original query with write:FALSE , I get:
|nodes|similarityPairs|write|writeRelationshipType|writeProperty|
|9684|12817736|false|"SIMILARITY"|"keywords_jaccard"| >>> it's done in 59 sec.

~ 13Millions Pairs... it's not so much for my machine (16 CPU, 56 GB), it should finish in a reasonable time (I hope less than 30 min).

When I set write:TRUE in the same query , I get : Connection to server lost. Reconnecting... Then I have to down docker and start again... (Without deleting the entire data base)

I appreciate your help.

What kind of disk do you have?

The relationship writing currently happens in batches of 100k can you check while it's running with the 0.3 cutoff (i.e. 13M rels)
what the CPU or I/O load look like on your machine?

Thanks a lot!

Finally, I realized that I have a memory problem if I try to write 13 or18 Millions of similarityPairs (and it's not useful). However adding the parameter topK I reduced the number and it writes without problem.

In your example you actually write 2864 similarityPairs (using topK:5), not the total ~ 292Millions. I'm sorry for the misunderstanding

It should actually batch the writes so it should progress and finish in parallel, but we can check that again.