I'm not able to get the correct result with apoc.text.regexGroups

NEO4j Community server 4.3.3 on Ubuntu 20.04

I'm going crazy with this! I have the following string:

The string can contains everything, and the pattern [:xx] is used as separator.

I'm trying to use

WITH '[:it]Per me “Riserva” significa tradizione. Perché è espressione massima.
[:en]For me “Riserva” means tradition. Maximum expression of.' AS str

RETURN apoc.text.regexGroups(str,
   '\[:it]((([\S \r\n]*)))\[:en]((([\S \r\n]*)))'
) AS output; 

which works on https://regex101.com/ (returning exactly the two substrings as expected) but on neo4j it returns

[["[:it]Per me “Riserva” significa tradizione. Perché è espressione massima.[:en]For me “Riserva” means tradition. Maximum expression of.", "Per me “Riserva” significa tradizione. Perché è espressione massima.", "Per me “Riserva” significa tradizione. Perché è espressione massima.", "Per me “Riserva” significa tradizione. Perché è espressione massima.", "For me “Riserva” means tradition. Maximum expression of.", "For me “Riserva” means tradition. Maximum expression of.", "For me “Riserva” means tradition. Maximum expression of."]]

Can someone help?

Hi @paolodipietro58 ,

The APOC regexGroups function returns an array with the initial text in the first element, followed by the occurrences of each group.
Besides, each group is defined within parentheses (), so in your case, in order to receive three elements only as an output (the text and the two occurrences), you can apply something like the following:

WITH '[:it]Per me “Riserva” significa tradizione. Perché è espressione massima.
[:en]For me “Riserva” means tradition. Maximum expression of.' AS str

RETURN apoc.text.regexGroups(str,
   '\[:it\]([\S \r\n]*)\[:en\]([\S \r\n]*)'
) AS output;

which will have the following output:

[["[:it]Per me “Riserva” significa tradizione. Perché è espressione massima.
[:en]For me “Riserva” means tradition. Maximum expression of.", "Per me “Riserva” significa tradizione. Perché è espressione massima.
", "For me “Riserva” means tradition. Maximum expression of."]]

Hope this helps

1 Like

Well, so I was on the right way, but I didn't know about returning the entire string as a first occurrence!

Nice to know! Thank you.