Unknown value error for class java.time.LocalDate when importing data

import

(Guilherme Junqueira) #1

Guys,

I am trying to run the following code and it is giving me a **"non-related" error:

USING PERIODIC COMMIT 3000 LOAD CSV WITH HEADERS
FROM "file:///tse-votacao_candidato_municipio_zona-municipal-2014-db.07.facts.csv" AS row
FIELDTERMINATOR ';'

MATCH
  (c:City {tse_code: toInteger(row.cod_municipio_tse)}),
  (:Publication {auto_name: row.publication})<-[:is_present_on]-(m:Metric {auto_name: row.cod_metrica}),
  (:State {acronym: row.sigla_uf})<-[:belongs_to]-(z:ElectoralZone {code: row.cod_zona_eleitoral}),
  (:Election {year: date(row.ano_eleicao), auto_name: row.cod_descricao_eleicao})<-[:round_of]-
    (:ElectionRound {number: row.num_turno})<-[:runs_in]-(cand:Candidate {code: row.sq_candidato})
CREATE
  (afe:Measurement)
SET
  afe.value = toInteger(row.total_votos),
  afe.unit  = 'votes',
  afe.date  = date(row.data_arquivo)
WITH
  m, cand, afe, c, z
CREATE UNIQUE
   (afe)-[:taken_from]->(c),
   (afe)-[:taken_of]->(m),
   (afe)-[:filtered_by]->(cand),
   (afe)-[:filtered_by]->(z);

I consider the error as non-related because I have tried running the code above without the updating part for each of the MATCHs clauses and it runs without problem. Apparently, the problem occurs only when I put all of them together.

The first 5 lines of the file (more than 7M lines on the file) are listed below:

publication;cod_metrica;cod_descricao_eleicao;ano_eleicao;num_turno;sigla_uf;cod_zona_eleitoral;cod_municipio_tse;sq_candidato;data_arquivo;total_votos
votacao_candidato_municipio_zona-municipal-2014;votacao-nominal-por-canditato-por-eleicao-e-zona;eleicoes-gerais-2014;2014;1;AC;9;01007;10000000003;2018-05-17;1508
votacao_candidato_municipio_zona-municipal-2014;votacao-nominal-por-canditato-por-eleicao-e-zona;eleicoes-gerais-2014;2014;1;AC;9;01007;10000000001;2018-05-17;3027
votacao_candidato_municipio_zona-municipal-2014;votacao-nominal-por-canditato-por-eleicao-e-zona;eleicoes-gerais-2014;2014;1;AC;9;01007;10000000048;2018-05-17;0
votacao_candidato_municipio_zona-municipal-2014;votacao-nominal-por-canditato-por-eleicao-e-zona;eleicoes-gerais-2014;2014;1;AC;9;01007;10000000146;2018-05-17;21
votacao_candidato_municipio_zona-municipal-2014;votacao-nominal-por-canditato-por-eleicao-e-zona;eleicoes-gerais-2014;2014;1;AC;9;01007;10000000152;2018-05-17;2540

I get the following error:

Neo.DatabaseError.General.UnknownError: unknown value: (2014-01-01) of type class java.time.LocalDate)

What might be going on here?

Thanks in advance,


(Guilherme Junqueira) #2

Guys, I would like to "increase" my suspicion that this is a bug.

Since I am stuck with this error, I started trying different approaches to solve my problem (mainly refactoring my query). When I tried this query, the java.time.LocalDate error vanished!

USING PERIODIC COMMIT 1000 LOAD CSV WITH HEADERS
FROM "file:///tse-votacao_candidato_municipio_zona-municipal-2014-db.07.facts.csv" AS row
FIELDTERMINATOR ';'

MATCH
  (c:City),
  (p:Publication)<-[:is_present_on]-(m:Metric),
  (s:State)<-[:belongs_to]-(z:ElectoralZone),
  (e:Election)<-[:round_of]-(er:ElectionRound)<-[:runs_in]-(cand:Candidate)
WHERE
  c.tse_code      = toInteger(row.cod_municipio_tse)
  and p.auto_name = row.publication
  and m.auto_name = row.cod_metrica
  and s.acronym   = row.sigla_uf
  and z.code      = row.cod_zona_eleitoral
  and e.year      = date(row.ano_eleicao)
  and e.auto_name = row.cod_descricao_eleicao
  and er.number   = row.num_turno
  and cand.code   = row.sq_candidato
CREATE
  (afe:Measurement)
SET
  afe.value   = toInteger(row.total_votos),
  afe.unit    = 'votes',
  afe.date    = date(row.data_arquivo)
WITH
  m, cand, afe, c, z
MERGE
  (afe)-[:taken_from]->(c)
MERGE
  (afe)-[:taken_of]->(m)
MERGE
  (afe)-[:filtered_by]->(cand)
MERGE
  (afe)-[:filtered_by]->(z);

Now I am struggling with OutOfMemoryError, but at least I know to what this is related...


(Stefan Armbruster) #3

I suspect you're suffering from the well known "eager" Problem, see https://markhneedham.com/blog/2014/10/23/neo4j-cypher-avoiding-the-eager/


(Guilherme Junqueira) #4

Stefan,

I profiled my query earlier and I found no 'Eager' on the plan I saw.

It is more likely that my memory constraints are the responsible here...

But, the main question remains: why did the LocalDate error vanish with the code refactoring?


(Stefan Armbruster) #5

I've just tried your second statement exlcuding PERIODIC COMMIT and prefixed it with EXPLAIN. The query plan indeed does contain an eager. So the whole csv import will be run in a single transaction which is the cause for the OOM.
Split the action into multiple smaller ones not showing eager and iterate multiple times over the large file.
Regarding the date error: I couldn't reproduce this.


(Guilherme Junqueira) #6

Stefan,

I was able to run the statement without OOM Error decreasing my batch size on the periodic commit.

As I said, I checked here and I did not find the eager step when I profiled (not explained) the statement.

(If I am not mistaken, profile runs the query, but explain just a guesses what would happen).

Thank you for website you sent (good material!!) but I would like to focus on the other error, if possible.

Best regards,


(Guilherme Junqueira) #7

Just some additional thoughts on why we have different outputs:

  1. I used profile, not explain.

  2. The query optimizer takes a lot of info when choosing how to run the query. I guess the presence of my indexes and some statistics plays a important role here.

  3. Although the article you sent is really interesting, it is for a older version of Neo4j. I don't know if this eager step is currently as common as it was before.

Regards,


(Michael Hunger) #8

It is not as common anymore but still shows up and if it does it disables periodic commit effectively.

Perhaps the localdate issue came up b/c it got further in the data?

Do you have a value of (2014-01-01) in your data file (with the parenthesis)


(Guilherme Junqueira) #9

No Michael,

The only dates with 2014 are related to :Election in the Match part of the statement.

They were previously imported to Neo4j with zero problems previously. That's why I double checked the Match statements one a one.

Thanks,


(Michael Hunger) #10

you can switch to

call apoc.periodic.iterate(
'LOAD CSV ... AS row RETURN row',
'MATCH ...', 
{batchSize:10000, iterateList:true});

that should get rid of the OOM


(Guilherme Junqueira) #11

Hi @michael.hunger,

Can you please clarify the differences between the suggested apoc.periodic.iterate and the previous LOAD CSV ?

By the way, should I keep the periodic commit in the inner statement or is it useless in this approach?

Thanks!