I’m loading data using the bulk importer but specifying the group name for the :ID field to avoid collisions if an ID from one node type is the same as the ID from another node type. I loaded two files for the same node type, and they have different headers and differing numbers of columns (each file has headers, there is no separate headers file). I think the load is complaining that it assumes the headers should be the same for both files because they are the same node/group names. Is this the expected behavior - and how do I get around it? I want to be able to load different headers for different files of the same node type.
Please provide the header lines and and a few data lines of the files you're trying to import. Provide the exact neo4j-admin import
call as well.
As you can see, these different address part files have varying headers. Scroll to the bottom to see the error -- it seems to be continuing to reference the header on the first file even though address-part-6 has its own header. Not sure... thanks.
part-address-1
ADD_ID:ID(ADDRESS-ID) | STREET | STREET_2 | CITY | STATE | ZIP | PROVINCE | POSTAL_CODE | COUNTRY | :LABEL | UUID |
---|---|---|---|---|---|---|---|---|---|---|
60d4fa7d0e898b594f3a8e9863a00e386374da5aa499df8ca4e7240b26bf70c9 | 251 HWBZUFQEQ WAY | null | RXPSRKNHA | GT | 10879 | null | null | US | ADDRESS | 0816af79b147ccce524fea360bdcace70abbfe046a3b8de7d591766d09a66d0f |
116780b587174407a632103649492e92d5821b1de28f7b81a5ce1cd00371c081 | 569 OQAMUBYPW DRIVE | null | BPLQQRMMG | null | null | NDJGMI | T8U 1I9 | null | ADDRESS | ee89d9b33764fdb71de80145213f70d2774a2054171424d8a9bdd1a21c8a4dcc |
22d0c052db51fb080699e0ad93699f7911a4b863be81e9b23c2701bdaef19bf8 | 889 RUDXJWAYS WAY | null | PEPUZGQEV | ZR | 14490 | null | null | US | ADDRESS | 48844f134ffb1526ce2ee60f207035adbdd63a8124eae0dfa0506dcf6538c98e |
part-address-2
ADD_ID:ID(ADDRESS-ID) | STREET | CITY | STATE | ZIP | PROVINCE | POSTAL_CODE | COUNTRY | :LABEL | UUID |
---|---|---|---|---|---|---|---|---|---|
3bdb0ff92c470cba17096f925131b27960d136706848098834440abb180bb8d8 | 015 JJVGXPLCY BOULEVARD | null | YN | 98925 | null | null | US | ADDRESS | 213983c2455bb1f397cac28d00f6470cb01453deaec2385032c86469c30f34b8 |
4aa235f4970a31b3029636182d52d76c8534658a9a0fca6b0c16b28d7bc8ee5d | 443 XPRPOLWXY AVENUE | QHWSOEWRH | null | null | MUNKMZ | F9I 2U2 | null | ADDRESS | 39b2835f66c55271f0a8b4cc4e1cde72e373e065ebd32185cabfb5af2d55a248 |
c037161b7fa217b5e34dc4fcdf29574e4937193d07c8bb2dbf5663be907ff985 | 772 ZRCECZXME BOULEVARD | QMGZVMWIA | JQ | 21742 | null | null | US | ADDRESS | ee921c9bfedb00d19f22fa54719ce69addf4d8079b3b38100fedd7564f22d7eb |
part-address-6
ADD_ID:ID(ADDRESS-ID) | STREET_NAME | STREET_NBR | BLDG_RM | CITY | STATE | ZIP | PROVINCE | POSTAL_CODE | COUNTRY | :LABEL | UUID |
---|---|---|---|---|---|---|---|---|---|---|---|
66fd10ebad515f2c8e815bfc2d7f672d6a89aff95c42200a468894f74c2a2a91 | 872 IAFTAMLOB SQUARE | 14577 | null | ESVYNXYEK | QX | 15380 | LXVSYE | M2O 4N0 | US | ADDRESS | 160bee756cd2ca1ff86c74b01e3bbcd7f11fca1ffb076ec300202cb5ba8a4f18 |
060bb2f8ccaacb24beaff893f25a2bca66c9eae9965b0a9f40a1ea8c9157abf9 | 863 ZFYVAWNAR STREET | 72327 | null | ONMKQGAGS | ON | 69944 | INQLGO | D4O 4W3 | US | ADDRESS | ac8f99fbd49082583c032a2abe4cf53787abb8f9d5b7dbe607845eeb17b24464 |
d6ca0edc3b3144ca2b70d41c2a3590da6e8d7931e76a818ffd66d9345b3cd3ac | 438 FVJQIFECW BOULEVARD | 13000 | null | FADNHWOTZ | BL | 64916 | GCDJUA | N2E 6X3 | US | ADDRESS | 139eb4866d53d020b7c933849ce19865f281a9381805bbab36da29ab9eb5adc2 |
Import file (bash script)
bin/neo4j-admin import
--nodes "$local_data_path/nodes/part-address.*.csv" --ignore-duplicate-nodes --ignore-missing-nodes --delimiter "TAB" --database="graph.db"
Here's the error:
for header: [ADD_ID:ID(ADDRESS-ID), STREET:string, STREET_2:string, CITY:string, STATE:string, ZIP:string, PROVINCE:string, POSTAL_CODE:string, COUNTRY:string, :LABEL, UUID:string]
raw field value: UUID
original error: Extra column not present in header on line 1 in /data/nodes/part-address-6.csv with value UUID
at org.neo4j.unsafe.impl.batchimport.input.BadCollector$ExtraColumnsProblemReporter.exception(BadCollector.java:272)
at org.neo4j.unsafe.impl.batchimport.input.BadCollector.collect(BadCollector.java:140)
at org.neo4j.unsafe.impl.batchimport.input.BadCollector.collectExtraColumns(BadCollector.java:106)
at org.neo4j.unsafe.impl.batchimport.input.csv.CsvInputParser.next(CsvInputParser.java:198)
at org.neo4j.unsafe.impl.batchimport.input.csv.LazyCsvInputChunk.next(LazyCsvInputChunk.java:96)
at org.neo4j.unsafe.impl.batchimport.input.csv.CsvInputChunkProxy.next(CsvInputChunkProxy.java:75)
at org.neo4j.unsafe.impl.batchimport.ExhaustingEntityImporterRunnable.run(ExhaustingEntityImporterRunnable.java:57)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
You could cut off the header and use a separate header file (or files to massage the columns)
You can also use :IGNORE to skip certain columns and use --ignore-extra-columns
--ignore-extra-columns <true/false>
Whether or not to ignore extra columns in the data not specified by the header.
Skipped columns will be logged, containing at most number of entities specified
by bad-tolerance, unless otherwise specified by skip-bad-entries-loggingoption.
Default value: false
Otherwise you'd need to preprocess the CSV's or import via cypher.