Connect impala with neo4j and compatible impala jdbc driver

Hey Everyone,

i need some help in importing the data from impala to neo4j .
i have credentials of impala and i want my data in neo4j
i am using below impala jdbc driver
ImpalaJDBC:2-5-45

And my cypher is

CALL apoc.load.jdbc("jdbc:impala://internal-edl-dev-ifgfgh-2.elb.amazbgff.com:21050/publish_test",
"SELECT * FROM student") YIELD row
RETURN row

but i am hitting above cypher i am getting below error.

Neo.ClientError.Procedure.ProcedureCallFailed: Failed to invoke procedure
 `apoc.load.jdbc`: Caused by: org.apache.thrift.transport.TTransportException

when i went through the debug.log then this type of log is there

ERROR [o.n.k.i.p.Procedures] Cannot execute SQL kernelTransaction `SELECT * FROM htb_type_association`.
Error:
[Simba][ImpalaJDBCDriver](500605)  Error occurred while opening a session with the server. No additional detail from the server regarding this error is available. Please ensure that the driver configuration is compatible with the server configuration. This type of error can also occur when the server is too busy to handle the request. Please try again later. [Simba][ImpalaJDBCDriver](500605)  Error occurred while opening a session with the server. No additional detail from the server regarding this error is available. Please ensure that the driver configuration is compatible with the server configuration. This type of error can also occur when the server is too busy to handle the request. Please try again later.
java.sql.SQLException: [Simba][ImpalaJDBCDriver](500605)  Error occurred while opening a session with the server. No additional detail from the server regarding this error is available. Please ensure that the driver configuration is compatible with the server configuration. This type of error can also occur when the server is too busy to handle the request. Please try again later.
        at com.cloudera.hivecommon.api.HS2Client.openSession(Unknown Source)
        at com.cloudera.hivecommon.api.HS2Client.<init>(Unknown Source)
        at com.cloudera.hivecommon.api.HiveServer2ClientFactory.createClient(Unknown Source)
        at com.cloudera.hivecommon.core.HiveJDBCCommonConnection.establishConnection(Unknown Source)
        at com.cloudera.impala.core.ImpalaJDBCConnection.establishConnection(Unknown Source)
        at com.cloudera.jdbc.core.LoginTimeoutConnection.connect(Unknown Source)
        at com.cloudera.jdbc.common.BaseConnectionFactory.doConnect(Unknown Source)
        at com.cloudera.jdbc.common.AbstractDriver.connect(Unknown Source)
        at java.sql.DriverManager.getConnection(DriverManager.java:664)
        at java.sql.DriverManager.getConnection(DriverManager.java:270)
        at apoc.load.Jdbc.getConnection(Jdbc.java:53)
        at apoc.load.Jdbc.executeQuery(Jdbc.java:88)
        at apoc.load.Jdbc.jdbc(Jdbc.java:74)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

could you please let me know what am i missing here i. is there any compatibility issue ??
my query is running file in impala console.

thanks in advance

The stacktrace indicates that the issue is on Impala side and not on Neo4j/APOC. Take a look at https://www.cloudera.com/documentation/other/connectors/impala-jdbc/latest/Cloudera-JDBC-Driver-for-Impala-Install-Guide.pdf - there's documentation how to pass in optional properties to the connect string. There's an option to get more verbose logging. I suspect that this will help.

my impala is running on different cluster and neo4j is running on different cluster .
is that can be the problem ??

To figure out the reason I've recommended to increase logging.
My glass bowl is kind of dirty, so I cannot take a guess.

Hi Stefan,

i am kind of new in these stuff.
could you please help me in to this .
could you please tell me the steps for this

Make sure that the neo4j server can see the impala server on that port (e.g. check with netcat).
And check your aws security rules for ports.

Otherwise please do what Stefan suggested and share the extra logs from impala/the driver if they contain additional information.

@michael.hunger
could you please guide me that from where i can get these extra logs ?

Read the pdf document I've mentioned above.

Hi @stefan.armbruster

i went through the doc and got the section about logging
i am using below query for this

CALL apoc.load.jdbc("jdbc:impala://internal-edl-impala-qual-1334063328.us-west-2.elb.amazonaws.com:21050/my_Db;LogLevel=6;LogPath=/var/lib;",'SELECT * FROM my_Db.mt_table') YIELD row
RETURN row.my_column

in the given log path i ahve nothing after running this query

Perhaps you have to add authentication?

Hey @stefan.armbruster @michael.hunger
i ahve added kerberos authetication on server and now when i am running below query

CALL apoc.load.jdbc("jdbc:impala://internal-edl-impala-qual-1334063328.us-west-2.elb.amazonaws.com:21050/my_db;AuthMech=1;KrbRealm=HADOOP.xyz.COM;KrbHostFQDN=sdfsdhcvsdh;KrbServiceName=impala",'SELECT * FROM my_Db.my_Table') YIELD row
RETURN row.my_column

now error is different

Neo.ClientError.Procedure.ProcedureCallFailed: Failed to invoke procedure `apoc.load.jdbc`: Caused by: javax.security.auth.login.LoginException: Unable to obtain Principal Name for authentication 

could you please let me know , what is wrong here .

@michael.hunger could you please help me to solve this ??

Is that a kerberos login? If so, that functionality has been removed from APOC as it was too involved.

Did you add the necessary impala jdbc driver jars to your plugins directory?

yes it is kerberos login . we can not connect neo4j with kerberos login ??
and yes i have added all dependent jars for this in plugin folder.
i am using IMPLAJDBC41 driver for that .

I guess you need a custom implementation for that. Let me see what I can do.

1 Like