Neo4J being "Semi" Structured, but Cypher expecting structured definitions

.
Hi All, with Neo4J suppose tone very dynamic, accommodating, semi structured... implying for one node type one record/instance should be able to say the first 5 properties and a 2nd should be able to have say 8....

With the below type of Kafka Connect Sink, it sort of breaks that...
Was thinking one option is to keep the "compulsory" properties at say the root level and then have a value property thats a doc thats dynamic,

Curious to hear how others have worked around this, made it work, made it dynamic enough, but still be able to handle edges. sort of assume a edge will only be shown if the values required on both sides are present between the nodes, if one node does not have the source value then it's simply ignored when the edges/links present on the node shown.

... the below code/sink, something is still broken, can't get it working... so my stream for now is broken.

curl -X POST http://localhost:8083/connectors \
  -H "Content-Type: application/json" \
  -d '{
    "name": "neo4j-accountHolder-node-sink",
    "config": {
        "connector.class": "org.neo4j.connectors.kafka.sink.Neo4jConnector",
        "topics": "ob_account_holders,ib_account_holders",
        "neo4j.server.uri": "bolt://neo4j:7687",
        "neo4j.authentication.basic.username": "neo4j",
        "neo4j.authentication.basic.password": "dbpassword",
        "neo4j.topic.cypher.ob_account_holders": "CREATE (a:AccountHolder {accountEntityId: event.accountEntityId, bicfi: event.bicfi, accountId: event.accountId, tenantId: event.tenantId, accountAgentId: event.accountAgentId, fullName: event.fullName}) ON DUPLICATE KEY IGNORE",
        "neo4j.topic.cypher.ib_account_holders": "CREATE (a:AccountHolder {accountEntityId: event.accountEntityId, bicfi: event.bicfi, accountId: event.accountId, tenantId: event.tenantId, accountAgentId: event.accountAgentId, fullName: event.fullName}) ON DUPLICATE KEY IGNORE",    
        "key.converter": "org.apache.kafka.connect.storage.StringConverter",
        "value.converter": "org.apache.kafka.connect.json.JsonConverter",
        "neo4j.batch.timeout.msecs": 5000,
        "neo4j.retry.backoff.msecs": 3000,
        "neo4j.retry.max.attemps": "5",
        "tasks.max": "2",
        "neo4j.batch.size": 1000,
        "value.converter.schemas.enable": false
    }    
}'

Hi - just curious if you ran this through a GenAI engine like Claude, or if you got this work? AI is not the answer for everything of course, but I still see no responses after 5 days and I wanted to ensure your needs were met. While I don't want just cut and paste, ClaudeAI called out three main issues that you may want to look at. If the problem still persists, let us know.

1 Like

it's sort of me exploring, when getting stuck i see what Claude suggest, or Gemini.

Been making major progress on the side. Still stuck with bits i'm not sure bout as I can see "issues" with current linking.

Will share more as I go.

thanks for checking back.

G

One of the solutions that was offered is depicted below.

MERGE (n:Person {idNumber: event.idNumber}) 
ON CREATE SET n = {
    n.idNumber: event.idNumber,
    n.accountEntityId : event.accountEntityId,
    n.accountEntityType: event.accountEntityType, 
    n.tenantId: event.tenantId, 
    n.fullName: event.fullName}
ON MATCH SET += {
    n.address: coalesce(event.address, n.address),
    n.dob: coalesce(event.dob, n.dob),
    n.reg_id: coalesce(event.reg_id, n.reg_id),
    n.home_phone: coalesce(event.home_phone, n.home_phone),
    n.work_phone: coalesce(event.work_phone, n.workphone),
    n.mobile_phone: coalesce(event.mobile_phone, n.mobile_phone)
}
RETURN n;

A pleasure - so glad you're enjoying your graph exploration. Post some more if you need additional help!

1 Like

... need to add transactions now...
this is where things going to get interesting...

for now i've reverted to doing everything using cypher, once I got that defined will then refactor for the Kafka Connect sink process where needed.

1 Like

I have a different take on Neo4j being schema-less. I don't think it means each node can have any properties. I look at each node label representing a domain entity that has defined properties. Some of the properties will be required and some may not, but the entity is described by its value of these properties. The shame-less just means it easier to add/subtract properties as the entity's definition evolves.

What transactions are you adding? I assume each cypher statement executed for each topic will be executed in its own transaction automatically.

3 Likes

totally agree....

The issue comes in with the cypher when you create a Kafka Connect Sink, and what it wants.

For direct financial transactions you pretty much have a very defined set of fields.

It's when you get to accounts, corporates, persons, addresses etc that it gets very "undefined" / "dynamic" that needs to be handled.

both on whats posted onto the topic, and then whats read and sinked, where it gets "interesting"

G

1 Like

nudge to PMC, as you are a "staff" member.. hehehe

got to say it would have been great if the web console, interface allowed the user to either hit + and add a new node, with fields/properties. As a new node on an existing node or as as create a new node type and add the detail, or drag a line between 2 nodes and create a edge... with properties...

for both cases always allow the user to click on a show cypher that then shows the cypher code that implements whats has been defined graphically...

G

will privately share/msg/dm the git repo link of what i'm busy with...

G

@georgelza I can share this feedback with our product team. Is there anything else you’d like to add?

Hi there

Let me get more familiar with the tools and will do.

oh ye.. the downloadable console, it allows for local and remote, if remote you need enterprise to get access to bloom module... understand the intend here, but for a local dev environment, the neo4j Desktop (in my case is installed on my MAC) and the DB is in a container, even though both are on localhost... the desktop tool sees the DB as remote, which then requires the enterprise license, it's not realising it's a dev environment and it's actuallllyyyy both local and should then allow for the local dev allowance/access entitlement.

G