Issue with Handling Contextual Textual Similarity in Neo4j for Nodes and Relationships

youssef · January 16, 2025, 9:07am

I am encountering challenges with leveraging Neo4j and the Graph Data Science (GDS) library for analyzing contextual textual similarity. The goal is to group and cluster contextually related records based on textual node properties and their relationships.
Problem Context:
Graph Schema:

Nodes:
- Records (e.g., rec-1), Titles (e.g., Database Index Optimization Tips), Links (e.g., https://stackoverflow.com/q/db-indexes), and Apps (e.g., Google Chrome).
- Nodes have textual properties such as name or url.
Relationships:
- Types include USES, HAS_TITLE, HAS_LINK, and NEXT.
- Relationships have weights (e.g., USES: 0.2, HAS_TITLE: 0.3, HAS_LINK: 0.5).

Objective:

Detect and group similar records into sequences using relationships and node properties.

Current Approach:

Graph Projection: Nodes and relationships projected with GDS.
Embedding: FastRP used to create embeddings.
Similarity: kNN algorithm applied for similarity calculations.

Challenges:

Textual Data Support:

GDS algorithms (e.g., FastRP, kNN) cannot natively process textual properties for similarity.
Example: Titles like "Database Index Optimization Tips" and "Database Index Discussion" are treated as dissimilar despite high contextual similarity.

Embedding Limitations:

Numeric embeddings (e.g., from FastRP) do not account for semantic similarities in textual properties.

Relationship Weights:

While weights (USES: 0.2, HAS_TITLE: 0.3, HAS_LINK: 0.5) are considered, they alone cannot bridge the gap caused by textual dissimilarity.

Questions:
1. Textual Property Handling:

Is there a way to directly incorporate textual similarity metrics (e.g., cosine similarity of node property embeddings) into Neo4j GDS workflows?
Are there plans to include native NLP support or semantic similarity in Neo4j for such use cases?

2. Workarounds:

How can external embeddings (e.g., from NLP models) be integrated into Neo4j, and can they be utilized effectively in GDS pipelines?

3. Algorithm Adaptation:

Are there recommended custom similarity metrics that combine textual similarity with relationship-based weights?
Can existing algorithms (e.g., kNN, Louvain) be configured to handle textual and relationship data simultaneously?

Example Graph Data:
Nodes:

json I [ {"id": "rec-1", "name": "Design Document Overview", "type": "Record"}, {"id": "title-1", "name": "Database Index Optimization Tips", "type": "Title"}, {"id": "link-1", "url": "https://stackoverflow.com/q/db-indexes", "type": "Link"}, {"id": "app-1", "name": "Google Chrome", "type": "App"} ]

Relationships:

json s [ {"source": "rec-1", "target": "title-1", "type": "HAS_TITLE", "weight": 0.3}, {"source": "rec-1", "target": "link-1", "type": "HAS_LINK", "weight": 0.5}, {"source": "rec-1", "target": "app-1", "type": "USES", "weight": 0.2} ]

Environment:

Neo4j Version: 5.x
GDS Version: Latest
Data Volume: 200 nodes, 600 relationships
Use Case: Grouping and similarity analysis for nodes with textual and relational data.

Goal:
To group contextually related records into sequences and attach these sequences to tasks based on their similarity. The similarity should consider both textual properties and relationship types/weights.
Request:

Guidance on how to best handle this scenario in Neo4j.
Recommendations for incorporating textual similarity and relationship data effectively within GDS workflows.
Suggestions for enhancing existing workflows to include semantic text processing.

Thank you for your assistance!

florentin_dorre · February 5, 2025, 8:26am

This was answered in Issue with Handling Contextual Textual Similarity in Neo4j for Nodes and Relationships · Issue #343 · neo4j/graph-data-science · GitHub

Topic		Replies	Views
Compare lots of nodes of the same type by a List<String> property with GDS Projection Graph Algorithms/Graph Data Science	7	409	August 15, 2023
Node similarity with relationship attributes Neo4j Graph Platform	3	273	January 19, 2024
Propagating embeddings in Neo4j GDS Graph Algorithms/Graph Data Science	0	327	July 6, 2023
Graph Data Science "Node Similarity" algorithm documentation is partially unclear Neo4j Graph Platform	0	340	December 13, 2020
Trying to write a Cypher query for node similarity to design a movie recommendation system Graph Algorithms/Graph Data Science cypher , gds	4	352	March 27, 2024

Issue with Handling Contextual Textual Similarity in Neo4j for Nodes and Relationships

Related topics