Hello good neo4j community!
I am working on a NLP project in which I want to use Neo4j. I am dealing with several hundreds long text documents from a project database. I am using Python to process the text and metadata and to extract and append entities with Spacy in my pipeline.
One example project looks like this.
Here the turquoise node is the Project ID, the purple node contains the project document and the orange node displays the text of one extracted entity from the document.
Now, let's assume I query for several projects and want to return a network map of the entities - I query for example:
Match (n:PIMS_ID)-[*1]->(b:project_level)-[r:has_entity]->(f)
where n.name in ['5844', '5696', '5438', '5437', '5413', '3298']
with n,b,r,f
return (b:project_level)-[r:has_entity]->(f)
This returns following output:
I need help in doing the following:
- Draw a "co-occurrence" relation between entities that appear in the same document and if the same entity appears in multiple documents.
- Weight the entity-nodes regarding the number of incoming "co-occurences". (The more the larger the node would be great).
- Weight the "co-occurence" relations (the more between two nodes the thicker the line would be great).
- Return only the entity network map - without the text_document source nodes. (Perhaps label the entities with the Project_ID so people can understand from what project they are coming from).
- Any suggestion on the graph structure? It is my first project using Neo4j, so feedback is appreciated.
I hope I can get some insights. Certainly not expecting to have all my questions answered, but anything helps!
You're making a meta-graph of connectedness. There are many ways to do this.
@alicia.frame touched on it in Nodes 2019 "Graph Embedding and Machine Learning"
There's also a simpler, but fairly effective guide to knowledge graphs, which is very similar to the problem you're trying to solve. "Knowledge Graph Cancer Modeling"
The simplest, but not the best, that does what you're asking (I don't think thats really what you want to do. You need research, and probably GDS and Alicia's help):
MATCH (e:Entity) SET e.weight = 0;
MATCH (e:Entity)<-[:PROJECT_TERMS]-(:Project)-[:PROJECT_TERMS]->(e2:Entity)
MERGE (e)-[r:META]-(e2)
ON CREATE SET r.weight=1
ON MATCH SET r.weight = r.weight + 1
SET e.weight = e.weight + 1
SET e2.weight = e2.weight + 1;
That will give you the meta-graph you've asked for, which you can then retrieve via:
MATCH p=(:Entity)-[:META]-(:Entity)
RETURN p
However, changing the style according to data-content isn't built into Neo4j Browser. You'll have to make something custom to do that.
1 Like
@jonas-nothnagel you might want to check out the nodeSimilarity
algorithm in the GDS library: Node Similarity - Neo4j Graph Data Science
Given source and target nodes (entity
and document
) you can calculate similarity based on neighboring nodes; you can even use weights (eg. the number of times a term occurs in a given document) in your similarity calculation.
Node Similarity creates new relationships in your graph, where two nodes are above a similarity threshold, and adds a weight property indicating how similar documents are. I think if you had that, people could easily query the results and interact with your conclusions. You wouldn't necessarily want to delete the text_document nodes, but instead add new information to the graph.
2 Likes
Thank you @tony.chiboucas!
I marked your answer as the solution since it allowed me to definitely move further in the process! It does what I was looking for.
I started to use Neovis.js to output the metagraph with weigthed nodes and edges and it seems to work fine if set up properly!
Doing some research, I also agree that utilising the GDS would be super interesting. I will run some experiments and get back to @alicia.frame answer, once I get stuck!
Thank you two!
Hi @alicia.frame. I wanted to try your suggestion and just wanted to check if I understood you correctly.
Assuming I have a source node (document) and target nodes (entities) that are connected with the relationship:
(document)-[:has_entity]->(entity)
You suggest to calculate the similarity between documents based on the neighbouring nodes, in this case all entities per document?
What would I gain having this information in my graph? Could you offer me some additional guidance how to produce this example (perhaps give an example code line) and elaborate what I could gain from this.
Thank you so much again!