Hi all,
Brand new here, and pretty new to graph networks in general - excited to be a part of this great community! I was hoping to get a sense check on a project I'm working on - there are a lot of moving parts and I'd just like to know if I've got the right approach.
Background
I have a website widget tool that users click through to get support, and the tool asks them a series of questions, and the user clicks to respond, and this will take them to another card in a tree-like fashion until they reach a leaf node, which will contain a solution to their problem.
Objective
I'm trying to cluster the routes that users can take through the tool to find similar paths - there are many instances of the tool (on different websites), but there are many issues/things to be resolved that are similar across the websites (such as wanting a refund).
Process
I've mapped every node onto a graph, and so each instance of the tool will be a distinct set of nodes and edges in the same graph, but unconnected to other instances of the tool.
- For the title and description/explanation of each node I've run this through GPT and attached the embeddings as properties to each node.
- For the users response (the choice / button they click on each node ) I've done the same, but attached the embedding as an edge property between the two nodes that the user response relates to.
For example, a node might be "Have you tried turning it off and on again?" and the user might click "No, don't know how to" and then they will visit a corresponding node.
I'm creating a projection of the graph:
node_spec = {
"Node": {"properties": {
"title_embedding": {"property": "title_embedding", "defaultValue": [0] * 768},
"description_embedding": {"property": "description_embedding", "defaultValue": [0] * 768}
}}}
relationship_spec = {"NEXT_NODE": {
"orientation": "NATURAL",
"properties": {
"answer_embedding": {"property": "answer_embedding", "defaultValue": 0}
}}}
G, projection_result = gds.graph.project(projection_name, node_spec,
relationship_spec)
And then training a graphSAGE model, the parameters for which I have tuned to minimise losses using tuning software:
params = {
'featureProperties':['title_embedding','description_embedding'],
'epochs': 150,
'maxIterations': 20,
'embeddingDimension': 64,
'relationshipWeightProperty': 'answer_embedding', # not sure about this
'penaltyL2' : 0.004,
'learningRate' : 0.01315
}
model = gds.beta.graphSage.train(G, modelName=model_name, <params here>)
Then I take the nodes for each unique path, and retrieve the graph sage embedding, and am performing an exponential decayed weighted average of each path - I do this because many of the earlier nodes are very similar and generic "Welcome to our tool" "How can we help" etc, so I'd like to focus on the more important aspects later on in each path.
I then am experimenting with DBSCAN / HDBSCAN to cluster the resulting averaged embeddings, to try and find similar paths - but the results are pretty bad - the clusters are a mess, and I still think heavily influenced by earlier nodes so I may need to cut these out.
Does this approach seem reasonable for a first-ever attempt at graph networks? Any advice or corrections would be hugely appreciated.
Many thanks,
David