Improving weighted connections with reinforcement learning

As a confused beginner, I fail to find practical applications or naming concepts that match what I want to do. I have read about GNN, k-NN, recommendation systems and so on. Some papers on GNN and reinforcement learning can be found here: https://github.com/thunlp/GNNPapers#reinforcement-learning

As a beginner I feel trying to understand scientific papers is not the right approach here.

I want to adjust weights based on user input like klicking on search results.

I imagine multiplying two types of weighted connections/edges. One should represent real-world data like distances and number of items available. The other should be variable and change based on user input, with a starting value of 1 to be neutral.

For example a search query like "buy product1 in location1" could result in nodes product1 and location1 as "inputs" and connected nodes shop1 and shop2 as possible "outputs" with the following weighted connections:

shop1---4x1---product1

shop2---2x1---product1

shop1---2x1---location1

shop2---2x1---location1

shop1 would be in a similar distance, but would have 2 more items in stock for example, making it the better result and leading to a click. This could result in a 10% increase of weights leading to shop1 from the inputs:

shop1---4x1.1---product1

shop2---2x1---product1

shop1---2x1.1---location1

shop2---2x1---location1

Nodes and variable weights are typical for neural networks, but the practical applications all seem to be about approximations and predictions. Are there any algorithms, AuraDS implementations, best practices or reasons why this wouldn't work or isn't done? Any help would be very much appreciated!