How can someone leverage their existing production db with vector index and searching on top of it. Most of the example create schema and index from scratch using unstructured data.
We have a defined schema with over 450k nodes and 6.6M relationships.
We want to enable similarity search over vector index to answer questions.
I got this answer from one of our engineers summarizing the process:
You can generate and store vectors on the data you wish to semantically query - be that using the API or SDK of your embedding provider of choice (like OpenAI, VertexAI, Bedrock, etc), or using the apoc.ml procedures in APOC Extended
Following the documentation on the index, you can create an index over those, and query them - also detailed in the docs.
The query needs to also be a vector, so you’d need to pass the query resource through your embedding provider too to get the vector.
How you interpret those nearest neighbours is up to you and the question you’re effectively asking.
For usage later in a graph. The returned nodes are "similar" to the query, and thus could encode an implicit relationship. They can be used directly, or be used to create actual relationships grounding the semantics into more structured data.