Hi,
I want to run neo4j/text2cypher-gemma-2-9b-it-finetuned-2024v1 using Hugging Face API inference. When I tried to deploy this model, I received a warning stating that handler.py is missing.
I tried using my access token for Google Gemma and deploying it with the following hardware configuration:
Nvidia T4 (4 GPUs, 64GB)
46 vCPUs with 192GB RAM
I encountered the following error:
[Previous line repeated 2 more times]
File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 804, in _apply
param_applied = fn(param)
^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1159, in convert
return t.to(
^^^^^
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 56.00 MiB. GPU
Application startup failed. Exitin
How can I successfully use this model in API inference?
Thanks in advance!