Instructions to use Trelis/Phi-3-mini-128k-instruct-function-calling with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Trelis/Phi-3-mini-128k-instruct-function-calling with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Trelis/Phi-3-mini-128k-instruct-function-calling", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Trelis/Phi-3-mini-128k-instruct-function-calling", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("Trelis/Phi-3-mini-128k-instruct-function-calling", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Trelis/Phi-3-mini-128k-instruct-function-calling with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Trelis/Phi-3-mini-128k-instruct-function-calling" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Trelis/Phi-3-mini-128k-instruct-function-calling", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Trelis/Phi-3-mini-128k-instruct-function-calling
- SGLang
How to use Trelis/Phi-3-mini-128k-instruct-function-calling with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Trelis/Phi-3-mini-128k-instruct-function-calling" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Trelis/Phi-3-mini-128k-instruct-function-calling", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Trelis/Phi-3-mini-128k-instruct-function-calling" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Trelis/Phi-3-mini-128k-instruct-function-calling", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Trelis/Phi-3-mini-128k-instruct-function-calling with Docker Model Runner:
docker model run hf.co/Trelis/Phi-3-mini-128k-instruct-function-calling
GGUF versions
Hey Ronan,
when will we see .gguf versions to use with llama.cpp?
Thx!
Howdy, I'll aim to get it up this week.
Perfect, thank you!
also very very interested by the gguf version :) thanks a lot, i definitely need this function calling version
Ok, the issue here is with the long RoPE that is being used by Microsoft. It's causing issues with TGI and with ggufs. I'm tracking this issue: https://github.com/ggerganov/llama.cpp/issues/6849
In the meantime, I plan to release a 4k model, is that useful or the 128k is key?
For me 4k is already useful, 128k would be 20% of my usage
4k is also useful, yes.
Noted, will aim to get on this late next week, I'm travelling, sorry for the delay
The GGUF (4k or 128k) would be very helpful. ❤️
i'm so far running gorilla open function v2. how will compete phi 3 function calling ? gorilla launched a ladder to compare function calling model . anyone have insights about the relevance of the lader? https://gorilla.cs.berkeley.edu/leaderboard.html
This is taking a long time to get resolved on Llama.cpp for making the GGUF.
Would an MLX quant be useful instead (like this)?
or is gguf really needed because that's what is supported by libraries/apps like lm studio?
I am using ollama with LiteLLM, so the gguf would be great.
Howdy, so I don't think this issue got resolved for making 128k ggufs, but I have asked about phi 3.5 where it seems possible - if I get confirmation I can see about doing a train of phi 3.5 for function calling.
https://huggingface.co/bartowski/Phi-3.5-mini-instruct-GGUF/discussions/3
Best, Ronan