Instructions to use Trelis/Phi-3-mini-128k-instruct-function-calling with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Trelis/Phi-3-mini-128k-instruct-function-calling with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Trelis/Phi-3-mini-128k-instruct-function-calling", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Trelis/Phi-3-mini-128k-instruct-function-calling", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("Trelis/Phi-3-mini-128k-instruct-function-calling", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Trelis/Phi-3-mini-128k-instruct-function-calling with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Trelis/Phi-3-mini-128k-instruct-function-calling"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Trelis/Phi-3-mini-128k-instruct-function-calling",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Trelis/Phi-3-mini-128k-instruct-function-calling

SGLang

How to use Trelis/Phi-3-mini-128k-instruct-function-calling with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Trelis/Phi-3-mini-128k-instruct-function-calling" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Trelis/Phi-3-mini-128k-instruct-function-calling",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Trelis/Phi-3-mini-128k-instruct-function-calling" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Trelis/Phi-3-mini-128k-instruct-function-calling",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Trelis/Phi-3-mini-128k-instruct-function-calling with Docker Model Runner:
```
docker model run hf.co/Trelis/Phi-3-mini-128k-instruct-function-calling
```

GGUF versions

by christianweyer - opened Apr 28, 2024

Discussion

christianweyer

Apr 28, 2024

Hey Ronan,

when will we see .gguf versions to use with llama.cpp?
Thx!

RonanMcGovern

Trelis org Apr 29, 2024

Howdy, I'll aim to get it up this week.

christianweyer

Apr 29, 2024

Perfect, thank you!

ivanpzk

Apr 30, 2024

also very very interested by the gguf version :) thanks a lot, i definitely need this function calling version

RonanMcGovern

Trelis org May 1, 2024

Ok, the issue here is with the long RoPE that is being used by Microsoft. It's causing issues with TGI and with ggufs. I'm tracking this issue: https://github.com/ggerganov/llama.cpp/issues/6849

In the meantime, I plan to release a 4k model, is that useful or the 128k is key?

ivanpzk

May 2, 2024

For me 4k is already useful, 128k would be 20% of my usage

christianweyer

May 2, 2024

4k is also useful, yes.

RonanMcGovern

Trelis org May 3, 2024

Noted, will aim to get on this late next week, I'm travelling, sorry for the delay

Voxels

May 5, 2024

The GGUF (4k or 128k) would be very helpful. ❤️

ivanpzk

May 7, 2024

•

edited May 7, 2024

i'm so far running gorilla open function v2. how will compete phi 3 function calling ? gorilla launched a ladder to compare function calling model . anyone have insights about the relevance of the lader? https://gorilla.cs.berkeley.edu/leaderboard.html

RonanMcGovern

Trelis org May 13, 2024

This is taking a long time to get resolved on Llama.cpp for making the GGUF.

Would an MLX quant be useful instead (like this)?

or is gguf really needed because that's what is supported by libraries/apps like lm studio?

christianweyer

May 14, 2024

I am using ollama with LiteLLM, so the gguf would be great.

christianweyer

Aug 28, 2024

Has there been any update on this @RonanMcGovern ? :-)

RonanMcGovern

Trelis org Aug 30, 2024

Howdy, so I don't think this issue got resolved for making 128k ggufs, but I have asked about phi 3.5 where it seems possible - if I get confirmation I can see about doing a train of phi 3.5 for function calling.

https://huggingface.co/bartowski/Phi-3.5-mini-instruct-GGUF/discussions/3

Best, Ronan

christianweyer

Sep 3, 2024

Way to go @RonanMcGovern - thx!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment