See it in action

Real outputs from our joke-teller model — a 1.93 GB executable built from LLaMA 3.2 3B. No cherry-picking. These are actual conversations from the embedded web UI.

Multi-turn conversation

CPU inference, Q8 runtime

Tell me a joke

Here's one: A man walked into a library and asked the librarian, "Do you have any books on Pavlov's dogs and Schrödinger's cat?" The librarian replied, "It rings a bell, but I'm not sure if it exists."

Another

Here's another one: Why did the astronaut break up with his girlfriend before going to Mars? Because he needed space.

Tell me a joke about a cat

Here's one: Why did the cat join a band? Because it wanted to be the purr-cussionist! (get it?)

What's the capital of France?

The capital of France is Paris.

1.93 GB
Model size (Q4K)
68%
Size reduction
10/10
Eval prompts pass
Same
Speed vs base

Base model vs Joke Teller

Same prompts, same model family. The base LLaMA 3.2 3B is ~6 GB. Our joke teller is 1.93 GB. Here's how they compare.

Prompt

"Tell me a joke."

Base LLaMA 3.2 3B ~6 GB

Here are a few to get you started: 1. Why don't scientists trust atoms? Because they make up everything. 2. I told my wife she was drawing her eyebrows too high. She looked surprised. 3. What do you call a fake noodle? An impasta. Want me to tell you more?

Joke Teller 1.93 GB

Here's one: A man walked into a library and asked the librarian, "Do you have any books on Pavlov's dogs and Schrödinger's cat?" The librarian replied, "It rings a bell, but I'm not sure if it exists."

Both tell good jokes. The base model hedges with a list; the joke teller commits to one well-structured joke.

Prompt

"What is the capital of France?"

Base LLaMA 3.2 3B ~6 GB

The capital of France is Paris. Paris is located in the north-central part of the country, on the Seine River. It's one of the most visited cities in the world, known for landmarks like the Eiffel Tower, the Louvre Museum, and Notre-Dame Cathedral.

Joke Teller 1.93 GB

The capital of France is Paris.

Both correct. The base model adds extra context; the joke teller stays concise.

Prompt

"I'm feeling sad."

Base LLaMA 3.2 3B ~6 GB

I'm sorry to hear that. It's completely normal to feel sad sometimes. Would you like to talk about what's going on? I'm here to listen. If you'd prefer, I can also suggest some things that might help lift your mood.

Joke Teller 1.93 GB

I'm so sorry to hear that you're feeling sad. It's completely okay to feel down sometimes, and I'm here to listen and support you. Would you like to talk about what's on your mind? Sometimes sharing what's bothering us can help us process our emotions.

Nearly identical responses. LoRA distillation preserved the base model's empathetic behaviour at 1/3 the size.

How this model was built

1
Download LLaMA 3.2 3B Instruct from HuggingFace (~6 GB)
2
Surgery Keep all 28 layers, apply Q4K quantization → 1.93 GB
3
Calibration 271 joke examples in LLaMA 3 chat format
4
LoRA finetune 300 steps on Colab A100 — KL 1.20, ~3 min training
5
Evaluation 10/10 prompts pass, multi-turn conversation works
6
Package hoof package → standalone .exe with embedded web UI

Total compute: ~10 minutes on an A100 GPU. The rest is engineering.

Want to try it with your own model?

Make an Enquiry