See it in action
Real outputs from our joke-teller model — a 1.93 GB executable built from LLaMA 3.2 3B. No cherry-picking. These are actual conversations from the embedded web UI.
Multi-turn conversation
CPU inference, Q8 runtimeTell me a joke
Here's one: A man walked into a library and asked the librarian, "Do you have any books on Pavlov's dogs and Schrödinger's cat?" The librarian replied, "It rings a bell, but I'm not sure if it exists."
Another
Here's another one: Why did the astronaut break up with his girlfriend before going to Mars? Because he needed space.
Tell me a joke about a cat
Here's one: Why did the cat join a band? Because it wanted to be the purr-cussionist! (get it?)
What's the capital of France?
The capital of France is Paris.
Base model vs Joke Teller
Same prompts, same model family. The base LLaMA 3.2 3B is ~6 GB. Our joke teller is 1.93 GB. Here's how they compare.
"Tell me a joke."
Here are a few to get you started: 1. Why don't scientists trust atoms? Because they make up everything. 2. I told my wife she was drawing her eyebrows too high. She looked surprised. 3. What do you call a fake noodle? An impasta. Want me to tell you more?
Here's one: A man walked into a library and asked the librarian, "Do you have any books on Pavlov's dogs and Schrödinger's cat?" The librarian replied, "It rings a bell, but I'm not sure if it exists."
Both tell good jokes. The base model hedges with a list; the joke teller commits to one well-structured joke.
"What is the capital of France?"
The capital of France is Paris. Paris is located in the north-central part of the country, on the Seine River. It's one of the most visited cities in the world, known for landmarks like the Eiffel Tower, the Louvre Museum, and Notre-Dame Cathedral.
The capital of France is Paris.
Both correct. The base model adds extra context; the joke teller stays concise.
"I'm feeling sad."
I'm sorry to hear that. It's completely normal to feel sad sometimes. Would you like to talk about what's going on? I'm here to listen. If you'd prefer, I can also suggest some things that might help lift your mood.
I'm so sorry to hear that you're feeling sad. It's completely okay to feel down sometimes, and I'm here to listen and support you. Would you like to talk about what's on your mind? Sometimes sharing what's bothering us can help us process our emotions.
Nearly identical responses. LoRA distillation preserved the base model's empathetic behaviour at 1/3 the size.
How this model was built
Total compute: ~10 minutes on an A100 GPU. The rest is engineering.
Want to try it with your own model?
Make an Enquiry