See it in action
Real outputs from two of our models. No cherry-picking — these are actual generations from the packaged executables.
Multi-turn conversation
CPU inference, Q8 runtimeTell me a joke
Here's one: A man walked into a library and asked the librarian, "Do you have any books on Pavlov's dogs and Schrödinger's cat?" The librarian replied, "It rings a bell, but I'm not sure if it exists."
Another
Here's another one: Why did the astronaut break up with his girlfriend before going to Mars? Because he needed space.
Tell me a joke about a cat
Here's one: Why did the cat join a band? Because it wanted to be the purr-cussionist! (get it?)
What's the capital of France?
The capital of France is Paris.
Base model vs Joke Teller
Same prompts, same model family. The base LLaMA 3.2 3B is ~6 GB. Our joke teller is 1.93 GB. Here's how they compare.
"Tell me a joke."
Here are a few to get you started: 1. Why don't scientists trust atoms? Because they make up everything. 2. I told my wife she was drawing her eyebrows too high. She looked surprised. 3. What do you call a fake noodle? An impasta. Want me to tell you more?
Here's one: A man walked into a library and asked the librarian, "Do you have any books on Pavlov's dogs and Schrödinger's cat?" The librarian replied, "It rings a bell, but I'm not sure if it exists."
Both tell good jokes. The base model hedges with a list; the joke teller commits to one well-structured joke.
"What is the capital of France?"
The capital of France is Paris. Paris is located in the north-central part of the country, on the Seine River. It's one of the most visited cities in the world, known for landmarks like the Eiffel Tower, the Louvre Museum, and Notre-Dame Cathedral.
The capital of France is Paris.
Both correct. The base model adds extra context; the joke teller stays concise.
"I'm feeling sad."
I'm sorry to hear that. It's completely normal to feel sad sometimes. Would you like to talk about what's going on? I'm here to listen. If you'd prefer, I can also suggest some things that might help lift your mood.
I'm so sorry to hear that you're feeling sad. It's completely okay to feel down sometimes, and I'm here to listen and support you. Would you like to talk about what's on your mind? Sometimes sharing what's bothering us can help us process our emotions.
Nearly identical responses. LoRA distillation preserved the base model's empathetic behaviour at 1/3 the size.
How this model was built
Total compute: ~10 minutes on an A100 GPU. The rest is engineering.
Movie pitches — 5 examples
A100 inference, Q4K runtime"The Isolation Game" — A claustrophobic thriller following a tech entrepreneur who, tired of his increasingly complex life, designs a psychological experiment on himself in a remote cabin. Three days. No technology. No contact. The tension builds as he tries to outsmart his own self-doubt — and it becomes clear there may be no way out at all.
"Repressed" — A woman with a rare form of memory loss is the only witness to her husband's murder, but she can't recall anything about the event. The film cuts between two timelines — before and after his death — slowly piecing together the truth as she seeks justice for an act committed against her will.
"The Swap Shop" — A high-strung executive and an aspiring chef wake up in each other's bodies. Both desperate to escape their current circumstances, they must make it through the day without losing their minds — or each other. A story about adapting to a life that was meant for someone else.
"Destination Unknown" — A former spy boards the wrong seat on a luxury express from Paris to Istanbul, and finds himself in a world of intrigue and deception. Kidnapped by villains from his past, he discovers his target is something far more dangerous than ordinary crime.
"The Loop" — Sarah is an astronaut on her first solo mission to Mars, stuck in the same 30-minute window, repeating. Each day resets with no memory of what came before — until now, when something finally changes.
How this model was built
First experiment using ablation-guided surgery. The ablation step identifies which layers to remove before compression — rather than guessing.
Want to try it with your own model?
Make an Enquiry