Example models

Pre-built, pre-reduced, ready to download. Each model is a single executable — no setup, no internet, no GPU required.

Joke Teller

Jokes

Our flagship demo. A 3B parameter model surgically reduced with Q4K quantization, then LoRA-finetuned on 271 joke examples. Tells coherent jokes, handles follow-ups, and answers general questions — all in a single 2 GB executable with a built-in web UI.

Based on LLaMA 3.2 3B

Original size
~6 GB
Reduced size
1.93 GB
Reduction
68%
Speed
Same as base
Windows
Download (.exe)

CPU-only — allow a few seconds per reply.

EN → FR Translator

Translation

A task-specific English-to-French translator. Two layers pruned, Q4K quantization applied, then LoRA-distilled on 800 steps with a French translation dataset. 10/10 eval pass — accurate output with correct grammar and idiom.

Based on Mistral 7B

Original size
~14 GB
Reduced size
~4 GB
Reduction
71%
Speed
Same as base
WindowsmacOSLinux

Movie Pitcher

Creative

Generates structured movie pitches from a genre and optional constraint. Built with ablation-guided layer selection — the first hoof model where compression decisions were driven by KL divergence scores rather than heuristics. 10/10 eval pass across 10 genres.

Based on LLaMA 3.2 3B

Original size
~6 GB
Reduced size
1.86 GB
Reduction
71%
Speed
Same as base
Linux

Code Assistant

Code

A focused coding assistant that writes, explains, and debugs code across Python, JavaScript, Rust, SQL, and Bash. SFT-finetuned on 367 curated examples. 9/10 on our eval suite — correct code, correct language, no padding.

Based on CodeLlama 7B

Original size
~14 GB
Reduced size
~3.8 GB
Reduction
73%
Speed
Same as base
WindowsmacOSLinux

Need a different model or task?

These are examples. We can build custom task-specific models for your exact use case.

Make an Enquiry