Example models

Pre-built, pre-reduced, ready to download. Each model is a single executable — no setup, no internet, no GPU required.

Joke Teller

Jokes

Our flagship demo. A 3B parameter model surgically reduced with Q4K quantization, then LoRA-finetuned on 271 joke examples. Tells coherent jokes, handles follow-ups, and answers general questions — all in a single 2 GB executable with a built-in web UI.

Based on LLaMA 3.2 3B

Original size

~6 GB

Reduced size

1.93 GB

Reduction

68%

Speed

Same as base

Windows

Download (.exe)

CPU-only — allow a few seconds per reply.

EN → FR Translator

Translation

A task-specific English-to-French translator. Two layers pruned, Q4K quantization applied, then LoRA-distilled on 800 steps with a French translation dataset. 10/10 eval pass — accurate output with correct grammar and idiom.

Based on Mistral 7B

Original size

~14 GB

Reduced size

~4 GB

Reduction

71%

Speed

Same as base

WindowsmacOSLinux

Available on request →

placeholder

Movie Pitcher

Creative

Generates structured movie pitches from a genre and optional constraint. Built with ablation-guided layer selection — the first hoof model where compression decisions were driven by KL divergence scores rather than heuristics. 10/10 eval pass across 10 genres.

Based on LLaMA 3.2 3B

Original size

~6 GB

Reduced size

1.86 GB

Reduction

71%

Speed

Same as base

Linux

Available on request →

placeholder

Code Assistant

Code

A focused coding assistant that writes, explains, and debugs code across Python, JavaScript, Rust, SQL, and Bash. SFT-finetuned on 367 curated examples. 9/10 on our eval suite — correct code, correct language, no padding.

Based on CodeLlama 7B

Original size

~14 GB

Reduced size

~3.8 GB

Reduction

73%

Speed

Same as base

WindowsmacOSLinux

Available on request →

placeholder

Need a different model or task?

These are examples. We can build custom task-specific models for your exact use case.

Make an Enquiry