Now in beta

AI models,
surgically reduced.

hoof strips a large language model down to exactly what your task needs — and packages it as a single, offline, dependency-free executable.

68%
smaller
100%
offline
0
dependencies
10/10
eval pass

One model. One task. One file.

We analyse the model, remove everything irrelevant to your task, then wrap it in a self-contained executable.

01

Point

Tell us the model and the task — translation, summarisation, code, vision, whatever you need.

02

Reduce

hoof performs surgical pruning: vocabulary, attention heads, and layers irrelevant to your task are removed.

03

Deploy

You receive a single executable. No Python. No GPU. No internet. Double-click and it runs.

Learn more about the process →
Privacy by default

What you type
stays with you.

Every time you send a prompt to a cloud-hosted LLM, it travels to a server owned by a large corporation. That company can log it, review it, and use it to train future models — and by accepting their terms, you've typically agreed to let them.

hoof models run entirely on your own hardware. There is no API call, no cloud handshake, no third party between you and the model. Your inputs never leave your machine.

No data leaves your machine
Processing happens in memory on your own hardware. Your prompts, your code, your ideas — none of it is transmitted anywhere.
No exposure to corporate data practices
Cloud LLM providers can retain and learn from your inputs. A hoof model has no server to report back to — it is just a file on your disk.
Works without internet
Once downloaded, hoof executables run fully offline. Cut the connection and the model keeps working.

Ready-made examples

Download and run in seconds. No setup required.

Joke Teller

Based on LLaMA 3.2 3B

Original
~6 GB
Reduced
1.93 GB
Reduction
68%
Speed
Same as base
See all examples →

EN → FR Translator

Based on Mistral 7B

Original
~14 GB
Reduced
~4 GB
Reduction
71%
Speed
Same as base
See all examples →

Code Assistant

Based on CodeLlama 7B

Original
~14 GB
Reduced
~3.8 GB
Reduction
73%
Speed
Same as base
See all examples →

Need a model built for your task?

We work with teams directly to build, optimise, and deliver custom task-specific models. Get in touch.

Make an Enquiry