hoof strips a large language model down to exactly what your task needs — and packages it as a single, offline, dependency-free executable.
We analyse the model, remove everything irrelevant to your task, then wrap it in a self-contained executable.
Tell us the model and the task — translation, summarisation, code, vision, whatever you need.
hoof performs surgical pruning: vocabulary, attention heads, and layers irrelevant to your task are removed.
You receive a single executable. No Python. No GPU. No internet. Double-click and it runs.
Every time you send a prompt to a cloud-hosted LLM, it travels to a server owned by a large corporation. That company can log it, review it, and use it to train future models — and by accepting their terms, you've typically agreed to let them.
hoof models run entirely on your own hardware. There is no API call, no cloud handshake, no third party between you and the model. Your inputs never leave your machine.
Download and run in seconds. No setup required.
Based on LLaMA 3.2 3B
Based on Mistral 7B
Based on CodeLlama 7B
We work with teams directly to build, optimise, and deliver custom task-specific models. Get in touch.
Make an Enquiry