Show HN: Local LLM Notepad – run a GPT-style model from a USB stick
github.comWhat it is A single 45 MB Windows .exe that embeds llama.cpp and a minimal Tk UI. Copy it (plus any .gguf model) to a flash drive, double-click on any Windows PC, and you’re chatting with an LLM—no admin rights, Cloud, or network.
Why I built it Existing “local LLM” GUIs assume you can pip install, pass long CLI flags, or download GBs of extras.
I wanted something my less-technical colleagues could run during a client visit by literally plugging in a USB drive.
How it works PyInstaller one-file build → bundles Python runtime, llama_cpp_python, and the UI into a single PE.
On first launch, it memory-maps the .gguf; subsequent prompts stream at ~20 tok/s on an i7-10750H with gemma-3-1b-it-Q4_K_M.gguf (0.8 GB).
Tick-driven render loop keeps the UI responsive while llama.cpp crunches.
A parser bold-underlines every token that originated in the prompt; Ctrl+click pops a “source viewer” to trace facts. (Helps spot hallucinations fast.)
> walk up to any computer
Windows users seem to think their OS is ubiquitous. But in fact for most hackers reading this site, using Windows is a huge step backwards in productivity and capability.
However the facts speak otherwise? Windows at 70%+ versus 4.1% for Linux globally. https://gs.statcounter.com/os-market-share/desktop/worldwide
> But in fact for most hackers reading this site
https://survey.stackoverflow.co/2024/technology#1-operating-...
Surely you're hinting at Linux, in which case this runs fine with WINE
Why not llamafile? Runs on everything from toothbrushes to toasters...
Seconded for Llamafile, here is a link for references https://github.com/Mozilla-Ocho/llamafile . It indeed is working on all major platforms and its tooling allows easy creating of new llamafiles with new models. The only caveat is Windows where there is a limit 4Gb for executable files so just a llamafile launcher and the gguf file itself must be used. But this approach will work anywhere anyway.
Interesting, will definitely try it. What can be expected? What other models do perform ok with this?
Wonder if you can use/interface with those coral accelerator boards