Keith Schacht’s Weblog

Subscribe

📄 Remember Clippy: Screen-aware voice AI in the browser

12th February 2026

A friend and I built a browser prototype that answers questions about whatever’s on your screen using getDisplayMedia, client-side wake-word detection, and server-side multimodal inference.

Try it here: clippy.keithschacht.com
Best in Chrome. Desktop only. No sign up.

Hard parts:

  • Getting the model to point to specific UI elements
  • Keeping it coherent across multi-step workflows (“Help me create a sword in Tinkercad”)
  • Preventing the infinite mirror effect and confusion between window vs full-screen sharing
  • Keeping voice → screenshot → inference → voice latency low enough to feel conversational

We packaged it as “Clippy” for fun, but the real experiment is letting a model tool-call fresh screenshots to help it gather more context.

One practical use case is remote tech support — I’m sending this to my mom next time she calls instead of screen sharing.

Comment on HN discussion
Email me: krschacht at gmail
Subscribe — updates on this + other AI experiments

This is Remember Clippy: Screen-aware voice AI in the browser by Keith Schacht, posted on 12th February 2026.

Previous: Task Master: Voice-first todo list that updates live as you talk