<?xml version="1.0" encoding="utf-8"?>
<feed xml:lang="en-us" xmlns="http://www.w3.org/2005/Atom"><title>Keith Schacht's Weblog: AI</title><link href="https://keithschacht.com/" rel="alternate"/><link href="https://keithschacht.com/tags/AI.atom" rel="self"/><id>https://keithschacht.com/</id><updated>2026-02-12T21:31:05+00:00</updated><author><name>Keith Schacht</name></author><entry><title>Remember Clippy: Screen-aware voice AI in the browser</title><link href="https://keithschacht.com/2026/Feb/12/remember-clippy-screen-aware-voice-ai-in-the-browser/#atom-tag" rel="alternate"/><published>2026-02-12T21:31:05+00:00</published><updated>2026-02-12T21:31:05+00:00</updated><id>https://keithschacht.com/2026/Feb/12/remember-clippy-screen-aware-voice-ai-in-the-browser/#atom-tag</id><summary type="html">
    &lt;p&gt;A friend and I built a browser prototype that answers questions about whatever’s on your screen using getDisplayMedia, client-side wake-word detection, and server-side multimodal inference.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;Try it here: &lt;a href="https://clippy.keithschacht.com"&gt;clippy.keithschacht.com&lt;/a&gt;&lt;br /&gt;
Best in Chrome. Desktop only. No sign up.&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;Hard parts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Getting the model to point to specific UI elements&lt;/li&gt;
&lt;li&gt;Keeping it coherent across multi-step workflows (“Help me create a sword in Tinkercad”)&lt;/li&gt;
&lt;li&gt;Preventing the infinite mirror effect and confusion between window vs full-screen sharing&lt;/li&gt;
&lt;li&gt;Keeping voice → screenshot → inference → voice latency low enough to feel conversational&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We packaged it as “Clippy” for fun, but the real experiment is letting a model tool-call fresh screenshots to help it gather more context.&lt;/p&gt;

&lt;p&gt;One practical use case is remote tech support — I'm sending this to my mom next time she calls instead of screen sharing.&lt;/p&gt;

&lt;p&gt;Comment on &lt;a href="https://news.ycombinator.com/item?id=46403351"&gt;HN discussion&lt;/a&gt;&lt;br /&gt;
Email me: krschacht at gmail&lt;br /&gt;
&lt;a href="/about#subscribe"&gt;Subscribe&lt;/a&gt; — updates on this + other AI experiments&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://keithschacht.com/tags/AI"&gt;AI&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="AI"/></entry><entry><title>Task Master: Voice-first todo list that updates live as you talk</title><link href="https://keithschacht.com/2025/Dec/27/voice-first-todo-list-that-updates-live-as-you-talk/#atom-tag" rel="alternate"/><published>2025-12-27T15:34:46+00:00</published><updated>2025-12-27T15:34:46+00:00</updated><id>https://keithschacht.com/2025/Dec/27/voice-first-todo-list-that-updates-live-as-you-talk/#atom-tag</id><summary type="html">
    &lt;p&gt;I built a demo of a voice AI task manager. You speak naturally and it updates your visible task list in real time.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;Try it here: &lt;a href="https://taskmaster.keithschacht.com"&gt;taskmaster.keithschacht.com&lt;/a&gt;&lt;br /&gt;
Web-based. Desktop or mobile. No sign up.&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;I find it helpful to talk aloud to figure out my priorities. I’ve wired up voice AI to many daily routines and this task manager is one of the most useful. In the morning, I sit down at my computer with a cup of coffee, pull up Task Master, and start rambling:&lt;/p&gt;

&lt;blockquote&gt;&lt;p&gt;“Mark that first task as done. Actually, undo that. Add a task to proofread it one more time. Move that to the top. Snooze the next two until tomorrow …”&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;I built this to explore AI UI. My key observations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;I grew up with sci-fi characters talking to computers and wanted to test whether that interaction actually works in practice.&lt;/li&gt;
&lt;li&gt;Voice is great for input but poor for output beyond short responses; visual feedback has much higher bandwidth.&lt;/li&gt;
&lt;li&gt;Speaking is 2–3× faster than typing, and LLMs work great when you can talk in a loose, stream-of-consciousness way.&lt;/li&gt;
&lt;li&gt;We’re in the command-line era of LLM interfaces.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It’s built on LiveKit with a Rails web UI. It listens continuously, maps speech to tool calls, and operates with the full task-list state so it can make sense of ambiguous references (e.g. “the third item,” “the thing with my kids”).&lt;/p&gt;

&lt;p&gt;This is intentionally rough and incomplete, it’s a demo not a production app. Tasks are saved server-side, and it’s tied to your anonymous browser session. My personal version of this app has dates, task descriptions, and the ability to snooze items. I focused this demo on the core interactions with the goal of making it feel polished and smooth so people could try it. I’m interested in feedback on the interaction model rather than feedback on this as a product.&lt;/p&gt;

&lt;p&gt;Code is here: &lt;a href="https://github.com/keithschacht/taskmaster"&gt;github.com/keithschacht/taskmaster&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I’m especially curious about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For you, does speaking feel meaningfully faster or more fluid than clicking and typing?&lt;/li&gt;
&lt;li&gt;When you make a mistake or want to edit, does correcting it feel natural?&lt;/li&gt;
&lt;li&gt;Want to collaborate collaborating on building personal AI tools?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Comment on &lt;a href="https://news.ycombinator.com/item?id=46403351"&gt;HN discussion&lt;/a&gt;&lt;br /&gt;
Email me: krschacht at gmail&lt;br /&gt;
&lt;a href="/about#subscribe"&gt;Subscribe&lt;/a&gt; — updates on this + other AI experiments&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://keithschacht.com/tags/AI"&gt;AI&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="AI"/></entry><entry><title>Quoting Dario’s interview with Lex Friedman</title><link href="https://keithschacht.com/2025/Feb/9/dario-on-agi/#atom-tag" rel="alternate"/><published>2025-02-09T22:34:27+00:00</published><updated>2025-02-09T22:34:27+00:00</updated><id>https://keithschacht.com/2025/Feb/9/dario-on-agi/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://youtu.be/ugvHCXCOmm4?si=dl_9Yb-13COwk_QU"&gt;&lt;p&gt;Dario from Anthropic is the most articulate and thoughtful person I’ve found on the subject of AI. In this interview he made a fantastic point I had not heard:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Let’s say it was 1995 and Moore’s Law was making computers faster and everyone was saying, “some day we’ll have super computers and we’ll be able to sequence the genome and do all these great things.” But there is no discrete point where we pass a threshold and then have super computers.&lt;/p&gt;
&lt;p&gt;Super computers is a term we use but it’s a vague term we use to describe computers which are a lot faster than what we have today. I feel the same way about AGI. There is a smooth exponential. And if, by AGI, you mean AI is getting better and better and will do more and more of what humans do until it’s smarter than humans and it will continue to get smarter from there, then I believe in AGI. But if AGI is some discrete thing, which is how many people talk about it, then it’s just a meaningless buzzword.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;But he then goes on to reference his essay in which he gives one of the best descriptions of an almost discrete point I’ve read! He doesn’t call this AGI but instead uses a great descriptor:a country of geniuses in a datacenter:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;In terms of pure intelligence4, it is smarter than a Nobel Prize winner across most relevant fields – biology, programming, math, engineering, writing, etc. This means it can prove unsolved mathematical theorems, write extremely good novels, write difficult codebases from scratch, etc.&lt;/p&gt;
&lt;p&gt;In addition to just being a “smart thing you talk to”, it has all the “interfaces” available to a human working virtually, including text, audio, video, mouse and keyboard control, and internet access. It can engage in any actions, communications, or remote operations enabled by this interface, including taking actions on the internet, taking or giving directions to humans, ordering materials, directing experiments, watching videos, making videos, and so on. It does all of these tasks with, again, a skill exceeding that of the most capable humans in the world.&lt;/p&gt;
&lt;p&gt;It does not just passively answer questions; instead, it can be given tasks that take hours, days, or weeks to complete, and then goes off and does those tasks autonomously, in the way a smart employee would, asking for clarification as necessary.&lt;/p&gt;
&lt;p&gt;It does not have a physical embodiment (other than living on a computer screen), but it can control existing physical tools, robots, or laboratory equipment through a computer; in theory it could even design robots or equipment for itself to use.
The resources used to train the model can be repurposed to run millions of instances of it (this matches projected cluster sizes by ~2027), and the model can absorb information and generate actions at roughly 10x-100x human speed5. It may however be limited by the response time of the physical world or of software it interacts with.&lt;/p&gt;
&lt;p&gt;Each of these million copies can act independently on unrelated tasks, or if needed can all work together in the same way humans would collaborate, perhaps with different subpopulations fine-tuned to be especially good at particular tasks.&lt;/p&gt;
&lt;/blockquote&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://youtu.be/ugvHCXCOmm4?si=dl_9Yb-13COwk_QU"&gt;Dario’s interview with Lex Friedman&lt;/a&gt;&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://keithschacht.com/tags/AI"&gt;AI&lt;/a&gt;&lt;/p&gt;



</summary><category term="AI"/></entry><entry><title>Study: Waymo robocars are safer than human drivers</title><link href="https://keithschacht.com/2025/Jan/7/study-waymo-robocars-are-safer-than-human-drivers/#atom-tag" rel="alternate"/><published>2025-01-07T18:59:03+00:00</published><updated>2025-01-07T18:59:03+00:00</updated><id>https://keithschacht.com/2025/Jan/7/study-waymo-robocars-are-safer-than-human-drivers/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://cleantechnica.com/2025/01/04/waymo-robotaxis-safer-than-any-human-driven-cars-much-safer"&gt;Study: Waymo robocars are safer than human drivers&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
An insurance company reported that Waymo has logged 25 mmillion fully autonomous miles and showed a 88% reduction in property damage claims and a 92% reduction in bodily injury claims. In other words, if humans had driven these same miles there would have been 8x as many property damage claims and 13x as many bodily injury claims.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://danielmiessler.com/blog/"&gt;Daniel Miessler&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://keithschacht.com/tags/AI"&gt;AI&lt;/a&gt;&lt;/p&gt;



</summary><category term="AI"/></entry><entry><title>Vampire game based around AI voice: SUCK UP!</title><link href="https://keithschacht.com/2024/Nov/26/vampire-game-based-around-ai-voice-suck-up/#atom-tag" rel="alternate"/><published>2024-11-26T15:47:33+00:00</published><updated>2024-11-26T15:47:33+00:00</updated><id>https://keithschacht.com/2024/Nov/26/vampire-game-based-around-ai-voice-suck-up/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.youtube.com/watch?v=811JkxLfvoA"&gt;Vampire game based around AI voice: SUCK UP!&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
This is one of the best examples I’ve seen of games creatively incorporating LLMs. You are a vampire character, you walk up to houses in a neighborhood and as the player you actually talk aloud to your computer and the NPCs talk back. You have to convince these NPCs to let you into their house.


    &lt;p&gt;Tags: &lt;a href="https://keithschacht.com/tags/AI"&gt;AI&lt;/a&gt;&lt;/p&gt;



</summary><category term="AI"/></entry><entry><title>Jina AI tool for simplifying webpages</title><link href="https://keithschacht.com/2024/Nov/3/jina-ai-tool-for-simplifying-webpages/#atom-tag" rel="alternate"/><published>2024-11-03T20:49:03+00:00</published><updated>2024-11-03T20:49:03+00:00</updated><id>https://keithschacht.com/2024/Nov/3/jina-ai-tool-for-simplifying-webpages/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://jina.ai/reader"&gt;Jina AI tool for simplifying webpages&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
This is a really handy tool for simplify a webpage before passing it into an LLM. It turns HTML into semi-structured markdown.


    &lt;p&gt;Tags: &lt;a href="https://keithschacht.com/tags/AI"&gt;AI&lt;/a&gt;&lt;/p&gt;



</summary><category term="AI"/></entry><entry><title>Using LLM to process video</title><link href="https://keithschacht.com/2024/Nov/3/using-llm-to-process-video/#atom-tag" rel="alternate"/><published>2024-11-03T20:11:46+00:00</published><updated>2024-11-03T20:11:46+00:00</updated><id>https://keithschacht.com/2024/Nov/3/using-llm-to-process-video/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://simonw.substack.com/p/video-scraping-using-google-gemini?open=false#%C2%A7video-scraping-extracting-json-data-from-a-second-screen-capture-for-less-than-th-of-a-cent"&gt;Using LLM to process video&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
I have done a lot of experimenting with passing screenshots into an LLM to give it additional context, but I really want to try passing video directly into the LLM. I believe Gemini is the only one that supports it. This is a summary of Simon Willison’s recent experiment with Gemini for video.


    &lt;p&gt;Tags: &lt;a href="https://keithschacht.com/tags/AI"&gt;AI&lt;/a&gt;&lt;/p&gt;



</summary><category term="AI"/></entry><entry><title>Automating app development with LLM</title><link href="https://keithschacht.com/2024/Nov/3/automating-app-development-with-llm/#atom-tag" rel="alternate"/><published>2024-11-03T19:37:19+00:00</published><updated>2024-11-03T19:37:19+00:00</updated><id>https://keithschacht.com/2024/Nov/3/automating-app-development-with-llm/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://codeinthehole.com/tips/llm-tdd-loop-script/"&gt;Automating app development with LLM&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
I keep automating more of my day-to-day programming using various tools, and this guy did a clever wrap around LLM. He writes a unit test in python, passes it to LLM, and it writes the code necessary to get the test to pass. This technique could prove useful for what I want to do with rails development.


    &lt;p&gt;Tags: &lt;a href="https://keithschacht.com/tags/AI"&gt;AI&lt;/a&gt;&lt;/p&gt;



</summary><category term="AI"/></entry><entry><title>Andressen and Horowitz on AI Robots</title><link href="https://keithschacht.com/2024/Nov/3/andressen-horowitz-discussion/#atom-tag" rel="alternate"/><published>2024-11-03T18:59:49+00:00</published><updated>2024-11-03T18:59:49+00:00</updated><id>https://keithschacht.com/2024/Nov/3/andressen-horowitz-discussion/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://youtu.be/1_ZB7O_9hlQ?si=g3GZ33j4z3RXPWN5"&gt;Andressen and Horowitz on AI Robots&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
In this discussion between Andreessen and Horowitz, they pointed out that we are on the verge of embodied AI being a generally useful too. My key takeaway was this:&lt;/p&gt;

&lt;p&gt;General purpose robots could do everything from be assistants in your home, build houses, fight wars, etc. This is being made possible by all the software advances in AI in addition to the hardware advances in actuators, batteries, and vision systems.&lt;/p&gt;

&lt;p&gt;The U.S. currently leads on the software side of AI, but China has a significant lead on the hardware side. The U.S. has basically made manufacturing illegal in the U.S. through environmental regulation, minimum wage requirements, and unions so U.S. companies have outsourced more advanced hardware to China. Now more than ever, with our strained relationship with China, we need to build these capabilities in the U.S. and at the same time U.S. regulators have their eyes on regulating the software AI capabilities.


    &lt;p&gt;Tags: &lt;a href="https://keithschacht.com/tags/AI"&gt;AI&lt;/a&gt;&lt;/p&gt;



</summary><category term="AI"/></entry></feed>