Skip to content
back to writing
April 28, 2026·3 min

I Built a Local AI Study Buddy in 30 Minutes

A small local pipeline that watches what I am reading, turns it into OCR logs, summarizes the concepts, and posts study updates to Discord.

#automation#python#local-ai#ollama#discord#study

I wanted a simple thing: an AI system that knows what I am studying without me manually logging anything.

Not a productivity app. Not another dashboard. Not a chatbot I have to keep updating.

Just something running quietly on my laptop, watching what I am reading, extracting the topic, and sending me summaries later.

So I built Study Buddy v1.

What it does

The first version is intentionally small:

screen crop -> OCR -> response log -> batch summaries -> Discord

A Python script takes a screenshot every few seconds, crops the part of the screen I care about, hashes it, and skips it if nothing changed. If the image changed, it saves it as img.png and sends it to a local Ollama vision/OCR model.

The response is appended to responses.jsonl.

A second process watches that file. When enough new responses have arrived, it batches them, asks another local model to extract the key concepts, and posts the summary to Discord.

That is the whole system.

Why this is useful

The important part is that I do not have to tell it what I am studying.

If I am reading about OS scheduling, it sees that. If I switch to probability, it sees that. If I spend an hour on one niche concept, the summaries start reflecting that.

It becomes a passive study log:

What did I actually read?
What topics kept coming up?
What niche concepts did I touch?
What should I revise later?

The output is not perfect, but it is already useful. It gives me a lightweight memory of my study session without requiring discipline from me.

Fully local

The part I care about most: no data leaves my laptop.

The screenshots are processed locally. The OCR model runs through Ollama. The summarizer also runs through Ollama. The files are local JSONL and JSON files.

I can choose the model. I can choose the prompt. I can change the crop box. I can decide what gets stored, archived, summarized, or ignored.

There is no account, no hosted backend, no vendor-specific workflow.

It is just my laptop observing my laptop.

The architecture

The first mistake would have been to make one giant script.

Instead, I split it into processes:

Capture process

Watches the screen, runs OCR, and writes events to responses.jsonl.

Summary watcher

Reads responses.jsonl, waits for a complete batch, summarizes it, writes summaries.json, and posts to Discord.

State file

Tracks what has already been posted so restarts do not duplicate everything.

This is the same pattern I keep coming back to:

capture -> append-only log -> processor -> dispatcher

It is boring, which is exactly why it works.

What is next

The obvious next step is audio.

If I am watching a lecture, the screen alone is not enough. I want the system to understand what is being said too. The current plan is to add a separate audio or YouTube transcription pipeline that produces another event log, then feed that into the same summarizer.

Eventually the shape becomes:

screen + camera + audio + YouTube -> local logs -> study summaries

But even v1 is enough to prove the idea.

In 30 minutes, I went from nothing to a local AI system that knows what I am reading, summarizes what I studied, and sends me updates.

That is a good trade.