<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Build Stories on Chinese Man</title><link>https://chineseman.net/categories/build-stories/</link><description>Recent content in Build Stories on Chinese Man</description><image><title>Chinese Man</title><url>https://chineseman.net/logo.jpg</url><link>https://chineseman.net/logo.jpg</link></image><generator>Hugo</generator><language>en-us</language><lastBuildDate>Sun, 21 Jun 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://chineseman.net/categories/build-stories/index.xml" rel="self" type="application/rss+xml"/><item><title>How I Built a Private, On-Device Speech-to-Text App with Whisper</title><link>https://chineseman.net/building-private-speech-to-text-whisper-app/</link><pubDate>Sun, 21 Jun 2026 00:00:00 +0000</pubDate><guid>https://chineseman.net/building-private-speech-to-text-whisper-app/</guid><description>The story of building Private Transcribe, an offline speech-to-text app powered by Whisper that runs entirely on your phone in 99+ languages, so your audio never touches a server. Model tiers, the privacy angle, the UX around the model, and what I learned shipping it solo.</description><content:encoded><![CDATA[<p>Most transcription apps upload your audio to a server to process it. If you are
recording anything sensitive, a doctor&rsquo;s visit, an interview, a private voice note, a
business call, that is a hard no. So I built <strong>Private Transcribe</strong>: speech-to-text
powered by <strong>Whisper</strong>, running entirely on your device. Your audio never leaves your
phone. &ldquo;Your voice, your privacy.&rdquo; Here is how it came together, and why the privacy
promise shaped every technical decision.</p>
<h2 id="the-problem-with-free-transcription">The problem with &ldquo;free&rdquo; transcription</h2>
<p>Online transcription is rarely free in the way that matters. You pay with your audio. It
gets uploaded, processed on someone&rsquo;s servers, and frequently retained, sometimes to
improve their models. For casual voice memos, maybe you do not care. For anything
confidential, you absolutely should, because once audio leaves your device you have lost
control of it forever.</p>
<p>The promise I wanted to make, and keep, was simple: your audio never leaves your phone,
and the app never uploads your audio or transcriptions to any server. You can only make
that promise credibly if the app genuinely has no server in the loop. So it does not. The
absence of a backend is not a limitation here, it is the entire product.</p>
<h2 id="whisper-on-a-phone">Whisper on a phone</h2>
<p>The thing that makes this possible is <strong>Whisper</strong>, the open speech-recognition model
that is remarkably good across languages. Private Transcribe handles <strong>99+ languages</strong>
through multilingual models, which is a feature you simply could not offer affordably if
every minute of audio had to round-trip through your own servers. On-device, languages
are free.</p>
<p>The engineering challenge, like with any on-device AI, is fitting a capable model onto a
phone without melting it. Whisper comes in sizes, and the size you choose is a direct
trade between speed, accuracy, and storage. Rather than guessing for the user, I expose
that choice as a <strong>quality tier</strong>:</p>
<ul>
<li><strong>Tiny (75 MB):</strong> fastest, great for quick notes where rough accuracy is fine.</li>
<li><strong>Base (142 MB):</strong> a balanced default that suits most people.</li>
<li><strong>Small (466 MB):</strong> noticeably higher quality for important recordings.</li>
<li><strong>Medium (1.5 GB):</strong> professional-grade accuracy for when it really matters.</li>
</ul>
<p>Someone on an older phone who wants speed picks Tiny. Someone transcribing a crucial
interview on a newer device picks Medium and waits a little longer for a better result.
Giving users the dial, with a sane default already selected, beats pretending one size
fits everyone. It also respects that people know their own situation better than I do.</p>
<h2 id="the-ux-around-the-model">The UX around the model</h2>
<p>The model is the engine, but the app is everything around it, and that wrapper is where
most of the product work actually lives:</p>
<ul>
<li><strong>Tap to record</strong> with real-time progress, so transcription feels responsive instead
of like staring into a black box wondering if anything is happening.</li>
<li><strong>Local storage</strong> of every transcription, with <strong>search</strong>, so you can actually find
that note from three weeks ago instead of scrolling forever.</li>
<li><strong>Sharing</strong> for the moments when you <em>do</em> want to send a transcript out, on your
terms, by deliberate choice, never automatically.</li>
<li>A clean <strong>dark interface</strong> that is comfortable for reading long blocks of text.</li>
</ul>
<p>A surprising amount of &ldquo;AI app&rdquo; work is exactly this: making the model&rsquo;s output findable,
usable, and trustworthy. The intelligence is table stakes. The product is the wrapper
around it, and a brilliant model behind a frustrating interface is a brilliant model
nobody uses.</p>
<h2 id="handling-the-unglamorous-edges">Handling the unglamorous edges</h2>
<p>Real recordings are messy. People pause, they ramble, they record for forty minutes. A
toy demo transcribes ten seconds of clean speech. A real app has to handle long audio
without running the phone out of memory, show sensible progress on a file that takes a
while, and never lose a transcription because the app got backgrounded. None of that is
glamorous, and all of it is the difference between a demo and something people rely on.</p>
<h2 id="where-vibe-coding-fit">Where vibe coding fit</h2>
<p>The AI assistant was a genuine force multiplier on the surrounding app: the recording
screen, the progress UI, the local database with search, the model download and storage
management, the share flow. All of that is well-trodden territory it could scaffold
quickly, which freed my attention for the parts that mattered.</p>
<p>What stayed firmly on me was the careful work: running Whisper efficiently on-device,
handling longer recordings without choking on memory, and, most importantly, making sure
the privacy promise was actually true end to end, with no sneaky analytics call quietly
shipping audio off the device. When your entire pitch is privacy, the privacy has to be
real, and that is precisely the kind of thing you verify by reading every line yourself,
not by trusting generated code. I dug into that division of labor in
<a href="/my-vibe-coding-toolkit/">my vibe coding toolkit</a>.</p>
<h2 id="why-on-device-was-worth-the-extra-work">Why on-device was worth the extra work</h2>
<p>Building this on-device was harder than wiring up a cloud API would have been. There was no
shortcut, no someone-else&rsquo;s-server to quietly offload the difficult part onto. But that
difficulty is exactly the moat. Anyone can wrap a transcription API in a weekend, and a
thousand people already have, which is why those apps all blur together. Almost nobody ships
something that genuinely keeps your audio on your phone, because it is more work and there is
no usage to bill for at the end of it. The hard path turned out to be the defensible one, and
the privacy promise that came with it is the precise reason a user would choose this app over
the dozen cloud-based ones above it in the search results.</p>
<h2 id="lessons">Lessons</h2>
<ul>
<li><strong>Privacy is a feature you can build a whole product around,</strong> but only if you mean it
architecturally. &ldquo;On-device&rdquo; is a promise you have to earn in the code, not a sticker
you slap on the listing.</li>
<li><strong>Expose the real trade-off.</strong> Speed versus accuracy is genuine and personal, so let
users choose their model tier instead of deciding for them.</li>
<li><strong>The wrapper is the product.</strong> Search, storage, sharing, and a calm interface are what
turn a model into something people open every day.</li>
</ul>
<p>Private Transcribe is part of a clear pattern in what I build: take an AI capability
people assume requires the cloud, and prove it can run privately in your pocket. If that
idea interests you, I did the same thing for chat in
<a href="/building-offline-ai-chat-app-personal-llm/">how I built an offline AI chat app</a>,
and I wrote about the strategy behind it in
<a href="/why-i-build-small-apps/">why I build small apps</a>.</p>
]]></content:encoded></item><item><title>Building Capybara Crossing: What Shipping a Casual Arcade Game Taught Me</title><link>https://chineseman.net/building-capybara-crossing-arcade-game/</link><pubDate>Thu, 18 Jun 2026 00:00:00 +0000</pubDate><guid>https://chineseman.net/building-capybara-crossing-arcade-game/</guid><description>The making of Capybara Crossing, an endless hop-across-the-road arcade game starring a capybara. Game feel, the anti-idle eagle mechanic, the coin economy, character unlocks, and the lessons from shipping a casual mobile game solo.</description><content:encoded><![CDATA[<p>After a run of utility apps, I wanted to build something that was pure fun, something
my niece could pick up in two seconds without a tutorial. The &ldquo;hop across endless lanes
of traffic&rdquo; formula is a classic for a reason, so I put a <strong>capybara</strong> in it and made
<strong>Capybara Crossing</strong>: hop across roads, train tracks, and rivers, collect coins, and
try not to die. &ldquo;Hop to survive.&rdquo; Here is what building it taught me, because a casual
game is a very different discipline from a tool.</p>
<h2 id="why-a-casual-game">Why a casual game</h2>
<p>Utility apps solve a problem. Casual games create a feeling. They are completely
different design challenges, and I wanted the reps in the second one. The bar is also
deceptively high. The gameplay is simple, but simple has nowhere to hide. If the hop
does not feel good, there is no feature list to distract from it, no settings screen to
get lost in. The core loop is the entire product, naked and exposed.</p>
<p>The pitch is tiny, and that is the point: you are a capybara, the world is an endless
gauntlet of cars, trucks, buses, trains, and rivers, and you hop forward as far as you
can. Cute character, instant understanding, one-thumb controls. A five-year-old and a
fifty-year-old both get it in the first second.</p>
<h2 id="game-feel-is-the-entire-product">Game feel is the entire product</h2>
<p>In a utility app, &ldquo;it works&rdquo; is enough. In a game, &ldquo;it works&rdquo; is the floor, and <em>feel</em>
is the ceiling. I spent a wildly disproportionate amount of time on things a spec sheet
would never list:</p>
<ul>
<li><strong>The hop.</strong> Its timing, the little arc, the snap onto the next tile, the tiny squash
when you land. This single animation is most of whether the game feels good or cheap.</li>
<li><strong>Readability.</strong> You have to instantly see where it is safe to land. Lane spacing,
hazard timing, and color all serve clarity, because a death that feels unfair makes
people quit, while a death that feels like their own fault makes them try again.</li>
<li><strong>The difficulty ramp.</strong> Easy enough to start, relentless enough that &ldquo;one more try&rdquo;
becomes irresistible. The curve has to feel fair the whole way up.</li>
</ul>
<p>You cannot spec your way to good feel. You build it, play it a hundred times, change one
number, and play it a hundred more. That tuning loop is where most of the real work
lives, and there is no shortcut through it.</p>
<h2 id="the-eagle-solving-the-standing-still-problem">The eagle: solving the &ldquo;standing still&rdquo; problem</h2>
<p>Endless hoppers have a classic exploit. The player just stops moving and stays safe
forever, which kills all the tension. My fix is a threat baked into the design: stay
still too long and an <strong>eagle swoops down</strong> and takes you. Keep moving to survive.</p>
<p>It is a tiny mechanic with an outsized effect. It removes the safe option, keeps tension
constant, and quietly converts the game from &ldquo;avoid hazards when you feel like moving&rdquo;
into &ldquo;always be moving, manage the danger as you go.&rdquo; The best game design is often one
small rule that invisibly forces the behavior you want, instead of a tutorial telling
players how they should act. The eagle never explains itself. It just teaches you, once,
and you never stand still again.</p>
<h2 id="coins-characters-and-the-reason-to-come-back">Coins, characters, and the reason to come back</h2>
<p>A high score is a reason to play once. <strong>Progression</strong> is a reason to come back. The
loop is simple:</p>
<ul>
<li>Collect <strong>coins</strong> as you hop, which gives every run a second purpose beyond distance.</li>
<li>Spend them to <strong>unlock characters</strong>, six of them, including a Pirate, a Ninja, and a
Space Capy.</li>
</ul>
<p>Cosmetic unlocks are perfect for a casual game. They give players goals without adding
rules to learn, and &ldquo;I just want the Space Capy&rdquo; turns out to be a surprisingly strong
retention hook. It also keeps the game fair, because everything is earnable by playing
rather than paying, which builds goodwill instead of resentment.</p>
<h2 id="where-vibe-coding-helped-and-where-it-did-not">Where vibe coding helped, and where it did not</h2>
<p>AI assistance was excellent for the <strong>scaffolding</strong>: the core game loop, procedural lane
spawning, collision detection, the coin and save system, and the unlock screens. That
got me from nothing to a playable prototype fast, which is exactly when a game project
is most fragile and most likely to be abandoned. Getting to &ldquo;I can play this&rdquo; quickly
kept the momentum alive.</p>
<p>What the AI absolutely could not do was tell me whether the game felt good. Tuning the
hop, spacing the hazards so they are hard but fair, pacing the difficulty so it pulls
you forward without frustrating you, all of that is taste and playtesting, and it stays
firmly human. AI gets you a working game. Only playing it, over and over, turns it into
a <em>fun</em> game.</p>
<h2 id="what-polish-really-means">What polish really means</h2>
<p>The biggest surprise was how much the tiny, invisible details mattered. A casual game lives or
dies on a hundred things no player could ever name: the exact delay before the eagle appears,
the weight of the landing, the half-second of feedback after a coin, the way the camera nudges
forward. None of it shows up in a feature list, and all of it is the difference between a game
that feels cheap and one that feels good in the hand. That obsession with feel is a muscle,
and once you build it on a game where feel is the entire product, you start noticing the same
missing polish everywhere else you ship, in the tools you thought were already done.</p>
<h2 id="lessons-from-shipping-it">Lessons from shipping it</h2>
<ul>
<li><strong>The first 30 seconds decide everything.</strong> Casual players judge instantly. If the
first session is not fun, there is no second session, so the opening has to land.</li>
<li><strong>Juice matters.</strong> Small feedback, the sounds, the little animations, the satisfying
coin pop, makes simple gameplay feel good. Juice is not decoration, it is the feel.</li>
<li><strong>Keep it offline and free.</strong> No connection required, local progress saving, no
paywalls standing between the player and fun. Friction is the enemy of casual.</li>
<li><strong>Constraints breed charm.</strong> A capybara and one clean mechanic beat a sprawling design
I would never have finished. The limits made the game, they did not hold it back.</li>
</ul>
<p>Building a game after a string of tools was the best thing I did for my craft this year.
It forces you to care about feel, not just function, and that lesson follows you back
into everything else you build, including the serious apps where &ldquo;it works&rdquo; was secretly
never quite enough either.</p>
]]></content:encoded></item><item><title>How I Built an AI Chat App That Runs Entirely On Your Phone</title><link>https://chineseman.net/building-offline-ai-chat-app-personal-llm/</link><pubDate>Sun, 14 Jun 2026 00:00:00 +0000</pubDate><guid>https://chineseman.net/building-offline-ai-chat-app-personal-llm/</guid><description>The story of building Personal LLM, an offline, private AI chat app that runs Qwen 3 and GLM-Edge models on-device with no servers, no accounts, and no subscriptions. What was hard, what I learned about running language models on a phone, and how I shipped it solo.</description><content:encoded><![CDATA[<p>Every mainstream AI chat app sends your conversations to someone&rsquo;s server. I wanted the
opposite: an app where the model runs on your phone, your messages never leave the
device, and there is no login and no subscription. That became <strong>Personal LLM</strong>, and
getting a language model to run well on a phone turned out to be the most interesting
engineering problem I have taken on as a solo builder.</p>
<h2 id="the-itch">The itch</h2>
<p>I kept hitting the same three frustrations with mainstream AI apps:</p>
<ol>
<li><strong>Privacy.</strong> Everything you type goes to a server and, often, into training data.</li>
<li><strong>Connectivity.</strong> On a plane or with bad signal, they are useless.</li>
<li><strong>Cost.</strong> Another $10 to $20 a month for something I use in unpredictable bursts.</li>
</ol>
<p>On-device models had quietly gotten good enough that I started to wonder: what if none
of that were true? What if the AI just lived on your phone, the way a calculator does?
&ldquo;Your AI, your device,&rdquo; private, offline, and actually powerful. The idea would not let
go of me, so I built it.</p>
<h2 id="the-hard-part-a-language-model-on-a-phone">The hard part: a language model on a phone</h2>
<p>This is where it stops being a UI project and becomes a systems problem. Phones are not
servers, and the constraints are brutal enough that they drive every single decision.</p>
<ul>
<li><strong>Memory.</strong> A phone might have 4 to 8GB of RAM, shared with the OS and every other
app. A model that loads fine on a laptop will simply get killed on a phone. That
alone rules out most models.</li>
<li><strong>Model size on disk.</strong> Nobody will tolerate a 20GB download. The realistic sweet
spot is a few hundred MB to a few GB.</li>
<li><strong>Speed.</strong> Tokens per second has to feel like a conversation, not a fax machine
slowly printing.</li>
<li><strong>Heat and battery.</strong> Run the processor flat out and the phone gets hot and the
battery drains, so the work has to be efficient, not just possible.</li>
</ul>
<p>The unlock is <strong>small, quantized models</strong>. Quantization shrinks a model by storing its
weights at lower precision, which trades a little quality for a huge reduction in size
and memory. I built around <strong>Qwen 3</strong>, from tiny 0.6B variants up to larger ones, and
<strong>GLM-Edge</strong>, including vision-capable variants, with downloads ranging from roughly
500MB to 4GB. The user picks a model that fits <em>their</em> device, downloads it once, and
from then on it runs fully offline.</p>
<p>The trade-off you have to make peace with, and be honest with users about, is this: a
1B model on a phone is not a frontier model in the cloud. It will not write your
dissertation. But for quick questions, drafting, summarizing, brainstorming, rewriting,
and anything on a plane, it is genuinely useful, and it is <em>yours</em>. Setting that
expectation clearly inside the app matters more than overpromising and disappointing
people on their first message.</p>
<h2 id="giving-users-the-dials-with-sane-defaults">Giving users the dials, with sane defaults</h2>
<p>Different people want different behavior, so I exposed the controls that matter:
temperature, top-k, and top-p for shaping how creative or focused the responses are,
plus toggles for a &ldquo;thinking&rdquo; mode versus a &ldquo;fast&rdquo; mode depending on whether you want
careful reasoning or a quick reply. The risk with exposing model internals is
overwhelming a casual user, so the rule was: every control has a good default, and you
never need to touch any of them to get a solid answer. Power users get the dials.
Everyone else gets something that just works out of the box.</p>
<h2 id="where-vibe-coding-carried-me">Where vibe coding carried me</h2>
<p>I am one person. I could not have shipped this typing every line. The AI assistant
handled the parts that are tedious but well understood:</p>
<ul>
<li>The <strong>chat UI</strong>: message bubbles, streaming token rendering, scroll behavior, the
copy button, the empty state.</li>
<li>The <strong>model download manager</strong>: progress, resume after interruption, storage checks,
and deletion to free space.</li>
<li>The <strong>settings layer</strong>: surfacing temperature, top-k, top-p, and the mode toggles in
a way that a normal person can ignore safely.</li>
</ul>
<p>What I had to own personally was the hard 20 percent: wiring up on-device inference,
managing memory so the app does not get killed mid-generation, and tuning the defaults
so a non-technical person gets a good answer without touching a single slider. That
split, where the AI does the boilerplate and you own the load-bearing parts, is the
whole game. I wrote about it in
<a href="/what-vibe-coding-actually-is/">what vibe coding actually is</a>.</p>
<h2 id="product-decisions-that-mattered">Product decisions that mattered</h2>
<ul>
<li><strong>One-time download, then offline forever.</strong> The only moment of friction is the first
model download. After that, zero network and zero waiting, ever.</li>
<li><strong>No account.</strong> An account is a privacy promise you can break. No account means there
is nothing to leak, nothing to breach, and nothing to log in to.</li>
<li><strong>Sensible defaults, optional depth.</strong> Casual users get a model that just answers.
Power users get the sampling controls. Neither group is punished for the other.</li>
<li><strong>Free.</strong> With no server costs to cover, there is no subscription to justify, which
also means no billing, no payment processing, and no churn to manage.</li>
</ul>
<h2 id="what-i-would-tell-another-builder">What I would tell another builder</h2>
<p>On-device AI feels like magic to users precisely because everyone assumes it is
impossible. &ldquo;You can run <em>that</em> on a phone?&rdquo; That gap between expectation and reality is
a wonderful place to build a product, because the perceived difficulty is doing your
marketing for you. But respect the constraints. Pick models per device tier, be honest
about what a small model can and cannot do, and spend your scarce hand-written effort on
memory and performance, not on the parts an AI can scaffold in an afternoon.</p>
<p>The result is an app I actually use, on planes, on the subway, and any time I would
rather not hand my private thoughts to a server. That is the kind of product worth
shipping: one you reach for yourself. Building privacy-first apps turned into a whole
theme for me, which I dug into again with
<a href="/building-private-speech-to-text-whisper-app/">an offline speech-to-text app</a>.</p>
]]></content:encoded></item></channel></rss>