So I have been playing with AI for a while now, but the promise of Vibe Coding is still lost on me. Most modern LLMs help with coding tasks, or the chatbots can help with ideation.
However, as soon as you want to do something more complex, it becomes an unmitigated disaster of word
salad. Sometimes I wonder how many Kamala Harris speeches
were written by AI, or accidentally or deliberately in the training set of California-based AI providers.
Okay, politics aside and back to Gemini. We have Gemini Pro as part of our Google subscription, so since it
came with the pack there was an incentive to try it out.
Things that, at the time of writing, seem to work really well with Gemini Pro:
- Synthesizing small code snippets for easy problems
- Draft diagramming in Mermaid
- Summarizing text
- Studio Ghibli style image generation (although input filters are a lot more restrictive now than when it was
first released), or concept art (see above)
While the deep research feature is nice, it is an unmitigated disaster when you ask for anything
that you could not resolve within a few Google searches. Google should not call it “deep research”,
they should call it “deep survey”.
Also, coding in agent mode is still a disaster. In addition to Gemini, we use GitHub Copilot. Do not ever give
any of these tools access to your code repository. They will happily start making changes that
break your codebase. It then becomes an insidious game of whack-a-mole to fix the codebase while the AI
keeps breaking it. The pain varies by language. Python seems to kind of work, but as soon as you venture
into C, C++, TypeScript, or Rust, it becomes a nightmare. These agents struggle to understand complex
build toolchains, dependencies, or code inlining for performance. The best way to use AI for coding is
to fence it off to small tasks by suggestion or auto-completion, but be sure to know your IDE’s shortcut key
for disabling it quickly.
So the above is all well and good, but it does not really justify the valuations of these AI companies.
Can we use it for something useful?
The two areas I have found that Gemini Pro is actually useful for me are:
- Bad handwriting recognition, even for foreign languages like German and Japanese
- Summarizing long videos and podcasts into text
Handwriting Recognition
I have terrible handwriting. However, since my teens I have kept a journal, probably mostly inspired by
films like Dances with Wolves.
Over the years, I have accumulated a few thick notebooks full of
handwritten notes in German. My kids struggle to read my handwriting, and they are not fully fluent in German yet. So in the interest of digital inheritance,
I was wondering what the best way to digitize my handwritten notes would be. I tried a few OCR tools, but
Gemini Pro blew them all out of the water. By just taking a picture of the handwritten notes, Gemini Pro
was able to transcribe them with amazing accuracy. As part of the prompt, specify the input language and it will
transcribe accordingly.
This also works via their API. With a little bit
of coding and prompt engineering, you can potentially “revolutionize” your note-taking workflow at work too,
especially if you have terrible handwriting like me.
Video and Podcast Summarization
Over the decades there have been numerous tools to scrape various multimedia content from the web. However,
what has been missing is a way to make them indexable and searchable. Especially if someone recommends a podcast
or technical lecture, and you do not want to sit through the entire thing, it would be nice to have a summary of the content.
AI has made this possible. I have for a couple of years already been using Whisper to transcribe audio content into text, or generate makeshift subtitles for videos. However, this either incurs OpenAI
cloud charges, or requires a local model installation. For me, Whisper still takes considerable time on an Nvidia 5060 to transcribe larger
batches of audio content.
Since we are already in the Google ecosystem, I decided to try Gemini Pro for this task. The process is fairly straightforward:
- Download the video or podcast audio locally
- Upload it to Gemini
- Use Gemini Pro to transcribe the content, explicitly prompt it not to omit any sections
- For large files there seems to be an option to stage it via Google Drive too
Since the output is Markdown, this can be easily recycled as documentation or wiki content.
Conclusion
I think, like many other folks, that AI (specifically LLMs) is probably overhyped right now. However, there are some
specific use cases where they can be really useful. There is lots of potential for specialized applications that were
computationally infeasible a few years ago. For example, think how far image classification has come since the early days of
CNNs, even though neural networks have been around for decades.
I feel like we are still in the early days of LLMs, and there is a lot of room for improvement. The misconception seems
to be to commercialize LLMs as general-purpose agents that can do everything. However, the reality is that they are still
quite limited in their capabilities. The key is to find the right use cases where they can add real value. For me, handwriting recognition
and multimedia summarization are two such areas where Gemini Pro shines. For constrained technical tasks as well.
Since so many people are handing off their creativity to AI, I fear we are on a trajectory that puts us closer
to what Idiocracy envisioned 20 years ago, rather than replacing Pressfield’s Muse with something better. Time will tell.
Published: 2025-10-10
Updated : 2025-10-10
Not a spam bot? Want to leave comments or provide editorial guidance? Please click any
of the social links below and make an effort to connect. I promise I read all messages and
will respond at my choosing.