Hacker News: Raywob

Show HN: Vibe Check – Client-side invisible Unicode steganography scanner

Raywob — Mon, 27 Apr 2026 04:08:53 +0000

Glassworm has hit 400+ repos across GitHub, npm, and VS Code using invisible Unicode characters to encode executable payloads that pass every code review, linter, and AI assistant.

Vibe Check is a browser-based scanner that detects these characters across 14 invisible Unicode ranges (zero-width spaces, variation selectors supplement, tag characters, bidi overrides, etc.) and flags sequences of 3+ consecutive invisible characters as likely payloads. Entirely client-side JS — no code leaves your browser.

Not a full SAST tool. Solves one specific problem: detecting characters that are invisible in every editor and terminal but can encode payloads decoded via eval() at runtime.

Scanner logic is in scanner.js, viewable in browser. Site runs on Cloudflare Pages free tier.

https://websationflow.com

Comments URL: https://news.ycombinator.com/item?id=47917600

Points: 1

# Comments: 0

My AI didn't misread a receipt – it fabricated one from scratch

Raywob — Wed, 18 Mar 2026 02:55:30 +0000

I pointed a vision model at a grocery receipt. It returned a store name, item list, and total. None of it was on the paper.

This wasn't OCR error. The model didn't confuse a "7" for a "1." It generated a plausible-looking receipt from scratch — different store, different items, different prices. If I hadn't been holding the original, I might not have caught it.

Same image, different model (same parameter count, same hardware), five seconds later: every item correct, store name right, total accurate to the penny.

The models: minicpm-v 8B (fabricated) vs qwen3-vl 8B (accurate). Both open source, both ~6GB VRAM, both running locally via Ollama on an RTX 5080.

What I learned:

1. Vision model hallucination is qualitatively different from text hallucination. A text model gives you a wrong answer to a real question. A vision model gives you a confident answer to an image it didn't process. The second is harder to detect.

2. Model selection matters more than prompt engineering for vision. Same prompt, same image — one model fabricated, one read accurately. No prompt optimization fixes a model that invents data.

3. Confidence scoring is mandatory. I added a reconciliation check: do the extracted items sum to roughly the stated total? This catches fabrication that looks plausible at the individual line-item level.

4. The fix wasn't more money or a bigger model. Same size (8B), same hardware, same cost ($0). Just a different architecture that actually reads pixels instead of generating plausible text about them.

Full writeup with the pipeline architecture and code patterns: https://dev.to/rayne_robinson_e479bf0f26/my-ai-read-a-receipt-wrong-it-didnt-misread-it-it-made-one-up-4f5n

Comments URL: https://news.ycombinator.com/item?id=47421107

Points: 4

# Comments: 0

New comment by Raywob in "The $2k Laptop That Replaced My $200/Month AI Subscription"

Raywob — Thu, 19 Feb 2026 16:04:33 +0000

Haven't tried GPT-OSS-20B yet — the MOE approach is interesting for keeping VRAM usage down while getting better reasoning. 85 t/s on a 3060 is impressive. I'll look into that.

I've been on Qwen3 8B mostly because it was "good enough" for the mechanical stages (scanning, scoring, dedup) and I didn't want to optimize the local model before validating the orchestration pattern itself. Now that the pipeline is proven, experimenting with the local model is the obvious next lever to pull.

The Qwen3 4B 2507 claim is interesting — if the quality holds for structured extraction tasks, halving the VRAM footprint would open up running two models concurrently or leaving more room for larger contexts. Worth testing.

Thanks for the pointers — this is exactly the kind of optimization I haven't had time to dig into yet.

New comment by Raywob in "The $2k Laptop That Replaced My $200/Month AI Subscription"

Raywob — Thu, 19 Feb 2026 15:09:39 +0000

For the mechanical stages (scanning, scoring, dedup) — indistinguishable from proprietary models. These are structured tasks: "score this post 1-10 against these criteria" or "extract these fields from this text." An 8B model handles that fine at 30 tok/s on consumer GPU.

For synthesis and judgment — no, it's not close. That's exactly why I route those stages to Claude. When you need the model to generate novel connections or strategic recommendations, the quality gap between 8B and frontier is real.

The key insight is that most pipeline stages don't need synthesis. They need pattern matching. And that's where the 95% cost savings live.

The $2k Laptop That Replaced My $200/Month AI Subscription

Raywob — Thu, 19 Feb 2026 14:47:59 +0000

Cloud AI pricing is per-token. The more useful your pipeline, the more it costs. I built a dual-model orchestration pattern that routes 80% of work to a free local model (Qwen3 8B on Ollama, GPU-accelerated) and only sends the synthesis/judgment stage to a cloud API.

Cost for a 50-item research pipeline: $0.15-0.40 vs $8-15 all-cloud. Same output quality where it matters.

Stack: RTX 5080 laptop, Ollama in Docker with GPU passthrough, PostgreSQL, Redis, Claude API for the final 20%.

The pattern: scan locally → score locally → deduplicate locally → synthesize via cloud. Four stages, three are free.

Gotchas I hit: Qwen3's thinking tokens through /api/generate (use /api/chat instead), Docker binding to IPv4 only while Windows resolves localhost to IPv6, and GPU memory ceilings on consumer cards.

Happy to share architecture details in comments.

Comments URL: https://news.ycombinator.com/item?id=47074347

Points: 8

# Comments: 5