Hacker News: airylizard

Show HN: Patient Glue a more affordable SMS solution for healthcare that I built

airylizard — Tue, 16 Sep 2025 14:13:26 +0000

Hey y'all

Wanted to introduce Patient Glue, it's an all-in-one SMS platform for healthcare that integrated directly into your EHR.

Currently have an up to 3 month pilot for you to try it for free!

Unlike other service offerings, Patient Glue is made to be budget friendly with a flat-rate per number of seats and usage pricing model.

I added in some AI integrations which are configurable and some automated workflows to make the scheduling process easier.

Built this because I had worked at a smaller practice and the clinic literally couldn't afford those other options.

Check it out yourself, open for any and all feedback!

Remember, happy patients stick around!

Comments URL: https://news.ycombinator.com/item?id=45262520

Points: 1

# Comments: 0

Think Before You Speak – Exploratory Forced Hallucination Study [pdf]

airylizard — Mon, 16 Jun 2025 23:58:00 +0000

This is a research/discovery post, not a polished toolkit or product.

The Idea in a nutshell:

"Hallucinations" aren't indicative of bad training, but per-token semantic ambiguity. By accounting for that ambiguity before prompting for a determinate response we can increase the reliability of the output.

Two‑Step Contextual Enrichment (TSCE) is an experiment probing whether a high‑temperature “forced hallucination”, used as part of the system prompt in a second low temp pass, can reduce end-result hallucinations and tighten output variance in LLMs.

What I noticed:

In >4000 automated tests across GPT‑4o, GPT‑3.5‑turbo and Llama‑3, TSCE lifted task‑pass rates by 24 – 44 pp with < 0.5 s extra latency.

All logs & raw JSON are public for anyone who wants to replicate (or debunk) the findings.

Would love to hear from anyone doing something similar, I know other multi-pass prompting techniques exist but I think this is somewhat different.

Primarily because in the first step we purposefully instruct the LLM to not directly reference or respond to the user, building upon ideas like adversarial prompting.

I posted an early version of this paper but since then have run about 3100 additional tests using other models outside of GPT-3.5-turbo and Llama-3-8B, and updated the paper to reflect that.

Code MIT, paper CC-BY-4.0.

Comments URL: https://news.ycombinator.com/item?id=44294446

Points: 2

# Comments: 0

Evaluating AI Agents with Azure AI Evaluation

airylizard — Fri, 23 May 2025 23:07:10 +0000

Article URL: https://techcommunity.microsoft.com/blog/azure-ai-services-blog/evaluating-agentic-ai-systems-a-deep-dive-into-agentic-metrics/4403923

Comments URL: https://news.ycombinator.com/item?id=44077404

Points: 5

# Comments: 0

New comment by airylizard in "Introducing the Llama Startup Program"

airylizard — Wed, 21 May 2025 23:23:22 +0000

Exactly what leads to inaccurate output in LLM's. The semantic interpretation of each individual token isn't the same between us and it. "Interpretation", we likely define accuracy and reliability pretty closely the same you and I, but that interpretation is where we differ. As for my "definition" it's in the repo. I'm not funded by anyone, don't get paid, and have no product to sell. So if you're genuinely interested in discussing, I'm all for it!

New comment by airylizard in "Harnessing the Universal Geometry of Embeddings"

airylizard — Wed, 21 May 2025 19:15:36 +0000

Are you continuing research? Is there somewhere we can follow along?

New comment by airylizard in "Harnessing the Universal Geometry of Embeddings"

airylizard — Wed, 21 May 2025 19:06:04 +0000

The fact that embeddings from different models can be translated into a shared latent space (and back) supports the notion that semantic anchors or guides are not just model-specific hacks, but potentially universal tools. Fantastic read, thank you.

Given the demonstrated risk of information leakage from embeddings, have you explored any methods for hardening, obfuscating, or 'watermarking' embedding spaces to resist universal translation and inversion?

New comment by airylizard in "AI's energy footprint"

airylizard — Wed, 21 May 2025 17:54:35 +0000

As more and more people use brute force loops to make their AI agents more reliable, this hidden inference giant will only continue to grow. This is why I put my framework together, using just 2 passes as opposed to n+ can increase accuracy and reliability by a far greater amount than brute force loops while using significantly less resources. Supportive evidence and data can be found in my repo: https://github.com/AutomationOptimization/tsce_demo

New comment by airylizard in "Introducing the Llama Startup Program"

airylizard — Wed, 21 May 2025 17:42:17 +0000

love it. any llm can be made to perform reliably and accurately which is the biggest pre-requisite when it comes to creating an "AI Agent". I think this gives people the opportunity to start somewhere because they can leverage multi-pass prompting frameworks like TSCE to scale: https://github.com/AutomationOptimization/tsce_demo. despite the fact that "llama isn't the best"

New comment by airylizard in "We give data to train AI models and get nothing in return"

airylizard — Sun, 18 May 2025 17:29:50 +0000

The data "supply chain" has already surged ahead of production elsewhere. Companies aren't just passively taking what's out there, they actively harvest highly curated content, benefiting even further when we voluntarily correct and refine their models. Heck, some of us are even paying them for the privilege of training AI. The best time to have made this argument would've been when GPT originally released, but I think most people were too enamored with it to care and the idea it would be "open-source" meant we'd get it back at the end of the day.

Unrelated, but this is exactly why I've been spending time building my AI framework (TSCE). The idea is to leverage these open-weight LLMs, typically smaller and accessible, to achieve accuracy and reliability comparable to larger models. It doesn't necessarily make the models "smarter" (like retraining or fine-tuning might), but it empowers everyday users to build reliable agentic workflows or AI tools from multiple smaller LLM instances. Check it out: https://github.com/AutomationOptimization/tsce_demo

New comment by airylizard in "What do y'all think – weeknd project"

airylizard — Sun, 18 May 2025 05:25:39 +0000

I like the idea, TSCE framework should make the individual agents more reliable and deterministic: https://github.com/AutomationOptimization/tsce_demo

New comment by airylizard in "LLMs get lost in multi-turn conversation"

airylizard — Thu, 15 May 2025 19:10:43 +0000

Right, the 4.1 training checkpoint hasn’t moved. What has moved is the glue on top: decoder heuristics / safety filters / logit-bias rules that OpenAI can hot-swap without re-training the model. Those “serving-layer” tweaks are what stomped the obvious em-dash miss for short, clean prompts. So the April-14 weights are unchanged, but the pipeline that samples from those weights is stricter about “don’t output X” than it was on day one. By all means, keep trying to poke holes! I’ve got nothing to sell; just sharing insights and happy to stress-test them.

New comment by airylizard in "LLMs get lost in multi-turn conversation"

airylizard — Thu, 15 May 2025 15:53:16 +0000

Hey, thanks for kicking the tires! The run you’re describing was done in mid-April, right after GPT-4.1 went live. Since then OpenAI has refreshed the weights behind the “gpt-4.1” alias a couple of times, and one of those updates fixed the em-dash miss.

If you reran today you’d see the same improved pass rate I’m getting now. That’s the downside of benchmarking against latest model names; behaviour changes quietly unless you pin to a dated snapshot.

For bigger, noisier prompts (or on GPT-3.5-turbo, which hasn’t changed) TSCE still gives a solid uplift, so the framework’s value stands. Appreciate you checking it out!

New comment by airylizard in "LLMs get lost in multi-turn conversation"

airylizard — Thu, 15 May 2025 15:09:54 +0000

The test isn't for how well an LLM can find or replace a string. It's for how well it can carry out given instructions... Is that not obvious?

New comment by airylizard in "LLMs get lost in multi-turn conversation"

airylizard — Thu, 15 May 2025 04:43:14 +0000

Why I came up with TSCE(Two-Step Contextual Enrichment).

+30pp uplift when using GPT-35-turbo on a mix of 300 tasks.

Free open framework, check the repo try it yourself

https://github.com/AutomationOptimization/tsce_demo

I tested this another 300 times with gpt-4.1 to remove those obtrusive "em-dashes" everyone hates. Tested a single-pass baseline vs TSCE, same exact instructions and prompt "Remove the em-dashes from my linkedin post. . .".

Out of the 300 tests, baseline failed to remove the em-dashes 149/300 times. TSCE failed to remove the em-dashes 18/300 times.

It works, all the data as well as the entire script used for testing is in the repo.

New comment by airylizard in "TSCE and HyperDimensional Anchors: Making AI agents/workflows reliable at scale"

airylizard — Mon, 05 May 2025 20:52:45 +0000

1. What TSCE is in one breath

Two deterministic forward-passes.

1. The model is asked to emit a hyperdimensional anchor (HDA) under high temperature. 2. The same model is then asked to answer while that anchor is prepended to the original prompt.

No retries, no human-readable scratch-pad, no fine-tune.

---

2. What a hyper-dimensional anchor is

Opaque token sequence that the network writes for itself.

Notation: • X = full system + user prompt • A = anchor tokens • Y = final answer

Phase 1 samples `A ~ pθ(A | X)` Phase 2 samples `Y ~ pθ(Y | X,A)`

Because A is now a latent variable observed at inference time:

`H(Y | X,A) ≤ H(Y | X)` (entropy can only go down) and, empirically, E\[H] drops ≈ 6× on GPT-3.5-turbo.

Think of it as the network manufacturing an internal coordinate frame, then constraining its second pass to that frame.

---

3. Why the anchor helps (intuition, not hype)

4 096-D embeddings can store far more semantic nuances than any single “chain-of-thought” token stream. The anchor is generated under the same system policy that will govern the answer, so policy constraints are rehearsed privately before the model speaks. Lower conditional entropy means fewer high-probability “wrong” beams, so a single low-temperature decode often suffices.

---

4. Numbers (mixed math + calendar + formatting pack)

GPT-3.5-turbo – accuracy 49 % → 79 % (N = 300). GPT-4.1 – em-dash violation 50 % → 6 % (N = 300). Llama-3 8 B – accuracy 69 % → 76 % with anchor alone, 85 % when anchor precedes chain-of-thought (N = 100). Token overhead: 1.3 – 1.9× (two calls). One Self-Refine loop already costs ≥ 3×.

Diagnostic plots (entropy bars, KL-per-position, cosine-distance violin) are in the repo if you like pictures → `figures/`.

---

5. Why this isn’t “just another prompt trick”

The anchor never appears in the user-visible text. Gains replicate on two vendor families (OpenAI GPT and open-weights Llama) and on both reasoning and policy-adherence tasks. Visible chain-of-thought actually loses accuracy on 8 B models unless the anchor comes first; the mechanism changes internal computation, not surface wording.

---

6. Try it yourself

pip install tsce python -m tsce_demo "Rewrite this sentence without any em-dashes — can you?"

Repo (MIT) with benchmark harness, plots, and raw JSONL in Title!

---

7. Questions I’d love feedback on

Optimal anchor length vs. model size (64 tokens seems enough for < 10 B). Behaviour on Mixtral, Phi-3, Claude, Gemini — please post numbers. Red-team attempts: can you poison the anchor in Phase 1 and make the answer leak?

---

TSCE and HyperDimensional Anchors: Making AI agents/workflows reliable at scale

airylizard — Mon, 05 May 2025 20:50:34 +0000

Article URL: https://github.com/AutomationOptimization/tsce_demo

Comments URL: https://news.ycombinator.com/item?id=43899325

Points: 3

# Comments: 1

Show HN: TSCE – Think Before You Speak (Two-Step Contextual Enrichment for LLMs)

airylizard — Fri, 25 Apr 2025 00:13:58 +0000

Hi HN!

I’d like to share TSCE – a Python library that gives small models the ability to follow instructions with above GPT-4-like precision using a two-step architecture. It’s designed for developers who want reliable, high-quality results without the need for fine-tuning large models.

TL;DR TSCE uses a two-step approach to improve the reliability of language models. This ensures more accurate and coherent outputs, especially for tasks that require rule adherence or detailed instructions.

---

Why You Should Care

* No Fine-Tuning: You don't need to fine-tune models. TSCE works with existing models like GPT-3.5 and GPT-4. * Improved Reliability: By decoupling the thinking process (Phase 1) and the answering process (Phase 2), you get more consistent, on-topic results. * Works with OpenAI and Azure: TSCE works out of the box with both OpenAI and Azure OpenAI APIs, so it’s ready to plug into your existing workflows. * Minimal Overhead: The additional latency is small (~2 API calls), but the payoff is huge in terms of clarity and accuracy.

Example

from tsce import TSCEChat

tsce = TSCEChat(model="gpt-3.5-turbo") prompt = "Provide a brief explanation of the benefits of solar energy and how it impacts the environment. Include an example of a renewable energy source other than solar energy."

reply = tsce(prompt)

print("Answer (Phase 2):", reply.content) # The final response generated after Phase 2 print("Anchor (Phase 1):", reply.anchor) # The structured anchor generated in Phase 1

Output Example:

- *Anchor (Phase 1)*: `solar::harvest~light{clean}—echo::sustain[green]` ###END###`

- *Answer (Phase 2)*: `"Solar energy offers numerous benefits, particularly in terms of environmental impact. Here are some key points:

1. *Clean Energy Source*: Solar energy is a clean and renewable source of power. It generates electricity without emitting greenhouse gases or pollutants, which helps reduce the overall carbon footprint.

2. *Reduction in Fossil Fuel Dependence*: By harnessing solar energy, we can decrease our reliance on fossil fuels, which are finite and contribute significantly to environmental pollution and climate change.

3. *Sustainable and Renewable*: Solar energy is abundant and inexhaustible. As long as the sun shines, we can harness its power, making it a sustainable energy solution for the long term.

4. *Positive Environmental Impact*: Utilizing solar energy reduces air and water pollution, minimizes habitat destruction, and decreases the strain on natural resources. This contributes to a healthier ecosystem and a more balanced natural environment.

*Example of Another Renewable Energy Source*: Wind energy is another excellent example of a renewable energy source. Wind turbines convert the kinetic energy from wind into electrical power. Like solar energy, wind energy is clean, sustainable, and helps reduce greenhouse gas emissions. Wind farms can be established on land or offshore, providing versatile options for generating renewable energy."`

---

Installation

``` pip install tsce ```

Customizing System Prompts

You can easily customize the system instructions for the final pass by passing in a custom *`final_prefix`*:

``` tsce = TSCEChat(final_prefix="You are a helpful assistant, please respond in 3 sentences.") reply = tsce("Explain the concept of recursion.") print(reply.content) ```

---

If you have any questions or want to try it out, feel free to comment here or head over to the repo.

[GitHub Repo](https://github.com/AutomationOptimization/tsce_demo) [GDrive: Read the paper, See the proof](https://tinyurl.com/3xswpzbb)

Looking forward to hearing what you think!

Comments URL: https://news.ycombinator.com/item?id=43788904

Points: 3

# Comments: 0