Hacker News: nowittyusername

New comment by nowittyusername in "Show HN: Duplicate 3 layers in a 24B LLM, logical deduction .22→.76. No training"

nowittyusername — Thu, 19 Mar 2026 01:39:15 +0000

There's still a lot of low hanging fruit left IMO. Good find and rather funny to think about as you can have someone simply clone the various layers multiple times and instead of spending millions of dollars retraining the model increase performance significantly with "this one trick".

New comment by nowittyusername in "Tinnitus Is Connected to Sleep"

nowittyusername — Sat, 07 Mar 2026 19:52:18 +0000

Got mine after my first Acid trip (still don't know if it was real acid). Its not debilitating for me, just annoying. So yeah, be careful out there folks. The Acid trip was very cerebral though and I consider it to be an important experience in my life so I am kind of on the fence that it might have been worth the trade off....

New comment by nowittyusername in "GPT-5.4"

nowittyusername — Thu, 05 Mar 2026 20:53:13 +0000

Personally what I am more interested about is effective context window. I find that when using codex 5.2 high, I preferred to start compaction at around 50% of the context window because I noticed degradation at around that point. Though as of a bout a month ago that point is now below that which is great. Anyways, I feel that I will not be using that 1 million context at all in 5.4 but if the effective window is something like 400k context, that by itself is already a huge win. That means longer sessions before compaction and the agent can keep working on complex stuff for longer. But then there is the issue of intelligence of 5.4. If its as good as 5.2 high I am a happy camper, I found 5.3 anything... lacking personally.

New comment by nowittyusername in "Nvidia PersonaPlex 7B on Apple Silicon: Full-Duplex Speech-to-Speech in Swift"

nowittyusername — Thu, 05 Mar 2026 20:43:24 +0000

sent!

New comment by nowittyusername in "Nvidia PersonaPlex 7B on Apple Silicon: Full-Duplex Speech-to-Speech in Swift"

nowittyusername — Thu, 05 Mar 2026 10:05:29 +0000

I've been working on building my own voice agent as well for a while and would love to talk to you and swap notes if you have the time. I have many things id like to discuss, but mainly right now im trying to figure out how a full duplex pipeline like this could fit in to an agentic framework. Ive had no issues with the traditional route of stt > llm > tts pipeline as that naturally lends itself with any agentic behavior like tool use, advanced context managemnt systems, rag , etc... I separate the human facing agent from the subagent to reduce latency and context bloat and it works well. While I am happy with the current pipeline I do always keep an eye out for full duplex solutions as they look interesting and feel more dynamic naturally because of the architecture, but every time i visit them i cant wrap my head how you would even begin to implement that as part of a voice agent. I mean sure you have text input and output channels in some of these things but even then with its own context limitations feels like they could never bee anything then a fancy mouthpiece. But this feels like im possibly looking at this from ignorance. anyways would love to talk on discord with a like minded fella. cheers.

New comment by nowittyusername in "Did Alibaba just kneecap its powerful Qwen AI team?"

nowittyusername — Wed, 04 Mar 2026 16:08:41 +0000

I hope the people from the Qwen team start their own thing or something... But regardless, the work they did will live on as legendary.

New comment by nowittyusername in "Setting up OpenClaw on a cloud VM"

nowittyusername — Fri, 27 Feb 2026 19:42:25 +0000

For me at least its an interesting project I can take apart and build on top of. I've built 100% my own agent frameworks from scratch and have learned a lot from them. There is something to be said on learning from others projects as well, also because its an ever evolving project with so many contributes whatever fork you go with of your own, theirs a good chance the new goodies will work with your own modified version. For example I'm looking in to LCM right now, and woo-dent you know it someone ported it to openclaw. But nanobot doesn't have it, so I'm considering working on the LCM port to that. If i succeed i will learn a lot and also contribute to progress in my own little ways.

New comment by nowittyusername in "Mercury 2: Fast reasoning LLM powered by diffusion"

nowittyusername — Wed, 25 Feb 2026 02:39:10 +0000

How does the whole kv cache situation work for diffusion models? Like are there latency and computation/monetary savings for caching? is the curve similar to auto regressive caching options? or maybe such things dont apply at all and you can just mess with system prompt and dynamically change it every turn because there's no savings to be had? or maybe you can make dynamic changes to the head but also get cache savings because of diffusion based architecture?... so many ideas...

New comment by nowittyusername in "Mercury 2: Fast reasoning LLM powered by diffusion"

nowittyusername — Wed, 25 Feb 2026 02:32:59 +0000

Nice, I'm excited to try this for my voice agent, at worst it could be used to power the human facing agent for latency reduction.

New comment by nowittyusername in "LCM: Lossless Context Management [pdf]"

nowittyusername — Tue, 24 Feb 2026 22:04:50 +0000

I am in the process of trying to integrate LCM in to my own personal assistant agent for its context management system. The main human facing agent will not be a coding agent so ill be modifying the system prompt and some other things quite heavily but core concepts of the system will be as the backbone. Now that I am paying around with it, I am hoping you can answer some questions. I notice that the system prompt of the agent mutates as local time is injected in to the system prompt itself. If that's whats happening, you are destroying any hopes of caching from the provider are you not? Am I reading this correctly or was this a deliberate choice for some reason... instead of appending at the end of the users turn like a system metadata info that way you preserve the head? Thanks.

New comment by nowittyusername in "AI adoption and Solow's productivity paradox"

nowittyusername — Wed, 18 Feb 2026 03:25:01 +0000

As we approach the singularity things will be more noisy and things will make less and less sense as rapid change can look like chaos from inside the system. I recommend folks just take a deep breath, and just take a look around you. Regardless on your stance if the singularity is real, if AI will revolutionize everything or not, just forget all that noise. just look around you and ask yourself if things are seeming more or less chaotic, are you able to predict better or worse on what is going to happen? how far can your predictions land you now versus lets say 10 or 20 years ago? Conflicting signals is exactly how all of this looks. one account is saying its the end of the world another is saying nothing ever changes and everything is the same as it always was....

New comment by nowittyusername in "LCM: Lossless Context Management [pdf]"

nowittyusername — Tue, 17 Feb 2026 18:34:06 +0000

Thanks for the reply. That does help.

New comment by nowittyusername in "LCM: Lossless Context Management [pdf]"

nowittyusername — Tue, 17 Feb 2026 05:53:01 +0000

Do you have any resources or youtube videos that might also help someone understand the lcm context management a bit better. I think there's something to this, but i'm having trouble wrapping my head around it. i learn well with analogies and im trying to really grok the concept here. If there are other ways you could explain it it would be appreciated. mind you i have built my own agents from scratch so im not a total novice in these areas. my agents already manage context with sub-agents and multi layered conversational histories with RAG thrown in there. But i dont want to make wrong assumptions about your implementations and miss the nuanced important bits. regardless, ill try my best to reread the article and hash it out on my own, thanks for the paper.

New comment by nowittyusername in "Audio is the one area small labs are winning"

nowittyusername — Mon, 16 Feb 2026 19:08:30 +0000

Thanks, ill read it now.

New comment by nowittyusername in "Audio is the one area small labs are winning"

nowittyusername — Mon, 16 Feb 2026 01:15:56 +0000

Good article and I agree with everything in there. For my own voice agent I decided to make him PTT by default as the problems of the model accurately guessing the end of utterance are just too great. I think it can be solved in the future but, I haven't seen a really good example of it being done with modern day tech including this labs. Fundamentally it all comes down to the fact that different humans have different ways of speaking, and the human listening to them updates their own internal model of the speech pattern. Adjusting their own model after a couple of interactions and arriving at the proper way of speaking with said person. Something very similar will need to be done and at very fast latency's for it to succeed in the audio ml world. But I don't think we have anything like that yet. It seems currently best you can do is tune the model on a generic speech pattern that you expect to fit over a larger percentage of the human population and that's about the best you can do, anyone who falls outside of that will feel the pain of getting interrupted every time.

New comment by nowittyusername in "Doing the thing is doing the thing"

nowittyusername — Tue, 27 Jan 2026 23:22:36 +0000

I wholeheartedly agree. In an age of talking heads. you will not hear from the people actually doing the thing. because they too busy doing the thing versus talking about it. now excuse me ima go back to doing the thing.

New comment by nowittyusername in "The state of modern AI text to speech systems for screen reader users"

nowittyusername — Sat, 24 Jan 2026 21:59:43 +0000

yes sorry i mixed these up. supertonic is not the best sounding in my tests. it was by far the fastest, but its audio quality for something so fast was decent. if you wanted something that sounds better AND is also extremely fast pocket tts is the choice. amazing quality and also crazy fast on both gpu and cpu. if you care mainly about quality, chatterbox in my tests was best fit, but its slower then the others. qwen 3 tts was also great but its unisable as any real time agentic voice as its too slow. they havent relesed the code for streaming yet, once they release that this will be my top contender.

New comment by nowittyusername in "The state of modern AI text to speech systems for screen reader users"

nowittyusername — Fri, 23 Jan 2026 21:34:16 +0000

TTS has no intelligence bud. Its only something that transforms text to audio. And that is all that we are talking about here. neither the article or anyone else was discussing the whole stt > llm > tts pipeline.

New comment by nowittyusername in "The state of modern AI text to speech systems for screen reader users"

nowittyusername — Fri, 23 Jan 2026 18:56:39 +0000

Supertonic is probably way faster then that, I wouldn't be surprised if measured it would be something like 14k wpm. On my 4090 I was getting about 175x real time while on cpu only it was 55x realtime. I stopped optimizing it but im sure it could be pushed further. Anyways you should check out their repo to test it yourself its crazy what that team accomplished!

New comment by nowittyusername in "The state of modern AI text to speech systems for screen reader users"

nowittyusername — Fri, 23 Jan 2026 15:17:04 +0000

With supertonic , or overall? If overall most do pretty well though some are funky, like suprano was so bad no matter what I did, so i had to rule that out from my top contenders on anything. supertonic was close to my number one choice for my agentic pipeline as it was soo insanely fast and quality was great, but it didnt have the other bells and whistles like some other models so i held that off for cpu only projects in the future. If you are gonna use it on a GPU I would suggest chatterbox or pocket tts. Chatterbox is my top contender as of now because it sounds amazing, has cloning and i got it down to 0.26 ttfa/ttsa once i quantized it and implemented pipecat in to it. pocket tts is probably my second choice for similar reasons.