Hacker News: g023

New comment by g023 in "DeepSeek makes the V4 Pro price discount permanent"

g023 — Fri, 22 May 2026 21:38:06 +0000

I use DeepSeek v4 flash with CoPilot and it works pretty good.

New comment by g023 in "DeepSeek makes the V4 Pro price discount permanent"

g023 — Fri, 22 May 2026 21:36:55 +0000

If anyone is looking to hook it up to copilot, I made a proxy script to handle the connection a bit back that might be handy: https://gist.github.com/g023/c2bb7b540ffe64cee76023f18f6f936...

New comment by g023 in "[dead]"

g023 — Sun, 19 Apr 2026 05:45:02 +0000

Terminal-based chat application powered by *locally installed* llama.cpp, featuring an auto-managed server backend, reasoning modes, and 7 built-in filesystem tools for interactive AI assistance with a focus on only allowing read only agentic access to system and only offline-focused agentic commands (for now). Powered by the g023/g023-Qwen3.5-9B-GGUF:IQ2_M model.

New comment by g023 in "Cerebras S-1"

g023 — Sat, 18 Apr 2026 00:09:43 +0000

We need more personal level AI solutions instead of so much corporate centered solutions.

Local Model Router: Ollama/OpenAI-compat bridges for local LLMs via llama.cpp

g023 — Fri, 17 Apr 2026 19:19:55 +0000

A high-performance local LLM server providing drop-in API compatibility with Ollama and OpenAI, built on llama.cpp's llama-server. Features automatic VRAM management, Hugging Face integration, and modular architecture. Unlike Ollama which bundles its own inference engine, LMR leverages the battle-tested llama.cpp backend while providing familiar APIs and intelligent model management.

https://github.com/g023/localmodelrouter

Comments URL: https://news.ycombinator.com/item?id=47809550

Points: 1

# Comments: 0

New comment by g023 in "The local LLM ecosystem doesn’t need Ollama"

g023 — Fri, 17 Apr 2026 05:53:14 +0000

I've started creating https://github.com/g023/localmodelrouter/ which offers Ollama like functionality but as a single .py file with minimal dependencies and more focus on letting llama.cpp handle the dirty work.

New comment by g023 in "[dead]"

g023 — Fri, 17 Apr 2026 02:12:09 +0000

HarnessHarvester generates executable Python harnesses from natural language task descriptions, executes them in a sandboxed environment, reviews them with multi-faceted LLM judges, and repairs failures using branching strategies. It includes two autonomous modes: autolearn (continuous discovery loop) and autoimprove (iterative enhancement of existing harnesses). This concept is designed to be an offline first harness/scaffolding builder where you get the harness instead of some remote api.

New comment by g023 in "Show HN: Standalone TurboQuant KV Cache Inference"

g023 — Tue, 07 Apr 2026 00:30:33 +0000

A single file, python based, minimal/recognizable dependencies, turboquant playground, barebones af, with some easy to access globals to experiment with at top of 'run_tquant.py'. Test model is a 1.77B model that I altered by duplicating a layer in a Qwen3 1.7B model. Probably work fine with the regular Qwen3 1.7B model as well, but for right now I'm just working with my surgically altered one while I work on the script.

New comment by g023 in "Show HN: Standalone TurboQuant KV Cache Inference"

g023 — Tue, 07 Apr 2026 00:24:16 +0000

I had some issues in the original, but had to jump away for a bit here to do some backups (weak). Anyways, I updated to make the necessary fixes, and also made some more tweaking values at top to play with and dialed in the params for the more this specific model a bit more. I will start testing with some other models here as my next step in this little experiment. Thanks for the interest. Feel free to try latest version and run the interactive mode to chat it up with the model and get a feedback on the results as you go. If you have any suggestions, let me know. I'm trying to keep this one as barebones as possible to make it easier for others to port to other languages, or integrate into other uses more easily.

edit: just added Mirostat v2 to clean up repetitive output from the model

Show HN: Standalone TurboQuant KV Cache Inference

g023 — Fri, 03 Apr 2026 22:31:31 +0000

Implements TurboQuant (ICLR 2026, arXiv:2504.19874) KV cache compression directly inside a Transformers inference script. All algorithms are self-contained. Minimal dependencies.

- uses https://huggingface.co/g023/Qwen3-1.77B-g023 as the demonstration model (throw model files in Qwen3-BEST folder)

Comments URL: https://news.ycombinator.com/item?id=47633195

Points: 3

# Comments: 4

Show HN: An offline first focused agentic CLI application powered by Ollama

g023 — Wed, 25 Mar 2026 19:56:17 +0000

An offline first focused agentic CLI application powered by local Ollama models. Integrated advanced memory management. Minimal dependencies. MIT licensed.

DEFAULT USES A 4b Qwen 3.5 MODEL.

https://github.com/g023/ai_cli/

Comments URL: https://news.ycombinator.com/item?id=47522348

Points: 4

# Comments: 0

New comment by g023 in "G023's Agentic Chat with Memory and Python Power"

g023 — Fri, 20 Mar 2026 19:29:40 +0000

A sophisticated multi-level reasoning engine with agentic memory, tool integration, and user control modes. Built as a (primarily) single-file Python program using vanilla Python and local LLM integration. Powered by Ollama API and utilizes a 1.77B Qwen3 variant that has layer 21 duplicated. Different layers of memory for agentic processes. Has ability to oversee command execution or yolo mode. MIT Licensed.

Show HN: G023's Agentic Chat with Memory and Python Power

g023 — Fri, 20 Mar 2026 19:29:40 +0000

Article URL: https://github.com/g023/g023_agentic_chat

Comments URL: https://news.ycombinator.com/item?id=47459448

Points: 1

# Comments: 1

New comment by g023 in "Nvidia releases 8B model with learned 8x KV cache compression"

g023 — Sat, 31 Jan 2026 09:46:32 +0000

I made a smaller sized version https://huggingface.co/g023/Qwen3-8B-DMS-8x-4bit-NF4

New comment by g023 in "Wall Street sees AI bubble coming and is betting on what pops it"

g023 — Mon, 15 Dec 2025 16:58:17 +0000

I'm thinking that the bubble will be the vortex caused by an abundance of power that becomes freely available locally due to the AI datacenters moving to space.

New comment by g023 in "The Rise of Computer Games, Part I: Adventure"

g023 — Mon, 15 Dec 2025 16:43:19 +0000

Something about the modern day fails to match the feelings of when MUDs were in their prime. With text you can describe so much more than a picture can paint. You can visualize a smell, a taste, or a feeling in text, but it doesn't translate well when you have graphics painting your imagination for you.

Show HN: G023's OllamaMan – Web-based OS for managing Ollama servers

g023 — Sun, 14 Dec 2025 23:40:35 +0000

g023's OllamaMan - Ollama Manager OS style GUI management of ollama Server - Open Source LLM Management using PHP/JS/Sqlite and a web browser. Integrated apps for chat, terminal viewer, model management (can pull or delete; huggingface gguf supported too). Advanced model creation now implemented. Supports images in chat and speech to text/history/etc. Lots of quick access buttons, and auto sets itself up if db doesn't exist. Adding more bells and whistles as I go. Not meant for a public facing folder, so protect as you see necessary. Open source, BSD 3-Clause so have fun and thanks for giving it a glance. I just made it yesterday, so I'm sorry if it has some rough bits as I'll fudge those into place hopefully shortly (menus on the topbar/settings tabs). Right now fully functional.

Comments URL: https://news.ycombinator.com/item?id=46268382

Points: 3

# Comments: 0