Hacker News: warangal

New comment by warangal in "Ask HN: What Are You Working On? (March 2026)"

warangal — Mon, 09 Mar 2026 09:29:43 +0000

VITS is such a cool model (and paper), fast, minimal, trainable. Meta took it to extreme for about 1000 languges.

It seems like you have been working on this application for sometime, i will go through your code , but could you provide some context about upgradations/changes you have made, or some post describing your efforts.

Cool nonetheless!

New comment by warangal in "Sarvam 105B, the first competitive Indian open source LLM"

warangal — Sat, 07 Mar 2026 12:59:58 +0000

I may be wrong here, but blog-post seems AI written, with repetition of sequences like "the inference pipeline was rebuilt using architecture-aware fused kernels, optimized scheduling, and dis-aggregated serving". I don't know what that means without some code and proper context.

Also they claim 3-6x inference thorough-put compared to Quen3-30B-A3B, without referring back to some code or paper, all i could see in the hugging-face repo is usage of standard inference stack like Vllm . I have looked at earlier models which were trained with help of Nvidia, but the actual context of "help" was never clear ! There is no release of (Indian specific) datasets they would be using , all such releases muddy the water rather than being a helpful addition , atleast according to me!

New comment by warangal in "I Ported Coreboot to the ThinkPad X270"

warangal — Tue, 24 Feb 2026 07:12:30 +0000

pretty cool work! Though it leaves me wondering if coreboot/bios code can directly interface with thermal-management and battery controller , shouldn't it be feasible to improve upon battery life by exposing some interface to OS, like apple laptops ?

New comment by warangal in "Thoughts on Generating C"

warangal — Mon, 09 Feb 2026 17:39:06 +0000

I was also reading through lobsters Memory management, which (i think) currently implements "borrow first" semantics, to do away with a lot of run-time reference counting logic, which i think is a very practical approach. Also i have doubts if reference counting overhead ever becomes too much for some languages to never consider RC ?

Tangentially, i was experimenting with a runtime library to expose such "borrow-first" semantics, such "lents" can be easily copied on a new thread stack to access shared memory, and are not involved in RC . Race-conditions detection helps to share memory without any explicit move to a new thread. It seems to work well for simpler data-structures like sequence/vectors/strings/dictionary, but have not figured a proper way to handle recursive/dynamic data-structures!

New comment by warangal in "Ask HN: Those making $500/month on side projects in 2025 – Show and tell"

warangal — Thu, 18 Dec 2025 12:36:47 +0000

How do you market it, through social-media or are there dedicated channels for sharing awareness for such Mac Apps, if you don't mind sharing?

New comment by warangal in "Self-hosting my photos with Immich"

warangal — Sat, 06 Dec 2025 11:24:07 +0000

I work on an image search engine[0], main idea has been to preserve all the original meta-data and directory structure while allowing semantic and meta-data search from a single interface. All meta-data is stored in a single json file, with original Path and filenames, in case ever to create backups. Instead of uploading photos to a server, you could host it on a cheap VPS with enough space, and instead Index there. (by default it a local app). It is an engine though and don't provide any Auth or specific features like sharing albums!

[0] https://github.com/eagledot/hachi

New comment by warangal in "Hachi: An Image Search Engine"

warangal — Sat, 29 Nov 2025 18:03:01 +0000

Currently (Semantic) ML model is the weakest (minorly fine-tuned) ViT B/32 variant, and more like acting as a placeholder i.e very easy to swap with a desired model. (DINO models have been pretty great, being trained on much cleaner and larger Dataset, CLIP was one of first of Image-text type models !).

For point about "girl drinking water", "girl" is the person/tagged name , "drinking water" is just re-ranking all of "girl"s photos ! (Rather than finding all photos of a (generic) girl drinking water) .

I have been more focussed on making indexing pipeline more peformant by reducing copies, speeding up bottleneck portions by writing in Nim. Fusion of semantic features with meta-data is more interesting and challenging part, in comparison to choosing an embedding model !

New comment by warangal in "Hachi: An Image Search Engine"

warangal — Sat, 29 Nov 2025 16:16:25 +0000

Hi, Author here!

I have been working on this project for quite some time now. Even though for such search engines, basic ideas remain the same i.e extracting meta-data or semantic info, and providing an interface to query it. Lots of effort have gone into making those modules performant while keeping dependencies minimal. Current version is down to only 3 dependencies i.e numpy, markupsafe, ftfy and a python installation with no hard dependence on any version. A lot of code is written from scratch including a meta-indexing engine and minimal vector database. Being able to index any personal data from multiple devices or service without duplicating has been the main them of the project so far!

We (My friend) have already tested it on around 180gb of Pexels dataset and upto 500k of flickr 10M dataset. Machine learning models are powered by a framework completely written in Nim (which is currently not open-source) and has ONEDNN as only dependency (which has to be do away to make it run on ARM machines!)

I have been mainly looking for feedback to improve upon some rough edges, but it has been worthwhile to work upon this project and includes code written in assembly to html !

Hachi: An Image Search Engine

warangal — Sat, 29 Nov 2025 13:56:04 +0000

Article URL: https://eagledot.xyz/hachi.md.html

Comments URL: https://news.ycombinator.com/item?id=46087549

Points: 152

# Comments: 14

New comment by warangal in "Electron vs. Tauri"

warangal — Sat, 29 Nov 2025 10:16:18 +0000

Using `zig-cc (clang)` to set a particular `LibC` version is one of the best decisions i have made, and saved me from those meaningless libC mismatch errors!

New comment by warangal in "Meta Segment Anything Model 3"

warangal — Thu, 20 Nov 2025 09:50:24 +0000

I know it may be not what you are looking for, but most of such models generate multiple-scale image features through an image encoder, and those can be very easily fine-tuned for a particular task, like some polygon prediction for your use case. I understand the main benefit of such promptable models to reduce/remove this kind of work in the first place, but could be worth and much more accurate if you have a specific high-load task !

New comment by warangal in "Show HN: Luminal – Open-source, search-based GPU compiler"

warangal — Thu, 21 Aug 2025 08:14:15 +0000

Pretty cool project!, I have been also trying to do something similar with very limited (abstract) OPs akin to fundamental computer instructions. Just using the numpy backend for now to test theory, but neat thing is that most of complexity lies in the abstract space like deciding which memory accesses could be coalesced even before generating the final code for a specific backend! As far as i know most of DL compilers struggle to generate optimum code, as model starts getting bigger and bigger . Halide project was/is a very cool project that speed up many kernels just by finding better cache/memory access pattern. If you happen to share more insights about your projects through blog-posts or whitepaper that would be really helpful.

New comment by warangal in "Show HN: Whispering – Open-source, local-first dictation you can trust"

warangal — Tue, 19 Aug 2025 06:57:34 +0000

A bit tangential statement, about parakeet and other Nvidia Nemo models, i never found actual architecture implementations as pytorch/tf code, seems like all such models, are instant-ized from a binary blob making it difficult to experiment! Maybe i missed something, does anyone here have more experience with .nemo models to shed some more light onto this?

New comment by warangal in "SIMD within a register: How I doubled hash table lookup performance"

warangal — Mon, 28 Jul 2025 10:07:39 +0000

Nice write up! It is about speeding up underlying hash-table for a `cuckoo filter` , as in false-positives are allowed, unlike data-structure like dictionary, which makes sure that false-positive rate is 0.

Whenever i need to use dictionary/hash-table in a hot loop where maximum possible iterations are known, i use a custom hash-table initialized with that value, and in some cases, standard library implementation may store `hash` along with `key` and `value` in case it needs to `resize`, in my case i don't store the `hash` which leads to speed up about 15-20% due to less cache misses, as there would be no resize possible!

Also `cuckoo filters` seems to use `robinhood hashing` like strategy, using in a hot-loop it would lead to a higher insert cost ?

New comment by warangal in "The little book about OS development"

warangal — Sat, 22 Mar 2025 16:05:10 +0000

Thanks for explaining it, given you are writing it from scratch it gives you a lot of control in modelling a particular feature!

I did bookmark this project a few months ago but couldn't spend time to understand more about it. I wasn't aware of documentation, which should now make it easy to start with. Thanks for putting a lot of work in documentation!

New comment by warangal in "The little book about OS development"

warangal — Sat, 22 Mar 2025 09:27:16 +0000

Are you trying to do this message passing using Nim channels ? For my use-cases, i always had to resort to just passing `pointers`, to prevent any copying, most of time i just divide the problem to use independent memory locations to write to avoid using locks, but that is not general pattern for sure. For read only purposes i just use `cursor` while casting to appropriate type. If you find a useful pattern, please share.

Film School Rejects Is Seeking New Ownership

warangal — Wed, 18 Dec 2024 17:17:43 +0000

Article URL: https://filmschoolrejects.com/film-school-rejects-for-sale/

Comments URL: https://news.ycombinator.com/item?id=42452487

Points: 1

# Comments: 0

New comment by warangal in "PaliGemma 2: Powerful Vision-Language Models, Simple Fine-Tuning"

warangal — Fri, 06 Dec 2024 11:18:41 +0000

for images readme is at: https://github.com/eagledot/hachi/tree/main/images/readme.md with more than enough details! It is supposed to be a search engine for all modalities, for now `images` are supported !

For demo, i don't have much resources to host it, would a video showcasing features help ?

Also for windows there is a portable app at https://github.com/eagledot/hachi/releases/download/v1.3/hac...

New comment by warangal in "PaliGemma 2: Powerful Vision-Language Models, Simple Fine-Tuning"

warangal — Fri, 06 Dec 2024 06:01:09 +0000

Disclaimer: I work on such a project[0]

I think a combination of CLIP and some face-recognition may solve your issues! It just takes a path to your directory, and can index all the images while preserving your folder hierarchy along with a high quality face-clustering. Each image indexing takes about 100ms on a cpu. Every combination can then be mixed and matched, from a single interface, It doesn't take much to try as dependencies are very minimal . There is a self contained app for windows too. I have been looking for feedback, so just plugging it here in case some one has a use case.

[0] https://github.com/eagledot/hachi

New comment by warangal in "Nixiesearch: Running Lucene over S3, and why we're building a new search engine"

warangal — Thu, 10 Oct 2024 15:33:13 +0000

Yes, trigram mainly but also bigram and/or combination of both are used generally to implement fuzzy search, zoekt also uses trigram index. But such indices depend heavily on the content being indexed, for example if ever encounter a rare "trigram" during querying not indexed, they would fail to return relevant results! LSH implementations on the other hand employ a more diverse collection of stats depending upon the number of buckets and N(-gram)/window-size used, to compare better with unseen content/bytes during querying. But it is not cheap as each hash is around 30 bytes, even more than the string/text being indexed most of the time ! But its leads to fixed size hashes independent of size of content indexed and acts as an "auxiliary" index which can be queried independently of original index! Comparison of hashes can be optimized leading to a quite fast fuzzy search .