Hacker News: meehai

New comment by meehai in "Show HN: Twitch Roulette – Find live streamers who need views the most"

meehai — Sat, 28 Mar 2026 11:47:34 +0000

Actively doing this. It indeed forces me to think things through, organize thoughts and speak them out. I open paint/miro to draw. It's good practice.

New comment by meehai in "RX – a new random-access JSON alternative"

meehai — Thu, 19 Mar 2026 08:49:57 +0000

and with little data (i.e. <10Mb), this matters much less than accessibility and easy understanding of the data using a simple text editor or jq in the terminal + some filters.

New comment by meehai in "LLM from scratch, part 28 – training a base model from scratch on an RTX 3090"

meehai — Tue, 09 Dec 2025 11:58:08 +0000

it's skills first and then money and hardware for scale

A more skilled person that understands all the underlying steps will always be more efficient in scaling up due to knowing where to allocate more.

basically... you always need the skills and the money is the fine tuning.

New comment by meehai in "Vortex: An extensible, state of the art columnar file format"

meehai — Thu, 20 Nov 2025 06:05:16 +0000

Can you append new columns to a file stored on disk without reading it all in mempey? Somehoe this is beyond parquet capabilities.

New comment by meehai in "F3: Open-source data file format for the future [pdf]"

meehai — Thu, 02 Oct 2025 10:32:37 +0000

https://stackoverflow.com/questions/31812780/append-a-new-co...

New comment by meehai in "SimpleFold: Folding proteins is simpler than you think"

meehai — Sat, 27 Sep 2025 07:56:37 +0000

Yeah, but if you can do topologies based on latencies you may get some decent tradeoffs. For example with N=1M nodes each doing batch updates in a tree manner, i.e the all reduce is actually layered by latency between nodes.

New comment by meehai in "An LLM is a lossy encyclopedia"

meehai — Tue, 02 Sep 2025 12:33:48 +0000

lossy encycopledia that can also do some short-term memory (RAG) things.

New comment by meehai in "Counter-Strike: A billion-dollar game built in a dorm room"

meehai — Tue, 19 Aug 2025 08:09:09 +0000

I've played years of KZ and HNS after years of playing competitive CS on local communities (old PGL in romania!). I got over 6k hours in steam CS1.6 + many more on "non-steam". That game shaped me. I even learned the basics of programming while modding a KZ plugin: https://forums.alliedmods.net/showthread.php?t=130417

Nowadays I code for a living, but for sure this is the game that started the spark for me.

It was a great time and I feel that I can always run this game and get back to that childhood feeling.

New comment by meehai in "Do not download the app, use the website"

meehai — Sat, 26 Jul 2025 03:20:47 +0000

Tbh, the web won the application platform mostly because it's a standard. Everybody knows html, css and a little JS.

On the other hand, for mobile apps, there is still a device-specific mentality.

Imagine web apps being built with a different flavor for all the major browsers...

I hope that the same level of standardization comes to mobile apps too with the option to use more device-specific features on top of the generic UI.

New comment by meehai in "My Self-Hosting Setup"

meehai — Sat, 19 Jul 2025 06:12:30 +0000

Mine is much more barebone:

- one single machine - nginx proxy - many services on the same machine; some are internal, some are supposed to be public, are all accessible via the web! - internal ones have a humongous large password for HTTP basic auth that I store in an external password manager (firefox built in one) - public ones are either public or have google oauth

I coded all of them from scratch as that's the point of what I'm doing with homelabbing. You want images? browsers can read them. Videos? Browsers can play them.

The hard part is the backend for me. The frontend is very much "90s html".

New comment by meehai in "Most RESTful APIs aren't really RESTful"

meehai — Wed, 09 Jul 2025 13:29:47 +0000

the last point got me.

How can you idiomatically do a read only request with complex filters? For me both PUT and POST are "writable" operations, while "GET" are assumed to be read only. However, if you need to encode the state of the UI (filters or whatnot), it's preferred to use JSON rather than query params (which have length limitations).

So ... how does one do it?

New comment by meehai in "I wrote my PhD Thesis in Typst"

meehai — Mon, 23 Jun 2025 08:27:49 +0000

https://github.com/overleaf/overleaf hm ?

New comment by meehai in "Apple Notes Expected to Gain Markdown Support in iOS 26"

meehai — Thu, 05 Jun 2025 05:06:14 +0000

vscode with the markdown plugin works really good.

text on the left, render on the right pane

example: https://imgur.com/9rjoMa2.png

New comment by meehai in "Cognitive load is what matters"

meehai — Thu, 26 Dec 2024 16:36:01 +0000

At work we have a pretty big Python monorepo. The way we scale it is by having many standalone CLI mini apps ( about 80) atm with most of them outputting json/parquet in GCS or bigquery tables. Inputs are the same.

I insisted a lot on this unix (ish as it's not pipes) philosophy. It paid off so far.

We can test each cli app as well as make broader integration tests.

New comment by meehai in "Hackers use ZIP file concatenation to evade detection"

meehai — Sat, 16 Nov 2024 08:08:02 +0000

I meant that they should be separate tools that can be piped together. For example: you have 1 directory of many files (1Gb in total)

`zip out.zip dir/`

This results in a single out.zip file that is, let's say 500Mb (1:2 compression)

If you want to shard it, you have a separate tool, let's call it `shard` that works on any type of byte streams:

`shard -I out.zip -O out_shards/ --shard_size 100Mb`

This results in `out_shards/1.shard, ..., out_shards/5.shard`, each of 100Mb each.

And then you have the opposite: `unshard` (back into 1 zip file) and `unzip`.

No need for 'sharding' to exist as a feature in the zip utility.

And... if you want only the shard from the get go without the original 1 file archive, you can do something like:

`zip dir/ | shard -O out_shards/`

Now, these can be copied to the floppy disks (as discussed above) or sent via the network etc. The main thing here is that the sharding tool works on bytes only (doesn't know if it's an mp4 file, a zip file, a txt file etc.) and does no compression and the zip tool does no sharding but optimizes compression.

New comment by meehai in "Hackers use ZIP file concatenation to evade detection"

meehai — Sat, 16 Nov 2024 07:14:33 +0000

couldn't agree more!

We need to separate and design modules as unitary as possible:

- zip should ARCHIVE/COMPRESS, i.e. reduce the file size and create a single file from the file system point of view. The complexity should go in the compression algorithm.

- Sharding/sending multiple coherent pieces of the same file (zip or not) is a different module and should be handled by specialized and agnostic protocols that do this like the ones you mentioned.

People are always doing tools that handle 2 or more use cases instead of following the UNIX principle to create generic and good single respectability tools that can be combined together (thus allowing a 'whitelist' of combinations which is safe). Quite frankly it's annoying and very often leads to issues such as this that weren't even thought in the original design because of the exponential problem of combining tools together.

New comment by meehai in "Open washing – why companies pretend to be open source"

meehai — Sat, 26 Oct 2024 18:01:40 +0000

I think Open Weights is a better name for AI models that don't share the reproducible training scripts and data.

New comment by meehai in "The Retreat to Muskworld"

meehai — Tue, 15 Oct 2024 07:55:49 +0000

one answer is due to the fact that humans also do this with just 2 pretty bad cameras and a lot of offloading to the cortex.

It also simplifies the stack a lot to have a single set of sensors, so the software becomes mostly: getting good training data (iterative loops from failing production cases) and an efficient training algorithm.

This scales to more than just AD and also can leverage new breakthroughs from academia

New comment by meehai in "The Ultimate Guide to Error Handling in Python"

meehai — Thu, 10 Oct 2024 10:29:07 +0000

what about this pattern? https://www.inngest.com/blog/python-errors-as-values

I tried it once in an sqlite DB connector with some business logic and simply checking stuff like

  res: DBException | Result = db_handler.some_business_logic()
  if isinstance(res, DBException):
    return res # you can also log or even raise if this function isn't returning exceptions as values
  # guaranteed to be Result type here

See here:

- https://gitlab.com/meehai/drpciv-flask/-/blob/main/be/db_han...

- https://gitlab.com/meehai/drpciv-flask/-/blob/main/be/app.py...

Show HN: VRE Dataset generation for MultiTask vision models training from videos

meehai — Wed, 09 Oct 2024 17:39:32 +0000

Been working on this tool for my PhD which involves training multi task vision models using various pre-trained models as inputs or pseudolabels in order to improve generalization. I work mostly on UAV datasets, but it should work okay on indoor scenes or self driving (at least Marigold and Mask2Former).

For example, this dataset was generated using this tool: https://huggingface.co/datasets/Meehai/dronescapes

I'm quite aggressively trying to "just get the nn.Module" from the public repos that other researchers put up in their overly convoluted frameworks. A simple `forward(rgb_input: torch.Tensor) -> torch.Tensor` is nice, having 100 imports from a generic framework that has versions incompatibilities with everything else is not.

PS: most mains are standalone runnable too, i.e. - https://gitlab.com/meehai/video-representations-extractor/-/... or - https://gitlab.com/meehai/video-representations-extractor/-/...

Comments URL: https://news.ycombinator.com/item?id=41790559

Points: 2

# Comments: 0