Hacker News: gschoeni

New comment by gschoeni in "Lore – Open source version control system designed for scalability"

gschoeni — Thu, 18 Jun 2026 00:59:49 +0000

one of the oxen engineers here would love to hear about anything you ran into on the os product or platform! we've grown the team a bunch and are eager to learn what your perfect vcs looks like

New comment by gschoeni in "Model Report, May 2026"

gschoeni — Fri, 08 May 2026 19:52:29 +0000

Love this level of detail, thanks for sharing!

New comment by gschoeni in "[dead]"

gschoeni — Sun, 26 Oct 2025 21:09:07 +0000

Wanted to share some learnings we had optimizing and deploying Qwen-Image-Edit at scale to replace Nano-Banana. The goal was to generate a product catalogue of 1.2m images, which would have cost $46k with Nano-Banana or GPT-Image-Edit.

Qwen-Image-Edit being Apache 2.0 allows you to fine-tune and apply a few tricks like compilation, lightning lora and quantization to cut costs.

The base model takes ~15s to generate an image which would mean we would need 1,200,000*15/60/60=5,000 compute hours.

Compilation of the PyTorch graph + applying a lightning LoRA cut inference down to ~4s per image which resulted in ~1,333 compute hours.

I'm a big fan of open source models, so wanted to share the details in case it inspires you to own your own weights in the future.

New comment by gschoeni in "Git for Music – Using Version Control for Music Production (2023)"

gschoeni — Mon, 01 Sep 2025 19:29:08 +0000

One of the maintainers of the Open Source project "Oxen" here. Our VCS scales for binary data better than git does, and was built to solve some of the problems with git-lfs and git-annex.

We've had a few requests to integrate with music production workflows, but haven't taken it on yet. If anyone wants to collaborate to integrate Oxen with their DAW or workflow let us know! Here's the project:

https://github.com/Oxen-AI/Oxen

New comment by gschoeni in "The future of large files in Git is Git"

gschoeni — Sat, 16 Aug 2025 01:29:53 +0000

We're working on `oxen` to solve a lot of the problems we ran into with git or git-lfs.

We have an open source CLI and server that mirrors git, but handles large files and mono repos with millions of files in a much more performant manner. Would love feedback if you want to check it out!

https://github.com/Oxen-AI/Oxen

New comment by gschoeni in "GitHub is no longer independent at Microsoft after CEO resignation"

gschoeni — Wed, 13 Aug 2025 00:37:33 +0000

How big are your datasets? Working on an Open Source git-lfs replacement called "oxen" if you are interested.

https://github.com/Oxen-AI/Oxen

New comment by gschoeni in "[dead]"

gschoeni — Mon, 30 Jun 2025 22:54:57 +0000

FLUX.1-dev is one of the most widely fine-tuned models out there - but I couldn’t find a single, clean, end-to-end example that actually worked. So I wrote one. Enjoy!

New comment by gschoeni in "[dead]"

gschoeni — Thu, 30 Jan 2025 04:52:10 +0000

Over the past ~1.5 years I've been running a research paper club where we dive into interesting/foundational papers in AI/ML. So we naturally have come across a lot of the papers that lead up to DeepSeek-R1. While diving into the DeepSeek papers this week, I decided to compile a list of papers that we've already gone over or I think would be good background reading to get a bigger picture of what's going on under the hood of DeepSeek.

Grab a cup of coffee and enjoy!

https://www.oxen.ai/blog/no-hype-deepseek-r1-reading-list

Merkle Tree 101

gschoeni — Tue, 28 Jan 2025 03:24:47 +0000

Article URL: https://ghost.oxen.ai/merkle-tree-101/

Comments URL: https://news.ycombinator.com/item?id=42848539

Points: 1

# Comments: 0

New comment by gschoeni in "Ask HN: How do you version your data?"

gschoeni — Thu, 26 Dec 2024 17:14:07 +0000

Right now the UI is only available through a VPC deployment. We are thinking about making the data grid / query interface embeddable or available through a library which would make it easy to self host.

New comment by gschoeni in "Ask HN: How do you version your data?"

gschoeni — Thu, 26 Dec 2024 16:01:24 +0000

We're working on Oxen.ai which is an Open Source CLI and Server with Python bindings as well. Optimized for ML/AI workloads but works with any type of data and we see usage from game companies, bio, aerospace etc.

Feel free to check it out here: https://github.com/Oxen-AI/oxen-release

Or a hub you can host data on (we have public and private repos, or private VPC deployments): https://oxen.ai

The CLI mirrors git so it's easy to learn. It has some interesting build in tooling for diff-ing datasets and working on them remotely without downloading a full copy of the data as well.

Happy to answer any other questions!

New comment by gschoeni in "[dead]"

gschoeni — Sun, 03 Nov 2024 23:27:39 +0000

Hey all,

If you haven't seen the Oxen project yet, we have been building an open source unstructured data version control tool.

We were inspired by the idea of making large machine learning datasets living & breathing assets that people can collaborate on, rather than the static ones of the past. Lately we have been working hard on optimizing the underlying Merkle Trees and data structures with in Oxen.ai and just released v0.19.4 which provides a bunch of performance upgrades and stability to the internal APIs.

To put it all to the test, we decided to benchmark the tool on the 1 million+ images in the classic ImageNet dataset.

The TLDR is Oxen.ai is faster than raw uploads to S3, 13x faster than git-lfs, and 5x faster than DVC. The full breakdown can be found here.

https://docs.oxen.ai/features/performance

If you are in the ML/AI community, or rust aficionados, would love to get your feedback on both the tool and the codebase. We would love some community contribution when it comes to different storage backends and integrations into other data tools.

New comment by gschoeni in "[dead]"

gschoeni — Sat, 13 Jan 2024 17:32:06 +0000

Every Friday we pick a paper for our Paper Club and discuss it, here's the recap of yesterday's session on Mixtral 8x7B if anyone is interested!

https://blog.oxen.ai/arxiv-dives-mixture-of-experts-moe-with...

New comment by gschoeni in "MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts"

gschoeni — Wed, 10 Jan 2024 16:12:32 +0000

Yes it is! We meet every Friday at 10am PST and pick an Arxiv Paper to go over as a group.

Feel free to join here: https://lu.ma/oxenbookclub

New comment by gschoeni in "MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts"

gschoeni — Wed, 10 Jan 2024 06:43:55 +0000

We went over it in our Friday paper club before the holidays which helped me gain an intuition.

https://blog.oxen.ai/mamba-linear-time-sequence-modeling-wit...

I'm still not convinced on Mamba's performance on Natural Language tasks, but maybe it's just because they haven't trained a large enough model on enough data yet.

New comment by gschoeni in "Ask HN: AI/ML papers to catch up with current state of AI?"

gschoeni — Fri, 15 Dec 2023 21:52:14 +0000

I put together a reading list for Andrej Karpathy's intro to LLMs that would be helpful for all of the latest LLM and multi-modal architectures:

https://blog.oxen.ai/reading-list-for-andrej-karpathys-intro...

New comment by gschoeni in "[dead]"

gschoeni — Fri, 15 Dec 2023 21:36:52 +0000

Have been studying the Mamba architecture all week and put together my notes here:

https://blog.oxen.ai/mamba-linear-time-sequence-modeling-wit...

I hadn't found a very satisfying explanation of the paper yet, and still had some questions at the end, but hopefully this can give people a good jumping off point for their understanding!

New comment by gschoeni in "[dead]"

gschoeni — Thu, 14 Dec 2023 16:15:44 +0000

Hey all, I ran some experiments benchmarking fine-tuning ViT, ResNet50, and CLIP on a Facial Emotion Recognition dataset. I had read the original papers the past few weeks, but wanted to do some practical hands on use of the models themselves.

https://blog.oxen.ai/practical-ml-dive-how-to-customize-a-vi...

~ TLDR ~ ViT works the best in this small experiment, with minimal code. The experiment was classifying 7 different facial emotions such as "happy", "sad", "angry", etc...

Model Accuracy

* ViT - 69% * ResNet50 64% * Zero-Shot CLIP - 53%

Was honestly most impressed with CLIP's ability for zero-shot transfer, even though it had the worst accuracy. The ability to give it a freeform list of prompts or labels and it will automatically classify into the subset without training feels like the future of prototyping products and models, then once you define your use case go with something more performant like a ViT.

Anyways, I had fun writing the code and running the experiments, so thought I would share!

New comment by gschoeni in "Ask HN: Can we do better than Git for version control?"

gschoeni — Sun, 10 Dec 2023 18:34:57 +0000

We've been working on a data version control system called "oxen" optimized for large unstructured datasets that we are seeing more and more with the advent of many of the generative AI techniques.

Many of these datasets have many many images, videos, audio files, text as well as structured tabular datasets that git or git-lfs just falls flat on.

Would love anyone to kick the tires on it and let us know what you think:

https://github.com/Oxen-AI/oxen-release

The commands are mirrored after git so it is easy to learn, but optimized under the hood for larger datasets.

New comment by gschoeni in "[dead]"

gschoeni — Fri, 08 Dec 2023 23:29:52 +0000

Hey all, we had a lively group discussion today on the 2021 CLIP paper from OpenAI.

Every Friday we've been going over the fundamentals of a lot of the state of the art techniques used in Machine Learning today. Hoping to learn a little each week, and spot patterns we can apply to our own work. I feel like there's always a little nugget of information I didn't fully understand before reading the paper, so have been finding it helpful.

Though it is not groundbreaking research as of this week, I think it's nice to take a step back and review the fundamentals as well as keeping up with the latest and greatest.

Posted the notes and video recap are here if anyone finds it helpful:

https://blog.oxen.ai/arxiv-dives-zero-shot-image-classificat...

Also would love to have anyone join us live on Fridays or suggest papers! We've got a pretty consistent and fun group of 400+ engineers and researchers popping in and out.