<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: gschoeni</title><link>https://news.ycombinator.com/user?id=gschoeni</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Thu, 18 Jun 2026 07:02:44 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=gschoeni" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by gschoeni in "Lore – Open source version control system designed for scalability"]]></title><description><![CDATA[
<p>one of the oxen engineers here  would love to hear about anything you ran into on the os product or platform! we've grown the team a bunch and are eager to learn what your perfect vcs looks like</p>
]]></description><pubDate>Thu, 18 Jun 2026 00:59:49 +0000</pubDate><link>https://news.ycombinator.com/item?id=48579192</link><dc:creator>gschoeni</dc:creator><comments>https://news.ycombinator.com/item?id=48579192</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48579192</guid></item><item><title><![CDATA[New comment by gschoeni in "Model Report, May 2026"]]></title><description><![CDATA[
<p>Love this level of detail, thanks for sharing!</p>
]]></description><pubDate>Fri, 08 May 2026 19:52:29 +0000</pubDate><link>https://news.ycombinator.com/item?id=48067874</link><dc:creator>gschoeni</dc:creator><comments>https://news.ycombinator.com/item?id=48067874</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48067874</guid></item><item><title><![CDATA[New comment by gschoeni in "[dead]"]]></title><description><![CDATA[
<p>Wanted to share some learnings we had optimizing and deploying Qwen-Image-Edit at scale to replace Nano-Banana. The goal was to generate a product catalogue of 1.2m images, which would have cost $46k with Nano-Banana or GPT-Image-Edit.<p>Qwen-Image-Edit being Apache 2.0 allows you to fine-tune and apply a few tricks like compilation, lightning lora and quantization to cut costs.<p>The base model takes ~15s to generate an image which would mean we would need 1,200,000*15/60/60=5,000 compute hours.<p>Compilation of the PyTorch graph + applying a lightning LoRA cut inference down to ~4s per image which resulted in ~1,333 compute hours.<p>I'm a big fan of open source models, so wanted to share the details in case it inspires you to own your own weights in the future.</p>
]]></description><pubDate>Sun, 26 Oct 2025 21:09:07 +0000</pubDate><link>https://news.ycombinator.com/item?id=45715236</link><dc:creator>gschoeni</dc:creator><comments>https://news.ycombinator.com/item?id=45715236</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45715236</guid></item><item><title><![CDATA[New comment by gschoeni in "Git for Music – Using Version Control for Music Production (2023)"]]></title><description><![CDATA[
<p>One of the maintainers of the Open Source project "Oxen" here. Our VCS scales for binary data better than git does, and was built to solve some of the problems with git-lfs and git-annex.<p>We've had a few requests to integrate with music production workflows, but haven't taken it on yet. If anyone wants to collaborate to integrate Oxen with their DAW or workflow let us know! Here's the project:<p><a href="https://github.com/Oxen-AI/Oxen" rel="nofollow">https://github.com/Oxen-AI/Oxen</a></p>
]]></description><pubDate>Mon, 01 Sep 2025 19:29:08 +0000</pubDate><link>https://news.ycombinator.com/item?id=45095816</link><dc:creator>gschoeni</dc:creator><comments>https://news.ycombinator.com/item?id=45095816</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45095816</guid></item><item><title><![CDATA[New comment by gschoeni in "The future of large files in Git is Git"]]></title><description><![CDATA[
<p>We're working on `oxen` to solve a lot of the problems we ran into with git or git-lfs.<p>We have an open source CLI and server that mirrors git, but handles large files and mono repos with millions of files in a much more performant manner. Would love feedback if you want to check it out!<p><a href="https://github.com/Oxen-AI/Oxen" rel="nofollow">https://github.com/Oxen-AI/Oxen</a></p>
]]></description><pubDate>Sat, 16 Aug 2025 01:29:53 +0000</pubDate><link>https://news.ycombinator.com/item?id=44919232</link><dc:creator>gschoeni</dc:creator><comments>https://news.ycombinator.com/item?id=44919232</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44919232</guid></item><item><title><![CDATA[New comment by gschoeni in "GitHub is no longer independent at Microsoft after CEO resignation"]]></title><description><![CDATA[
<p>How big are your datasets? Working on an Open Source git-lfs replacement called "oxen" if you are interested.<p><a href="https://github.com/Oxen-AI/Oxen" rel="nofollow">https://github.com/Oxen-AI/Oxen</a></p>
]]></description><pubDate>Wed, 13 Aug 2025 00:37:33 +0000</pubDate><link>https://news.ycombinator.com/item?id=44883397</link><dc:creator>gschoeni</dc:creator><comments>https://news.ycombinator.com/item?id=44883397</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44883397</guid></item><item><title><![CDATA[New comment by gschoeni in "[dead]"]]></title><description><![CDATA[
<p>FLUX.1-dev is one of the most widely fine-tuned models out there - but I couldn’t find a single, clean, end-to-end example that actually worked. So I wrote one. Enjoy!</p>
]]></description><pubDate>Mon, 30 Jun 2025 22:54:57 +0000</pubDate><link>https://news.ycombinator.com/item?id=44428792</link><dc:creator>gschoeni</dc:creator><comments>https://news.ycombinator.com/item?id=44428792</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44428792</guid></item><item><title><![CDATA[New comment by gschoeni in "[dead]"]]></title><description><![CDATA[
<p>Over the past ~1.5 years I've been running a research paper club where we dive into interesting/foundational papers in AI/ML. So we naturally have come across a lot of the papers that lead up to DeepSeek-R1. While diving into the DeepSeek papers this week, I decided to compile a list of papers that we've already gone over or I think would be good background reading to get a bigger picture of what's going on under the hood of DeepSeek.<p>Grab a cup of coffee and enjoy!<p><a href="https://www.oxen.ai/blog/no-hype-deepseek-r1-reading-list" rel="nofollow">https://www.oxen.ai/blog/no-hype-deepseek-r1-reading-list</a></p>
]]></description><pubDate>Thu, 30 Jan 2025 04:52:10 +0000</pubDate><link>https://news.ycombinator.com/item?id=42874992</link><dc:creator>gschoeni</dc:creator><comments>https://news.ycombinator.com/item?id=42874992</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42874992</guid></item><item><title><![CDATA[Merkle Tree 101]]></title><description><![CDATA[
<p>Article URL: <a href="https://ghost.oxen.ai/merkle-tree-101/">https://ghost.oxen.ai/merkle-tree-101/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=42848539">https://news.ycombinator.com/item?id=42848539</a></p>
<p>Points: 1</p>
<p># Comments: 0</p>
]]></description><pubDate>Tue, 28 Jan 2025 03:24:47 +0000</pubDate><link>https://ghost.oxen.ai/merkle-tree-101/</link><dc:creator>gschoeni</dc:creator><comments>https://news.ycombinator.com/item?id=42848539</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42848539</guid></item><item><title><![CDATA[New comment by gschoeni in "Ask HN: How do you version your data?"]]></title><description><![CDATA[
<p>Right now the UI is only available through a VPC deployment. We are thinking about making the data grid / query interface embeddable or available through a library which would make it easy to self host.</p>
]]></description><pubDate>Thu, 26 Dec 2024 17:14:07 +0000</pubDate><link>https://news.ycombinator.com/item?id=42516361</link><dc:creator>gschoeni</dc:creator><comments>https://news.ycombinator.com/item?id=42516361</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42516361</guid></item><item><title><![CDATA[New comment by gschoeni in "Ask HN: How do you version your data?"]]></title><description><![CDATA[
<p>We're working on Oxen.ai which is an Open Source CLI and Server with Python bindings as well. Optimized for ML/AI workloads but works with any type of data and we see usage from game companies, bio, aerospace etc.<p>Feel free to check it out here: 
<a href="https://github.com/Oxen-AI/oxen-release">https://github.com/Oxen-AI/oxen-release</a><p>Or a hub you can host data on (we have public and private repos, or private VPC deployments):
<a href="https://oxen.ai" rel="nofollow">https://oxen.ai</a><p>The CLI mirrors git so it's easy to learn. It has some interesting build in tooling for diff-ing datasets and working on them remotely without downloading a full copy of the data as well.<p>Happy to answer any other questions!</p>
]]></description><pubDate>Thu, 26 Dec 2024 16:01:24 +0000</pubDate><link>https://news.ycombinator.com/item?id=42515936</link><dc:creator>gschoeni</dc:creator><comments>https://news.ycombinator.com/item?id=42515936</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42515936</guid></item><item><title><![CDATA[New comment by gschoeni in "[dead]"]]></title><description><![CDATA[
<p>Hey all,<p>If you haven't seen the Oxen project yet, we have been building an open source unstructured data version control tool.<p>We were inspired by the idea of making large machine learning datasets living & breathing assets that people can collaborate on, rather than the static ones of the past. Lately we have been working hard on optimizing the underlying Merkle Trees and data structures with in Oxen.ai and just released v0.19.4 which provides a bunch of performance upgrades and stability to the internal APIs.<p>To put it all to the test, we decided to benchmark the tool on the 1 million+ images in the classic ImageNet dataset.<p>The TLDR is Oxen.ai is faster than raw uploads to S3, 13x faster than git-lfs, and 5x faster than DVC. The full breakdown can be found here.<p><a href="https://docs.oxen.ai/features/performance" rel="nofollow">https://docs.oxen.ai/features/performance</a><p>If you are in the ML/AI community, or rust aficionados, would love to get your feedback on both the tool and the codebase. We would love some community contribution when it comes to different storage backends and integrations into other data tools.</p>
]]></description><pubDate>Sun, 03 Nov 2024 23:27:39 +0000</pubDate><link>https://news.ycombinator.com/item?id=42036978</link><dc:creator>gschoeni</dc:creator><comments>https://news.ycombinator.com/item?id=42036978</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42036978</guid></item><item><title><![CDATA[New comment by gschoeni in "[dead]"]]></title><description><![CDATA[
<p>Every Friday we pick a paper for our Paper Club and discuss it, here's the recap of yesterday's session on Mixtral 8x7B if anyone is interested!<p><a href="https://blog.oxen.ai/arxiv-dives-mixture-of-experts-moe-with-mixtral-8x7b/" rel="nofollow">https://blog.oxen.ai/arxiv-dives-mixture-of-experts-moe-with...</a></p>
]]></description><pubDate>Sat, 13 Jan 2024 17:32:06 +0000</pubDate><link>https://news.ycombinator.com/item?id=38982263</link><dc:creator>gschoeni</dc:creator><comments>https://news.ycombinator.com/item?id=38982263</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=38982263</guid></item><item><title><![CDATA[New comment by gschoeni in "MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts"]]></title><description><![CDATA[
<p>Yes it is! We meet every Friday at 10am PST and pick an Arxiv Paper to go over as a group.<p>Feel free to join here: <a href="https://lu.ma/oxenbookclub" rel="nofollow">https://lu.ma/oxenbookclub</a></p>
]]></description><pubDate>Wed, 10 Jan 2024 16:12:32 +0000</pubDate><link>https://news.ycombinator.com/item?id=38940153</link><dc:creator>gschoeni</dc:creator><comments>https://news.ycombinator.com/item?id=38940153</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=38940153</guid></item><item><title><![CDATA[New comment by gschoeni in "MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts"]]></title><description><![CDATA[
<p>We went over it in our Friday paper club before the holidays which helped me gain an intuition.<p><a href="https://blog.oxen.ai/mamba-linear-time-sequence-modeling-with-selective-state-spaces-arxiv-dives/" rel="nofollow">https://blog.oxen.ai/mamba-linear-time-sequence-modeling-wit...</a><p>I'm still not convinced on Mamba's performance on Natural Language tasks, but maybe it's just because they haven't trained a large enough model on enough data yet.</p>
]]></description><pubDate>Wed, 10 Jan 2024 06:43:55 +0000</pubDate><link>https://news.ycombinator.com/item?id=38937034</link><dc:creator>gschoeni</dc:creator><comments>https://news.ycombinator.com/item?id=38937034</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=38937034</guid></item><item><title><![CDATA[New comment by gschoeni in "Ask HN: AI/ML papers to catch up with current state of AI?"]]></title><description><![CDATA[
<p>I put together a reading list for Andrej Karpathy's intro to LLMs that would be helpful for all of the latest LLM and multi-modal architectures:<p><a href="https://blog.oxen.ai/reading-list-for-andrej-karpathys-intro-to-large-language-models-video/" rel="nofollow noreferrer">https://blog.oxen.ai/reading-list-for-andrej-karpathys-intro...</a></p>
]]></description><pubDate>Fri, 15 Dec 2023 21:52:14 +0000</pubDate><link>https://news.ycombinator.com/item?id=38659260</link><dc:creator>gschoeni</dc:creator><comments>https://news.ycombinator.com/item?id=38659260</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=38659260</guid></item><item><title><![CDATA[New comment by gschoeni in "[dead]"]]></title><description><![CDATA[
<p>Have been studying the Mamba architecture all week and put together my notes here:<p><a href="https://blog.oxen.ai/mamba-linear-time-sequence-modeling-with-selective-state-spaces-arxiv-dives/" rel="nofollow noreferrer">https://blog.oxen.ai/mamba-linear-time-sequence-modeling-wit...</a><p>I hadn't found a very satisfying explanation of the paper yet, and still had some questions at the end, but hopefully this can give people a good jumping off point for their understanding!</p>
]]></description><pubDate>Fri, 15 Dec 2023 21:36:52 +0000</pubDate><link>https://news.ycombinator.com/item?id=38659127</link><dc:creator>gschoeni</dc:creator><comments>https://news.ycombinator.com/item?id=38659127</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=38659127</guid></item><item><title><![CDATA[New comment by gschoeni in "[dead]"]]></title><description><![CDATA[
<p>Hey all, I ran some experiments benchmarking fine-tuning ViT, ResNet50, and CLIP on a Facial Emotion Recognition dataset. I had read the original papers the past few weeks, but wanted to do some practical hands on use of the models themselves.<p><a href="https://blog.oxen.ai/practical-ml-dive-how-to-customize-a-vision-transformer-on-your-own-data/" rel="nofollow noreferrer">https://blog.oxen.ai/practical-ml-dive-how-to-customize-a-vi...</a><p>~ TLDR ~ ViT works the best in this small experiment, with minimal code. The experiment was classifying 7 different facial emotions such as "happy", "sad", "angry", etc...<p>Model Accuracy<p>* ViT - 69%
* ResNet50 64%
* Zero-Shot CLIP - 53%<p>Was honestly most impressed with CLIP's ability for zero-shot transfer, even though it had the worst accuracy. The ability to give it a freeform list of prompts or labels and it will automatically classify into the subset without training feels like the future of prototyping products and models, then once you define your use case go with something more performant like a ViT.<p>Anyways, I had fun writing the code and running the experiments, so thought I would share!</p>
]]></description><pubDate>Thu, 14 Dec 2023 16:15:44 +0000</pubDate><link>https://news.ycombinator.com/item?id=38642967</link><dc:creator>gschoeni</dc:creator><comments>https://news.ycombinator.com/item?id=38642967</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=38642967</guid></item><item><title><![CDATA[New comment by gschoeni in "Ask HN: Can we do better than Git for version control?"]]></title><description><![CDATA[
<p>We've been working on a data version control system called "oxen" optimized for large unstructured datasets that we are seeing more and more with the advent of many of the generative AI techniques.<p>Many of these datasets have many many images, videos, audio files, text as well as structured tabular datasets that git or git-lfs just falls flat on.<p>Would love anyone to kick the tires on it and let us know what you think:<p><a href="https://github.com/Oxen-AI/oxen-release">https://github.com/Oxen-AI/oxen-release</a><p>The commands are mirrored after git so it is easy to learn, but optimized under the hood for larger datasets.</p>
]]></description><pubDate>Sun, 10 Dec 2023 18:34:57 +0000</pubDate><link>https://news.ycombinator.com/item?id=38593672</link><dc:creator>gschoeni</dc:creator><comments>https://news.ycombinator.com/item?id=38593672</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=38593672</guid></item><item><title><![CDATA[New comment by gschoeni in "[dead]"]]></title><description><![CDATA[
<p>Hey all, we had a lively group discussion today on the 2021 CLIP paper from OpenAI.<p>Every Friday we've been going over the fundamentals of a lot of the state of the art techniques used in Machine Learning today. Hoping to learn a little each week, and spot patterns we can apply to our own work. I feel like there's always a little nugget of information I didn't fully understand before reading the paper, so have been finding it helpful.<p>Though it is not groundbreaking research as of this week, I think it's nice to take a step back and review the fundamentals as well as keeping up with the latest and greatest.<p>Posted the notes and video recap are here if anyone finds it helpful:<p><a href="https://blog.oxen.ai/arxiv-dives-zero-shot-image-classification-with-clip/" rel="nofollow noreferrer">https://blog.oxen.ai/arxiv-dives-zero-shot-image-classificat...</a><p>Also would love to have anyone join us live on Fridays or suggest papers! We've got a pretty consistent and fun group of 400+ engineers and researchers popping in and out.</p>
]]></description><pubDate>Fri, 08 Dec 2023 23:29:52 +0000</pubDate><link>https://news.ycombinator.com/item?id=38576248</link><dc:creator>gschoeni</dc:creator><comments>https://news.ycombinator.com/item?id=38576248</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=38576248</guid></item></channel></rss>