<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: lllllm</title><link>https://news.ycombinator.com/user?id=lllllm</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Mon, 22 Jun 2026 22:09:33 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=lllllm" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by lllllm in "Ask HN: Who is hiring? (October 2025)"]]></title><description><![CDATA[
<p>Swiss AI Initiative | <a href="https://www.swiss-ai.org/" rel="nofollow">https://www.swiss-ai.org/</a> | Hybrid/ONSITE (in Europe)<p>We are a young team, and the creators of the Apertus LLM, the currently leading open-data open-weights AI model.<p>Join us to work on cutting edge LLM training in the open. We do pretraining, alignment, reasoning, multilinguality and multimodality - all at the intersection of engineering and research.<p>This is a joint team between ETH Zurich and EPFL in Lausanne, running on the Alps supercomputer (one of the largest public institution GPU cluster).
Visa sponsoring possible, work language is English.<p><a href="https://careers.epfl.ch/job/Lausanne-AI-Research-Engineers-Swiss-AI-Initiative/1163395655/" rel="nofollow">https://careers.epfl.ch/job/Lausanne-AI-Research-Engineers-S...</a></p>
]]></description><pubDate>Thu, 02 Oct 2025 07:39:05 +0000</pubDate><link>https://news.ycombinator.com/item?id=45447179</link><dc:creator>lllllm</dc:creator><comments>https://news.ycombinator.com/item?id=45447179</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45447179</guid></item><item><title><![CDATA[New comment by lllllm in "Apertus 70B: Truly Open - Swiss LLM by ETH, EPFL and CSCS"]]></title><description><![CDATA[
<p>yes this seems a good way to go. for example you can already find many quantized versions under <a href="https://huggingface.co/models?search=apertus%20mlx" rel="nofollow">https://huggingface.co/models?search=apertus%20mlx</a>  and elsewhere</p>
]]></description><pubDate>Sat, 06 Sep 2025 20:07:46 +0000</pubDate><link>https://news.ycombinator.com/item?id=45152436</link><dc:creator>lllllm</dc:creator><comments>https://news.ycombinator.com/item?id=45152436</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45152436</guid></item><item><title><![CDATA[New comment by lllllm in "Apertus 70B: Truly Open - Swiss LLM by ETH, EPFL and CSCS"]]></title><description><![CDATA[
<p>thank you!</p>
]]></description><pubDate>Sat, 06 Sep 2025 11:39:27 +0000</pubDate><link>https://news.ycombinator.com/item?id=45148399</link><dc:creator>lllllm</dc:creator><comments>https://news.ycombinator.com/item?id=45148399</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45148399</guid></item><item><title><![CDATA[New comment by lllllm in "Apertus 70B: Truly Open - Swiss LLM by ETH, EPFL and CSCS"]]></title><description><![CDATA[
<p>We hear you, nevertheless this is one of the very few open-weights and open-data LLMs, and the license is still very permissive (compare for example to Llama). Personally of course I'd like to remove the additional click, but the universities also have a say in this.</p>
]]></description><pubDate>Sat, 06 Sep 2025 11:38:38 +0000</pubDate><link>https://news.ycombinator.com/item?id=45148394</link><dc:creator>lllllm</dc:creator><comments>https://news.ycombinator.com/item?id=45148394</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45148394</guid></item><item><title><![CDATA[New comment by lllllm in "Apertus 70B: Truly Open - Swiss LLM by ETH, EPFL and CSCS"]]></title><description><![CDATA[
<p>The pretraining (so 99% of training) is fully global, in over 1000 languages without special weighting. The posttraining (See section 4 of the paper) had also as many languages as we could get, and did upweight some languages. The posttraining can easily be customized to any other target languages</p>
]]></description><pubDate>Sat, 06 Sep 2025 10:31:58 +0000</pubDate><link>https://news.ycombinator.com/item?id=45148120</link><dc:creator>lllllm</dc:creator><comments>https://news.ycombinator.com/item?id=45148120</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45148120</guid></item><item><title><![CDATA[New comment by lllllm in "Apertus 70B: Truly Open - Swiss LLM by ETH, EPFL and CSCS"]]></title><description><![CDATA[
<p>common crawl anyway respects the CCbot opt-out every time they do a crawl.<p>we went a step further because back in old ages (2013 is our oldest training data) LLMs did not exist, so website owners opting out today of AI crawlers might like the option to also remove their past contents.<p>arguments can be made either way but we tried to remain on the cautious side at this point.<p>we also wrote a paper on how this additional removal affects downstream performance of the LLM <a href="https://arxiv.org/abs/2504.06219" rel="nofollow">https://arxiv.org/abs/2504.06219</a>  (it does so surprisingly little)</p>
]]></description><pubDate>Fri, 05 Sep 2025 22:56:44 +0000</pubDate><link>https://news.ycombinator.com/item?id=45144665</link><dc:creator>lllllm</dc:creator><comments>https://news.ycombinator.com/item?id=45144665</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45144665</guid></item><item><title><![CDATA[New comment by lllllm in "Apertus 70B: Truly Open - Swiss LLM by ETH, EPFL and CSCS"]]></title><description><![CDATA[
<p>martin here from the apertus team, happy to answer any questions if i can.<p>the full collection of models is here: <a href="https://huggingface.co/collections/swiss-ai/apertus-llm-68b699e65415c231ace3b059" rel="nofollow">https://huggingface.co/collections/swiss-ai/apertus-llm-68b6...</a><p>PS: you can run this locally on your mac with this one-liner:<p>pip install mlx-lm<p>mlx_lm.generate --model mlx-community/Apertus-8B-Instruct-2509-8bit --prompt "who are you?"</p>
]]></description><pubDate>Fri, 05 Sep 2025 22:33:36 +0000</pubDate><link>https://news.ycombinator.com/item?id=45144461</link><dc:creator>lllllm</dc:creator><comments>https://news.ycombinator.com/item?id=45144461</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45144461</guid></item><item><title><![CDATA[New comment by lllllm in "Apertus 70B: Truly Open - Swiss LLM by ETH, EPFL and CSCS"]]></title><description><![CDATA[
<p>we compared to GPT-OSS-20B, Llama 4, Qwen 3, among many others. Which models do you think are missing, among open weights and fully-open models?<p>Note that we have a specific focus on multilinguality (over 1000 languages supported), not only on english</p>
]]></description><pubDate>Fri, 05 Sep 2025 22:28:55 +0000</pubDate><link>https://news.ycombinator.com/item?id=45144417</link><dc:creator>lllllm</dc:creator><comments>https://news.ycombinator.com/item?id=45144417</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45144417</guid></item><item><title><![CDATA[New comment by lllllm in "Apertus 70B: Truly Open - Swiss LLM by ETH, EPFL and CSCS"]]></title><description><![CDATA[
<p>we didn't have time to write one yet, but there is the tech report which has a lot of details already</p>
]]></description><pubDate>Fri, 05 Sep 2025 21:47:38 +0000</pubDate><link>https://news.ycombinator.com/item?id=45144029</link><dc:creator>lllllm</dc:creator><comments>https://news.ycombinator.com/item?id=45144029</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45144029</guid></item><item><title><![CDATA[New comment by lllllm in "Apertus 70B: Truly Open - Swiss LLM by ETH, EPFL and CSCS"]]></title><description><![CDATA[
<p>posttraining codebase is here: <a href="https://github.com/swiss-ai/posttraining" rel="nofollow">https://github.com/swiss-ai/posttraining</a></p>
]]></description><pubDate>Fri, 05 Sep 2025 21:43:20 +0000</pubDate><link>https://news.ycombinator.com/item?id=45143988</link><dc:creator>lllllm</dc:creator><comments>https://news.ycombinator.com/item?id=45143988</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45143988</guid></item><item><title><![CDATA[New comment by lllllm in "Apertus 70B: Truly Open - Swiss LLM by ETH, EPFL and CSCS"]]></title><description><![CDATA[
<p>we released 81 intermediate checkpoints of the whole pretraining phase, and the code and data to reproduce. so full audit is surely possible - still it would depend on what you consider 'practical' here.</p>
]]></description><pubDate>Fri, 05 Sep 2025 21:39:02 +0000</pubDate><link>https://news.ycombinator.com/item?id=45143950</link><dc:creator>lllllm</dc:creator><comments>https://news.ycombinator.com/item?id=45143950</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45143950</guid></item><item><title><![CDATA[New comment by lllllm in "Apertus 70B: Truly Open - Swiss LLM by ETH, EPFL and CSCS"]]></title><description><![CDATA[
<p>benchmarks: we provide plenty in the over 100 page tech report here
<a href="https://github.com/swiss-ai/apertus-tech-report/blob/main/Apertus_Tech_Report.pdf" rel="nofollow">https://github.com/swiss-ai/apertus-tech-report/blob/main/Ap...</a><p>quantizations: available now in MLX <a href="https://github.com/ml-explore/mlx-lm" rel="nofollow">https://github.com/ml-explore/mlx-lm</a> (gguf coming soon, not trivial due to new architecture)<p>model sizes: still many good dense models today lie in the range between our small and large chosen sizes</p>
]]></description><pubDate>Fri, 05 Sep 2025 21:36:41 +0000</pubDate><link>https://news.ycombinator.com/item?id=45143911</link><dc:creator>lllllm</dc:creator><comments>https://news.ycombinator.com/item?id=45143911</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45143911</guid></item><item><title><![CDATA[New comment by lllllm in "ETH Zurich and EPFL to release a LLM developed on public infrastructure"]]></title><description><![CDATA[
<p>this is what this paper tries to answer: <a href="https://arxiv.org/abs/2504.06219" rel="nofollow">https://arxiv.org/abs/2504.06219</a>
the quality gap is surprisingly small between compliant and not</p>
]]></description><pubDate>Sat, 12 Jul 2025 08:45:00 +0000</pubDate><link>https://news.ycombinator.com/item?id=44540399</link><dc:creator>lllllm</dc:creator><comments>https://news.ycombinator.com/item?id=44540399</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44540399</guid></item><item><title><![CDATA[New comment by lllllm in "ETH Zurich and EPFL to release a LLM developed on public infrastructure"]]></title><description><![CDATA[
<p>absolutely! i've sent you a linkedin message last week. but here seems to work much better, thanks a lot!</p>
]]></description><pubDate>Sat, 12 Jul 2025 08:11:30 +0000</pubDate><link>https://news.ycombinator.com/item?id=44540233</link><dc:creator>lllllm</dc:creator><comments>https://news.ycombinator.com/item?id=44540233</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44540233</guid></item><item><title><![CDATA[New comment by lllllm in "ETH Zurich and EPFL to release a LLM developed on public infrastructure"]]></title><description><![CDATA[
<p>we kept all 1800+ (script/language) pairs, not only the quality filtered ones. the question if a mix of quality filtered and not languages impacts the mixing is still an open question. preliminary research (Section 4.2.7 of <a href="https://arxiv.org/abs/2502.10361" rel="nofollow">https://arxiv.org/abs/2502.10361</a> ) indicates that quality filtering can mitigate the curse of multilinguality to some degree, so facilitate cross-lingual generalization, but it has to be seen how strong this effect is on larger scale</p>
]]></description><pubDate>Sat, 12 Jul 2025 08:06:36 +0000</pubDate><link>https://news.ycombinator.com/item?id=44540219</link><dc:creator>lllllm</dc:creator><comments>https://news.ycombinator.com/item?id=44540219</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44540219</guid></item><item><title><![CDATA[New comment by lllllm in "ETH Zurich and EPFL to release a LLM developed on public infrastructure"]]></title><description><![CDATA[
<p>no. the main source is fineweb2, but with additional filtering for compliance, toxicity removal, and quality filters such as fineweb2-hq</p>
]]></description><pubDate>Sat, 12 Jul 2025 07:13:22 +0000</pubDate><link>https://news.ycombinator.com/item?id=44539987</link><dc:creator>lllllm</dc:creator><comments>https://news.ycombinator.com/item?id=44539987</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44539987</guid></item><item><title><![CDATA[New comment by lllllm in "ETH Zurich and EPFL to release a LLM developed on public infrastructure"]]></title><description><![CDATA[
<p>Yes this is an interesting question. In our arxiv paper [1] we did study this for news articles, and also removed duplicates of articles (decontamination). We did not observe an impact on the downstream accuracy of the LLM, in the case of news data.<p>[1] <a href="https://arxiv.org/abs/2504.06219" rel="nofollow">https://arxiv.org/abs/2504.06219</a></p>
]]></description><pubDate>Sat, 12 Jul 2025 07:12:02 +0000</pubDate><link>https://news.ycombinator.com/item?id=44539981</link><dc:creator>lllllm</dc:creator><comments>https://news.ycombinator.com/item?id=44539981</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44539981</guid></item><item><title><![CDATA[New comment by lllllm in "ETH Zurich and EPFL to release a LLM developed on public infrastructure"]]></title><description><![CDATA[
<p>No, the model has nothing do to with Llama. We are using our own architecture, and training from scratch. Llama also does not have open training data, and is non-compliant, in contrast to this model.<p>Source: I'm part of the training team</p>
]]></description><pubDate>Sat, 12 Jul 2025 06:39:41 +0000</pubDate><link>https://news.ycombinator.com/item?id=44539869</link><dc:creator>lllllm</dc:creator><comments>https://news.ycombinator.com/item?id=44539869</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44539869</guid></item><item><title><![CDATA[New comment by lllllm in "Planet squeezed in between two stars"]]></title><description><![CDATA[
<p>animation of it: <a href="https://youtu.be/ewg36czOOiI?si=moL9g9Xz2-vVClZX" rel="nofollow">https://youtu.be/ewg36czOOiI?si=moL9g9Xz2-vVClZX</a></p>
]]></description><pubDate>Sat, 24 May 2025 04:01:59 +0000</pubDate><link>https://news.ycombinator.com/item?id=44078663</link><dc:creator>lllllm</dc:creator><comments>https://news.ycombinator.com/item?id=44078663</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44078663</guid></item><item><title><![CDATA[Distributed Collaborative ML (and LLMs) in the Browser]]></title><description><![CDATA[
<p>Article URL: <a href="https://github.com/epfml/disco">https://github.com/epfml/disco</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=41059173">https://news.ycombinator.com/item?id=41059173</a></p>
<p>Points: 2</p>
<p># Comments: 0</p>
]]></description><pubDate>Wed, 24 Jul 2024 17:01:52 +0000</pubDate><link>https://github.com/epfml/disco</link><dc:creator>lllllm</dc:creator><comments>https://news.ycombinator.com/item?id=41059173</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41059173</guid></item></channel></rss>