Hacker News: lllllm

New comment by lllllm in "Ask HN: Who is hiring? (October 2025)"

lllllm — Thu, 02 Oct 2025 07:39:05 +0000

Swiss AI Initiative | https://www.swiss-ai.org/ | Hybrid/ONSITE (in Europe)

We are a young team, and the creators of the Apertus LLM, the currently leading open-data open-weights AI model.

Join us to work on cutting edge LLM training in the open. We do pretraining, alignment, reasoning, multilinguality and multimodality - all at the intersection of engineering and research.

This is a joint team between ETH Zurich and EPFL in Lausanne, running on the Alps supercomputer (one of the largest public institution GPU cluster). Visa sponsoring possible, work language is English.

https://careers.epfl.ch/job/Lausanne-AI-Research-Engineers-S...

New comment by lllllm in "Apertus 70B: Truly Open - Swiss LLM by ETH, EPFL and CSCS"

lllllm — Sat, 06 Sep 2025 20:07:46 +0000

yes this seems a good way to go. for example you can already find many quantized versions under https://huggingface.co/models?search=apertus%20mlx and elsewhere

New comment by lllllm in "Apertus 70B: Truly Open - Swiss LLM by ETH, EPFL and CSCS"

lllllm — Sat, 06 Sep 2025 11:39:27 +0000

thank you!

New comment by lllllm in "Apertus 70B: Truly Open - Swiss LLM by ETH, EPFL and CSCS"

lllllm — Sat, 06 Sep 2025 11:38:38 +0000

We hear you, nevertheless this is one of the very few open-weights and open-data LLMs, and the license is still very permissive (compare for example to Llama). Personally of course I'd like to remove the additional click, but the universities also have a say in this.

New comment by lllllm in "Apertus 70B: Truly Open - Swiss LLM by ETH, EPFL and CSCS"

lllllm — Sat, 06 Sep 2025 10:31:58 +0000

The pretraining (so 99% of training) is fully global, in over 1000 languages without special weighting. The posttraining (See section 4 of the paper) had also as many languages as we could get, and did upweight some languages. The posttraining can easily be customized to any other target languages

New comment by lllllm in "Apertus 70B: Truly Open - Swiss LLM by ETH, EPFL and CSCS"

lllllm — Fri, 05 Sep 2025 22:56:44 +0000

common crawl anyway respects the CCbot opt-out every time they do a crawl.

we went a step further because back in old ages (2013 is our oldest training data) LLMs did not exist, so website owners opting out today of AI crawlers might like the option to also remove their past contents.

arguments can be made either way but we tried to remain on the cautious side at this point.

we also wrote a paper on how this additional removal affects downstream performance of the LLM https://arxiv.org/abs/2504.06219 (it does so surprisingly little)

New comment by lllllm in "Apertus 70B: Truly Open - Swiss LLM by ETH, EPFL and CSCS"

lllllm — Fri, 05 Sep 2025 22:33:36 +0000

martin here from the apertus team, happy to answer any questions if i can.

the full collection of models is here: https://huggingface.co/collections/swiss-ai/apertus-llm-68b6...

PS: you can run this locally on your mac with this one-liner:

pip install mlx-lm

mlx_lm.generate --model mlx-community/Apertus-8B-Instruct-2509-8bit --prompt "who are you?"

New comment by lllllm in "Apertus 70B: Truly Open - Swiss LLM by ETH, EPFL and CSCS"

lllllm — Fri, 05 Sep 2025 22:28:55 +0000

we compared to GPT-OSS-20B, Llama 4, Qwen 3, among many others. Which models do you think are missing, among open weights and fully-open models?

Note that we have a specific focus on multilinguality (over 1000 languages supported), not only on english

New comment by lllllm in "Apertus 70B: Truly Open - Swiss LLM by ETH, EPFL and CSCS"

lllllm — Fri, 05 Sep 2025 21:47:38 +0000

we didn't have time to write one yet, but there is the tech report which has a lot of details already

New comment by lllllm in "Apertus 70B: Truly Open - Swiss LLM by ETH, EPFL and CSCS"

lllllm — Fri, 05 Sep 2025 21:43:20 +0000

posttraining codebase is here: https://github.com/swiss-ai/posttraining

New comment by lllllm in "Apertus 70B: Truly Open - Swiss LLM by ETH, EPFL and CSCS"

lllllm — Fri, 05 Sep 2025 21:39:02 +0000

we released 81 intermediate checkpoints of the whole pretraining phase, and the code and data to reproduce. so full audit is surely possible - still it would depend on what you consider 'practical' here.

New comment by lllllm in "Apertus 70B: Truly Open - Swiss LLM by ETH, EPFL and CSCS"

lllllm — Fri, 05 Sep 2025 21:36:41 +0000

benchmarks: we provide plenty in the over 100 page tech report here https://github.com/swiss-ai/apertus-tech-report/blob/main/Ap...

quantizations: available now in MLX https://github.com/ml-explore/mlx-lm (gguf coming soon, not trivial due to new architecture)

model sizes: still many good dense models today lie in the range between our small and large chosen sizes

New comment by lllllm in "ETH Zurich and EPFL to release a LLM developed on public infrastructure"

lllllm — Sat, 12 Jul 2025 08:45:00 +0000

this is what this paper tries to answer: https://arxiv.org/abs/2504.06219 the quality gap is surprisingly small between compliant and not

New comment by lllllm in "ETH Zurich and EPFL to release a LLM developed on public infrastructure"

lllllm — Sat, 12 Jul 2025 08:11:30 +0000

absolutely! i've sent you a linkedin message last week. but here seems to work much better, thanks a lot!

New comment by lllllm in "ETH Zurich and EPFL to release a LLM developed on public infrastructure"

lllllm — Sat, 12 Jul 2025 08:06:36 +0000

we kept all 1800+ (script/language) pairs, not only the quality filtered ones. the question if a mix of quality filtered and not languages impacts the mixing is still an open question. preliminary research (Section 4.2.7 of https://arxiv.org/abs/2502.10361 ) indicates that quality filtering can mitigate the curse of multilinguality to some degree, so facilitate cross-lingual generalization, but it has to be seen how strong this effect is on larger scale

New comment by lllllm in "ETH Zurich and EPFL to release a LLM developed on public infrastructure"

lllllm — Sat, 12 Jul 2025 07:13:22 +0000

no. the main source is fineweb2, but with additional filtering for compliance, toxicity removal, and quality filters such as fineweb2-hq

New comment by lllllm in "ETH Zurich and EPFL to release a LLM developed on public infrastructure"

lllllm — Sat, 12 Jul 2025 07:12:02 +0000

Yes this is an interesting question. In our arxiv paper [1] we did study this for news articles, and also removed duplicates of articles (decontamination). We did not observe an impact on the downstream accuracy of the LLM, in the case of news data.

[1] https://arxiv.org/abs/2504.06219

New comment by lllllm in "ETH Zurich and EPFL to release a LLM developed on public infrastructure"

lllllm — Sat, 12 Jul 2025 06:39:41 +0000

No, the model has nothing do to with Llama. We are using our own architecture, and training from scratch. Llama also does not have open training data, and is non-compliant, in contrast to this model.

Source: I'm part of the training team

New comment by lllllm in "Planet squeezed in between two stars"

lllllm — Sat, 24 May 2025 04:01:59 +0000

animation of it: https://youtu.be/ewg36czOOiI?si=moL9g9Xz2-vVClZX

Distributed Collaborative ML (and LLMs) in the Browser

lllllm — Wed, 24 Jul 2024 17:01:52 +0000

Article URL: https://github.com/epfml/disco

Comments URL: https://news.ycombinator.com/item?id=41059173

Points: 2

# Comments: 0