Hacker News: canyon289

New comment by canyon289 in "Running Gemma 4 locally with LM Studio's new headless CLI and Claude Code"

canyon289 — Sun, 05 Apr 2026 18:48:17 +0000

This is a nice writeup!

New comment by canyon289 in "Google releases Gemma 4 open models"

canyon289 — Thu, 02 Apr 2026 18:06:52 +0000

This is going to sound like a corp answer but I mean this genuinely as an individual engineer. Google is a leader in its field and that means we get to chart our own path and do what is best for research and for users.

I personally strive to build software and models provides provides the best and most usable experience for lots of people. I did this before I joined google with open source, and my writing on "old school" generative models, and I'm lucky that I get to this at Google in the current LLM era.

New comment by canyon289 in "Google releases Gemma 4 open models"

canyon289 — Thu, 02 Apr 2026 17:41:14 +0000

I dont have the metrics off hand, but I'd say try it and see if you're impressed! What matters at the end of the day is if its useful for your use cases and only you'll be able to assess that!

New comment by canyon289 in "Google releases Gemma 4 open models"

canyon289 — Thu, 02 Apr 2026 17:40:07 +0000

On this one I dont know :) I'll ask my friends on the evaluation side of things how they do this

New comment by canyon289 in "Google releases Gemma 4 open models"

canyon289 — Thu, 02 Apr 2026 17:39:31 +0000

Its hard to say because Pixel comes prepacked with a lot of models, not just ones that that are text output models.

With the caveat that I'm not on the pixel team and I'm not building _all_ the models that are on google's devices, its evident there are many models that support the Android experience. For example the one mentioned here

https://store.google.com/us/magazine/magic-editor?hl=en-US&p...

New comment by canyon289 in "Google releases Gemma 4 open models"

canyon289 — Thu, 02 Apr 2026 17:29:18 +0000

We are always figuring out what parameter size makes sense.

The decision is always a mix between how good we can make the models from a technical aspect, with how good they need to be to make all of you super excited to use them. And its a bit of a challenge what is an ever changing ecosystem.

I'm personally curious is there a certain parameter size you're looking for?

New comment by canyon289 in "Google releases Gemma 4 open models"

canyon289 — Thu, 02 Apr 2026 17:09:02 +0000

You could try Gemma4 :D

New comment by canyon289 in "Google releases Gemma 4 open models"

canyon289 — Thu, 02 Apr 2026 17:08:12 +0000

Hi all! I work on the Gemma team, one of many as this one was a bigger effort given it was a mainline release. Happy to answer whatever questions I can

New comment by canyon289 in "Qwen3.5 Fine-Tuning Guide"

canyon289 — Wed, 04 Mar 2026 15:45:39 +0000

I work on Gemma and Gemini models I want to echo Daniel's point here. Small finetuned models have their place even with larger general purpose models.

For example last year with Daniel/Unsloth's help we released a tiny specialized model that can get equivalent to Gemini level purpose specifically for FC. For folks that need efficient limited purpose models small models like this can fit a specific need.

https://blog.google/innovation-and-ai/technology/developers-...

Especially on device. https://developers.googleblog.com/on-device-function-calling...

It's the same with chips, we have general purpose CPUs but we still have specialized silicon for tasks that are smaller, more power efficient, cheaper, and because they're single purpose it simplifies and derisks certain designs.

And I have to add, if you want to learn about finetuning models efficiently the Unsloth guides are at the top of my list. They're practical, have all the technical details, and most importantly Daniel and the others are working around the clock to keep it up to date in what is an incredibly fast moving space of models and hardware. I am continually astounded by their work.

New comment by canyon289 in "FunctionGemma 270M Model"

canyon289 — Fri, 19 Dec 2025 14:21:08 +0000

I'm with you! Small generative models are awesome, I thought so a decade ago and I still think so now! The size of what is "small" has definitely increased though, I used to think a 100 parameter model was large back in 2016, but here I am now saying 270 million is small :)

New comment by canyon289 in "FunctionGemma 270M Model"

canyon289 — Fri, 19 Dec 2025 14:19:04 +0000

Good insight here, we actually did not include thinking into this model partly because we saw how incredibly fast it was to just get the minimum amount of tokens to output an answer.

Thinking helps performance scores but we'll leave it up to users to add additional tokens if they want. Our goal here was the leanest weight and token base for blazing fast performance for you all.

New comment by canyon289 in "FunctionGemma 270M Model"

canyon289 — Fri, 19 Dec 2025 04:21:21 +0000

Its definitely a step in that direction. I use Gemma models on my local macbook all the time and am personally excited to have this one available for me at home now as well

New comment by canyon289 in "FunctionGemma 270M Model"

canyon289 — Fri, 19 Dec 2025 04:20:18 +0000

It depends on a couple of things. If you expect reasoning or frontier level chat abilities then larger Gemma models or Gemini is better.

Another hard constraint is context limit, Gemma 270m is at 32k so if the search results returned are massive then this not a great model. The larger 4b+ Gemma models have 128k, and Gemini token window is in the millions

New comment by canyon289 in "FunctionGemma 270M Model"

canyon289 — Thu, 18 Dec 2025 22:15:14 +0000

I've only just skimmed this blog post but if I'm reading correctly FunctionGemma can work just like what's intended here, a "contextless" tool router.

Going one level up you as a developer have a choice how much context you want to provide to the model. Philipp Schmid wrote a good blog post about this, titling this "context engineering". I like his idea because instead of just blindly throwing stuff into a model's context window and hoping to get good performance, it encourages folks to think more about how what's going into the context in each turn.

https://www.philschmid.de/context-engineering

Similarly I think the blog post you linked has a similar sentiment. There's nuanced approaches that can yield better results if an engineering mindset is applied.

New comment by canyon289 in "FunctionGemma 270M Model"

canyon289 — Thu, 18 Dec 2025 22:11:45 +0000

I want to say so much right now but I can't :)

The most generic thing I can say is I really do like working at Google because its one of the few (maybe only) company that has models of all sizes and capabilities. Because of this research and product development is insanely fun and feels "magical" when things just click together.

Keep following the Google Developer channels/blogs whatever. Google as a whole is pushing hard in this space and I personally think is building stuff that felt like science fiction just 3 years ago.

New comment by canyon289 in "FunctionGemma 270M Model"

canyon289 — Thu, 18 Dec 2025 21:43:32 +0000

:popcorn gif:

New comment by canyon289 in "T5Gemma 2: The next generation of encoder-decoder models"

canyon289 — Thu, 18 Dec 2025 21:19:46 +0000

Hi, I'm not on the t5 Gemma team but work on gemma in general.

Encoder Decoder comes from the original transformers implementation way back in 2017. If you look at figure 1 you'll see what the first transformer ever looked like.

Since that time different implementations of transformers use either just the encoder portion, or the decoder portion, or both. Its a deep topic so hard to summarize here, but Gemini explains it really well! Hope this gets you started on some prompting to learn more

https://arxiv.org/pdf/1706.03762

New comment by canyon289 in "FunctionGemma 270M Model"

canyon289 — Thu, 18 Dec 2025 21:15:28 +0000

I'm not specifically promising anything but I do want to say 2026 is going to be a great year! Many of my colleagues are shipping models too, such as t5gemma which is on the front page, and I'm personally excited to see what we're all collectively going to release in the coming year.

New comment by canyon289 in "FunctionGemma 270M Model"

canyon289 — Thu, 18 Dec 2025 21:13:49 +0000

> Do you recommend any particular mix or focus in the dataset for finetuning this model, without losing too much generality?

Astute questions, there's sort of two ways to think about finetuning, 1. Obliterate any general functionality and train the model on your general commands 2. As you asked maintain generality trying to preserve initial model ability

For 2 typically low learning rate or LORA is a good strategy. We show an example in our the finetuning tutorial in the blog.

> 2. do you have any recommendations for how many examples per-tool? This depends on the tool complexity and the variety of user inputs. So a simple tool like turn_flashlight_on(), with no args, will get taught quickly, especially if say you're only prompting in English.

But if you have a more complex function like get_weather(lat, lon, day, region, date) and have prompts coming in in English, Chinese, Gujarati and spanish, the model needs to do a lot more "heavy lifting" to both translate a request and fill out a complex query. We know as programmers date by themselves are insanely complex in natural language (12/18/2025 vs 18/12/2025).

To get this right it'll help the model if it was trained on data that shows it the versions of variations of inputs possible.

Long answer but I hope this makes sense.

New comment by canyon289 in "FunctionGemma 270M Model"

canyon289 — Thu, 18 Dec 2025 20:59:16 +0000

We evaluate many things that you alluded to, such as speed on device, output correctness, and also "is this something that would be useful" the last one being a bit abstract.

The way we think about it is what do we think developers and users need, and is there a way we can fill that gap in a useful way. With this model we had the hypothesis you had, there are fantastic larger models out there pushing the frontier of AI capabilities, but there's also a nice for smaller customizable model that's quick to run and quick to tune.

What is optimal then ultimately falls to you and your use cases (which I'm guessing at here), you have options now between Gemini and Gemma.