Googles Gemma 4 Just Shocked The AI Industry

🔴 News

Gemma 4 від Google шокує індустрію AI ефективністю

The AI Grid•близько 2 місяців тому•4 квіт. 2026•Impact 6/10

Позитивна

AI Аналіз

Google випустила Gemma 4, нову лінійку відкритих AI-моделей для запуску на персональних пристроях. Моделі високоефективні, конкурують з більшими моделями за продуктивністю, використовуючи значно менше параметрів, що робить локальну та приватну обробку AI доступнішою.

Ключові тези

Моделі Gemma 4 розроблені для локального виконання на пристроях, таких як телефони та ноутбуки.
Моделі досягають продуктивності, порівнянної з більшими моделями, зі значно меншою кількістю параметрів.
Google випускає Gemma 4 під відкритою ліцензією Apache 2.0.

Можливості

Запуск AI-рішень без залежності від хмарних сервісів • Повна конфіденційність даних завдяки локальній обробці • Безкоштовна ліцензія Apache 2.0 для комерційного використання

Нюанси

Gemma 4 хоч і опенсорсна, але потребує значних обчислювальних ресурсів для великих моделей, що може обмежити її використання для деяких користувачів.

Опис відео

▼

So Google just stunned the open source world. So let's talk about it. So Google have just released Gemma 4 and I'm going to leave it to Google to take it away for you and then I will get into all of the details. >> Hi, my name is Olivier and I'm a group product manager on the Gemma team. Since we launched our first models, the developer community has absolutely blown us away. Over 400 million downloads, over 100,000 variants. You've built a vibrant ecosystem around JMA and we could not be more grateful. We've listened very closely to what you wanted next. And today, we are thrilled to announce Gemma 4. Built from the same world-class research and technology behind Gemini 3, Gemma 4 is our family of open models designed to run directly on the hardware you own, phone, laptops, and desktop. For the first time ever, we are releasing Jimma under an open-source Apache 2.0 license. Jima 4 is built for the agentic era. It can handle complex logic, multi-step planning and agentic workflows, making optimal use of tokens for its intelligence. The bigger model perform well with a context window of up to quarreion tokens allowing you to analyze entire code bases and all multi-turn agentic use cases. It features native support for tool use allowing you to build agent that plan and act on your behalf. Let's break down the model family now. First, a 26B mixture of experts and a 31B dense models. These provide frontier intelligence directly on your personal computer. You can run state-of-the-art local reasoning and coding pipeline without needing to upload data outside of your control environment. The 26b with 3.8 billion activated parameters is exceptionally fast while 31B is optimized for output quality. Then we have our effective 2B and effective 4B models engineered for maximum memory efficiency. This model brings a whole new level of intelligence to mobile and IoT devices with combined audio and vision support for real-time processing. They can see and hear the word. All of this while natively supporting over 140 languages. Now let's test our effective on multilingual and agentic task. AJ could put restaurant please reply to me in English. Amazing. We have a winner. As open model become more central to enterprise infrastructure, security is paramount. Developed by Google DeepMine, JMA 4 under goes the same rigorous security protocols as our propriatory models, giving enterprise and developer a trusted foundation to build on top of. We want you to be able to use Gemma 4 with the tools you already know and love. You can download the weights and start experimenting today. We cannot wait to see what you create next. Now, one of the first things I really want to look at and really dive into is the impressive benchmarks. Now, on the surface level, it does look like these models aren't as impressive as their counterparts, and that is of course pretty normal. And so when we actually take a look just because it looks like these models may score less than the arena elo score, I think you might be confused. And this is why I think this model release may actually be one of the most overlooked model releases in recent history. And so even if we are taking those benchmarks at face value, I don't think they matter that much because when we look at the Gemma 4 models, both of those, if we just look at those combined in parameters, okay, 31 billion parameters and 26 billion parameters. Let's say we, you know, have a model that's 57 billion parameters. That is still, okay, 10 times less than GLM5, 10 times less than the Kim K 2.5, and is essentially 10 times more efficient than Frontier Reasoning models. I mean, you know, if you just go on those models, you know, singularly, it they're they're almost, you know, 20 times more efficient while maintaining that level of quality, which honestly changes things in the open- source space because right now when you look to open source, you know, a lot of these models, the problem is is that you are essentially having to inference from a different provider. And to be honest, the models that you're currently imprinting, relatively speaking, they are really quite cheap, which is still pretty good, but it still is a cost that, you know, if you're doing a ton of work, it can still eat up at you, especially if you know you still need to do work completely locally. Now, take for example the Gemma 431B. This is their most powerful model, and that model is basically the same as Kimmy K2.5 thinking, and that's only a 31 billion parameter model. Meaning that if you have enough RAM at home, you can run this model on your GPU and not have to number one pay any spider. Number two, keep all of your data private and number three, you're going to be able to use this completely offline, completely secure in a way that we've never seen before. So, this is why I say this is super super underrated and you really need to pay attention to and I think this chart shows it even better, the model performance versus size. This is something that I think most people aren't talking about. It isn't just the fact that Google released an open- source model. It's the fact that they released one that you can literally download right now. And if you're wondering, I I literally made a tutorial on that. Link will be in the description. But they made a, you know, an entire model that you can download right now that is basically the same as Kim K 2.5 Thinking that you can literally use privately and locally. I mean, there is literally right now no model that is as efficient when you compare the model performance versus the size. And I think this is going to be gamechanging. you're going to be saving so much money. And I think this goes to show us just how crazy the future is going to be because if Gemma 4, okay, and they've made huge improvements since Gemma 3 and if this model is as good as it is, and I've already tested this model, this is going to be really interesting for the future to see just how crazy the models are. And I do wonder if at the lower end of tasks, we're just going to be defaulting to ondevice models because of what we're able to do. And remember guys, this is a reasoning model. And I actually tested this model on my phone. So you can actually use a Google app to actually, you know, use this model on your phone. But models, you can use them on your phone as well. So it isn't just, okay, I'm going to be using this locally on my computer or my PC. Currently, what you're seeing is a 4 billion parameter, 3.6 GB to be exact model running on my iPhone 15 Pro. Okay, now this is pretty crazy because it isn't using that much RAM. It isn't using as much as you would think, and the model is absolutely tiny. Yet, what you get on board is a model that is compared to, you know, basically the best bang for your buck if we're looking at any model that is effective at working with large sums of text, reasoning difficulty, and this model even has a thinking mode. So, I think the biggest thing here is that this is basically the new suit of models that you'd want to be running. I mean, I have this on my phone. If I my internet ever goes out and I want to use an LLM, I literally could just boot up this app, use this, okay, on my phone device. Okay, as you guys can see right here, and this is completely private, completely offline, and doesn't cost me a complete scent. And the craziest thing about all of this is that this is not just a complete chatbox model. This is a natively multimodal model, meaning that, you know, in this app, there are areas where you can literally use images, you can use videos, you can use audio. It is absolutely impressive. I'm actually not sure about the video part, but I do know that you can use images and audio, okay? And it is even smart enough to control some aspects of your phone. So when we look at all of this, okay, the Gemma 4 scalability efficiency area, I mean it is very very crazy. You can run this on your phone, you can run this on your laptop, you can run this on your desktop, and even the big boy model, and I put that in quotation marks, isn't as big as you think because the model is remarkably effective. And the craziest thing about all of this is that all of these models only use, you know, 2 billion, 4 billion parameters of inference, meaning that it is remarkably remarkably effective. Okay, so you know if you have a room and let's say you got a building with many rooms and you only need to light up a few rooms to get the job done and that means it's such a tiny footprint that means when it's actually running okay you're still going to be able to do other stuff. Now of course if you've got completely old devices I would say you know at least have a phone with I think at least 6 to 8 GB of uh VRAM. That's what you're going to need. um and and you know the old old devices might be able to run some of the small ones but nonetheless I still think in terms of what you're getting this is going to be pretty pretty incredible okay especially the 26 billion parameter model mixture of experts completely effective at most tasks 31 bill parameters you know active all the time and this one is designed for maximum quality and do remember that there's going to be quantized versions meaning that those ones are going to just basically be the same level of quality but they're going to be even smaller in size which is pretty crazy so I do wonder how this is going to changed the open source ecosystem. Think about just how good open source is. A lot of the, you know, pull and a lot of the thrill for using other open source models was the fact that not only are they cheap, but they're also good. But Google has gone one step further. They've made it cheap, they've made it good, and not only that, they've made the size difference absolutely incredible. So, I mean, will you still be using other models if Gemma 4 thinking is as good as it is now, or are you still going to be using other models like Kimmy and those ones in your complete workflow? Now, if you want to download this model and actually use it locally, I'm actually going to drop you guys a guide showing you guys how you can actually use this on your device. And tomorrow, I'll be dropping a guide how you can actually use this on your phone and run this absolutely locally. If you guys enjoyed this video, it's been the Air Grid and I'll see you guys in the next

Дивитись на YouTube Підписатись на AI-дайджест

Ще з цього каналу

OpenAI’s NEW AGI Warning, Explained

близько 2 місяців тому

How To Run Openclaw For Free

близько 2 місяців тому

AI Backlash Just Reached A Tipping Point

близько 2 місяців тому

The War Against AI Has Just Begun

близько 2 місяців тому