YouTube каталог
Open AI Spud, Claude Mythos, Leaks & Open Source drops | AI NEWS
🔴 News
en

OpenAI Spud, витік коду Claude, відкриті моделі: що відбувається на передовій AI?

MattVidPro AI11 днів тому3 квіт. 2026Impact 6/10
AI Аналіз

Anthropic допустила масштабний витік коду Claude, а OpenAI обіцяє економічний прорив з моделлю Spud. Google продовжує випускати відкриті моделі Gemma, кидаючи виклик лідерам ринку.

Ключові тези

  • Anthropic випадково злила вихідний код Claude, розкриваючи внутрішню структуру моделі.
  • OpenAI розробляє модель Spud, що обіцяє значний економічний вплив, перевершуючи GPT-4.
  • Google активно випускає відкриті моделі Gemma та інструменти оптимізації, наприклад, TurboQuant.
Можливості

Google дає розробникам безкоштовні інструменти, щоб наздогнати OpenAI, але екосистема поки що фрагментована.

Нюанси

Витік коду Claude може мати серйозні наслідки для безпеки, оскільки зловмисники можуть використовувати його для пошуку вразливостей. Заяви OpenAI про економічний вплив Spud поки що залишаються лише обіцянками.

Опис відео

Thanks for tuning in to the Matt Vid Pro channel. In today's video, I've got a news roundup. It is filled with exciting pieces of information that point us in the direction we're headed in for 2026 regarding AI. But there's also a few updates and tools that you can use today. Plenty to get excited about as all of the big companies dig their heels in. New AI models almost ready to be deployed and smaller labs, researchers, prompters out there just pushing the limits of what's possible across all modalities and domains. So without further ado, I'll stop yapping. Let's dive right in. First up, we got to talk about Enthropic. They've had quite the week. We haven't seen any large official releases, but things are happening under the scene, not only of what they're building, but what is accidentally leaking out of their company. Regardless, we start here, and this is almost old news by now. March 28th, Anthropic shows Claude finding zeroday vulnerabilities in a live conference demo. They looked at a GitHub repo, Ghost, 50,000 stars, so very large. never had a critical security vulnerability in its entire history, but Claude finds one. Blind SQL injection in 90 minutes, stole admin API key, and then it did the exact same thing to the Linux kernel. The original video is on YouTube, and I'll link it down below. This guy up in the left topand corner is Nicholas Carini. obviously using an internal claw model. We think the mythos that is being rumored to demonstrate the security vulnerabilities and just sheer brunt force of these newer AI models that are now coming out of pre-training getting fine-tuned. You can't just hand over access to the public with one of these models, even on a pro plan or enterprisegrade, because it would honestly be pretty trivial to jailbreak the models inherent security, and use it to find all kinds of incredibly dubious security flaws, something that was never before possible. It's a massive risk and something that could overturn all kinds of frameworks. There is no real moat with this sort of thing. The kind of thing that would keep me up at night is sitting around thinking, you know, it's probably possible for an open-source AI model to exist with this capability level that runs on consumer grade hardware. Maybe even the hardware we have now eventually, you know, those open source models don't exist yet, but they theoretically could. On the other side of the coin, though, access to an AI this powerful should help us build systems that are more robust and have less security flaws. powerful technology, double-edged sword. That has been the case with AI since the beginning. So, let's move on and take a look at some of the leaks surrounding Claude Mythos and their new upcoming model releases. Mythos is looking like it'll have a context window of a million tokens. Anthropic models typically do well at high context, but people like to use those context windows up. I'd honestly be more interested to see how many tokens it can output on a single request. Opus 4.7 and Sonnet 4.8 8 are already being spotted in the code and a mysterious Claude buddy also appears. However, those leaks aren't nearly as substantive as this. Earlier this week, Anthropic had a major leak. The entire Claude Code CLI source code is now public. A misconfigured map file in their npm package exposed a direct download link to the full TypeScript codebase from Anthropic's own R2 bucket. It's a massive,900 files, half a million lines of code. It shows the complete tool system, 50 slash commands, multi- aent coordinator, terminal UI. It's the whole kitten kaboodleoodle. So, someone's already put it right on GitHub. People are forking it, building things off of it. And like I said, this was a few days ago now. So, the cat's pretty much been out of the bag before Anthropic could even seal the leaking hole. It's pretty cool to see how it all works under the hood, though. There were a few discoveries showing off that it was a little bit more messy than I think people really initially thought, but that's not all that surprising. You can see that first GitHub repo has already been taken down. It looks like the scaffolding and structure Anthropic designed and built for Claude wasn't anything necessarily super special. My advice to Anthropic would be to keep it open, let people build off of it. I'm trying to cover a wide scope today, but I did want to touch on this research by Anthropic because it's gaining a lot of traction. I think people are misinterpreting it. So, this is all about emotion concepts, their function inside of large language models. All large language models sometimes act like they have emotions. But why? Anthropic claims to discover internal representations of emotion concepts that can sometimes drive Claude's behavior directly inside the massive cloud that is the model's internal weights. If a model reads a certain sentence that portrays a particular emotion, they can artificially dial up or down the vectors that make up individual emotions, which can in turn steer the output behavior of the model, suggesting that these emotion vectors are actually driving output behavior to some degree. But this all does make sense. These emotive systems are a reflection of the training data, a reflection of the fine-tuning that makes the model play the role of Claude, which does have these underlining tuned in functional emotions. But it's important to keep in mind that they aren't real emotions. The model doesn't experience a living consciousness. It's a stateless being that exists on a computer. And when certain emotions are lighting up, that's a a state that happens in the moment. It is not side of a larger biological system. What we are witnessing is a reflection of humanity. I think the deeper truth is that you can't strip emotion out of doing a task. There is some level of emotion required not only to present as the AI assistant claude, but to undertake problems, whether it's coding a game, writing a story, or conducting research. It might feel robotic and bland, but there's some level of emotion that needs to exist and operate for the model to go out and complete the tasks. Before we drop in on Open AI and check out the upcoming Spud model, I've got a quick word from today's sponsor. This video is sponsored by Verda, formerly known as data crunch, if you know anything about building with AI, compute is the bottleneck. But it's not just about getting access to GPUs. It's getting access to the right GPUs without paying absurd cloud legacy markups. That's where Verta comes in. Verta is a European cloud provider built specifically for AI workloads. So, this isn't generalpurpose cloud compute awkwardly trying to position itself over AI. Their infrastructure is designed with the intent of being AI first. So, they specialize in things like large-scale model training and high concurrency inference. They offer access to a wide range of Nvidia hardware. They've got the latest models like the B200 and B300, but they've also got all reliable workh horses like the V100. One of the brightest selling points is the price to performance. With Verta, you can get elite Nvidia hardware up to 90% less than traditional AI hyperscalers. So, if you're considering serious workloads, you could be looking at a hefty discount. Verta is also now officially an Nvidia preferred partner. This means they get early access to new silicon and direct engineering support from Nvidia. And they've just opened access to the new GB300 Blackwell Ultra. These are some of the most powerful AI systems online right now. Envy Link GPU virtual machines, Infiniband clusters. Spin these up in minutes and get direct support from engineers who work handson with AI. Plus, all this is guiltfree. Verta is a European platform with GDPR compliance. Not to mention, these data centers run on 100% renewable energy. So, not only is it an impressive stack, but it's also a sustainable one as well. Stop paying for legacy cloud overhead and check out Verta with the link down below. You guys specifically get $50 in free trial credits if you use the code MVP50 AI news. Thank you for watching today's video and thanks to Verda for sponsoring today's video. Now back to your regularly scheduled content. Welcome back folks. Greg Brockman brought a little bit more information to the table regarding Spud. This is OpenAI's latest pre-train that completed and apparently they were very happy with it. This is the model that is supposed to move the economy which is no small task and supposed to be a very serious bump, not just like 20% gains. We are supposed to feel this model working differently, more agentically, longer on more difficult tasks, but actually completing them. It might be hard to think of these complex tasks on the top of your head, but even simple tasks like creating a video game actually very complex and can be done in ways that far exceed what we see with current models. Believe it or not, it's still not that difficult to write a paragraph-sized prompt that throws current LLM into deep water. So, what is the leap with Spud? And by the way, we don't know if it's going to be called GPT 5.5, but A5.5 is coming down the line. Spud is a new base model. It's a new fresh pre-train. Apparently, this is 2 years worth of research coming into fruition with Spud on top of everything they've built prior. Maybe this is going to be the GPT5 we all hoped for. Although, I have to say 5.4 probably one of my favorite current models. Greg also mentioned there's this thing called big model smell. I've talked about it before. It's where you can just feel by the output that the models are actually much smarter, more capable. They respond to your prompt in a way that you can tell they just get it and they understand what you're trying to go for. What you meant by what you said comes out in the form of them aligning, bending to you much more. And you you feel it in the interaction, a more useful output. So, we're we're going to be looking for this qualitative shift. If it is there and I see it, I will do my best to describe it to you guys when I'm testing Spud, but obviously it's going to have to walk the walk of just doing things, getting jobs done that it wasn't able to do before. That's the real economic moving moment. So, hopefully less overexlanation. It's tough because sometimes smaller weak prompts, they can have the basic structure there that we are looking for, but if you don't go into great detail, the AI isn't going to be thinking where your brain immediately went. So, it's those little intuitions that honestly make up to big differences. I think the ceiling is getting raised yet again for longer time horizons. the more complex problems autonomously will take much much longer but it should be solving much more difficult problems it was never able to do meaning we don't care if it takes longer it wasn't able to do it in the first place is very exciting to me open AAI is refocusing its efforts the challenge or the goal of building a model that drives or pushes the economy to actually pull that off you need something that is genuinely a cut above and different I'm trying to imagine the sort of things that I would expect this to be capable of. Obviously, code software, but software that works right out of the gate, even if it takes a long time and is featured well aligned to what you are asking for. There are a lot of times where I try to build something that maybe uses multiple APIs to create a new workflow. Sometimes first try you'll get that whole workflow working, but the UI will be pretty janky where it only exists in the back end. models have gotten very good at completing research, harnessing that research, applying it autonomously would be really cool to see. Less theorizing and then tossing the ball in the user's court and and more theorizing and building you a better court and ball. Benchmarks are telling us less and less. It's all about the application. What are the use cases we see today? What are the use cases we see tomorrow? I don't know, guys. Let me know your predictions, your thoughts down in the comments below. I'm excited for Spud, more so than Mythos, but only because I don't think they're going to be able to serve Mythos at the scale needed to drive the economy or push the economy. All right, we're going to pick up the speed a little bit. I'm going to be more news reporter, less AI prediction weather, man. As we already know, OpenAI is discontinuing Sora. It is sad, but now we have dates April 26th. So, in a few weeks, the web app will shut down and the API is going much, much later. September 24th, 2026. So, we're gonna have some months with the Sora API. I made a whole video about Sora shutting down, my thoughts on it, obviously the reasons why, the refocusing of OpenAI. Let's talk about their upcoming super app, though. Right now, OpenAI says the future is not a collection of AI tools, but a single AI super app. Kind of reminds me of what Elon kept talking about with the X everything app. Chat GPT codeex browsing other agentic systems all work as one. Behind the product vision though is a larger ambition. turn consumer scale into enterprise dominance and position itself as core infrastructure for the age of AI. Of course, this is where the money is. This is where the value drive is. So, that all makes sense and honestly aligns with what I said in my Sora video about the real reason they are shutting Sora down. It's to refocus compute. They need to stop spending so much money and start making money, showing upfront value, upfront change. But I'm sure they'll be back with more funky AI experiments. I mean, who can resist them if you are an AI organization like Google, OpenAI Anthropic Rock. Shifting gears out of OpenAI, Bonsai 1 Bit 8B from Prism ML. This is running on device iPhone 17 Pro, so a handheld smartphone, but a good one. 40 tokens per second for a dense 8B model. Pretty impressive. That's like normal chat GPT speeds. Obviously, this model isn't nearly as smart, but it is running entirely locally. It's free to try out using the locally AI app. However, guys, be forewarned, it's not great. It It seems to be a little hallucinatory. I think the one bit is impressive, but at the same time, there's quite a bit of criticism surrounding this model. Don't expect anything too competent, but still any LLM running locally on a phone is very cool to me and especially something 8b at one bit. Speaking of open though, this week Gemma 4 by Google got released. Guys, Google is continuously releasing open source completely free models. OpenAI did it once already outdated. Anthropic isn't doing that. And whatever happened to Llama by Meta? The real competitors are the Chinese ones like Quen. But props to Google for actually releasing an open- source model. Measure plan here used it in combination with an object detection model to have it instantly give descriptions of the scene that are fast, instantaneous, and accurate. It's a pretty cool little demo. Pretty good image recognition capabilities though for a lightweight open-source model. Last week, we talked about Google releasing Turbo Quant as well. It's only been a week, but Will here talks about how the insane LLM wizards are experimenting with TurboQuant, not just to compress KV cache, but now the entire AI model itself. This test showed a 50% reduction in memory footprint, allowing for Quen 3.527B to run on a single RTX 5060. No apparent degradation. Will has this right. This just goes to show that we're likely nowhere near the full optimization power that could possibly exist even for the models we have today. We're like a year away from running big models on small devices with minimal consequences. These open source LLMs are just getting better and better and better. What a time to be alive is absolutely right. This takes me back to what I was thinking about in the opening. You know what? If we had a mythos level model, this codereaker so intelligent it's dangerous to release outright. Will our optimizations be enough in the future to run something like that on an RTX 5090? I don't know. It definitely gets the gears turning a little bit. But let's jump into models that generate imagery. Skywork has released Matrix Game 3.0 fully open-source realtime and streaming interactive world model with a long horizon memory. We've seen a lot of these little experimental world models emerge from smaller labs, smaller companies, but this I mean this is really looking like an impressive open- source project. They've posted the code model technical report 720p at 40fps pretty high frame rate and this is with a 5B sized model memory consistency of a minute long. Other researchers consider 15-second long consistency to be pretty impressive or like up to 30 seconds, right? So, a minute that's actually genuinely impressive. They've trained this on Unreal Engine plus AAA games and some real world data scales up to 28B. Mixture of experts for quality dynamics and better generalization. The footage is really what we have to observe here. And I got to say, I'm already noticing a couple of games in particular that are coming through. Okay, but frame by frame, it looks temporally consistent for sure. Red Dead Redemption, Grand Theft Auto, Cyberpunk. 720p is also actually HD. That's still good for video generators. So, moving around a 3D landscape of a museum, some paintings on the wall. About this one as well. It definitely seems to have a game aspect or look to it just from the training data. Starts to get a little bit mushy there. Okay, we actually have a a character on screen that we're controlling. So, yeah, in this high detail, like getting a look at it in more resolution. You can see lots of just AI mushiness or barf. Down here in this grass, there's some real crispiness. It's got like that hairy puff ball. But we're able to take this same character and put it into a wide variety of different scenes. Yeah. I mean, this fidelity, this resolution. It's not that Google Genie 3 star player for open source, though. I mean, this is this is a step up. All right, now we're getting into some longer form videos. These are all like a minute long each. You can see this looks very realistic, but I think it does maybe have a little bit of those GTA 5 vibes. It does a pretty good job with the space, but as you can see this truck, I mean, that's not anatomically correct, we would say, for a truck. Structurally, it's it's completely in incoherent and incorrect, but it's close. Like, we got the front versus the back, a couple of the wheels. It's not terrible. Oh, this one's definitely a little bit more challenging, I think. You can see high detail, high texture scenes. It's more challenging for the model to resolve it. There's a little bit of water movement going on at the bottom. And it's mostly still frozen, which we see a lot with these models. All right, here is the Red Dead Redemption 2 gameplay. The trees shockingly similar look to the actual game. Even this location, like I can tell this is supposed to be a certain city in the game, but nowhere in particular, right? It's it's all hallucinated. But the horse, the character, the controllability, this is cool to see from an AI model, right? Pretty accurately replicate it while actually being controllable, too. This is the same situation with Grand Theft Auto 5. I mean, there's probably some more training data, I would say, for this one in particular, but this is a particular location in the game that if you've played it, you probably know of. Even this car as well. Overall, it's very cool. It's very early stuff. This one we can actually try if we have the hardware. Apparently, we're looking at as low as 12 GB of VRAM. It's actually not too bad, but that's specifically low VRAM mode with the 5B model. If you want that headline 720p at 40 fps, 19 gigabytes, but still that's consumer grade. This is impressive. I might have to try to get this thing going. Let's chat a little bit about Seedance 2.0. It's rolling out to a very wide variety of providers, but no US native ones from the best that I can tell. Um, because really they don't want Cedance 2.0 released in the US just yet. However, you still can get access on a few of those websites, but it's still pretty limited. It appears only in Hen you can actually generate with any character you want or real people because they have advanced safety systems. They are the only platform that has the full access to allowing human faces. Although you can get around it. Alex shows off Topaz AI Starlight video upscaler. These guys are like the premiere in terms of AI video upscaling. You can see Cance 2.0 gets a very noticeable bump in improvement in fidelity. Video models today are limited to lowresolution output, although the video might kind of feel like it's high resolution because it has seen all this high resolution training data. So, this gets it up to that native resolution while still clearly maintaining decent coherence. Topaz products typically pretty expensive. So, Starlight video upscaler, you're going to be paying for that. Speaking of upscaling though, for image upscaling, Peruna just dropped Page Upscale. can get up to an 8 megapixel output for just, you know, a cents on the dollar by a ridiculous amount, a half a cent. Unfortunately, it doesn't appear to be open source, but you can run it for very cheap on replicate, and they do have an API. It's been a while since we've got a good cheap upscaler, and this one definitely is upping the game. Impressive stuff. When 2.7 has also dropped, and it's alive in a few places. It's on Fall AI. It's also on Comfy UI, but it's set up ready for the workflows out of the box. Still not released open source. I think for when 2.7 to really make a big splash, they've got to open source it. It doesn't look like they have native audio for when WAN yet, but it does have audiodriven generation, which allows you to get pretty much perfect audio or whatever audio you want with matching visuals. Gro Imagine, which is the XAI video, just got a quality mode and it appears to have some pretty decent chops. I don't talk too much about the Gro video on here, but it gets a lot of frequent updates. And now with this quality mode, you know, there's some cinematic aspects here that are really bringing my attention. Lots of details. It feels like the bit rate, quote unquote, is high. Even though we're limited by terrible Twitter bit rate for AI video, this is actually good detail. And the corn just looks so realistic. I like that he just grabs this decaying leaf. It's really the acting as well. And when it gets close up to his face, just seeing all those details. If you already pay for Grock, have one of those plans, then this is definitely something worth checking out. I would like to see the Grock video get fleshed out a little bit more. Is it a sea dance too? Definitely not. At this point with video quality they are moving up pretty far right on the tail of those bigger players. Seance, Cling, Sora, Vio honestly in a lot of situations actually comparable. Grock imagine also just got an update similar to the video. It continuously gets these bumps. It also got a new quality mode that increases image generation time. It has seriously cracked through the wall of not being able to generate vast amount of text and it can now do so coherently with ease. There's very few errors in this, although I think I can spot a couple. Realistic images like this though really seem to be where it shines. The stickers are blowing my mind on the cash register, although a lot of them are still pretty fuzzy. This ice cream looks decent. The people, the framing, it is pretty nuts. Again, if you already have a Gro plan, this is one to check out. I don't think this is a model that's going to take out a Nano Banana 2, but this is impressive. I think a lot of people would prefer this to the current chat GPT image gen. It's definitely got some realism and some rawness to it. Very believable, but at the same time, it also can sort of do infographics. All right, guys. I'll cap things off there. Always more to talk about, but I hope you enjoyed today's news roundup. This big AI race feels like it has no breaks sometimes, but it's also so exciting because new possibilities are opening up all the time, and I can see myself inching closer and closer towards those goals or ideas of being able to create your own stories and bring them to life in full movie quality fruition, creating your own games, creating custom applications. The technology is getting stronger, and I think 2026 is a year where companies have to prove themselves. They're digging their heels in. We're expecting big things. We're getting good updates and good upgrades. We'll see what the rest of the year holds in store for us. Subscribe if you like the video. Thanks so much for watching and I'll see you in the next one. Goodbye.