今年夏季,絕不能錯過名勝壹號世界郵輪重回基隆啟航!多種優惠方案讓您輕鬆預訂心儀的日本沖繩郵輪行程,同時省下大筆開支!

ThursdAI - The top AI news from the past week

2 個月前
-
-
(基於 PinQueue 指標)
ThursdAI - The top AI news from the past week
Every ThursdAI, Alex Volkov hosts a panel of experts, ai engineers, data scientists and prompt spellcasters on twitter spaces, as we discuss everything major and important that happened in the world of AI for the past week. Topics include LLMs, Open source, New capabilities, OpenAI, competitors in AI space, new LLM models, AI art and diffusion aspects and much more.

sub.thursdai.news
🔥 ThursdAI - Feb 15, 2024 - OpenAI changes the Video Game, Google changes the Context game, and other AI news from past week

Holy SH*T,

These two words have been said on this episode multiple times, way more than ever before I want to say, and it's because we got 2 incredible exciting breaking news announcements in a very very short amount of time (in the span of 3 hours) and the OpenAI announcement came as we were recording the space, so you'll get to hear a live reaction of ours to this insanity.

We also had 3 deep-dives, which I am posting on this weeks episode, we chatted with Yi Tay and Max Bane from Reka, which trained and released a few new foundational multi modal models this week, and with Dome and Pablo from Stability who released a new diffusion model called Stable Cascade, and finally had a great time hanging with Swyx (from Latent space) and finally got a chance to turn the microphone back at him, and had a conversation about Swyx background, Latent Space, and AI Engineer.

I was also very happy to be in SF today of all days, as my day is not over yet, there's still an event which we Cohost together with A16Z, folks from Nous Research, Ollama and a bunch of other great folks, just look at all these logos! Open Source FTW 👏

TL;DR of all topics covered:

* Breaking AI News

* 🔥 OpenAI releases SORA - text to video generation (Sora Blogpost with examples)

* 🔥 Google teases Gemini 1.5 with a whopping 1 MILLION tokens context window (X, Blog)

* Open Source LLMs

* Nvidia releases Chat With RTX local models (Blog, Download)

* Cohere open sources Aya 101 - 101 languages supporting 12.8B model (X, HuggingFace)

* Nomic releases Nomic Embed 1.5 + with Matryoshka embeddings (X)

* Big CO LLMs + APIs

* Andrej Karpathy leaves OpenAI (Announcement)

* OpenAI adds memory to chatGPT (X)

* This weeks Buzz (What I learned at WandB this week)

* We launched a new course with Hamel Husain on enterprise model management (Course)

* Vision & Video

* Reka releases Reka-Flash, 21B & Reka Edge MM models (Blog, Demo)

* Voice & Audio

* WhisperKit runs on WatchOS now! (X)

* AI Art & Diffusion & 3D

* Stability releases Stable Casdade - new AI model based on Würstchen v3 (Blog, Demo)

* Tools & Others

* Goody2ai - A very good and aligned AI that does NOT want to break the rules (try it)

🔥 Let's start with Breaking News (in the order of how they happened)

Google teases Gemini 1.5 with a whopping 1M context window

This morning, Jeff Dean released a thread, full of crazy multi modal examples of their new 1.5 Gemini model, which can handle up to 1M tokens in the context window. The closest to that model so far was Claude 2.1 and that was not multi modal. They also claim they are researching up to 10M tokens in the context window.

The thread was chock full of great examples, some of which highlighted the multimodality of this incredible model, like being able to pinpoint and give a timestamp of an exact moment in an hour long movie, just by getting a sketch as input. This, honestly blew me away. They were able to use the incredible large context window, break down the WHOLE 1 hour movie to frames and provide additional text tokens on top of it, and the model had near perfect recall.

They used Greg Kamradt needle in the haystack analysis on text, video and audio and showed incredible recall, near perfect which highlights how much advancement we got in the area of context windows. Just for reference, less than a year ago, we had this chart from Mosaic when they released MPT. This graph Y axis at 60K the above graph is 1 MILLION and we're less than a year apart, not only that, Gemini Pro 1.5 is also multi modal

I got to give promps to the Gemini team, this is quite a huge leap for them, and for the rest of the industry, this is a significant jump in what users will expect going forward! No longer will we be told "hey, your context is too long" 🤞

A friend of the pod Enrico Shipolle joined the stage, you may remember him from our deep dive into extending Llama context window to 128K and showed that a bunch of new research makes all this possible also for open source, so we're waiting for OSS to catch up to the big G.

I will sum up with this, Google is the big dog here, they invented transformers, they worked on this for a long time, and it's amazing to see them show up like this, like they used to do, and blow us away! Kudos 👏

OpenAI teases SORA - a new giant leap in text to video generation

You know what? I will not write any analysis, I will just post a link to the blogpost and upload some videos that the fine folks at OpenAI just started releasing out of the blue.

You can see a ton more videos on Sam twitter and on the official SORA website

Honestly I was so impressed with all of them, that I downloaded a bunch and edited them all into the trailer for the show!

Open Source LLMs

Nvidia releases Chat With RTX

Chat With Notes, Documents, and Video

Using Gradio interface and packing 2 local modals, Nvidia releases a bundle with open source AI packaged, including RAG and even Youtube transcriptions chat!

Chat with RTX supports various file formats, including text, pdf, doc/docx, and xml. Simply point the application at the folder containing your files and it'll load them into the library in a matter of seconds. Additionally, you can provide the url of a YouTube playlist and the app will load the transcriptions of the videos in the playlist, enabling you to query the content they cover.

Chat for Developers

The Chat with RTX tech demo is built from the TensorRT-LLM RAG developer reference project available from GitHub. Developers can use that reference to develop and deploy their own RAG-based applications for RTX, accelerated by TensorRT-LLM.

This weeks Buzz (What I learned with WandB this week)

We just released a new course! Hamel Hussein released a course on enterprise model management!

Course name: Enterprise Model ManagementCourse Link: wandb.me/emm-courseWho is this for: The course is targeted at enterprise ML practitioners working with models: MLOps engineers, ML team leaders, ML engineers. It shows both at conceptual and technical level how to get the most value of W&B Model Registry and automations. Attached is also a screenshot of a slide from the course on what different personas (MLOps, ML exec etc) get from Model Registry.What can they expect: Learn how to store, version, and evaluate models like top enterprise companies today, using an LLM training & evaluation example. Big value props: improved compliance, collaboration, and disciplined model development.

Vision & Video

Reka releases Reka Flash and Reka Edge multimodal models

Reka was co-founded by Yi Tay, previously from DeepMind, trained and released 2 foundational multimodal models, I tried them and was blown away by the ability of the multi-modals to not only understand text and perform VERY well on metrics (73.5 MMLU / 65.2 on HumanEval) but also boasts incredible (honestly, never before seen by me) multi modal capabilities, including understanding video!

Here's a thread of me getting my head continuously blown away by the quality of the tonality of this multimodality (sorry...😅)

I uploaded a bunch of video examples and was blown away, it understands tonality (with the dive dive Diiiiive example) understands scene boundaries, and does incredible OCR between scenes (the Jason/Alex example from speakers)

AI Art & Diffusion

Stable Cascade (link)

Stability AI introduced a new text-to-image generation model called Stable Cascade that uses a three-stage approach to produce high-quality images with a compressed latent space, making it more efficient to train and use than previous models. It achieved better results than other models in evaluations while having faster inference speeds. The company released code to train, fine-tune, and use control models like inpainting with Stable Cascade to enable further customization and experimentation. Stability AI aims to lower barriers to AI development through models like this one.

Nate did a comparison between a much slower SDXL and Stable Cascade here:

Here’s the transcript for the whole episode, you definitely should check it out! It was really one of the coolest shows we had, and we had over 2K folks listening in!

[00:00:00] Alex Volkov: Hey, this is Alex Volkov, you're on ThursdAI, and I just gotta record this intro real quick, because today marks one of the more singular days in AI that I remember since I started recording ThursdAIs, which was itself a singular day, March 14th, 11 months ago, when GPT 4 was released and announced. We since then had a few days like this GPT Dev Day was one such day, and today marks another one.

[00:00:38] Alex Volkov: Google has released an update to their model, talking about 1 million tokens in the context window, basically unlimited. And then, just a few, just an hour or two later, OpenAI said, you know what, we also have something in store, and released the most incredible jump. Incapability of video generation, text to video generation.

[00:01:02] Alex Volkov: It's called SORA, and what you hear is us recording live, knowing only about Google, which came out an hour and a half before we started recording, and then somewhere in the middle, I think minute 35 or something, you'll hear our live reaction to the Incredibly mind blowing advancement in text to video that OpenAI just released.

[00:01:31] Alex Volkov: And I just wanted to record this as I'm finishing up the editing and about to start writing the newsletter, to say, days like this really are the reason why I'm all in on AI and I'm very excited about the changes and advancements.

[00:01:49] Alex Volkov: And I'm sure there will be more days like this going forward. We've yet to see what Apple came up with, we've yet to really see what Meta comes up with Llama 3, etc. And, yeah, I just wish you enjoyed this and I don't have a lot of words here besides just letting you listen to the rest of the episode and say that I was very happy to be in San Francisco for this, the place where most of this happens, and I was very happy to be in company of good friends, both in the virtual world those on stage in our Twitter live recording, and I was sitting across from Swyx, a friend of mine with whom I recorded an interview at the end of this, you can hear.

[00:02:30] Alex Volkov: I just couldn't let go of this chance. We also had a conversation, besides the updates and the breaking news, we also had conversations with the folks who worked on some of the stuff we talked about. I interviewed Yi Te and Max Bain from RECA, which you'll hear later, and the deep dive into RECA multimodal models, which blew me away just yesterday.

[00:02:52] Alex Volkov: And so my head kept getting blown away this week. And I also interviewed The folks who built Stable Cascade, a new stability model that outperforms the existing stability models. Dome, and Pablo. And all of those were great conversations, in addition to just generally the folks who joined me from week to week, Nisten and Far El and Alignment Lab, and we had Robert Scoble join us, with whom I've been buddying up since Vision Pro was released, as he was expecting, and that blew me away just a week ago.

[00:03:23] Alex Volkov: And I'm very excited to share with you this whole thing, and I hope that Yeah, I hope you enjoyed this as much as I do, and I hope that you enjoyed listening to these as much as I enjoy making them. And if you are, just share them with a friend, it would really help. And give us a 5 star review on Apple.

[00:03:38] Alex Volkov: This would great, gratefully help. With that, I'll give you the ThursdAI thing.

[00:03:43] Alex Volkov: All right, let's go. How's it going, everyone? Welcome to ThursdAI. Today is February 15th, and it's quite a day in the AI updates that we've had so far. Quite a day. Even today, this morning, we had like a bunch of updates. But besides those, we had quite a crazy week as well very interesting show today, very interesting show today.

[00:04:13] Alex Volkov: My name is Alex Volkov, I'm an AI evangelist with Weights Biases, and right now I'm getting my picture selfie taken by my today's co host, Swyx. Welcome,

[00:04:23] Swyx: Hey, hey, hey. Good morning, everyone.

[00:04:25] Alex Volkov: And we're in the Latent Space Studio in San Francisco. I flew in just last night. And as I was flying in, there was more news happening. So we're going to cover all of this.

[00:04:34] Alex Volkov: We have a very exciting show today. We have a bunch of guests, special guests that are coming on the second hour of this. So hopefully we'll see folks from the RECA models, and hopefully we'll see some folks from Stability. We're going to get to talk about Google and everything in between. So meanwhile, settle in.

[00:04:50] Alex Volkov: This is going to be a great show today in San Francisco. And maybe I'll also probably share with you why I Flew in here today. That's gonna come up next. So welcome to ThursdAI and we're gonna get started. All right there. Let's get started. Let me Smoothly fade out the music, say hi to everyone here on stage. Hey, Nisten, welcome. We have Robert Skobul over here, folks. We've been, we've been more, more friendly lately than usual because Robert and I are both members of the VisionPro cult. I think that's what you call it, Robert.

[00:05:37] Alex Volkov: But today is, today's the space for, for AI. But Robert you've been covering AI on your feed as well for, for a long time. We have, obviously Swyx is on stage, but also in front of me, which is super cool. And it's been a while, brother. It's great, you just flew back from

[00:05:51] Swyx: Singapore.

[00:05:52] Swyx: Yeah, Chinese New Year.

[00:05:53] Alex Volkov: Are you jet lagged at all or are you good?

[00:05:55] Swyx: I'm good actually. I have had very little sleep, but for some reason that always helps with the jet lag.

[00:06:00] Alex Volkov: Yes, awesome. And I also want to say hi to Alignment Labs, Austin and Far El as well, folks who are working on open source models, and we usually cover a bunch of stuff that they're doing, and usual co hosts and experts here on ThursdAI.

[00:06:11] Alex Volkov: So if you never joined ThursdAI before, just a brief kind of recap of what we're doing. As I said before, my name is Alex Volkov. I'm an AI evangelist with Weights Biases. It's always so fun to say. And Weights Biases is a company that is basically helping all these companies build their AI models, and it's super cool.

[00:06:26] Alex Volkov: And I flew in, I went to the office last night, and I have some cool videos to share with you from the office as well.

[00:06:32] Alex Volkov: and this

[00:06:33] Alex Volkov: is ThursdAI. ThursdAI is a Twitter space and newsletter and podcast that I started a year ago. And then slowly this built a community of fine folks who show up to talk about everything that happened in the world of AI for the past week.

[00:06:46] Alex Volkov: And there hasn't been many weeks like this last week that highlight how important and how cool ThursdAI actually is. Because we just had So much, so much to cover today and usually I start the space with a roundup of the stuff that we're going to run through just for folks who are not patient, don't have a lot of time and we're going to just run through everything we're going to talk about and then we're going to dive deep because we have some breaking news and I even have, hopefully, I have my breaking news button.

[00:07:16] Alex Volkov: Oh, I don't. Oh my God. Okay.

[00:07:17] Swyx: Oh no.

[00:07:17] Alex Volkov: I'm not set up for a breaking news button, but it's fine.

[00:07:20] Alex Volkov: We'll imagine this. I'm going to put this in the, in the, in the post edit. With that said, are you guys ready for a brief recap? Let's go to a brief recap.

[00:07:27] Recap and TL;DR

[00:07:27] Alex Volkov: Alright, folks, back for the recap. Today is Thursday. ThursdAI, February 15th. This is a recap of everything we talked about. And, ooh, boy, this was one of the worst days to be caught outside of my own personal production studio because my, my breaking news button didn't make it all the way here. And there was so much breaking news.

[00:07:57] Alex Volkov: So obviously as I woke up, the biggest breaking news of today was Ai. Actually cannot decide what was the biggest breaking news. So the first piece of breaking news from today was Google releasing a teaser of Gemini 1. 5. And 1. 5 was not only a continuation of Gemini Pro that we got last week, 1. 5 actually was teased with up to 1 million, a whopping 1 [00:08:20] million tokens in the context window, which is incredible.

[00:08:23] Alex Volkov: It's just for comparison, JGPT is currently at 128 and cloud to the best. Highest offering up until Gemini was 200k with Entropic Cloud Advanced and Google teased this out of the gate with 1 million token and their claim they have up to 10 million tokens of context window in in in the demos, which is incredible.

[00:08:44] Alex Volkov: And they've shown a bunch of demos. They did the needle in the haystack analysis that we've talked about from Greg Cumbrand and just quite an incredible release from them. They talked about that you can put a whole like hour of a movie of Dustin Keaton, I think it's called. And then you can actually ask questions about the movie and we'll give you the exact.

[00:09:03] Alex Volkov: Timestamp of something happens. They talked about it being multimodal where you can provide a sketch and say, Hey, when this, this scene happened, it will pull out just like incredibly like magic, mind blowing, mind blowing stuff. And all of this needs a lot of context because you take this, you take this video, you turn it into images, you send this into context.

[00:09:22] Alex Volkov: They also talked about, you can send 10 hours of audio within one prompt and then some ad, And the quality of retrieval is very, very high. You're talking about like 90 plus percentage, 95 plus percentage in the haystack, which is incredible. Again, we had Enrico Cipolla, a friend of the pod who worked on the Yarn paper and the rope methods before extending the LLAMA context.

[00:09:46] Alex Volkov: And he brought like four papers or something that show that open source is actually unlocking this ability as well. And not only today was a credible day just generally, but not only Google talked about a large context window, we also saw that Nat Friedman and Daniel Gross just invested 100 million in a company called Magic, that they also talk about multimodality and large context window up to 1 million as well.

[00:10:08] Alex Volkov: So it was very interesting. To see both of them release on the same day as well. We then geeked out about Gemini. We talked about Andre Karpathy leaving open AI and, and invited him to come to Thursday AI and latent space as well. And then we also mentioned the OpenAI ads, memory and personalization to charge G bt, which is super cool.

[00:10:25] Alex Volkov: They didn't release it to many people. Yeah, but personalization is my personal thread of 2024 because these models, especially with the larger, larger context window with personal per perfect recall, these models will. become our buddies that will remember everything about us, specifically, especially tied into different devices.

[00:10:43] Alex Volkov: Like the tab that's somewhere here behind me is getting built in San Francisco. We, we briefly mentioned that NVIDIA released the chat with RTX local models that you can download and run your NVIDIA GPUs. It has rack built in. It has a chat with YouTube videos and super cool. We talked about Cohere release and AYA 101 multimodal.

[00:11:01] Alex Volkov: And our friend of the pod Far El was talking about how he wasn't finding like super impressive. Unfortunately, He dropped in the middle of this. Apologies for El, but Cohere released a big multi model, which is also pretty cool. We mentioned that NOMIC, our friends at NOMIC, which we mentioned last week, released open source embeddings.

[00:11:17] Alex Volkov: If you guys remember, they released an update to those embeddings, NOMIC Embed 1. 5 with Matryoshka embeddings. Matryoshka. is obviously the name for the Russian doll that like sits one inside each other. And we're going to actually talk with the authors of the Matryoshka paper in not the next Thursday, the next after that.

[00:11:34] Alex Volkov: So we're going to cover Matryoshka but it's what OpenAI apparently used, not apparently, confirmed they used to reduce dimensions in the API for embeddings. Super cool. We're going to dive deep into this. As we're going to learn, I'm going to learn, you're going to learn. It's going to be super cool.

[00:11:48] Alex Volkov: And as we're talking about OpenAI I got a ping on my phone because I'm subscribed to all updates from their main account and we had a collective holy s**t moment. Everybody's jaw was on the floor because OpenAI just released Sora, which is a foundational video model, text to video model, that just blew us the F away, pardon my French, because of the consistency.

[00:12:08] Alex Volkov: So if and if you've seen The how should I say the area of video generation has been has been evolving fairly quickly, but not as quick as what we just saw. We saw first we saw attempts at taking stable diffusion rendering frame by frame and the consistency wasn't there. It was moving from one to to another, like the face would change and everything.

[00:12:30] Alex Volkov: You guys saw this, right? So we moved from the hallucinatory kind of videos to Towards consistency videos where stable diffusion recently released and gave us SVD, which was like one to two to three seconds videos. Runway ML gives you the option to choose where the video is going to go. If it's going to be zoom in like brushes, all these things.

[00:12:49] Alex Volkov: And now all of them seem just so futile because open the eyes, Sora, can generate up to 60 seconds of a video. And honestly, we were sitting here just watching all of us just open the Sora website, and we were just mind blown away by the consistency and the complexity of the scenes that you can generate, the reflections.

[00:13:06] Alex Volkov: There was one scene where a woman was walking through the, a very busy street in Japan, and her coat stays the same, her face stays the same. There's another where a Dalmatian dog climbs out of one window and jumps into another. All the spots on the Dalmatian are perfect. perfectly in balance the legs are it's it's really unbelievable how high quality of a thing OpenAI released and what's unbelievable to me also is that The jump from what we saw in video to the open source stuff, or even the runway stuff and Pico stuff, the jump in fidelity, in quality, in consistency, is so much higher than the jump from like 200, 000 tokens to 1 million tokens that Google did.

[00:13:44] Alex Volkov: So it does feel like some folks in OpenAI sat there and said, Hey, Google just released something. It's super cool. It's picking up attention on Twitter. Let's release something else that we have behind the scenes. It looked super polished. So shout out to the folks who worked on Sora. It's really, if you haven't seen The videos, you'll see them in show notes and definitely you'll see them everywhere because Hollywood is about to get seriously, seriously disrupted with the, just the level of quality is amazing.

[00:14:08] Alex Volkov: Compare this with all the vision and, and, and sound stuff. I, moving back to the recap, I'm getting excited again. We also, then we talked about Reka and Reka Flash and Reka Edge from a company called Reka AI. And then, as I love bringing the people who actually built. the thing to talk about the thing.

[00:14:23] Alex Volkov: So we had Yitei and we had Max as well from Reka. Max made for Reka to talk to us about their multimodels. I was very, very impressed with Reka's multimodal understanding. And I think this model compared to Gemini Pro, which is probably huge and runs all the GPUs and TPUs. This model is 21 billion and Reka Edge is even smaller.

[00:14:41] Alex Volkov: And yet it was able to understand my videos to an extent that even surprised the guys who were the co founders of the company. It understood tonality, understood text. And audio in a very specific and interesting way. So we had a conversation with the RECA folks and continuing on this thread. We also had a new model from Stability called Stable Cascade that is significantly faster than SDXL and generates hands and text out of the blue.

[00:15:07] Alex Volkov: It's based on something called Worst Chen, which we learned is a hot dog today. And we had the folks that work behind this, Dom and I'm blanking on the name of the other author that joined. I apologize. It was a very exciting day. So we had a conversation with the guys behind Worshen and Stable Cascade as well.

[00:15:24] Alex Volkov: So definitely check this out. We mentioned that WhisperKid runs now on watchOS, which is quite incredible because Siri's voice to text is still not that great. And I think that's mostly of what we discussed. And then I flipped the mic on my, on my friend here that sits in front of me and I just had a deep dive interview with Swyx.

[00:15:41] Alex Volkov: In the latent space, he just posted a few images as well, and it was a great conversation as well, so definitely worth a follow and a listen if you haven't listened to this. With that, I think we recap ThursdAI on one of the more seminal days that I remember in the AI one after another, and we all hope that, Meta will just release Llama 3

[00:16:01] Investments updates from Swyx

[00:16:01] Alex Volkov: Unless I missed some stuff that's very important. I'll just double check. Nisten, out of the stuff that we've sent, did I miss anything else? Swyx, did I miss anything else?

[00:16:10] Swyx: Today there was also a LangChain Series A. True. With LangSmith.

[00:16:13] Swyx: Yes. There was Magic. dev, Series A with Nat Friedman.

[00:16:16] Alex Volkov: So I was thinking to cover this around the Google stuff because they also announced a longer context craziness.

[00:16:21] Alex Volkov: But definitely, definitely both of those.

[00:16:23] Swyx: Lambda Labs, Alonzo 300 million, Series C.

[00:16:26] Alex Volkov: Oh, wow, yeah, I even commented. I said, hey, Mitesh good. So we love Lambda, definitely. Most of the stuff that we play around with is happening in Lambda. And

[00:16:34] Swyx: Lindy also had their GA launch today.

[00:16:37] Alex Volkov: nice. Okay. Today

[00:16:38] Swyx: Today was a very bad day to launch [00:16:40] things, because everyone else launched

[00:16:41] Swyx: things.

[00:16:41] Swyx: Yes. If you're not Gemini, it's going to be a struggle

[00:16:44] Alex Volkov: I was just thinking, magic. dev, and I guess let's move to just discussing kind of the breaking news of the hour, as we already is. Let's talk about Google, and Gemina 1. 5.

[00:16:55] Google teases Gemini Pro 1.5 with 1M context windows

[00:16:55] Alex Volkov: Do we do a musical transition? Sure, let's do a musical News. This is not the Breaking News music. By not even a stretch, this is not a Breaking News music. But, imagine that we have Breaking News right now, because we do. Just an hour or so ago, we had an update from Jeff Dean and then Sundar Pichai and then a blog post and then a whole thread and a bunch of videos from Google.

[00:17:27] Alex Volkov: And if you guys remember some Google videos from before, these seem more authentic than the kind of the quote unquote fake video that we got previously with Gemini Ultra. So just a week after Google released Gemini Ultra, which is now available as aka Gemini Advance. And just a week after they killed Bard almost entirely as a concept they're now teasing.

[00:17:48] Alex Volkov: Teasing did not release, teasing. Gemini 1. 5, 1. 5, they're teasing it and they're coming out with a bang. Something that honestly, folks at least for me, that's how I expect Google to show up. Unlike before, where they're like lagging after GPT 4 by eight months or nine months, what they're doing now is that they're leading a category, or at least they're claiming they are.

[00:18:07] Alex Volkov: And so they released Gemini 1. 5, and they're teasing this with a whopping 1 million tokens. in context window on production and up to 10 million tokens in context window in research. And just to give a context, they put like this nice animated video where they put Gemini Pro, which they have currently, not 1.

[00:18:26] Alex Volkov: 5, the Pro version. is around 32, I think, and then they have GPT 4 with 128 and then they show Cloud 2 is at 200k and then Gemini 1. 5 is a whopping 1 million tokens, which is ridiculous. Not only that, they also came a little bit further and they released it with the Needle in Haystack analysis from our friend Greg Kambrad, which usually does this.

[00:18:50] Alex Volkov: We'll not be able to pronounce his name. I asked Greg to join us. Maybe he will. A needle in a haystack analysis that analyzes the ability of the model to recall whether or not it's able to actually process all these tokens and actually get them and understand what happens there. And quite surprisingly, they show like 99 percent recall, which is incredible.

[00:19:10] Alex Volkov: And we all know, previously in long context windows, we had this dip in the middle. We've talked about the The butter on toast analogy, where the context or attention is like the butter and context window is the toast and you spread and you don't have enough for the whole toast to spread evenly.

[00:19:27] Alex Volkov: We've talked about this. It doesn't seem, at least

[00:19:30] Alex Volkov: on

[00:19:30] Alex Volkov: the face of it, that they are suffering from this problem. And that's quite exciting. It is exciting because also this model is multi modal, which is very important to talk about. They definitely show audio and they are able to scrub through, I said, they said, I think they said 10 hours of audio or so.

[00:19:47] Alex Volkov: Which is quite incredible. Imagine this is going 10 hours of audio and say hey, when When did Alex talk about Gemini in ThursdAI? That would be super dope and Quite incredible. They also did video. They showed a hour of video of Buster Keaton's something and because the model is multi modal the cool thing they did is that they provided this model with a reference of with a sketch.

[00:20:11] Alex Volkov: So they drew a sketch of something that happened during this video, not even talking about this, just like a sketch. And they provided this multimodal with an image of this and said, when did this happen in the video? And it found the right timestamp. And so I'm very, very excited about this. If you can't hear from my voice, Swyx can probably tell you that it looks like I'm excited as well, because it's, it's quite.

[00:20:31] Alex Volkov: As far as I'm considering a breakthrough for multiple reasons. And now we're gonna have a short discussion.

[00:20:35] Enrico taking about open source alternatives to long context

[00:20:35] Alex Volkov: I want to say hi to Enrico here. Enrico welcome up on stage. Enrico Cipolli, one of the authors of the Yarn paper. And like we've had Enrico before talk to us about long context. Enrico, as we send this news in DMs, you replied that there have been some breakthroughs lately that kind of point to this.

[00:20:51] Alex Volkov: And you want to come up and say hi and introduce us briefly. And let's chat about the long context.

[00:20:57] Enrico Shipolle: Hi, Alex. Yeah, so there actually have been a lot of research improvements within the last couple months, even from before we submitted YARN. You could still scale even transformers to millions of essentially context. length back then. We previously in YARN worked on scaling the rotary embeddings, which was a traditional issue in long context.

[00:21:19] Enrico Shipolle: So I, if you don't mind, I'll probably go through some of the research really quickly because unfortunately,

[00:21:25] NA: so on January 2nd, there was one called, it's called LLM, maybe long LLM. That's a mouthful essentially, but they were showing that you can process these long input sequences during inference using something called self extend, which it allows you to basically manage the context window without even fine tuning these models.

[00:21:48] NA: And then on January 7th, 2024, there was another paper that released, it's called Soaring from 4k to 400k, which allows you to extend like the LLM's context with something called an activation beacon. With these activation beacons, they essentially condense raw activation functions in these models to a very like compact form, which essentially the large language model can perceive this longer context.

[00:22:14] NA: Even in a smaller context window, the great thing about these activation beacons or the LLM, maybe long LLM, is essentially they only take a few lines of code to modify the transformer architecture and get all these massive performance benefits for long context inference.

[00:22:33] Alex Volkov: Are

[00:22:33] Alex Volkov: you serious? Are we getting one of those breakthroughs that take two lines of code, kind

[00:22:37] NA: No so basically all of these require minimal code changes to even be able to scale to, to long, like token counts, whether it's audio, video, image, or text. Text is. Generally, like the shortest token count, if you look at something like RefinedWeb or SlimPajama the, the average token count of a piece of text in that is only anywhere from 300 to 500 tokens.

[00:23:02] NA: So this is actually generally a data centric issue too, when you're talking about long context with even training a standard natural language processing model. The thing about audio and video is, is these have a ton of tokens in them. And the one good thing, and then? the final note, I'm, I'm going to put in, unfortunately, before I have to head out, I know this was a lot of information.

[00:23:22] NA: I can link these

[00:23:24] Alex Volkov: Yeah, we're gonna add some, some of this, we're gonna add some, some links, the links that I'd be able to find, Enrique, if you can send

[00:23:29] NA: Yeah, I'll, I'll send you all the research papers.

[00:23:32] Alex Volkov: Yeah, you want to lend one last thing before we move on? Yeah, go ahead.

[00:23:36] NA: Yeah, So, just the last thing on January 13th is there was this paper called Extending LLM's Context Window with only a hundred samples and they were essentially able to show that even in a very limited amount of long context samples, you're able to massively improve the context lengths of these models. I should mention these are the papers that I found did pretty rigorous evaluation overall, because a lot of them, there's a huge problem in long context evaluation. But I feel these authors generally applied their knowledge pretty well, and these results are really impactful. so, even for the open source community, because you don't need a lot of computational power to be able to scale these context windows massively now.

[00:24:24] NA: And

[00:24:24] NA: that's basically everything I wanted to

[00:24:26] NA: say.

[00:24:27] Alex Volkov: Thank you, Enrico. Thank you, folks. Folks, definitely give Enrico a follow. And we have quite a few conversations with Enrico. If somebody in the open source community knows about Long Contacts, Enrico is that guy. And we're definitely going to follow up with the links in the show notes for a bunch of this research.

[00:24:41] Alex Volkov: And I think just to sum up, Enrico There have been breakthroughs, and it doesn't look like Google is the only folks who come up today. Nat Nat Friedman and Daniel Gross, the guys who have AI grant, they have the Vesuvius Challenge recently, and invest in everything AI possibly. They just announced an investment in magic, that they have a hundred million dollars investment, [00:25:00] quote unquote.

[00:25:00] Alex Volkov: We were so impressed with these guys when we decided to give them a hundred million dollars from Nat Friedman, and they also talk about the model that does. Something like 10 million context windows. Swyx, you wanna, you wanna talk about the magic thing?

[00:25:12] Swyx: They first talked about this last year, like six months ago, and then went completely silent. So we didn't really know what was going on with them. So it's good to see that this is at least real because six months ago they were talking about 5 million token context model.

[00:25:28] Swyx: But no, nothing was demoed. Not even like a little teaser graphic or anything like that. But for Nat to have invested in this amount, I think it's a huge vote of confidence. And it basically promises that you can do proper codebase embedding and reasoning over an entire codebase. Which, it's funny to have a code model that specially does this, because Gemini could also potentially do this.

[00:25:58] Alex Volkov: They showed in their examples 3JS. Did you see this?

[00:26:01] Swyx: No, I didn't see the 3JS, but okay, yeah. And we have a pretty consistent result from what we've seen so far that GPT 4 is simultaneously the best LLM, but also the best code model. There's a lot of open source code models, CodeLlama, DeepSeaCoder, all these things.

[00:26:18] Swyx: They're not as good as GPT So I think there's a general intelligence lesson to be learned here. That it remains to be seen because we, Magic did not release any other details today. Whether or not it can actually do better than just a general purpose Gemini.

[00:26:34] Alex Volkov: Yeah, and so the example that they showed is actually they took 3JS, if you folks know the 3JS library from Mr.

[00:26:40] Alex Volkov: Doob and they, embedded all of this in the context window and then asked questions and it was able to understand all of it Including, finding incredibly huge codebase. And I think I want to just move this conversation.

[00:26:52] Alex Volkov: Yeah, Nisten, go ahead. I see you, I see you unmuting. And folks on the stage, feel free to raise your hands if if you want to chime in. We'll hopefully get to some of you, but we have a bunch of stuff to chat about as well.

[00:27:01] Nisten Tahiraj: I'll just quickly say that there are still some drawbacks to these systems. And by systems the long context models where you dump in a whole code base or entire components in. And the drawbacks, even from the demos, still seem to be that. Yes, now they do look like they're much better at reading and intaking the information, but they're not yet much better at outputting similar length output, so they're still gonna only output, I think, up to 8, 000 tokens or so, and I don't know if that's that's a byproduct of of the training, or they could be trained to re output much longer, much longer context.

[00:27:43] Nisten Tahiraj: However, the benefit now is that unlike Retrieval augmentation system, unlike a RAG the, the drawback with a RAG was that yes, it could search over the document, but it would only find maybe two or three or a couple of points and bring them up. Whereas this one is more holistic understanding of the, of the entire input that you've dumped in.

[00:28:03] Nisten Tahiraj: But again, we're not quite there yet where they can just output a whole textbook. That's, that's what I mean. So that's the thing. That's the next challenge

[00:28:11] Far El: to solve.

[00:28:12] Alex Volkov: So I think, I think the, the immediate reaction that I had is very similar to what you had, Nisten. RAG is something everybody uses right now. And we've talked about long context versus, versus something like a RAG before, and the usual conversation we have is usually about cost. How much does it cost you pair these tokens, right?

[00:28:30] Alex Volkov: If you send 10 million tokens and each token is like a cent, you're basically paying 10 million cents for every back and forth. Also speed and, and user experience. If your users are sitting there and waiting for 45, 60 seconds because they sent a bunch of contacts, if you can solve this with RAG, then RAG is probably a better approach for you.

[00:28:48] Alex Volkov: However, however this specifically looks like. At least from the examples that the Google did, they showed the video transparently, they sped up the inference, but I saw something where with at least the video question, it took them around 40 seconds. to extract a frame of a video of an hour. They sent an hour worth of context of a video within this thing, and it took them 40 seconds for this inference.

[00:29:13] Alex Volkov: Folks, like I said before, and I'm going to say this again, regular ChatGPT, not even crazy context, queries took me sometimes 40 seconds. Now, you may say, okay, Alex they show the demo of their environment, and ChatGPT is in production environment. Yes, but the possibility is, if I can send I don't know, 500, 000 tokens in the context window, and then within 40 seconds get a response which is equivalent to what I get from GPT 4.

[00:29:38] Alex Volkov: Then I think that a bunch of the conversation about RAG being better just from a speed of inference perspective are slowing down. An additional thing I want to say before I get to you, Yam, just a second the immediate response in my head was, okay, RAG is done for, or at least not done for, but definitely the kind of the crown on RAG's head.

[00:29:56] Alex Volkov: Everybody's talking about RAG. There's vector databases everywhere. We just had folks talk about Colbert and different things. RAG is, okay, RAG is now shaky. But the other thing I started to think is, is fine tuning. also under risk. And Swyx, I think this goes back to what you just said about like the general models versus the maybe the Finetune or very specific models, because if a general model can take a whole book, and they had an example about this where there was a very low resource language, Kalamathi, Kalabathi, something like this, and there's only one book that's a dictionary for this language, they literally threw the book in the context window, and the model was able to, from context learning, to generalize and understand this and perform better than fine tuned models.

[00:30:37] Alex Volkov: And I'm thinking here okay, rag is the first thing to go. Is fine tuned second? Are we going to stop fine tuning and sending contexts? So Swyx, I want to hear your reaction about, about the language thing and then we're going to get to Yam and then we're going to ask some more folks.

[00:30:48] Discussion about effects of longer context windows

[00:30:48] Swyx: Yeah, I think there's generalizable insights about learning about language. And it's not surprising that throwing that into the context window works, especially if it's a cognate language of something that it already knows. So then you're just learning substitutions, and don't forget that transformers are initially trained to do language translation, like this is like bread and butter stuff for transformers.

[00:31:12] Swyx: The second thing I would respond to is, I have to keep saying and banging this drum, long context does not kill RAG because of cost. Imagine if every time you throw 10 million tokens of context in there, you have to pay like a thousand dollars. Because unless something fundamentally is very, very different about this paradigm, you still pay to ingest those tokens of cost.

[00:31:39] Swyx: So ultimately, people want to still reg for cost and then for attribution reasons, like debuggability attribution, which is something that's still valuable. So I think long context is something that I have historically quite underweighted for this reasons. I'm looking to change those assumptions, of course, because obviously this is magical capabilities if you can use

[00:32:03] Alex Volkov: this is magical capabilities if you can use

[00:32:10] Far El: Yeah, I just want to say on the topic of of latency and ingesting a lot of context. I think that there is a solution that we didn't talk about it here and will be something that is going to be incorporated in all the flagship models, which is embedding embedding knowledge into the KB cache, which is something that many of the inference engines today can do.

[00:32:34] Far El: And you simply just prefix the context beforehand, and then you don't need to process it through your model. So you're not sending the whole database each time you are calling your model. It's just saved. Imagine that OpenAI have some sort of API that you embed. The KD cache beforehand, and it's reduced price, of course, and then it uses that as, as your context.

[00:32:59] Far El: Basically, somewhere in the middle between the two. And the reason that it's not supported now in flagship models, because the first flagship model that supports a million tokens came out today. But I think that if we see this this, if we go there, this is something that we're going to see in all of the APIs.

[00:33:18] Far El: Moreover, I also don't [00:33:20] think that RUG is done for it because RUG is explaining to you very, very clearly and very simply. Where the information is coming from, what the model is basing itself on. You can claim that the model with the attention you can do it as well, but it's not like RUG. RUG, you're just showing the clients, the people, exactly where it comes from.

[00:33:40] Far El: And there are use cases where this is absolutely a must. So I think that there will always be room for RUG for these specific use

[00:33:49] NA: cases and long

[00:33:50] Far El: context. With KVCaching is going to be, I think, I think the methods for embedding, for example, a full database, or a book, or something big, and using it multiple times, with many different

[00:34:05] Far El: prompts.

[00:34:06] Alex Volkov: Or also multimodality, right? So thank you for this. Definitely, definitely makes sense. And I think somebody in the comment also left a similar comment as well. So we want to dive into the KVCache stuff maybe in the next one. But I want to talk about the multimodality part of this because, um We've, we've multiple times mentioned.

[00:34:25] Alex Volkov: I think we did this every Thursday. I sense GPT 4 launched because we were waiting for the vision part of GPT 4. And we've talked about 2024 being the year of multimodal. And we're going to have to talk about a bunch of multimodal stuff today, specifically with the RECA folks and the RECA flash, which understands videos.

[00:34:40] Alex Volkov: They basically, so I'm going to have to see whether RECA understands videos better than Gemini, but the Gemini folks talked about there's a specifically. A bunch of multi model effect on the context window where if you send videos, you, at least the way they did this was just frames. They broke down this movie to a bunch of 500, 000 frames or so and just sent it in context window.

[00:35:04] Alex Volkov: And they basically said we have all this video in the context window and then we have a little bit of text. And I think context window expansions like this will just allow for incredibly multi modal use cases, not only video, audio, they talked about, we've talked about previously with the folks from

[00:35:20] Alex Volkov: Prophetic about different fMRI and EEG signals that they're getting like multi modal like applications as well and Context window enlargement for these things, Google specifically highlighted.

[00:35:32] Alex Volkov: And I want to highlight this as well because it's definitely coming. I'm waiting for being able to live stream video, for example. And I know some folks from like 12 Labs are talking about almost live live stream embedding. So definitely multimodal from Google. I think, folks, we've been at this for 30 minutes.

[00:35:48] Andrej Karpathy leaves OpenAI

[00:35:48] Alex Volkov: Alright, so folks, I think we're going to move on and talk about the next kind of a couple of stuff that we've already covered to an extent, but there's some news from OpenAI, specifically around Andrej Karpathy leaving, and this was announced, I think broke in the information, and Karpathy, some folks here call them senpai, Karpathy is a very Very legit, I don't know, top 10, top 5, whatever, researchers, and could potentially have been listening to the space that we had with LDJ after he left, or, yeah, I think it says, it was clear that he left it was the information kind of announcement didn't have a bunch of stuff, but then Andrei just As, as a transparent dude himself, he came and said, hey, this wasn't the reaction to anything specific that happened because speculations were flying.

[00:36:33] Alex Volkov: And I think at least, at least to some extent, we were in charge of some of these speculations because we did a whole space about this that he could have just listened to. But as speculation was flying, maybe this was ILLIA related, maybe this was open source related, like all of these things.

[00:36:46] Alex Volkov: Andre basically Helped start OpenAI, then left and helped kickstart the Tesla Autopilot program, scaled that to 1500, then left. On the chat with Lex Friedman, Andrei said that Basically, he wanted to go back to hands on coding, and in OpenAI, his bio at least said that he's working on a kind of Jarvis within OpenAI, and definitely Andrei has been also talking about the AI as an OS, Swyx, you wanna, you wanna cover like his OS approach?

[00:37:14] Alex Volkov: I think you talked about this. He had a whole outline, I think you

[00:37:17] Swyx: also

[00:37:17] Swyx: talked about this. LLM OS.

[00:37:18] Swyx: Yeah. He wasn't working on it so much as thinking about it.

[00:37:21] Swyx: Thinking about it,

[00:37:21] Swyx: yeah. And maybe now that he's independent, he might think about it. The main thing I will offer as actual alpha rather than speculation is I did speak to friends at OpenAI who reassured us that it really was nothing negative at OpenAI when he left.

[00:37:40] Swyx: Apparently because they spoke to him before he left.

[00:37:43] Swyx: So yeah, he's for the way I described it is he's following his own internal North Star and every time he does that the rest of us

[00:37:51] Alex Volkov: And definitely the rest of us win.

[00:37:53] Alex Volkov: the open source community is hoping, or I've seen many, many multiple things that say, hey, Andre will unite like the, the, the bands of open source, the different bands of open source.

[00:38:02] Alex Volkov: Andre posted this thing. on his ex, where like his calendar was just free, which shows maybe part of the rationale why he left, because meetings and meetings and meetings and everything and now he can actually work. So shout out to Andrej Karpathy for all he did in OpenAI and for all he's going to continue to do.

[00:38:16] Alex Volkov: We're going to definitely keep up to date with the stuff that he releases. Andrej, if you're listening to this, you're more than welcome to join. We're here on every Thursday. You don't have to have a calendar meeting for this. You can hop on the space and just join. Also on the topic of OpenAI, they've added memory to ChatGPT, which is super cool.

[00:38:31] Alex Volkov: They released a teaser, this, I didn't get into the beta, so they released it to a limited amount of people. They added memory to ChatGPT, and memory is very, very cool, the way they added this as well. So I've said for a long time that 2024 is not only about multimodality, that's obviously going to come, but also it's about time we have personalization.

[00:38:51] Alex Volkov: I'm getting tired of opening a ChatGPT. Chat, and have to remember to say the same things on, it doesn't remember the stuff that previously said. The folks in OpenAI are working on the differentiator, the moat, and different other things, especially now where Google is coming after them with the 10 million context window tokens.

[00:39:08] Alex Volkov: And, they're now adding memory, where ChatGPT itself, like the model, will manage memory for you, and will try to figure out, oh, OpenAI, oh my god, breaking news. OpenAI just shared something. As I'm talking about them, you guys want to see this? Literally, I got a

[00:39:28] Alex Volkov: notification from OpenAI as I'm talking about this.

[00:39:30] Swyx: What?

[00:39:32] Alex Volkov: Let's look at this. I, dude, I needed my, my breaking news button today. Opening, I said, introducing Sora, our text to video model. Sora can create videos for up to 60 seconds.

[00:39:44] Alex Volkov: Holy s**t, this looks incredible. Oh my god, somebody please pin this to the, to the, Nisten, you have to see, there's a video, 60 second video, folks.

[00:39:54] Alex Volkov: Like, all of the, oh my god, breaking, I have to put the breaking news button here, holy s**t. So folks, just to describe what I'm seeing, cause somebody please pin this to the top of the space every video model we had so far, every video model that we had so far does 3 to 4 seconds, Pica the other labs, I forgot their name now, Runway, all of these models,

[00:40:16] Swyx: they

[00:40:16] Swyx: do

[00:40:16] Swyx: Oh my god, Runway.

[00:40:18] Alex Volkov: They

[00:40:18] Alex Volkov: do three to five seconds and it looks like wonky, this thing just that they show generates a 60 second featuring highly detailed scenes and the video that they've shared, I'm going to repost and somebody already put it up on space has folks walking hand in hand throughout a There's a zoomed in, like behind the scenes camera zooming in.

[00:40:39] Alex Volkov: There's a couple Consistent I cannot believe this is January. Holy s**t The consistency is crazy. Nothing changes. You know how like previously video would jump frames and faces and things would shift

[00:40:52] Alex Volkov: Wow, okay, so I guess we should probably talk about this. Reactions from folks. I saw LDJ wanted to come up to see the reaction I'm

[00:41:00] Far El: just wild. Honestly, it looks crazy. It looks really good quality. Better than most text to video models that I've seen.

[00:41:08] Alex Volkov: Holy s**t okay, so I'm scrolling through the page, folks,

[00:41:13] Alex Volkov: those who are listening, openai. com slash Sora, Sora is their like text to video I'm seeing a video of a model walking through like a Japan street, whatever, the prompt is, a stylish woman walks down a Tokyo street filled with warm glowing neon animated city signage, she wears a black leather jacket, long red dress, and black boots, and the consistency here is insane.

[00:41:35] Alex Volkov: I do

[00:41:35] Far El: out the mammoths. Or actually go on their websites. On the Sora, [00:41:40] on OpenAI's website. They've got a

[00:41:42] Far El: few examples. It's crazy. It's crazy. I've

[00:41:45] Far El: never seen a

[00:41:48] Alex Volkov: the if you showed me this yesterday, Far El, if you showed me this yesterday and said this is generated, I would not believe you. So what happens is, now the same video of this woman walking, they have a video camera zooming in, into her eyeglasses, her face stays the same, the same consistency, you can see reflection in the, in the sunglasses.

[00:42:08] Far El: Alex, you have to go on the website. There's like this video of, oh like literally the prop is reflections in the window of a train traveling through the Tokyo suburbs. And

[00:42:19] Far El: honestly, it looks, it looks like someone captured this no way this is AI

[00:42:23] Far El: generated. It's, it's crazy

[00:42:27] Alex Volkov: Wow,

[00:42:27] Alex Volkov: folks. What's the availability of this? Let's, let's see, what do we know? So we know safety. We'll be taking several important safety steps ahead of making SORA available on OpenAI's products, so it's not available yet. Working with Red Teamers, they don't want this to be used in deepfakes for porn, obviously.

[00:42:43] Alex Volkov: That's like the first thing that the waifus are going to use it for. The C2PA metadata that, if you guys remember, we've talked about that they started including in DALI, they're going to probably include this as well. And new techniques prepared for deployment, leveraging the existing safety methods.

[00:42:56] Alex Volkov: Okay research techniques.

[00:42:58] Far El: Crazy.

[00:43:00] Alex Volkov: Consistency is crazy, right folks?

[00:43:02] Swyx: Yeah, it's not available it looks like.

[00:43:03] Swyx: Not available

[00:43:04] Swyx: yet.

[00:43:04] Swyx: To answer your question. They released some details about it being a diffusion model. They also talked about it having links to DALI 3 in the sense that Honestly, I don't know if people know that there was a DALI 3 paper, which is very, very rare in this age of Not close.

[00:43:22] Swyx: Not open ai.

[00:43:23] Alex Volkov: Yeah, not

[00:43:24] Swyx: open AI.

[00:43:24] Swyx: And so they doing this like synthetic data captioning thing for the DO three model and they're referencing the same method for soa. I would just go read the Dolly three paper

[00:43:37] Alex Volkov: Wow. I, I, the consistency has been the biggest kind of problem with these LDJ.

[00:43:41] Alex Volkov: Go ahead, please. As I'm reading this and reacting and, and my mind is literally blown the demo of the doggy. Hold on nj one second. There's a demo. There's a video of the dog, like walking from one window and jumping to another window and the pause, they look like it's a video, like folks like literally does not look like generated, like anything we've seen before.

[00:44:02] Far El: This, is going to disrupt Hollywood immediately we're talking about, text to video disrupting media content creation and so on this is it, this is like the mid journey moment of, of text to video that same feeling that we had when we were able to crop mid journey and get some really high quality images this is the same but for video, essentially.

[00:44:23] Alex Volkov: This, this breaks reality for me right now. Literally I'm watching this video multiple times. I cannot believe that the dog's paws are not shaping in different shapes. The spots on this Dalmatian dog stay in the same place throughout the video. It, it don't make sense. Alright, LDJ, go. I think, I think,

[00:44:37] Far El: Yeah so

[00:44:38] Far El: Sam here, I'll post it on the, on the ding board. Sam said that that certain select creators have access now. And, oh, I just lost the tweet. I'll, I'll get it. But yeah, he says that some creators already have access and I guess they're going to slowly expand it out to like beta users or whatever.

[00:44:59] Alex Volkov: Wow, so Sam asked for some we can show you what Sora can do. Please reply with captions for videos you'd like to see and we'll start making some.

[00:45:06] Alex Volkov: So

[00:45:06] Swyx: Oh yeah, basically give him some really complicated prompt, and let's, let's go, let's go.

[00:45:12] Alex Volkov: A bunch of podcasters sitting, watching Sora and reacting in real time and their heads are blown.

[00:45:17] Alex Volkov: Not literally, because this is insane. How's that for a prompt? I'm gonna post it. Hopefully some will get it.

[00:45:25] NA: Just opening a portal through Twitter, through OpenAI to the Munich and then string

[00:45:31] Alex Volkov: Oh, there's, there's also, I don't wanna spend the rest of Thursday. 'cause we still have a bunch of talk about folks.

[00:45:38] Alex Volkov: Is anybody not scrolling through examples right now? And you definitely should. There's an example of a

[00:45:43] Swyx: there's only nine examples.

[00:45:45] Alex Volkov: What, what

[00:45:45] Far El: This is insane.

[00:45:46] Alex Volkov: The whole, no website has a bunch of, scroll down.

[00:45:48] Alex Volkov: There's like every, every kind of example has

[00:45:51] Alex Volkov: more scrollies. So I'm looking at an example of a chameleon, which, has a bunch of spots and has guys, the spots are in the same place. What the f**k? It doesn't move. it does not look like honestly, let's do this. Everybody send this to your mom and say, Hey mom, is this AI generator?

[00:46:07] Alex Volkov: Or not? Like older folks will not believe this s**t, like

[00:46:10] Swyx: I, I will

[00:46:13] Far El: What's the most impressive

[00:46:14] Swyx: compare this to Google

[00:46:15] Far El: right? Like humans,

[00:46:17] Swyx: don't know, I think you guys

[00:46:18] Alex Volkov: hold on. Pharrell, I think, I think we're talking over each other. Give us a one sec. Swix and then Farrell.

[00:46:22] Swyx: Oh, sorry, yeah, there's a bit of a lag. Oh, no, nothing. Just compare this to Google Lumiere where they release a bunch of sample videos as well.

[00:46:29] Swyx: But you could, the, the, I was impressed by the consistency of the Lumiere demo videos. They would, they demoed sort of pouring syrup onto a pancake and then infilling the syrup and showing that, it would be pretty realistic in pouring all that syrup stuff. Didn't really see that kind of very technical test here.

[00:46:49] Swyx: But the resolution of these videos and the consistency of some of these movements between frames, and the ability to cut from scene to scene is way better. Instantly way better. I was thinking that Lumiere was, like, state of the art a few weeks ago, and now it is completely replaced by Sora.

[00:47:08] Swyx: This is a way better demo. I think OpenAI is showing Google how to ship.

[00:47:12] Alex Volkov: eye. Decided to say, you know what, Google, you think you can one up us with the context window?

[00:47:18] Alex Volkov: We got another thing coming, because I've

[00:47:20] Swyx: just pull up the Lumiere page, and then pull up the Sora page, and just look at them side by side, and you can see how much better they

[00:47:26] Alex Volkov: Lumiere

[00:47:26] Alex Volkov: was mind blowing as well. Go ahead, Far El. Go ahead, because we're still reacting in real time to this whole ridiculously impressive.

[00:47:32] Far El: Yeah, I was just saying that the the most impressive thing are, is like how alive these video shots feel, right? Humans talking action scenes like, all the text to video models that I've seen so far and I've used were very very simplistic, right? It felt like more like you're animating an image to do very minor movements.

[00:47:55] Far El: It wasn't actually alive in any way, but Sora's text to videos is, is nuts, the quality, the consistency, the action, like the actual action of the characters. I wonder how much like granular control do you have on a scene to scene basis. I know that Google released like a paper I think a few months back where they had a basically like a script that allowed the, like for much more long form.

[00:48:27] Far El: video content, but I'm not sure if that's the case here. It's just, it's just really impressive. It's, it's really impressive.

[00:48:35] Alex Volkov: I want to say one of our friends, LaChanze, just sent, at the bottom of the page, it says, Sora serves as a foundation model that can understand and simulate the real world. I can it's really hard for me to even internalize what I'm reading right now, because the simulation of the real world, it triggers something in me, tingles the simulation hypothesis type of thing, and this can regenerate the map of the world and then zoom in and then generate all the videos.

[00:48:58] Alex Volkov: And I'm wearing this Mixed, slash, augmented, slash, spatial reality headset that just generates and this happens on the fly, and what am I actually watching here? So this says Sura serves as a foundation for models that can understand and simulate the real world, a capability we believe will be an important milestone for achieving AGI.

[00:49:15] Alex Volkov: Yeah. Alright, folks. I will say, let's do two more minutes, cause this is I can't believe we got both of them the same day today, holy s**t, we got 10 million contacts window from Google announcement, which is incredible, multi modal as well, I like, my whole thing itches right now to take the videos that OpenAI generated and shove them into, into a Gemini to understand what it sees and see if if it understands, it probably will.

[00:49:40] Alex Volkov: Wow.

[00:49:40] Far El: Thing that would make this Thursday a tiny bit even more awesome is if Meta comes out with telemetry. Too much, too much, too much.

[00:49:51] Alex Volkov: It's

[00:49:51] Alex Volkov: gonna be too much. We need, we need a second to like breathe. Yeah, definitely folks. This is a Literally like singular day. Again, we've [00:50:00] had a few of those. We had one on March 14th when ThursdAI started, OpenAI released GPT 4, Entropic released Cloud, I think on the same day. We had another one when OpenAI Dev Day came about, and I think there's a bunch of other stuff.

[00:50:12] Alex Volkov: I consider this to be another monumental day. We got Gemini 1. 5 with a potential 10 million context window, including incredible results in understanding multimodality in video, up to an hour of video. And then we also have some folks from RECA that's gonna come up soon and talk about their stuff, which is, they just with all due respect with RECA folks this news seems bigger, but they still launched something super, super cool we're gonna chat about, and now we're getting, it's just, the distance, we're used to jumps, we're used to state of the art every week, we're used to this, we're used to this model beats this model by Finetune, whatever, we're used to the OpenAI leaderboard, this is

[00:50:53] Alex Volkov: such a

[00:50:53] Alex Volkov: big jump on top of everything we saw.

[00:50:55] Alex Volkov: From Stable Visual Diffusion. From what are they called again? I just said their name, Runway. I forgot their always forget their name.

[00:51:02] Swyx: Poor guys.

[00:51:04] Alex Volkov: Poor Runway. From Pica Labs. From folks who are generating videos. This is just such a huge jump in capability. They're talking about 60 seconds.

[00:51:14] Alex Volkov: Oh, Meta just announced JEPA. Yeah, I don't know if JEPA is enough. People are commenting about JEPA, and I'm like, okay wait, hold

[00:51:21] Swyx: You, you spiked my heart rate when you said Meta just announced. I was like, what the f**k?

[00:51:25] Alex Volkov: the f**k? Meta literally just came out with an announcement, VJEPA, supervised learning for videos.

[00:51:29] Alex Volkov: But, folks unless they come out with Lama 3 and it's multimodal and it's available right now, not Meta is not participating in the

[00:51:35] Swyx: thing

[00:51:36] Alex Volkov: day

[00:51:36] Far El: Oh wait, this is actually cool. So this is this is something,

[00:51:39] Far El: actually a paper they came out with like about a month ago, but this is for video understanding. So this is pretty much like for input of video, while OpenAI's model is for output of video.

[00:51:51] Alex Volkov: It just, I will say it's a research thing, right? So they're not showing anything there, unless I'm mistaken. Um So, I kinda, so I still have a bunch of stuff to give you updates for, and I still have a bunch of interviews as well, there's a new stability model, but I'm still like, blown away, and I just wanna sit here and watch the videos,

[00:52:07] Alex Volkov: Is this what Ilya saw? Yeah, somebody reacted like, what did Ilya see? Did Ilya see a generated video and the model understanding this and that's why, that's why?

[00:52:16] Far El: No, I think, I think, I think AGI has been achieved internally at

[00:52:21] Far El: this rate.

[00:52:22] Alex Volkov: Wow. I, I'm, I'm still blown away. Like I, if a model can generate this level of detail in very soon, I just wanna play with this. I wish, I wish we had some time to, to, to, I, I was one of the artists and I hope that somebody in the audience here is, and that they will come to talk about this on Thursday.

[00:52:43] Alex Volkov: I and because I'm, yeah. I'm still mind blown. So I see. Quite a few folks that I invited that I wanna, I wanna welcome to the stage. VJEP understands the world while Sora generates one. That's the comment that some folks led. And okay, okay. VJEP is going to be something we definitely cover because Meta released this and Meta are the GOATs, even though yeah, no, Meta's definitely GOATs. I'm just a little bit lost for words right now.

[00:53:06] Nisten Tahiraj: Yeah, so if people have watched a lot of speeches from Yann LeCun is the, the main idea is that these AI models are not very good at understanding the world around them or thinking in 3D. So in some ways, you could reason out that A cat is a lot more intelligent even if it was blind and it couldn't smell, it could still figure out where to go and find its letterbox stuff like that.

[00:53:30] Nisten Tahiraj: This is one part that's missing from the world model that they get purely just from word relationships or word vectors. And so this is a step in that direction, it seems. Again, I haven't read the paper, so I'm Half making stuff up here but it feels like this is a step in, in that direction towards AI models that understand what's going on like us and animals do.

[00:53:56] Nisten Tahiraj: So that, that's the main, the gist of it for, the audience.

[00:54:04] Alex Volkov: Oh, what, what a what A Thursday. What A Thursday. I gotta wonder how am I'm gonna summarize this, all of this. And I just wanna invite, we have here in the audience and I sent you a request to join. If you didn't get it. Make sure that you're looking at requests and then accept. And then we should have, we should have Max as well at some point.

[00:54:20] Alex Volkov: Lemme text Max. 'cause we have guest speakers here from, from Breca that we wanna chat with. Meanwhile I'm gonna continue and, and move forward in some of the conversations. Let's roll back. Okay, while we're still super excited and I can't wait for this to come out, this is an announcement that they did.

[00:54:35] Alex Volkov: It's very polished. We haven't seen we didn't see any access or anything about when it's going to come out. I do feel that this is a breakthrough moment. from Google and from OpenAI. And it does look like it's reactionary to an extent. The folks in OpenAI were sitting on this and saying, Hey, what's a good time to release this?

[00:54:52] Alex Volkov: And, actually now, to let's steal some thunder from Google and they're like 10 million thing that also not many people can use. And let's show whatever we have that not many people can use which, which is an interesting. Think, to think about, because, again, the pressure is on a bunch of other labs, on Meta, to release something, we know Lama3 is coming at some point, will it be multi modal, will it be able to generate some stuff every

[00:55:16] NA: Really, really quick, sorry to interrupt

[00:55:18] Alex Volkov: Go

[00:55:19] NA: the thing about VJEBA seems to be good at is understanding video instructions I guess you could point the camera to something you're doing with your hands and arts and crafts things, or repairing something, and it understands what you're doing, so that, that's actually very easy.

[00:55:36] NA: Powerful for what data sets data sets of skills that will come, because then you can generate actions. I, I think that, that will apply a lot to robotics, what they're doing.

[00:55:48] Alex Volkov: Oh, alright, yeah. And they also have the Ego4D datasets of robotics as well, and they've talked about this.

[00:55:55] Nvidia relases chat with RTX

[00:55:55] Alex Volkov: so let's go to open source like super quick. NVIDIA released a chat with RTX for local models. And it's actually like very, very cool. So a few things about the chat with RTX. First of all, NVIDIA packed a few, a few models for you. It's 38 gigabytes or something download. And they, they have they have quite a few I think they have two models packed in there.

[00:56:16] Alex Volkov: I wasn't sure which ones. And this, this is basically a, a package you download. I don't know if a doc or not. That runs on any desktop PC with RTX 30 or 40 series with at least 8 gigabytes of RAM. And it gives you a chatbot that's fully local. And we love talking about open source and local stuff as well.

[00:56:33] Alex Volkov: And it Not only that, they give you a rag built in. So you can actually run this on some of the documents that you have. They also have something that runs through a YouTube. You can give it like a YouTube playlist or a video link, and it will it will have you talk to YouTube video. So it has built in rag, built in Tensor rt, LLM, which runs on their, on their stuff RTX acceleration and.

[00:56:56] Alex Volkov: I think it's pretty cool, like it works only on the very specific types of devices, only for like gamers or folks who run these things but I think it's pretty cool that that folks are, that NVIDIA is releasing this. They also have something for developers as well to be able to build on top of this.

[00:57:11] Alex Volkov: And I think the last thing I'll say about this is that it's a Gradio interface, which is really funny to me that people are shipping Gradio interfaces on production. It's super cool.

[00:57:18] Cohere releases Aya 101 12.8B LLM with 101 language understanding

[00:57:18] Alex Volkov: Cohere releases an open source called AYA 101, a model that's like 12. 8 billion parameters model with understanding of multilingual 101 languages from Cohere. It's, it's honestly pretty cool because Cohere has been done doing a bunch of stuff. AYA outperforms the Bloom's model and MT0 on wide, a variety of automatic evaluations despite covering double the number of languages.

[00:57:41] Alex Volkov: And what's interesting as well, they released a dataset together with AYA and then what is interesting here? Yeah, just, oh, Apache 2 license, which is super cool as well. Apache 2 license for, for this model. Let me invite Yi as a co host, maybe this can, join. Far El, go ahead.

[00:57:58] Alex Volkov: Did you see, do you want to talk about Yi Aya?

[00:58:00] Far El: Yeah first off, I I appreciate and commend Cohere to building a multilingual open source data set and so on. That's awesome. We need more of that. But unfortunately, With the first few questions that I asked in Arabic specifically most of the answers were complete. [00:58:20] nonsense on their train model.

[00:58:23] Far El: Yeah. And to, to the point that it's it's laughable, right? For instance in Arabic, I asked who was the who was the first nation that

[00:58:32] NA: had astronauts on the moon. I

[00:58:38] Alex Volkov: Yes.

[00:58:39] NA: think, I think you cut out for a sec.

[00:58:43] Alex Volkov: I think he dropped. I don't see him anymore.

[00:58:45] NA: He might have

[00:58:46] NA: His phone might have

[00:58:47] Alex Volkov: yeah, we're gonna have to

[00:58:48] NA: I can briefly

[00:58:50] NA: comment on it. Yeah, we're pretty happy now that also Kahira has started contributing,

[00:58:56] NA: To open source because datasets are very important. And yeah, I think the reason it wasn't performing so well In other languages, it's just because some languages do not have there wasn't enough data in that for it to be, to be trained.

[00:59:12] NA: But the beautiful thing is that it is Apache 2. 0. You can just add your own languages data set and it will. Literally, make the whole thing better. And yeah, that's, those are my comments on it.

[00:59:22] Interview with Yi Tay and Max Baine from Reka AI

[00:59:22] Alex Volkov: Awesome. All right, folks. So now we're moving into the interview stage, and we have quite a few folks. As one of the most favorite things that I want to do in ThursdAI, and it's been an hour since we've been here, is to actually talk with the folks who released the stuff that we're talking about.

[00:59:35] Alex Volkov: So the next thing I'm going to announce, and then we're going to talk with Yitei and Max, and then after that, we're going to talk with Dom as well. Earlier this week, a company named Reka AI released two models, or at least released a demo of two models, right? I don't think API is still available.

[00:59:51] Alex Volkov: We're going to talk about this as well. Called Reka Flash and Reka Edge. And Reka Flash and Reka Edge are both multimodal models that understand text, understand video, understand audio as well, which is like very surprising to me as well. And I had a thread where I just geeked out and my head was blown to the level of understanding of multimodality.

[01:00:09] Alex Volkov: And I think some of the folks here had, had had talked about Sorry, let me reset. Some of the folks here on stage have worked on these multi models models. And so with this I want to introduce Yi Tei and Max Bain. Please feel free to unmute and introduce yourself briefly and then we're going to talk about some record stuff.

[01:00:25] Alex Volkov: Yi first maybe and then Max.

[01:00:27] Yi Tay: Yeah, thanks thanks Alex for inviting me here. Can people hear me actually?

[01:00:31] Alex Volkov: Yeah, we can hear you

[01:00:32] Yi Tay: okay, great, great. Because this is the first, hey this is the first time using space, so yeah, try to figure out how to use it. But thanks for the invite, alex, and so I'll just introduce myself. I'm Yi Teh, and I'm one of the co founders of RectorAI.

[01:00:45] Yi Tay: We're like a new startup in the LMS space. We train multi modal models. Previously I worked at Google Brain working on Flan stuff like that. So yeah, that's just a short introduction about myself. And maybe Max, do you want to introduce yourself? Yeah,

[01:00:59] Alex Volkov: Yeah, Max, go ahead, please.

[01:01:00] Max Bain: thanks Ian. Yeah.

[01:01:01] Max Bain: Thanks Alex for having me. So yeah, as you said yeah, I'm part of Wrecker. So I joined more recently, like six months ago. I just finished my PhD and that was all my video, audio, speech understanding. I've done a bit of work in open source. So if you use WhisperX that was like something I worked on and yeah, now working more on part of Wrecker and really enjoying it.

[01:01:22] Max Bain: yeah, that's pretty much

[01:01:23] Alex Volkov: First of all, let me just say, thank you for WhisperX, I did use this, and it was awesome, and I think this is how we connected before or at least, to some extent, I think this is the reason maybe I follow you, I was really surprised that you were Reka. Let's talk about the models that you guys just released, and because Very impressive on the multimodality part, but also very impressive on just the regular comparative benchmark, and I think you guys released the comparisons to just regular MMLU scores, so Wreck A Flash gets 73.

[01:01:52] Alex Volkov: 5 on MMLU and 65 on Human EVAL, and GPT 4 is at 67, at least, and Gemini Ultra, they claim is 74, but your guy's model is like significantly smaller. What can you tell us about, and I know you said before there's like a bunch of stuff that you won't be able to talk about what can you tell us about the performance just on the textual kind of comparison, even though this is a multimodal model and there's a bunch more that we will talk about?

[01:02:17] Yi Tay: Yeah, thanks so I'll just I can't really say that much, but I can say that there's quite a lot of headroom in pre training just for language alone, and I think that we're still not near the headroom yet for pre training, and I think even for us, actually, we have a better version of RecoFlash internally right now, but we've not even published metrics for that because while we were preparing for the launch we actually have even a better model now.

[01:02:39] Yi Tay: So I think actually there's still quite a lot of headroom for pushing that and there's quite a lot of things to do in pre training but I can't really wouldn't be able to say much about? About like more details, yeah.

[01:02:48] Alex Volkov: About specifics. I did see the comments that you left in your thread, that you talked about the folks who do foundational models from scratch, they, there's a lot of banging a lot of creation they have to do in the process as well, and it looks like at least some of this amount, some of this amount of hard work you guys had to go through in order to train these foundational models.

[01:03:09] Alex Volkov: So let's talk about the multimodality, what what can this model do? And I think I have a

[01:03:15] Alex Volkov: good idea, but can you talk to us on the multimodal part? What can those models do in terms of multimodality?

[01:03:23] Max Bain: Yeah, so in terms of multimodal yeah, if you just, you can use it actually on chat. reco. ai, and I would say the image understanding's pretty good, so people have noticed, you can recognize text pretty well. Yeah, more nuanced details, which tended to be a big issue with VLMs, like they used to be quite biased or it'd hallucinate a lot.

[01:03:41] Max Bain: I think in Rekka Fafri noticed that dropped a lot. So I think kind of image understanding is, I'd say, yeah, pretty on par with Gemini Pro or a bit better. But yeah, that's up to the jury. The video understands also pretty good. We limit it to a one minute input. We do have internally like better things and like bounded by how much we can run like for free. So, yeah, I'd say yeah, overall pretty good video understanding and image. We haven't focused too much on audio right now, but that's like definitely on the, on the roadmap.

[01:04:14] Alex Volkov: I did run into the audio stuff, and I ran a few videos through the demo, and folks definitely should check out the demo. I'll add this in the show notes, and hopefully some folks will add this to the space as well. I just started uploading like short clips, and it's great to hear that you're saying, you guys are limited, you're limiting on the demo, but you can, if I'm hearing correctly, you can The model can understand longer videos as well.

[01:04:39] Alex Volkov: So I uploaded a video of a trip that I took to Hawaii and there's a submarine there and somebody was narrating in the submarine and he yelled something like, there, there, there's the submarine goes, dive, dive, dive, something like this. Very excitedly. And the model really understood this, and actually it said, the commenter said, Dive, dive, dive, like this, with a bunch of I's in it.

[01:05:00] Alex Volkov: And to me, this was like the, the holy s**t moment. I uploaded this video. The narrator for this video was very excited. I did not expect the model to actually pick up on the excitement. And, It was very surprising to me because if you use something like Whisper and you just extract the audio from the, from the video, you would not get this result.

[01:05:20] Alex Volkov: You would not get like the, the excitement in this person's voice. And while we try to get max back in, could you, so could you mention stuff about audio? Do you train this specifically for audio as much as you can share, obviously. Or is it like a, a, a byproduct of, of just this model being multimodal and understanding and can listen as well?

[01:05:39] Yi Tay: Wait, so let me take a step back. Actually, thanks for sharing that example because I

[01:05:43] Yi Tay: actually had to watch your example to find that, that dive, dive, dive. I actually watched the entire video to find that, that clip. So I think it was a pretty Good clip. To be honest, it also surprised me that you found this example.

[01:05:56] Yi Tay: I, I think I was not also expecting this but I, we, we, we co trained this with many modalities. We are not sure, like, why this this specific case is like this. I think that's all I can say, but probably

[01:06:09] Yi Tay: yeah, next one

[01:06:09] Alex Volkov: I can definitely, definitely add one thing that this video wasn't for sure not in your training data set because it was a private video of mine that didn't exist on the internet before. So it wasn't like a result of this video being in a training set. Max, you rejoined. I hope you heard some of this question as well, attributed to you.

[01:06:26] Alex Volkov: Did you see this example? Did it cut you off guard as well? Do you see other examples like this that were like very, very surprising in how this model performs?

[01:06:33] Max Bain: Yeah, I saw that. I was surprised. To be honest, one thing I've noticed is that video benchmarks are quite poor. So [01:06:40] we, in the question answering datasets, we don't really get a chance to see this, especially ones that use like the speech information and things like that. So I guess really, I'm glad you like tested it a lot.

[01:06:50] Max Bain: Cause yeah, like internally we maybe haven't had a chance to I think but it's the benefit of kind of, yeah, training everything from scratch and adding all the modalities

[01:06:58] Yi Tay: and yeah

[01:06:58] Alex Volkov: That's awesome. So I also want to talk about the fact that you guys raised two models and you talked about there's a bigger one. Let's talk about the edge model. Can you talk about Are we going to be able to use this on device? I assume what's the play here? At least from what you can say, what's the play in terms of using the smaller models?

[01:07:14] Alex Volkov: Obviously, smaller models, the benefit of them is using them closer on the edge and device, and that's how you named it. What's the, what's the thinking about releasing, these two models in different sizes? And and what's your plans for those?

[01:07:26] Yi Tay: Oh yeah, sounds good. Yeah, that's a great question. So for the H model, 7B model, it's I think it's it's at a size that it's possible to run it locally, but we are thinking also along the lines of okay, it's actually Faster, like it's just for latency sensitive applications sometimes you just need certain things like this Slightly faster than the 21b model and it's also cheaper to to to host for for a lot of applications So I think that's mainly like this one of the reasons why seven.

[01:07:55] Yi Tay: We also ran lots of ablations at low smaller scale. So this, this turns out to be just the size that we have. And I, I think it's mostly, mainly for latency sensitive stuff. And then like for people who are like for businesses and stuff, like they might just choose to deploy the smaller model if they don't like, need a larger models like the.

[01:08:13] Yi Tay: Flash or the, the core model. So I think that's really like the idea behind it. And then from the research point of view, or at least from the playground point of view, right? Like the, the demo point of view is that people get to, to, to, to get a sense of the view of the model at the seven B scale and the 21 B scale, right?

[01:08:28] Yi Tay: So there's kind some kind of you might be able to, to get a sense of like how this setup looks at the different scale. I think that's mainly like why we deployed two models in the background just so that people can play with. Two variants and the stuff. Actually not much thought here.

[01:08:42] Yi Tay: I mean it's not like super complicated, it just happened this way, but yeah, that's all I can say, yeah.

[01:08:48] Alex Volkov: Awesome. And so folks can go check out the demo. It looks like you guys are set up for API keys as far as I understood. So will developers be able, be, be able to build with this? What stage are you in? I think you, you invited to a disco or something. Could you talk about how we can play with these models, what we can do, and if there's any expected open source, because we'll have open source here on ThursdAI.

[01:09:08] Alex Volkov: If there's anything to talk about there as well, please, please feel free to, to tell us how to actually try these models beyond the demo. Build with them.

[01:09:16] Yi Tay: Yeah, sounds, sounds good. So for API, actually, we, we, we have our API as a system already like working and then some people are already using it. We are like rolling out access coupling without the billing and everything, like we're just making sure everything is running very well.

[01:09:29] Yi Tay: And then we will roll it out soon. So I think that's mainly like the, the idea behind the slightly stitch. API release yeah, so that's for APIs. And then for open source, we I'll just be candid here, we are constantly, we're not sure yet about whether we want to do it or we don't want to do it.

[01:09:44] Yi Tay: It's always a question we have but we're not promising anything, but we're also not saying no yet. So it's a, it's a competition we have very regularly about about this kind of thing. So I, I, so yeah, that's currently the stance we have right now. But we are, we are

[01:09:55] Yi Tay: writing a we are writing a tech report it's not like a paper paper, but it's also not going to be that there'll, there'll be some details in the tech report, but not complete details, but some details.

[01:10:04] Yi Tay: But yeah, so I think that's mainly like the extent of like how we're thinking about things right now, yeah.

[01:10:09] Alex Volkov: Awesome. So first of all, I want to consider you guys friends of ThursdAI. Thanks for coming on the pod. And here, we definitely love open source. We talk about it all the time. And we're just like Champions of Open Source, so if you do release anything Open Source, you're welcome to come back as well. Yi and Max, we have Swyx here, I'm actually in Swyx's audience, so you can hear them from my microphone.

[01:10:29] Alex Volkov: And Swyx has a few follow up questions for Yi and Max as well, so Swyx, go ahead.

[01:10:32] Swyx: Oh, sure. Yeah. Hey I actually tried to set up a chat with you when I was in Singapore, but it didn't happen.

[01:10:39] Swyx: So sorry about that. But I actually wanted to just chat with you more about something that you hinted on your announcement post. You talked about how much of the infra you had to rebuild, you Reka. Everything, you said everything from robust training infra. Proper Human Evaluation Pipelines and Proper RLHF Setups.

[01:11:00] Swyx: I was wondering if you can just give us like a preview of What did you miss? What does Google have? And then what do you think like the industry could innovate on?

[01:11:09] Yi Tay: Okay. That's a very interesting question. I need to be, need to think about what I can say and what I cannot say. But so definitely, definitely I miss GPUs credit to GPUs and being like a, a Googler for all my. Professional life, definitely the infra was completely new to me, and then at Rekka, we have a lot of people from GTM and, and Google in Alphabet in general I think a lot of us could, I feel the same way and then, I think in terms of infra, I think GPU tooling is not as robust as at least what I experienced for TPU Infra back at, at, at Google. So I think that's mainly the first thing is the robustness of the the, the training the, the, the, the, the, the accelerators itself, right? And then also even things like FileIO is something that people take for granted. At Google, the file systems, the X Manager box and stuff orchestrators and stuff like that are, like, just so well designed at Google.

[01:12:02] Yi Tay: And then externally, it's a lot of them are just missing. So I think yeah, I, I, yeah, I think that's basically on the training infrasight and yeah, so I think, I think the tooling for like training like large models is not really super like robust externally, like you're, you're, it's not easy to like just pick off something and then like train like.

[01:12:26] Yi Tay: Like a 100 bit model easily without actually making sure your checkpointing is you're, you're, you're resuming your checkpointing, your, your notes failing and stuff like that. I think those are, like, hard, hard stuff things that, that need to be taken care of but at, at, at Google some, some team Does that for you.

[01:12:43] Yi Tay: Yeah, TLDR of the training infrastructure, yeah.

[01:12:48] Swyx: Does Google have the equivalent of Weights and Biases?

[01:12:51] Yi Tay: TensorBoard, I think, yeah.

[01:12:53] Swyx: Oh yeah, yeah, yeah, of course.

[01:12:55] Yi Tay: Yeah yeah, yeah, yeah yeah.

[01:12:58] Alex Volkov: So

[01:12:58] Alex Volkov: we don't work with Google yet, but hopefully if if folks at Google are listening to us and you want to use kind of Weights Biases, definitely reach out. But at least you guys, now that you're out of Google, you definitely can. You want to follow up with Swyx, or are you,

[01:13:10] Swyx: are you Oh,

[01:13:10] Swyx: I don't know. Did you guys talk about Ricoh Core already?

[01:13:13] Alex Volkov: Yeah, so I think, Yi, there's not a lot of stuff that you can say about the bigger model that you guys have, but give us a little teaser live for a few folks here on stage, like what can we expect from the bigger model, maybe when, what can you tell us?

[01:13:28] Yi Tay: So the bigger model, okay, so I can just say that we, we ourselves are quite impressed by the results and it's if, if if you try to extrapolate from our 7 and 21 based on relative to other models of the scale you can. Try to imagine like what the type of metrics look like, right? But I think we are, we ourselves are, ourselves, we are quite impressed by, by the, the, the, the, the metrics.

[01:13:49] Yi Tay: So like we are I think that's all we can say. I think in the polls, we say that coming out in coming weeks is around that ballpark. It's not like next week, the kind of thing. It's also not like one, two weeks. It's probably like a couple of weeks. But we still, we also kind of like a bit tired after the release.

[01:14:05] Yi Tay: Take

[01:14:05] Yi Tay: a few days light break and then start working again, that kind of thing. So Yeah. I think that that's, that's basically what I can say, but it's, I, we are, we are very happy in the model and as well, yeah.

[01:14:17] Alex Volkov: All right, so we're excited to see this. I want to flip back to Max just for a second. Max as we just talked covered, there's some stuff that I use that you guys are watching. Oh, find somebody test this out. When folks interact with your demo, first of all, I'll just say, definitely folks should do the thumbs up, thumbs down, and reply, so you guys will get some nice RLHF.

[01:14:35] Alex Volkov: What other venues of giving you guys feedback would folks can go? Is there a Discord you want to call out, or anything else you want to add to this as we move on?

[01:14:44] Max Bain: Yeah, thanks guys. We, we actually have a discord channel and if people post, use cases where maybe our model is doing well, or could do better, you can post that, or maybe there's something you're not happy with the current models, like GPT 4V also. And like, I guess, cause we're [01:15:00] such a small team in an early stage, like we'd.

[01:15:02] Max Bain: We're taking a lot of that on board and yeah if you can point any of that stuff, if you have stuff in more detail, you can put that on the Discord and yeah, we're like, really happy for any feedback,

[01:15:10] Alex Volkov: awesome. Are you guys distributed, by the way? Are you working co located? Like, where's, where's RECA located?

[01:15:16] Max Bain: Like, all over the globe, yeah, So he's in Singapore, I'm, like London, sometimes the West Coast, but yeah, it's like a remote first

[01:15:23] Max Bain: company.

[01:15:25] Max Bain: and also, yeah, sorry. Another thing is if we have, do you have job posting? So if you guys would Yeah, like the sound of record, you can also apply to join. We have yeah, quite a few

[01:15:35] Max Bain: positions open.

[01:15:42] Alex Volkov: friends of the pod from now on. E, anything else you wanna, you wanna add as, as we finish up and then move to the next

[01:15:49] Yi Tay: No, thanks. Yeah, really thanks for inviting. It's really nice chatting with you. And yeah, it's been great. Yeah.

[01:15:56] Alex Volkov: I'm, I was, like, like I said, I was blown away by the performance of the multimodality. I was blown away by the tonality understanding, which I've never experienced in any model so far. I heard that it's possible and I saw some technical stuff. I never experienced this on something like my videos as well.

[01:16:11] Alex Volkov: Definitely folks should play around with, with the demo. I'll add this in the show notes and follow Yi and Reka and, oh yeah, one last thing Yi, before you go. What's the meaning of Reka? I know this is a word in Hebrew that I know, but what's, what's the meaning of this word? Like, where, where did this come from?

[01:16:24] Alex Volkov: I was really curious.

[01:16:26] Yi Tay: I think one of the meanings, it's not official, it's not canon, but like one of the meaning it comes from Reka in Eureka, like Eureka, like the Reka

[01:16:35] Yi Tay: in Eureka, but it's not Okay, this is not canon, it's just one of the interpretations of that but it's a bit reverse engineered where people ask us, we just, this is what we say, but that's actually I think that that's it's not really like canon, yeah.

[01:16:49] Alex Volkov: Awesome. Thank you guys for joining and folks, definitely should go check out the demo. And I think the tradition continues because now we have we're moving on to the diffusion area and we have the, the, the, the awesome, the awesome chance to have Dome here. And we. Just released, or I guess we saw this week, a new release from Stable Diffusion called Stable Cascade.

[01:17:09] Alex Volkov: And Dom, I reacted to Imad's tweet about this hey Imad, you want to come to ThursdAI? And he said, Dom, and I think did you say Rodrigo was the other guy? Are the real heroes. And I want to welcome Dom to the stage. Dom, welcome. Feel free to unmute yourself, give a brief introduction. Let's talk about, let's talk about Stable Cascade. .

[01:17:25] Dome: So yeah, my, my name's Dom. I joined stability a couple, actually a couple of months only ago. And I'm currently enrolled in, in Germany in a in a degree. I'm currently finishing that up and I've met Pablo more than a year ago. And ever since that we started working on, generative models, mostly in vision. So image modality and also slowly moving into video stuff. And yeah, at some point, so pretty early, we already connected to stability via Lyon. And at some point they liked what we were doing and liked the progress of how the paper that we called Verstehen was going, which is German and means sausage.

[01:18:09] Dome: I can tell more about that

[01:18:10] Alex Volkov: Oh, that's what it means! Okay.

[01:18:13] Dome: yeah, yeah, yeah. And yeah, so then we joined, we joined and we joined the apply team and we were able to, to work on the third version of it which in the end then was called Stable Cascade, just to make it fit in more, not to confuse people where that name comes from, what's this third version about.

[01:18:31] Dome: And yeah.

[01:18:34] Dome: That's bad.

[01:18:34] Alex Volkov: Awesome. So let's, let's say hi to Pablo as well. Welcome, Pablo. Feel free to unmute yourself. Brief intro from you as well. And let's talk about what makes Cascade different than SDXL or even the V2.

[01:18:45] Pablo: Hey, hi, Alex. A bit about myself. I am a machine learning researcher. I used to work before working at Stability. I used to work at Disney. So I was able to bring a lot of interesting ideas from there. And then I, yeah, I joined Dom and we have been working on very cool things since, since I met him.

[01:19:03] Pablo: And the latest is, is our new stable cascade.

[01:19:08] Alex Volkov: That's awesome. Let's talk about Stable Cascade. I've been able to test this out, and the things I was able to, the things that blew me away were, like, speed, inference speed as well, but also the base model already has hands built in, and they're fine. You guys said you're working with Worshen for a couple iterations, and this became Stable Cascade?

[01:19:26] Alex Volkov: Like, where talk to me about the history, and why is it so good, and so fast?

[01:19:30] Dome: Okay. Yeah. Yeah. So basically the, the biggest difference, and I think that's what it boils down eventually is the, the, the space or the dimension where stuff is generated for, for the text conditional part and for Stable Diffusion XL is, that they have this thing called the VAE, which takes images and just compresses it down to a smaller space.

[01:19:53] Dome: And the only reason to do that is. Just that you work at a smaller resolution, which then gives you faster training and faster inference. Imagine training or generating stuff at a pixel resolution of 1024, so one megapixel. This will be a lot slower than if you try to do the same, try to trying the same model at what, 32 by 32, for example.

[01:20:15] Dome: So the idea is you still want to have high, high quality, high resolution images, but you don't want to generate at that very high pixel space. So you just try to find something, how you can compress it even further. And up, up until now, people always use VAEs, VQGANs, normal autoencoders and so on but they reach limits very early on.

[01:20:34] Dome: So you can get to an spatial compression of eight. So Pablo had this incredible idea of using it. diffusion model to increase that compression, basically, and long story short by using a diffusion model on top of a normal VAE, or you could also leave the VAE away and just start at pixel space, you can achieve much, much higher compressions because you have the diffusion model that can iteratively at first at the lower frequency, so the, the the rough details, and then later on at the high frequency.

[01:21:04] Dome: So at all the details. And so it has just a lot more space to reconstruct an image. And with that it's possible to, to compress images a lot further. And the version that we have now achieves a compression of 42. And that makes a huge difference in terms of training and inference time. And That's probably what you saw because then

[01:21:24] Dome: the big model, the 3.

[01:21:26] Dome: 6 billion, which is. quite big for images. So stable diffusion XL is 2. 2 billion. We're not in the, in the large language models. So yeah, this makes it just a lot faster. And then you have this diffusion decoder, which works at at a higher resolution, but needs a lot less steps and combining this just gives results in making the model very fast.

[01:21:49] Alex Volkov: That's super cool. I want to switch back to Pablo just real quick. So I'm looking at this graph for inference speed, but also checked out some of the examples. One thing that I noticed is the real time rendering basically of how the model kind of searches through the diffusion space. And the last step just like kicks into like super high resolution.

[01:22:09] Alex Volkov: Pablo, what can you tell us from some exciting or maybe surprising results that you've seen or people using it and Yeah, feel free to speak about your cool model a little bit more.

[01:22:18] Pablo: Yeah, I actually I have been really surprised on how well this model could, could could be. We, we, we're not expecting it to be as good as it is. We started this more as an like a, an experimental idea of trying to achieve the same quality of existing models but focusing on, on speed on performance.

[01:22:39] Pablo: But then somehow we ended up with a model that was like very competitive and yeah, I don't know. I think this last step as, as you mentioned, is the the, the upsampling stage. Which is this diffusion model that Dominic mentioned that can bring the image from 24 by 24 latent to a one megapixel.

[01:23:00] Pablo: And that's why you see this like very big difference between the previous to last and the last step.

[01:23:06] Alex Volkov: Yeah, the last step is poof, high quality. I love it.

[01:23:11] Dome: Yeah, we, we, yeah, we, we actually provided a previewer. So when we work in this very highly compressed latent space, In order to be able [01:23:20] to see what the model is doing, we have this very tiny convolutional model that can preview what's going on. That's what you're seeing, which looks pretty blurry. And then yeah, the final step does that.

[01:23:33] Dome: And yeah, why the model can make We're also pretty surprised. The, the big

[01:23:41] Alex Volkov: Text is also very impressive. I think let's not skip over this. The out of the box text. is so good. Compared to, let's say, the Stable Diffusion 1. 4, which it released was, which was bigger, right? I think it was like five gigabytes or something. This is just miles, miles, miles better. And the text out of the box, hands out of the box is very impressive.

[01:23:59] Alex Volkov: Text is super cool as well. Very surprising. Yeah, go ahead, please.

[01:24:02] Pablo: The, the, the biggest difference compared to V2, which was our previous iteration of the model was the size of the architecture of the model and the quality of the data, which I think. It shows how important that, that is, and I think probably, since, since our model is able to work on this very, very highly compressed space, it can learn much more efficiently if, if it has good data, it can learn much more efficiently these, these kind of things.

[01:24:30] Pablo: Maybe it learns them faster than other models which is why Yeah, we're able to have this kind of results.

[01:24:39] Alex Volkov: Awesome. Thank you guys for coming up. I really wanted to make sure that, yeah, you guys get the recognition because like really, really cool. This is under the stability membership, right? This is not like fully, fully open source, but folks are going to be able to use this model for, for their stuff and maybe keep training.

[01:24:55] Alex Volkov: Does it support all of the, the, the fine tuning and the LoRa ecosystem as well?

[01:24:59] Pablo: Yeah, one detail, it's not yet on the the subscription. It's still for only for research but it, it will change probably in, in the following weeks, you asked about the Loras and Control Nets. Yeah, we

[01:25:13] Pablo: we

[01:25:13] Pablo: we we made sure to provide some example code for training Loras, Control Nets, and the full, full fine tunings on, on our repository. We also provide some pre trained Control Nets for in painting, for canny edges for super resolution, which is not the best super resolution model out there, but it's, it's interesting enough to, to share with the community, and we provided Tiny Laura with Dom's dog which is, it's pretty and,

[01:25:44] Alex Volkov: Nice.

[01:25:45] Dome: yeah, and I think that's it for now, that, that's

[01:25:48] Yi Tay: all the

[01:25:49] Alex Volkov: Awesome. Thank you for joining and folks, definitely give Dom and Pablo a follow. Folks, really great shout out for building this and releasing this from Stability and it looks really good and I'm sure the community will adopt this. I've already seen a bunch of AI artists in my, in my kind of field.

[01:26:02] Alex Volkov: field are getting very excited about the possibilities here. Thank you for your work and thank you for coming for Thursday. I please feel free to stay because we're going to cover a bunch of other stuff as well, like super quick. Meanwhile, I just want to do a quick reset. It's been an hour and let's say 35 minutes since we're here.

[01:26:20] Alex Volkov: If you're just joining us, you're on the Thursday I X space, which is live recording for the Thursday I podcast and newsletter. I'm your host,

[01:26:28] Alex Volkov: Alex Volkov, I'm here joined by a co host, Nisten is here on stage, Yamil Spokin, and we have Swyx here, who dropped off the stage, but he's in the microphone, and I will move towards a corner that I have, and then

[01:26:40] This weeks Buzz

[01:26:40] Alex Volkov: I have a surprise for Swyx I'm moving towards a corner that I have usually, which is called This Week's Buzz, where I talk about the stuff that we have, or I learn in Weights Biases every week, so if you are subscribed to the newsletter, you definitely already know this, I just learn as I go and talk about this.

[01:26:55] Alex Volkov: If you're not subscribed to the newsletter, Why not? I guess you'll be up to date with everything that happens in the world of AI. So definitely check out thursdai. news. This is the URL, HTTPS, thursdai. news. And this week's buzz is all about this new course that we released with Hamil Hussain about putting models in production.

[01:27:13] Alex Volkov: I think I've spoken about this before. Weights Biases has an academy. We release courses and the courses are free for you. There's a bunch of knowledge. The last one we've talked about was the, with Jason Liu about the instructor. And we also have Hamel Hussain who released a course about model management and in production as well.

[01:27:29] Alex Volkov: And this is definitely A very illuminating one, including how to use weights and biases for the, like the best companies do, OpenAI does, and like Microsoft and Meta, and hopefully we'll get Google at some point. Definitely, of course, it's worth checking out and signing up for. This will be in the show notes as well, and I'll post the link as well here.

[01:27:47] Interview with Swyx from Latent Space

[01:27:47] Alex Volkov: And now I'm gonna Actually yeah, Swyx is now back on stage, and here's my surprise, if you guys follow and Swyx's voice, you know that he's a co host of Latentspace together with Alessio, and we're now sitting in the Latentspace pod studio, which looks incredible the surprise is, I don't remember you being on the other side of the mic, so this is like a surprise interview with Alex and Swyx, but you're gonna be a guest and not a host, and I just wanted to hear about some stuff that you guys are doing, and how Latentspace is going, like all these things.

[01:28:14] Alex Volkov: So this turns from ThursdAI into ThursdAI, like deep dive interview, just a brief

[01:28:18] Alex Volkov: one.

[01:28:19] Alex Volkov: I figured I'd use the opportunity to give you a surprise. This was not staged. Swix told me he may not be able to even join. 'cause you just flew back from

[01:28:26] Swyx: Singapore. Singapore, yeah. Yeah.

[01:28:27] Swyx: Yeah.

[01:28:28] Swyx: Cool, okay,

[01:28:29] Alex Volkov: So as,

[01:28:30] Swyx: I feel like we talk so much and you've been a guest on our pod like five times, so

[01:28:35] Alex Volkov: and

[01:28:36] Alex Volkov: I, I would wanna start with how you would introduce yourself to the audience that doesn't know you.

[01:28:41] Swyx: you so I'm Swyx, I mostly work on developer tooling, and, and, mostly known as the editor or podcaster of Latent Space, which has done pretty well.

[01:28:51] Swyx: I think we're celebrating our first year anniversary pretty soon. And on the the other half of my life is I'm working on small AI and AI Engineer Conference, which we just, which we just announced for June 25th to 27th. Yeah.

[01:29:05] Alex Volkov: Yeah. You've had quite a long career in DX as well. I think Netlify, you had a stint in

[01:29:09] Swyx: Netlify

[01:29:09] Swyx: Yeah, I was one of their earliest employees slash dev rel of Netlify. That's where a lot of people know me. That's where I became quote unquote famous in developer tooling and in React specifically. Because I did a lot of content on React and serverless speaking and writing. And then I've been head of developer experience for Temporal, Airbyte, and then also spent a year at AWS working on the same thing.

[01:29:34] Alex Volkov: Hmm. Awesome. I also from that kind of that side of your career, you work with the Chroma guys as well.

[01:29:40] Alex Volkov: And Chroma

[01:29:41] Alex Volkov: just announced that they have been a year around and looked like millions of companies that probably had

[01:29:48] Alex Volkov: something to do with that. So shout out Jeff. And and, I'm blanking out on the

[01:29:53] Swyx: name, Anton. Yeah, yeah. I so I consulted for them on their DevRel when they were doing their, their first hackathon a year ago, actually. And yeah, I

[01:30:03] Alex Volkov: think

[01:30:04] Swyx: It seems like they are the leaders in open source vector databases. Retool, we did a chat or interview with David Hsu, the founder of Retool, and Retool did a state of AI survey among their customers what they're using.

[01:30:18] Swyx: And Chroma was, like, up and to the right in terms of the adoption and the NPS score, which I think NPS is actually a very important metric to keep tracking. Yeah. Really, really cool. Glad to be involved with Chroma.

[01:30:30] Alex Volkov: Glad to be involved with Chroma. You've been also prolific in writing, like I know many people go to your blogs and like the stuff that you have, how many publications in total are you like, publishing your content in right now?

[01:30:46] Alex Volkov: You have your own personal

[01:30:47] Swyx: one, Yeah, I have three blogs. Three blogs. But Latentspace is the currently primary active blog. I have a personal one and then I have a developer tools advising one because I do a bunch of angel investing and advising for people.

[01:31:01] Swyx: And I don't know. I think More people should blog! It helps you think through what you think that and share your knowledge with other people.

[01:31:10] Swyx: And also, actually the most valuable thing is the most embarrassing thing, which is when you get things wrong. People will come out and correct you, and you will be embarrassed for a second, but then you'll remember the lesson forever.

[01:31:21] Alex Volkov: Can you give me an example of something that you went wrong and people corrected you, and then this improved your thinking?

[01:31:28] Swyx: improved thinking?

[01:31:31] Swyx: Yesterday or into coming into today, right? Because I do a monthly recap where I think what ThursdAI does is [01:31:40] recap news every week and then other people like NLW from the breakdown recaps news every day. And I think the lower frequency granularity of a month means that I only get to do 12 of these a

[01:31:53] Alex Volkov: year.

[01:31:54] Swyx: And that. forces me to think through okay, what is really actually important when you step back and think about it. And for my January recap, January was a slow month, to be honest. Today was more news than January. So I was like, I was trying to recap January, and I was like, okay nothing super interesting this month.

[01:32:11] Swyx: What Do we, if we step back, it's important for AI progress. And I listed a bunch of things, long inference and all that. One thing I specifically said was not interesting for state of the art models was long context.

[01:32:26] Alex Volkov: was, long context. It

[01:32:28] Swyx: I said that yesterday. It's published, I sent it out to 35, 000 people, including Satya Nadella, Drew Houston, and all the people who read the newsletter.

[01:32:36] Alex Volkov: Satya doesn't read, he also participates, like he clicks on

[01:32:39] Swyx: links,

[01:32:39] Swyx: Yeah.

[01:32:40] Alex Volkov: there's an engagement, active engagement from Satya from Lydian Space.

[01:32:43] Swyx: so it's, so it's embarrassing, but also it just forces me to think about okay, how much do I really believe in million token and ten million token context? And I know now, today I learned that Nat Friedman strongly disagrees.

[01:32:58] Swyx: And that's good. That's, that's useful to update. And Google, of course. Yeah, yeah. I think It's, it's a, basically, so it's not about that specific point because we can always debate the pros and cons of that, but the act of writing down what you believe and taking strong opinions instead of saying that everything is awesome, instead of celebrating every little bit of progress as equally important, you have to rank them, and being wrong in your rankings gives you information to update your rankings, and if you don't give yourself the chance to be wrong, then you don't really learn.

[01:33:36] Alex Volkov: You

[01:33:37] Alex Volkov: publish a bunch of stuff. Some of the stuff that you publish turns into more than just an article. You have essays, and I think that the one essay that I remember specifically, obviously, is about the AI engineer essay. Talk to me about thinking about how you approach writing this. Is that stuff that you saw?

[01:33:51] Alex Volkov: And I think as background for folks who are not familiar with you and where you are in, in, you're sitting in the middle of the arena that you helped also coin in San Francisco, right? We're in the middle of Soma Mission, Hayes Valley, somewhere there, if I'm not confusing. We're in this space it's called Newton that you're also like I think you're plugging in latent space where Tons of companies that we know from the Twittersphere are just literally behind us here.

[01:34:15] Alex Volkov: There's Tab with Avi and Julius with Rahul like like a bunch of other companies like sitting right here building like very cool things and And this is an example of one of those so actually I think it was very natural to put those kind of hubs within the bigger bubble of San Francisco. And you, as far as I'm concerned, it was very plugged in to this even before coming to AiEngineer, right?

[01:34:34] Alex Volkov: And potentially, this is the reason why the engineer the conference had so many amazing speakers on stage because very I think you told me back then a lot of like personal favors were pulled to get some folks to show up on that on that. And As somebody who's an outsider from Denver, what I said, right?

[01:34:48] Alex Volkov: This is, this is incredible to see, but also it's very hard to penetrate and understand like what's going on and where the trends are. And this is part of the reason for ThursdAI. So you're sitting in the middle of this, you have all these connections, you said you're an angel investor as well. How does this shape your thinking about the AI engineer?

[01:35:02] Alex Volkov: Do these old people talk in like the hackathons? How do you draw to create something like this that's fairly seminal that now people are considering themselves AI

[01:35:11] Swyx: engine. Okay. Oh. Okay. So there's, there's two questions here.

[01:35:15] Swyx: If I can do rag on your questions. Yeah, please. Which is that one, how do you write impactful perspectives or come up with interesting ideas that will stick around? And two, how do you make sense of San Francisco? Especially as an outsider. And people, I think people can hear in my voice that I'm not American.

[01:35:34] Swyx: I'm Singaporean. And the last seven years of my developer career, I did not spend in San Francisco. I only moved here in April of last year. You don't have to be an SF to have a background in tech. Oh, I think the other the other thing I should offer as context is that I, I have been blogging for quite a bit.

[01:35:57] Swyx: I often say that you have to blog 50 times a year, but in order to get like one post a year that it, that makes up the entire year, it's the one that people know you for. So this is my sort of fourth or fifth Quote, unquote, industry defining blog posts. So I, I've done this for serverless, runtimes and cloud orchestration and AWS, so I've done this before and I knew the work that goes into writing something like this. Rise of the AI Engineer took two months. I had a few potential collaborators

[01:36:35] Swyx: who ultimately did not co author but were heavily involved.

[01:36:43] Swyx: And I can talk about the writing of the post, but the main inspiration is trying to figure out what is important directions.

[01:36:48] Swyx: And it is not purely about coining a term, which I think is a very vanity metric, but it is about picking directions in terms of identifying what is wrong about the zeitgeist. At if you rewind this time one year ago, people were very much focusing on prompt engineering. People were worried about the end of jobs for AI, for, for engineers, for software engineers.

[01:37:13] Swyx: And I think both have been proven wrong in terms of the scope of the prompt engineer. Now, like now you're no longer really here about. Professional prompt engineers, because it's been replaced by the AI engineer who can code. And I think the importance of the ability to code to wield AI makes you a thousand times more effective than people who use AI without the ability to code.

[01:37:37] Swyx: And I think identifying this core difference in ability, understanding that this stack is starting pretty thin and small, but it's going to grow over time, understanding that it is fundamentally very different from the ML engineer stack is a part of the mix that made me convinced that AI engineer would be a category to invest in which is why I started the conference and then pivoted the newsletter and podcast.

[01:38:04] Alex Volkov: Yeah, so let's talk about that as well. So definitely the audience that ThursdAI draws, at least in part, is AI engineers, but also in part, like folks who are trained in Finetune models. And I've noticed like a little bit of a AI engineering is almost like the gateway drug into the larger AI stuff, because at least the folks that I'm familiar with, the folks who are like JSTS devs, that did the Netlify stint, that did React, etc.,

[01:38:27] Alex Volkov: they started to build with these tools. The tools are like significantly easier to get into than ML, than traditional ML. You just do some API calls open AI exposes a bunch of stuff, and suddenly you're like, oh, okay. I have, I've tapped all this power, this incredible power. I'm building intuitions about how to use this power.

[01:38:42] Alex Volkov: I'm building intuitions, how to put this power in production for my users. They tell me some feedback. How do I do more of this? Am I only limited to open ai? Or maybe I can go to the open source. Try some stuff like this. Maybe I can do Olama, which, by the way, shout out to Olama, our friends, just released the Windows thing.

[01:38:56] Alex Volkov: Maybe I can do this like locally on device. Maybe you can do this on Edge, on Cloudflare, for example. All these new tools are popping up, and these people are sounding like from a very limited scope of API users, are growing into API users who also have an intuition about prompting is just one of those things, embedding in RAG and better RAG systems, like we've seen some folks going there.

[01:39:14] Alex Volkov: Definitely the scope grows, and as every category, like frontend was a very tiny scope, JavaScript, HTML, and the client, and suddenly like it became a full stack, you have prompt and like frontend, ops, and like all of these like things. So scope grows.

[01:39:30] Alex Volkov: Where do people learn about this new and upcoming thing?

[01:39:32] Alex Volkov: And I think like the conference is one such way. So we've talked about the conference. This is actually not your first time. I just remembering I interviewed you after the conference for a full hour that we had a full conversation. It wasn't about Swyx. So how was the conference after the conference received?

[01:39:46] Alex Volkov: How did your direction into thinking about latent space and kind of exposing AI in San Francisco to the world? And let's take this to the kind of the next conference where you want to take us. What happened to the AI engineer?

[01:39:59] Alex Volkov: I think I asked

[01:39:59] Swyx: three

[01:39:59] Swyx: or

[01:39:59] Swyx: four. [01:40:00] Yeah, I know.

[01:40:00] Alex Volkov: Break them down however you want.

[01:40:02] Swyx: So the conference was really good, but I would actually classify that as the end of a process rather than the start of a process. It basically recaps

[01:40:10] Swyx: the work

[01:40:11] Swyx: that people are doing in the industry over the past year.

[01:40:14] Swyx: And then, I get to curate and pick and invite people to present, the best of their work and their thought. And I think that's a very privileged position. And then for me, The work begins after the conference for the next the next thing. And I picking directions and having so last year was like a single track conference, this year for World's Fair we're doing nine

[01:40:36] Alex Volkov: When is that, just for the

[01:40:38] Swyx: June 25th to 27th. Yeah.

[01:40:40] Alex Volkov: make sure you sign up.

[01:40:41] Alex Volkov: It's gonna

[01:40:42] Swyx: yeah, yeah. We're going four times bigger this year, 2, 000 people, and last year, 17, 000 people tuned in on the livestream, and hopefully we'll have, we'll have more impact this year. But yeah I think For me, actually, it's a really good way to think about okay, who do people want to hear from, who actually did impactful work that I will be proud to showcase 10 years from now.

[01:41:04] Swyx: I'm always thinking about the test of time. And I was very inspired by NeurIPS, where they actually had a test of time award. And I was like,

[01:41:10] Alex Volkov: man, that's Did Jeremy Howard get it or something, if I remember

[01:41:13] Alex Volkov: correctly?

[01:41:13] Alex Volkov: No, Jeff Dean. Jeff Dean.

[01:41:14] Swyx: Jeff Dean. Yeah.

[01:41:16] Alex Volkov: Shoutout Jeff Dean for today, by the way.

[01:41:17] Swyx: Yeah, yeah, for Word2Vec. I, I always said some people are speculating what is Test of Time for next year, and it was like Ilyas Oskarver, if he ever shows his face

[01:41:25] Swyx: again.

[01:41:26] Swyx: And then I was like, but I know what's gonna win the Test of Time for 2027. Which is attention is all you need.

[01:41:32] Swyx: Yeah, yeah. But basically it's a flex for any, any conference to say okay, Test of Time award goes to something that was presented here 10 years ago. And that and Neuros has been going on for 37 years.

[01:41:46] Alex Volkov: what of the AI engineer presentations would stand the test of

[01:41:50] Swyx: question. I think the audience has voted. It looks like Pydantic and Jason Liu's Instructure is very, very, very, very popular. And I think he's just fundamentally correct that every model, instead of there's like some table six versions of every model. You have the base model when you train it, then you have the chat tune model.

[01:42:07] Swyx: And now I think it's going to be table stakes that every model should have structured output or function calling as, as they call it. And it's even useful if you're not actually using it to, to generate code or call code because it's very good for chain of thought. And so Max Wolf mini maxer on Twitter and on Hacker News actually wrote a really influential post that I'm going to try to showcase.

[01:42:27] Swyx: Yeah, for me as a conference curator that's what I do. Read a lot of stuff and then I try to try to feature like the best of things and also try to make bets that are important. I do think as content creators, like we're like the end of the food chain and not the value chain.

[01:42:45] Swyx: And it's always important to understand like even stuff that we don't pick is very important and substantial and it's

[01:42:53] Swyx: You're, you're picking for an audience to use at work, which is a small subset of the total progress that humanity can make.

[01:43:01] Alex Volkov: Interesting, interesting. Tell

[01:43:02] Alex Volkov: me

[01:43:03] Swyx: I just people, you want to engage in philosophical conversation, you go to Lex Friedman or Dorkesh Patel.

[01:43:11] Swyx: And then if you want to Think, talk about things that you can use in open source. You go to Thursday, ai. And then we have less of an open source focus. We are, we're very much focused on enterprise and things you, things you can use at work to code and to build products and startups with.

[01:43:26] Swyx: And so like I, whatever you do, as, as long as you have a clear focus for the, of the audience that you serve and you know how to reach them, then they will love you because you are, you're making literally the thing for them. And you don't have to appeal to everyone. And I think that's fine.

[01:43:40] Alex Volkov: switching gears from the kind of the conference.

[01:43:43] Alex Volkov: How did the podcast came about? It's you said you're coming up on the year

[01:43:46] Alex Volkov: of

[01:43:46] Alex Volkov: the

[01:43:46] Alex Volkov: podcast. And you also said you moved here in April. I did not know this.

[01:43:49] Alex Volkov: I

[01:43:49] Alex Volkov: thought you're here for SF Native. So how did the podcast came about? How you and Alessia met? Let's talk about

[01:43:54] Swyx: later. Yeah. And we should talk about doing well in San Francisco and like the taxi in, in Ingra, I think, which I, which I think is important and something I'm.

[01:44:01] Swyx: going through but have also done well at. So the podcast specifically was because I started the newsletter writing opinion pieces on just AI stuff. It was actually inspired by Stable Diffusion at the time which was sort of August 2022 ish.

[01:44:16] Alex Volkov: My life changed after that open sourcing.

[01:44:19] Swyx: Yeah, and then you you really run out of opinions very

[01:44:22] Alex Volkov: and

[01:44:24] Swyx: and then you're like, oh, I need to generate unique or new tokens.

[01:44:29] Swyx: The only way to do that is to get source material by interviewing people and putting a microphone in front of them. When you put microphones in front of people, they get more chatty. And sometimes they break news. For us, the big breakthrough was George Hotz when he talked about GPT 4 with being a mixture of experts.

[01:44:44] Swyx: Yeah, that was, that was a surprise, but he likes to do that sort of thing, just drop random alpha.

[01:44:49] Alex Volkov: he dropped it and then you guys posted it and then I had no idea what Mixture of Experts is as well as like most of us and then it turns out to be like a true and now we we

[01:44:59] Swyx: saw it. Now Gemini is

[01:44:59] Alex Volkov: Gemini's Mixture of Experts the 1.

[01:45:01] Alex Volkov: 5 which is quite incredible so that was like a big thing did was this like natural to you to start turning on the microphone did you have to do an

[01:45:08] Alex Volkov: adjustment period

[01:45:09] Swyx: another thing that people don't know is that I started four podcasts before.

[01:45:13] Swyx: So I'm not new to the conversation game, and I'm not new to like audacity and like editing and publishing, but I think, Having taken a few runs at it helps to prep you for, like, when something actually has audience fit.

[01:45:26] Swyx: Because all the others were very small. There were maybe like a few hundred listeners each time. This one went to number 10 on the U. S. tech charts.

[01:45:33] Alex Volkov: Yes, I saw that. That was incredible. Is that the top, top,

[01:45:36] Swyx: I think that's the highest it's been. Recently when it was like as high as 16 over the holidays, and then now it's dropped back down again. It's very, very volatile.

[01:45:44] Alex Volkov: But it's like very clear that you're in the top 50 like tech podcasts in the world, even though AI is Fairly niche. And the topics you discuss are fairly technical.

[01:45:52] Alex Volkov: Like when you talk with folks, it's not a general appeal audience for like Sweden does, or the, the guys from the four guys, the VCs, right? It's very technical. So very impressive that like you broke the top 50 charts and it wasn't by chance you bring like great guests. Like, how do you, is the same approach that you have for the engineer you do for guests as well?

[01:46:13] Alex Volkov: Or are you now getting like requests to come on the podcast from some other

[01:46:15] Swyx: We get requests but you usually, for the, the people that draw the audiences, you have to go reach out to them. It's obviously, that's how it is. I

[01:46:24] Alex Volkov: I heard one such person now does not work in OpenAI, so he can

[01:46:28] Alex Volkov: potentially, potentially join

[01:46:29] Alex Volkov: podcasts as

[01:46:30] Swyx: yeah, he's a, he's a he's a listener and he has said that he'll come on at some point.

[01:46:35] Alex Volkov: We're talking about bad Mephisto for folks in the

[01:46:37] Swyx: Mephisto for Fortunyaga. So yeah,

[01:46:41] Swyx: I don't think it's actually just guests. I think it's also about focus on topics and then being engaged enough with the material that you get to ask questions that no one else asks.

[01:46:51] Swyx: Because, for example, if you have a VC asking questions, they often ask about market and business. But if you're an engineer, you're really asking about API and limitations and trade offs, stuff like that. Things that you don't really get into unless you're, like, actually evaluating it to use something at work.

[01:47:09] Swyx: And I think that's important. And also, I think a lot of guests For us, we try to be like the first podcast that somebody has done. Like we're the first podcast for for Fine, for Cursor, for a bunch of these guys. So they're not experienced speakers. They're not some of them are good speakers.

[01:47:25] Swyx: But they're not experienced at the whole telling their story and all that. So you have to help them. But it doesn't matter because I think that you just try to serve your audience at the end of the day, right? What do people want to know? Ask those questions and then get out of their way and let them talk.

[01:47:38] Swyx: I think that the other thing that we do, the reason I say it's not just GUESS. is because we do special episodes where we have breaking news. We haven't done one in a while because I don't know. I think, I think you got, you have taken that spot of, of the breaking news guy. We

[01:47:50] Alex Volkov: got

[01:47:51] Alex Volkov: the, we got three breaking news, you were here. This is kind of like, that as

[01:47:54] Swyx: that as well. And then we also do like events recaps. Like we did Dev Day we did NeurIPS and that is like a really big sort of editing process work that I really like to do where you're basically performing the work of summarization and Curation, instead of doing long form interviews, and people really like that.

[01:48:13] Alex Volkov: summarization part, like the multiple folks, I think I participated in one, you did one in DevDay NeurIPS as well. So what's, what's [01:48:20] now that we're coming up on an annual kind of thing for, for Latentspace, what's next for Latentspace?

[01:48:24] Swyx: More conversations? That's the weird thing we think that we've done and have done as well as a technical podcast can do in the general podcasting space.

[01:48:36] Swyx: The ultimate number of people who listen to podcasts is still very low. compared to the general audience that might be interested in the same kind of content. That's why I branch out into a conference where you produce talks and very highly polished and all that. We The way to grow a podcast is to not just podcast it's to actually write, where, my essays still get a lot more readers than listeners than to grow on YouTube or whatever, and that's fine.

[01:49:05] Swyx: I think ultimately, podcasting is a mix of entertainment and Education, right? You have to be attached to some kind of story, some kind of personality, and, and then learn something along the way that might be useful at work. So I think personally, I growing as a podcaster is about just growing your influence or understanding of an industry in general and the ability to serve an audience.

[01:49:29] Swyx: And then maybe opening up as hosts and as industry experts as we gain knowledge and understanding. So that people come to us not just for access to guests, but access to us as well, which people have when we did the end of year listener survey people actually requested for us to have more mic time.

[01:49:47] Swyx: Alessio and I did our first just the two of us conversation in a year and that was really good.

[01:49:52] Alex Volkov: Wow. So are you playing more, more of those?

[01:49:54] Swyx: Yeah, yeah, we, so we used to do these one on one episodes where we do Introductions to a topic, like we did Datasets 101, Benchmarks 101, and we did Transformer Math 101, and then we also did RLHF 201.

[01:50:07] Swyx: And so we want to do more of those, where it's like it's like inspired by Acquired FM. And the work for this kind of episode is so different than a normal chat, because in normal chat you just sit down and you, you, maybe you prep a bit, a bit of question, you, you research the other guy's background, and then you just have a nice conversation, and that's it.

[01:50:23] Swyx: Whereas for a content heavy episode like that one, you do

[01:50:27] Swyx: a

[01:50:27] Swyx: week of research. And you compile a whole bunch of stuff, and you simmer it in your mind, and then you try to rehash it and introduce it for an audience who hasn't done that amount of work. Yeah, that, that is a lot more work up front, but obviously it's very high value, and, and also I, I like to call it evergreen.

[01:50:43] Swyx: Evergreen content, meaning, like You want to build up something that will still be useful and relevant in a year.

[01:50:48] Alex Volkov: Yeah. So definitely let me, let me just take a personal position here with Latentspace.

[01:50:53] Alex Volkov: I've been a guest host, in Latentspace a couple of times, in special episodes as well. I, now this, this studio is like super cool, like a home away from home. They're able to come here to the spaces, Alessio on Tap into the AI scene in San Francisco. And I've learned a bunch from just the way you render.

[01:51:11] Alex Volkov: Latentspace, for folks who are listening, is not only just a podcast. If you're subscribing on just your Spotify or Apple News, you're missing a big part of it, which is the newsletter that you send, which has a bunch of links and show notes and folks that you talk

[01:51:23] Swyx: about.

[01:51:23] Swyx: There's one more part. Discord.

[01:51:26] Alex Volkov: Oh, there's also Discord.

[01:51:27] Alex Volkov: You do paper readings as well, right? There's a whole community that you're building.

[01:51:30] Swyx: community The Discord is surprisingly good. For the zero effort that I put into it, people just show up, and then they ask really very good questions, they drop things that I don't know, and then I learn from the Discord, and then I talk about it later. But, yeah, Discord has a lot of alpha.

[01:51:47] Swyx: And it's surprising because I have this newsletter that, I have this bot, That summarizes all the top AI discords, right? Obviously the top ones are, like, Eleuther, TheBloke what else?

[01:51:55] Swyx: Yeah, mid, mid, yeah, but it's not, that's not very technical. That's mostly just prompting.

[01:52:00] Swyx: Midrani is 8 million members. That's something like 13 percent of total Discord membership. Ha ha ha ha ha. That's freaking crazy. But anyway, so like the Discord is the community attachment to the podcast and the newsletter. And then it's, people interacting with each other, some people getting jobs, some people getting investments, I have founders coming in and VCs there also funding them.

[01:52:22] Swyx: And like I, I really think that every every piece of content is a minimum viable community, right? People gather, they're chatting in the Twitter space comments right now. They're chatting in your newsletter comment section. But if you let people gather together live, whether it's online or in person we also have in person meetups.

[01:52:40] Swyx: I just had one in Singapore. We have one in San Francisco, I think, monthly.

[01:52:45] Swyx: I hope to have it monthly. And then obviously once a year you get people together for a really big conference where like they put out their best work. So I call this community annealing, right? You have cold community, like podcasts are cold.

[01:52:58] Swyx: Newsletters are cold because they're asynchronous. There's not somebody there, you don't expect to respond to the other person. Twitter spaces are warm because they're live and, there's some chance of live feedback. Discords are live, but when you, when you, when they're hot, it's when like everyone is on the same call and you're looking in each other's eyes.

[01:53:16] Swyx: And you're conversing and you're, you're having like a real bond and relationship there. And so like communities need this whole range of like warm and hot and cold. And I try to build that for Dane Space.

[01:53:28] Alex Volkov: So for folks who are just listening on podcasts, you're missing several parts of the space. Newsletter is definitely worth checking out. Latent. space is actually a URL.

[01:53:38] Swyx: And that was donated by a reader. Not donated. Sold to us for cheap.

[01:53:42] Alex Volkov: You can consider this a donation but also the Discord part speaking of work that I think we need to wrap up because like we're after two hours and I want to let you go back to work. I also need to edit this and send this. I also want to check out the stuff that we did. Any last kind of parting things here?

[01:53:56] Alex Volkov: Maybe let's touch briefly or is that a bigger conversation? How to succeed in SF or is that for a later

[01:54:02] Swyx: Oh yeah, yeah, yeah. Oh man. This is such an interesting topic, especially for people who are not in sf, right?

[01:54:06] Swyx: Yeah. I think SF is a group of humans and not a place, and they are mostly available on Twitter. Yeah. But then sometimes they, they often gather in San Francisco and Yes, when you meet them in person. There are some people that are not famous online or not fully consistently candid online that you talk to them in person and you're like, Oh, okay, I fully understand you now and everything that you've done and everything that you're going to do, I understand where you're coming

[01:54:33] Swyx: from.

[01:54:34] Swyx: And to me, that is obviously a very high offer, that's why I moved here. But you don't have to go there directly, right? One of my mentors And the last one that I want to talk about is in career is Andrew Chen, who basically blogs his way into being a general partner at Andrews and Horowitz.

[01:54:49] Swyx: Like he runs one of their top three funds, the consumer fund. And he consistently is Hey, just Put out your best work, learn in public, tweet a lot and instead of going to all these parties, there's always, there's always a party every week in San Francisco

[01:55:03] Alex Volkov: Every day, multiple stacks a day sometimes, yeah.

[01:55:06] Swyx: There was one Thursday last year with 10 AI meetups in San Francisco.

[01:55:09] Alex Volkov: So

[01:55:10] Swyx: can go through the motions of networking, but you still end up with a smaller network than you would if you stayed at home. And you just wrote a lot, or you thought a lot, or you did quality work. And so then you don't have to be in San Francisco to do that. You can just, you can keep doing that online.

[01:55:27] Swyx: And then, take advantage of a big conference or something to come into San Francisco and actually meet people in person. And that's totally fine. I don't intend to stay in San Francisco forever, right? I have, once I know enough people, I can just come here like once a quarter and people will still think that I'm in San Francisco.

[01:55:41] Swyx: And that's fine.

[01:55:41] Alex Volkov: I get this question quite a lot. I've been here, maybe this is the fourth or fifth time for the past six months, and I get this question, do you live here? I was

[01:55:48] Swyx: Yeah. I think, I think people are just like borders. I, I'm, I'm a border disrespector and I think I hope more people do that. But do come into San Francisco every now and then maybe for a big conference that's happening June 25th to 27th.

[01:56:02] Swyx: But otherwise do great work online and people will notice it and find you and chat with you. And the in person component doesn't matter so much as plugging into the mentality and the community online.

[01:56:12] Alex Volkov: Yeah. SWIX, it's been a surprising interview. I didn't plan on this.

[01:56:15] Alex Volkov: I just thought we're here. I haven't heard you in a while. The anniversary of latency is coming up a huge kudos for this effort. Like huge kudos. Big, big, big, big. Thank you for me because a lot of what the stuff that you did, you and Alessio pulled me through. I, I still get like a bunch of listeners for Thursday.

[01:56:30] Alex Volkov: I, from the Latan space work on Substack. And so a huge thanks for me because you, you kinda shaped. what I'm doing as well. The newsletter and the podcast combo that I forced myself to doing every [01:56:40] week. This was, this was based on the Substack stuff from you as well. And I really appreciate your, your friendship as well.

[01:56:45] Alex Volkov: So thank you for coming up on Thursday. I thank you for hosting us in Latentspace. And with that, I think I'll move on to the last piece of what we have on Thursday, iFolks, which is a recap of everything we've talked about. And then I'll just briefly run through recap and I'll let you go to your day. We haven't, let me just start with the music, obviously, because like, how else would this work?

[01:57:02] Alex Volkov: However, with that, I just want to wish you a great Thursday. Thank you for joining us from week to week. I want to thank the co hosts that I had on stage. Thank you, Nisten. Thank you, Jan. Thank you, LDJ. Far El was here. Alignment was here. Thank you. A huge thank you for Swyx, Alessio, and the Latentspace folks for hosting me here.

[01:57:19] Alex Volkov: A shout out to a bunch of friends in Silicon Valley who I'm gonna meet. And with that, we'll see you next week. I'm gonna go and try to somehow summarize this all in the newsletter and podcast for you. And we'll see you folks next week. From San Francisco. This has been Alex Volkov. Cheers, everyone.

[01:57:34] Alex Volkov: Not this one. Bye bye.



This is a public episode. If you’d like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe
Fri, 16 Feb 2024 02:09:28 GMT
📅 ThursdAI - Feb 8 - Google Gemini Ultra is here, Qwen 1.5 with Junyang and deep dive into ColBERT, RAGatouille and DSPy with Connor Shorten and Benjamin Clavie

Hihi, this is Alex, from Weights & Biases, coming to you live, from Yosemite! Well, actually I’m writing these words from a fake virtual yosemite that appears above my kitchen counter as I’m not a Vision Pro user and I will force myself to work inside this thing and tell you if it’s worth it. I will also be on the lookout on anything AI related in this new spatial computing paradigm, like THIS for example!

But back to rfeality for a second, we had quite the show today! We had the awesome time to have Junyang Justin Lin, a dev lead in Alibaba, join us and talk about Qwen 1.5 and QwenVL and then we had a deep dive into quite a few Acronyms I’ve been seeing on my timeline lately, namely DSPy, ColBERT and (the funniest one) RAGatouille and we had a chat with Connor from Weaviate and Benjamin the author of RAGatouille about what it all means! Really really cool show today, hope you don’t only read the newsletter but listen on Spotify, Apple or right here on Substack.

TL;DR of all topics covered:

* Open Source LLMs

* Alibaba releases a BUNCH of new QWEN 1.5 models including a tiny .5B one (X announcement)

* Abacus fine-tunes Smaug, top of HF leaderboard based Qwen 72B (X)

* LMsys adds more open source models, sponsored by Together (X)

* Jina Embeddings fine tune for code

* Big CO LLMs + APIs

* Google rebranding Bard to Gemini and launching Gemini Ultra (Gemini)

* OpenAI adds image metadata (Announcement)

* OpenAI keys are now restricted per key (Announcement)

* Vision & Video

* Bria - RMBG 1.4 - Open Source BG removal that runs in your browser (X, DEMO)

* Voice & Audio

* Meta voice, a new apache2 licensed TTS - (Announcement)

* AI Art & Diffusion & 3D

* Microsoft added DALL-E editing with "designer" (X thread)

* Stability AI releases update to SVD - video 1.1 launches with a webUI, much nicer videos

* Deep Dive with Benjamin Clavie and Connor Shorten show notes:

* Benjamin's announcement of RAGatouille (X)

* Connor chat with Omar Khattab (author of DSPy and ColBERT) - Weaviate Podcast

* Very helpful intro to ColBert + RAGatouille - Notion

Open Source LLMs

Alibaba releases Qwen 1.5 - ranges from .5 to 72B (DEMO)

With 6 sizes, including 2 new novel ones, from as little as .5B parameter models to an interesting 4B, to all the way to a whopping 72B, Alibaba open sources additional QWEN checkpoints. We've had the honor to have friend of the pod Junyang Justin Lin again, and he talked to us about how these sizes were selected, that even thought this model beats Mistral Medium on some benchmarks, it remains to be seen how well this performs on human evaluations, and shared a bunch of details about open sourcing this.

The models were released with all the latest and greatest quantizations, significantly improved context length (32K) and support for both Ollama and Lm Studio (which I helped make happen and am very happy for the way ThursdAI community is growing and connecting!)

We also had a chat about QwenVL Plus and QwebVL Max, their API only examples for the best open source vision enabled models and had the awesome Piotr Skalski from Roborflow on stage to chat with Junyang about those models!

To me a success of ThursdAI, is when the authors of things we talk about are coming to the show, and this is Junyang second appearance, which he joined at midnight at the start of the chinese new year, so greately appreciated and def. give him a listen!

Abacus Smaug climbs to top of the hugging face leaderboard

Junyang also mentioned that Smaug is now at the top of the leaderboards, coming from Abacus, this is a finetune of the previous Qwen-72B, not even this new one. First model to achieve an average score of 80, this is an impressive appearance from Abacus, though they haven't released any new data, they said they are planning to!

They also said that they are planning to finetune Miqu, which we covered last time, the leak from Mistral that was acknowledged by Arthur Mensch the CEO of Mistral.

The techniques that Abacus used to finetune Smaug will be released an upcoming paper!

Big CO LLMs + APIs

Welcome Gemini Ultra (bye bye Bard)

Bard is no longer, get ready to meet Gemini. it's really funny because we keep getting cofusing naming from huge companies like Google and Microsoft. Just a week ago, Bard with Gemini Pro shot up to the LMSYS charts, after regular gemini pro API were not as close. and now we are suppose to forget that Bard even existed? 🤔

Anyhow, here we are, big G answer to GPT4, exactly 10 months 3 weeks 4 days 8 hours, but who's counting?

So what do we actually get? a $20/m advanced tier for Gemini Advanced (which will have Ultra 1.0) the naming confusion continues. We get a longer context (how much?) + IOS and android apps (though I couldn't find it in IOS, maybe it wasn't yet rolled out)

Gemini now also replaces google assistant for those with androids who opt in (MKBHD was somewhat impressed but not super impressed) but google is leaning into their advantage including home support!

* Looks like Gemini is ONLY optimized for English as well

We had quite the conversation on stage from folks who upgraded and started using, including noticing that Gemini is a better role player, and less bland, but also that they don't yet support uploading documents besides images, and that the context window is very limited, some said 8K and some 32K but definitely on the lower side.

Also from Google : a llama.cpp wrapper called localllm (Blog)

OpenAI watermarks DALL-E images and adds per key API limits (finally) (Blog)

OpenAI's using something calledC2PA for pictures made by DALL-E 3, whether you're chatting with ChatGPT or using their API. It's a way to show that DALL-E 3 actually created those images. But it's just for images right now, not for text or voice stuff. Adding this info can make the files up to 32% bigger, but it doesn't mess with the quality. The tags tell you if the source was DALL-E 3, ChatGPT, or the API by including special signatures and stuff. Just a heads up, though, this C2PA thing isn't perfect. The metadata could get wiped either on purpose or by mistake.

They also released an update to the developer experience that allows you to track usage but also restrict usage per API key! Very very needed and helpful!

This weeks Buzz (What I learned with WandB this week)

First part of the live series with the Growth ML team was live and AWESOME!

Vision

BRIA - Open-Source background removal (non commercial)

BRIA AI@bria_ai_Feb 6, 2024

📷 Introducing Open-Source Background Removal by @BriaAI 📷 Now live on @huggingface, RMBG v1.4 excels in separating foreground from background across diverse categories, surpassing current open models. See demo [https://t.co/DDwncjkYqi] #BriaAI #OpenSource #AI @briaai https://t.co/BlhjMMNWxa

Voice

MetaVoice (hub)

1.2B parameter model.Trained on 100K hours of data.Supports zero-shot voice cloning.Short & long-form synthesis.Emotional speech.Best part: Apache 2.0 licensed. 🔥

Powered by a simple yet robust architecture: > Encodec (Multi-Band Diffusion) and GPT + Encoder Transformer LM. > DeepFilterNet to clear up MBD artefacts.

That's it for us this week, this time I bring you both the news segment AND the deepdive in one conversation, hope it's not super long, see you here next ThursdAI! 👏

Full Transcript:

[00:00:00] Intro and housekeeping

[00:00:00] ​

[00:00:00] Alex Volkov: You're on ThursdAI, and I think it's time for us to get started with the recording and the introduction.

[00:00:26] Alex Volkov: Happy, happy Thursday everyone! Today is February 8th, 2024. I don't know, This is the second calendar year the Thursday is happening in, so I don't know if I need to mention the year or not but we're well on our way into 2024 and you're here on Thursday, I, the Thursday I is the space, the newsletter, and the podcast to keep you up to date with all of the very interesting things that are happening in the very fast moving world of ai.

[00:00:58] Alex Volkov: Hopefully by now, all of you already have ThursdAI in your podcast, wherever you get a podcast, Spotify, recently YouTube as well, which is weird. But with this introduction, I will just say, hello myself, basically. Hey everyone. My name is Alex Volkov. I'm an AI evangelist with Weights & Biases.

[00:01:15] Alex Volkov: Weights & Biases is the reason why this comes to life to you. And there's going to be a little segment about Weights & Biases in the middle here as well, and I'm joined on stage. Often, and pretty much every week by great friends, experts in their fields. As we talk about everything AI related this week, especially we're going to have some interesting things.

[00:01:34] Alex Volkov: Those of you who come back week after week. Thank you, and we love that you're part of the community, and it's great to see how many people just return, and those of you who are new, we're here every week and The community doesn't stop after we finish the space. There's a bunch of spaces. I think our friend AlignmentLab had the space that went on for the full week, I think.

[00:01:55] Alex Volkov: I don't know if he ever slept. That's maybe why he's not here on stage. But we're here every week for the two hours to give you updates for the first hour and definitely some very interesting deep dives that has been happening, that have been happening for the past few Weeks, I want to say, so I just want to shout out some friends of ours that recently we were featured in the deep dives.

[00:02:16] Alex Volkov: We've talked with Maxime Lubon, who trained the Beagle series and then also gave a deep dive with us about model merging. That was really fun. And on the last deep dive, we talked with the Lilac folks and they're building an open source tool. That lets you peer into huge data sets, like imagine millions of rows, data sets, and they chunk and cluster this. And we've talked about the importance of data sets in creation of LLMs or large language models.

[00:02:46] Alex Volkov: And they've taken the huge data sets of the folks to usually come up on ThursdAI. Technium from Nous Research just released their Hermes dataset, for example. And the folks in Lilac talked to us about how that would be visualized and how you can see which parts of it is comprised of.

[00:03:03] Alex Volkov: It's quite an interesting conversation about how to approach the training and fine tuning area. And we haven't often talked about dataset curation and creation, so that conversation was a very nice one. So we have deep dives. I will say that last weekend, I also interviewed, and that's probably going to come up as a separate episode.

[00:03:24] Alex Volkov: I interviewed Sasha Zhadan from Moscow, and this was a first for me. And I just want to like, highlight where this weird thing takes me, because that's not ThursdAI, and that's not about the news. That was just literally about AI stuff. So this guy from Moscow, and this will be dropping on ThursdAI podcast soon.

[00:03:42] Alex Volkov: This guy from Moscow built a bot that auto swipes for him on Tinder. And that bot started using gpt instruct, and then moved to gpt chat, gpt etc, and then moved to gpt 4. And he talks about how this bot kept improving with the improvement of AI. And then he autoswiped a wife, basically. And then this was, this took over the Russian ex.

[00:04:08] Alex Volkov: I don't know if you guys are on the Russian side of ex, but I definitely noticed that everybody, that's all they could talk about. This guy Previously also did some shenanigans with OpenAI stuff. And so it was a very interesting conversation, unlike anything that I did previously on ThursdAI.

[00:04:21] Alex Volkov: And definitely that's coming more as a human interest story than anything else. But it's very interesting. And also his fiance also joined and we talked about the morality of this as well. And it was really fun. So if that kind of new type of content also interests you definitely check out.

[00:04:37] Alex Volkov: That's probably not going to end up on X.

[00:04:40] Alex Volkov: And I think with this, it's time to get started. , The usual way we get started here is I just run through everything that we have. Just so you know what we're going to talk about.

[00:04:52] Alex Volkov: And then we're going to start with segment by segment. So that's

[00:04:54] TL;DR and recap of the conversation

[00:04:54] Alex Volkov: Hey everyone, this is a recap of everything we talked about on ThursdAI for February 8th. 2024 and we had a bunch of breaking new stuff today, specifically around the fact that Google finally gave us something. But I'm gonna do this recap properly based on the categories. So let's go. So in the category of open source lms, we've talked about Alibaba releases a bunch of new Qwen models, specifically under the numbering 1.5.

[00:05:33] Alex Volkov: And we had the great pleasure again to talk with Justin J. Yang Lin. from Qwen team the guy who's a tech lead there and pushes for open source. And he came up and talked about why this is a 1. 5 model, not a 2 model. He also talked about the fact that they released a tiny 0.

[00:05:51] Alex Volkov: 5 billion one. This is like a very tiny. Large language model. I think it's really funny to say a tiny large language model, but this is the case. And he talked about multiple releases for Qwen. We also had, friend of the pod, Piotr Skalski from Roboflow, who's like a vision expert who comes up from time to time, and the author of I forget the name of the library.

[00:06:12] Alex Volkov: I will remember this and put this in the show notes as well. He came up and he had a bunch of plays with the visions part of the Qwen. ecosystem, and we've talked about QNVL plus and QNVL max with Justin as well, and we've talked about their potential for open sourcing these models. They also released a 72 billion parameter model that's now part of the top of the Hug Face leaderboard, which is super cool.

[00:06:34] Alex Volkov: So definitely a great conversation. And I love it when the authors of the things that we talk about come out and talk about the, in ThursdAI. We then smooth, smoothly move to the next topic where Abacus, the company Abacus AI, there is Finetune that's now top of the Hug Face leaderboard, and that's based on QN72B, and not even the new one, the previous one, so 1.

[00:06:54] Alex Volkov: 0, and that's now the top model on Hug Face leaderboard, and that has an average score of over 80. And I think it's the first open source model to do and they haven't fully released the process of what they what they used in order to make this much better in different leaderboards. But they have mentioned that they're going to train this model on top of the Mikulik over Mixtral.

[00:07:17] Alex Volkov: And it's very interesting. And they also They're building some other stuff in Abacus as well. Very interesting. And then we moved to talk about LMSYS Arena. LMSYS Arena is the place that we send you to see which models users prefer better versus just the benchmarks and evaluations hung in phase.

[00:07:35] Alex Volkov: LMSYS Arena added a bunch of open source models, so shout out OpenChat again. They added another Hermes the Finetune that Technium did for Hermes on top of Mixtral, and they also added a bunch of Qwen versions as well. LMSYS adds open source, so you continuously can see which models are better and don't have to judge for yourself, because sometimes it's not very easy.

[00:07:55] Alex Volkov: We also covered JINA embeddings that are fine tuned for code. JINA from the company JINA AI and the representative Bo Wang who came, and he's a friend of the pod. We talked about their embeddings for code. Bo didn't show up this time, but maybe next time as well. Then we moved to big companies, LLMs and API, and definitely the conversation turned interesting, where multiple folks here on stage paid the new 20 tax, let's say from AI [00:08:20] for for the rebranded Bard now called Gemini and the launch of Gemini Ultra.

[00:08:25] Alex Volkov: And we've talked about how long we've waited for Google to actually give us something like this. And now we're getting Gemini Ultra and Bard is no more, Bard is Essentially dead as a brand, and now we're getting the Gemini brand. So if you used to go to BART, now you go to Gemini, but also the brain behind this also improved.

[00:08:41] Alex Volkov: So you get Gemini Pro by default for free, I think, and Gemini Ultra is going to cost you 20 bucks a month. It's free for the next two months, so you can sign up for a trial, and then you'll get Gemini Ultra. And you'll get it not only in the web interface, you also get it in iOS and Android apps. And if you're on Android, it also integrates with the Android Assistant.

[00:09:00] Alex Volkov: That's pretty cool. It has a context length of not very much, I think we said 8 or 16 or so and some folks contested this in the comments, so we're still figuring out the context length, and it looks like the context length for that is Restricted with the UI, less on the API side, and Gemini Ultra did not release an API yet.

[00:09:17] Alex Volkov: So we've talked about Gemini Ultra and different things there. We also covered that OpenAI adds image metadata to all DALI generations, whether through the UI or through the API, this image metadata can be stripped. So it's not a watermark per se, but it's definitely helpful. And there also the OpenAI gives us a little bit of a developer experience thing where you can restrict.

[00:09:36] Alex Volkov: Per key on API keys different possibilities. So if one key gets stolen, you can lock only that one, or you can restrict it to only like a specific use as well. In the vision video category, we've talked about the new model for background removal called RMBG from Bria AI. It's not a fully commercial license, but you can play with this now.

[00:09:57] Alex Volkov: There's a demo I'm going to add to the show notes. And also it runs fully on your client via the efforts of friends of the pod Zenova from Transformers. js. And it's pretty cool to have a model that removes background super like with two clicks with no back with no servers. And in the voice and audio category, we talked about MetaVoice, a new.

[00:10:14] Alex Volkov: licensed Apache 2 licensed text to speech model, not from Meta, even though it's called MetaVoice, and it's funny it's pretty decent and has zero shot voice cloning which means that you can provide a piece of your voice and fairly quickly get a your voice speaking back to you generated. And we also talked about breaking news from NVIDIA AI, something called Nemo Canary 1B, which is a ASR model, Automatic Speech Recognition model, that's now top of the leaderboards on Hug Face, and it beats Whisper on everything, including specifically for four languages.

[00:10:48] Alex Volkov: It's trained on 8, 500 hours 85, 000 hours of annotated audio, and it's very fast conformer encoder as well. We barely covered this, but Microsoft added DALI editing with the designer. So if you remember, Microsoft also did a rebrand. It used to be called Bing Chat, and now it's called Copilot.

[00:11:07] Alex Volkov: And that Copilot now adds capabilities that don't exist in other places, like GPT, ChatGPT with DALI. So Microsoft's DALI now is involving the designer thing, and they have cool things where you can edit images. On the fly, you can click on different segmented objects from your generated image and say, Hey, redo this in a different style.

[00:11:27] Alex Volkov: The video for this is super cool. I'm going to add this in the show notes. And it's very interesting to see that Mali Microsoft with their co pilots is moving away from where the capabilities is for ChatGPT exist. We also barely, briefly mentioned and glanced through this, but Stability AI released an update to stable video diffusion, including a web UI that you can use now, and it's not only a model, it's a web UI as well, and that web UI is pretty cool, if you didn't get an access to it, I'll link to the show notes, I think it's now possible to register, much nicer videos, and obviously it's in the open source.

[00:11:59] Alex Volkov: as much as possible. So super cool. But the web UI shows you other people's video attempts. You can actually use their prompts to create videos of your own. They have some controls. It's very nice. Then I think we talked a little bit at the end there about Vision Pro and my experience with this as it comes to AI.

[00:12:15] Alex Volkov: We didn't dive in into Vision Pro, even though this is my new, this is my new toy in life. And I'm very happy to participate in the renaissance of spatial computing. And we covered like the intersection of AI and spatial computing. And I think the very interesting part of today's ThursdAI was thanks to two new guests, Benjamin Clavy and Connor from Weaviate, and we've talked about DSPy and Colbert, or Colbert, and Ragatouille, which is a library to use Colbert embeddings.

[00:12:43] Alex Volkov: And we talked about what they mean, and this was a great learning kind of experience for me. And if you see these concepts on your timeline and you have no idea what we talked about, I basically played the role of, hey, I'm the village dummy, let's say. I'm gonna re ask the question about what this means, why should we use this as well.

[00:13:01] Alex Volkov: And I think this is our show today, folks. This is the quick summary. If I missed anything super big and important, please let me know.

[00:13:08] Open source LLMs and AI news

[00:13:08] Alex Volkov: But otherwise, I think we'll start with open source. All right, welcome to the open source corner. And I guess because the tradition of ThursdAI is Something releases, I go in the comments and say, Hey, I'm going to talk about this on ThursdAI. Do you want to join? And sometimes people say yes. And this is how we met Justin or Junyang here on stage. Junyang is the dev lead for the Qwen team and welcome Junyang.

[00:13:50] Alex Volkov: It's very late where you are. So I really appreciate your time here. Please feel free to unmute and introduce yourself again. Some folks already know you, but if in case some new folks are listening to us, feel free to introduce yourself. And then let's talk about the stuff that you released.

[00:14:06] New Qwen models 1.4 from Alibaba

[00:14:06] Junyang Lin: Yeah. Thanks Alex. Nice to be at Thursday. ai it's a very great program for us to talk about ai. I am j Young and you can call me Justin. I'm working in the team for the LM and LMM. And we are now working for the new LLM, Qwen 1. 5, and we are also upgrading our vision language model, QwenBL, to QwenBL Plus and Max.

[00:14:33] Junyang Lin: Plus and Max are not open sourced yet, but we have demos, so you can try in our HuggingFace organization, and you can find our demos, and you can try with Plus and Max. And the max is the best one, and I am very confident with the max demo. And about our language model today actually this week we are open sourcing QWAM 1.

[00:14:58] Junyang Lin: 5. Maybe you previously you have noticed the QWAM 2 code inside Hugging Face target based transformers. Yeah, we are moving to new codes for you to use our QUANT models because in the past few months I have been interviewing our users and they found some problems with using our code, the original QUANT code, so I'm moving a step forward.

[00:15:23] Junyang Lin: So this is why we had the QUANT 2 model, but for the model themselves actually we are still we in our judgment, we are still at the 1. 5 not 2 yet. We're still training the real Qwen 2, so this time we have Qwen 1. 5. For Qwen 1. 5 we are actually fixing a lot of problems because there are some models like 7 billion and 14 billion, there are a lot of people using these models, but they are actually quite old.

[00:15:50] Junyang Lin: They were released months ago. They have some problems for Qwen 14 billion It is actually only supporting around 2 to 4K context length, which is far from enough for a lot of users. So for this time, we have upgraded all models to supporting 32, 000 tokens. And for the sizes, we have released more sizes.

[00:16:15] Junyang Lin: Previously, we had 1. 8, which is the smallest one. But this time, we have 0. 5. only 0. 5. I used to think this one is just for experimental usage but there are some users in Twitter they found still 0. 5 can used to be do something so if you have any comments on [00:16:40] 0. 5 you can share the comments to me. And we also have 4 billion which is between 1.

[00:16:46] Junyang Lin: 8 and 7 billion. The reason why we have 4 billion is that actually when we first released 1. 8 billion it is actually popular because they would like to deploy the small model to some devices like cell phones. but they found just 1. 8 is not good enough for them to for the applications.

[00:17:07] Junyang Lin: So they want something just smaller than 7 billion, but much better than 0. 8. So we have 4 billion. Yeah. We have a wide range of sizes. These are for you to choose. And,

[00:17:19] Alex Volkov: six, six models overall Junaid?

[00:17:22] Junyang Lin: Yeah. Six

[00:17:23] Alex Volkov: Six sizes overall, but definitely more models than this, because you also released, I think for the first time, you released quantized versions as well, correct?

[00:17:32] Junyang Lin: No, but previously we have released GPDQ,

[00:17:35] Alex Volkov: Oh yeah.

[00:17:35] Junyang Lin: our convention, but this time I also have AWQ and also GGUF maybe GGUF is the new one admittedly, previously I don't know too much about AWQ and GGUF. This time I tried and everything is okay. So I just released the AWQ and GGUF.

[00:17:52] Junyang Lin: And GGUF is the new thing for me. But it is quite popular in the community. Like Elm Studio, like you introduced. To me and I found a lot of people using gguf they use in Olama. So I collaborated with Olama. So you can now just run one line of code, like Olama run QWAM. So you can use the QWAM models with Olama and you can also use it in Elm Studio.

[00:18:15] Alex Volkov: I just wanna

[00:18:16] Junyang Lin: No

[00:18:16] Alex Volkov: just a tiny pause here because I think first of all, to highlight the importance of this community, you guys are releasing a bunch of great models in open source, and first of all, just a Great. At testament to the community because you're listening to what folks have been saying, how they're reacting to your models and part of the Thursday aid, I was able to just introduce you to, to LM Studio and you guys work together.

[00:18:37] Alex Volkov: And now the second year model drops, not only you guys already pro providing us quantized versions in four and GGF stuff. It's also very easy to start using and I think, just a shout out to you guys for thinking about this because a lot of models when they release they just release a waste file and then it's up in the community to figure out how to run them, when to run them, what's the problems.

[00:18:57] Alex Volkov: And this was the issue with Gwen before. It was like harder to use and maybe only on hug and face demos. And now you guys released it with support for the most popular open source runners out there. So Ollama, if folks haven't used Ollama by now, definitely there's a CLI, just like Ollama installed this.

[00:19:14] Alex Volkov: And LM Studio, which we've talked about a bunch, so shout out LM Studio. Shout out JAGS. And I'm, I was very happy to introduce both of you. So it's been great. And I've used the small model, the baby model as well. How was the reception from the community? What have you seen people do? Have there been any fine tunes already that you're excited about?

[00:19:33] Junyang Lin: yeah this is a very great comment for helping us to improve. Yeah, previously like us, a lot of people just drop open source models and they just let the community to use it. But this is maybe, this may be not right, because we can do more to the community, maybe we can do things. more easily than the community users.

[00:19:56] Junyang Lin: So this is why we are changing our style. We try to modify our code, try to adapt to the usages to make our models more popular. And recently I found them just gradually fine tuned our models. Previously fine tuned users are inside mainland China because they have chances to talk to us, so they will know more about our models so they, they can finally fine tune it.

[00:20:24] Junyang Lin: But with the support of Lama X Tree and especially Alto wing Winland helped me a lot. Technium just introduced wing land to me, and I found some people are using X lotto to do it. I dunno if Chen I don't know if I pronounced his name he's one of the users of Qwen and he he previously got the usage of our models and then he quickly fine tuned a lot of model its name is Q U Y

[00:20:54] Alex Volkov: Oh, Stable Quan. Yeah, I think I know what the guys are talking about. Stable Quan from also Nous Research

[00:20:59] Junyang Lin: yeah, stableQwen I'm quite familiar with him, I just talked to him very much, and he just directly used our models, very quickly finding a series of models, and I find them, the quality are quite good.

[00:21:12] Junyang Lin: So this is quite encouraging for me, because you can find people are interested in your models, they can find you in it, very fast speed, and I recently found Smog by Abacus AI, but I got no chance to talk to them because I don't know who actually built the model, but I found a small 72 billion is built on Qwen 72 billion

[00:21:37] Alex Volkov: Oh, really?

[00:21:39] Junyang Lin: Open open leaderboard.

[00:21:40] Alex Volkov: Smog is the next thing we're going to talk about, so you're taking us exactly there. I think, Nisten, you have a question just before, and then we're going to move to talk about smog. Just on the community part just the names you mentioned. You mentioned Stablequan, definitely friend of the pod.

[00:21:52] Alex Volkov: You mentioned Technium introduced you to Winglian, the guy from Axolotl. All of this happens in the ThursdAI community, and I love it. I'll just say that I see Robert in the audience here. Smog is from Abacus AI, and I think Robert has some connection to Bindu, so Robert, if you can introduce Junyang to Bindu, that would be great, and then we'll figure out, like, how they use the 72B model.

[00:22:12] Alex Volkov: 72B model that you guys released is one of the more performant ones. I think it's even outperforming Mistral Medium, is that correct?

[00:22:21] Junyang Lin: Yeah it's now this version QEM 1. 5 SIMD2 BDN is for the chat model for the base model, it is actually quite similar some users have found that I admit that, and, but for the chat models, we have some improvements because this time we are not only Actually, we not only SFD the model, but we also use DBO.

[00:22:40] Junyang Lin: We have some progress in DBO. So we've reached like 8. 67 in MTBench. This is a relatively high score and we just did simple DBO and just improved the model. And we also sent our model to Chatbot Arena in Elimsys. supported by Together AI, because we have some friends in Together AI. They just built API for us, and we have been in chatbot arena, so you can try it in chatbot arena to see how it really performs.

[00:23:18] Junyang Lin: Is it really perform just like the score of MTBench? I'm not quite sure, because I'm also dependent on the users feedback.

[00:23:27] Alex Volkov: it depends on human preference. I so first of all, Justin, you're taking over my job now because you're also reporting on the stuff that I wanted to mention, but definitely a shout out for getting added to LMSYS. That's not super easy. Not every model out there on the Hagenfest leaderboard gets added there.

[00:23:41] Alex Volkov: So definitely super cool. Yeah, please go ahead. If you have anything else to

[00:23:46] Junyang Lin: for as you have mentioned Mistral Medium, I'm not sure which one is better Mistral Medium or Qwen 72 Billion from some reviews they might be similar for the Qwen 1. 5 72 Billion similar to MiQ some of my friends like Blade just tested In EqBench, the scores are very similar, but I need some more reviews to let me really know that how the 72 billion model really perform, that how is it better or is it worse than MeeQ?

[00:24:20] Junyang Lin: They're all okay for me. I just want real reviews for me. Yeah,

[00:24:23] Alex Volkov: Yeah,

[00:24:24] Junyang Lin: it.

[00:24:25] Discussion about Qwen VL with Nisten and Piotr

[00:24:25] Alex Volkov: awesome. Junaid, thank you for joining us. And Nisten, go ahead. You have a few questions, I think, about the interesting things about VL.

[00:24:34] Nisten Tahiraj: Yeah, so one thing is that the 0.5 Bs and the small models, I know Denova in the audience was specifically looking for one around that size or like a 0.3 to run on web GBU, because then even at 32 bit, which older browsers will still support it, it will still only take two gigs. So that, that would run anywhere.

[00:24:58] Nisten Tahiraj: But my question. [00:25:00] So shout out to Feliz de Nova for all that. I know he's going to do something with it but my question for you was more about the Macs and the the larger Qwen QwenVL chats are those also based off of the 72B and did you find more improvements in going with a larger LLM, and I also wanted to know your opinion on Lava.

[00:25:27] Nisten Tahiraj: The Lava 1. 6 method where they mosaic together four clip models on top to get a larger image, even though it slows down inference because now it's got a output like 2000 embeddings. So yeah, what do you think of Lava and is there more stuff to share about the Clang,

[00:25:47] Junyang Lin: VL, Max. Yeah for Plus and Max it may be, sorry for me not ready to open source it.

[00:25:57] Junyang Lin: I cannot decide these things. Yeah actually it's built on larger language models much larger than the Plus, and you can guess whether it is 72 billion. It is not that important, and we have found that The scaling of the language model is really important for the understanding of the VR models.

[00:26:18] Junyang Lin: We have tested it on the MMMU benchmark and we have found that the Max model is highly more com competitive and performs much better than the Quin bi plus. Although previously many people have thought that Quin Bi Plus is strong enough, but we found that the max had. Much better reasoning capabilities, just understand some, something like some reasoning games like poker or these things like that, some complex things that people can understand through the vision information they can somehow understand it.

[00:26:52] Junyang Lin: I think the performance might be a bit slower. Approaching Gemini, Ultra, or GPE4B for the QEMDR MAX. We were just gathering some reviews. I'm not quite sure, but

[00:27:05] Alex Volkov: From the review perspective, I want to say hi to Petr, our friend here on stage, from Roboflow. Petr is one of the vision experts here on stage. Petr, welcome. Feel free to introduce yourself briefly, but I definitely know that you got excited about some of the GwenVL Plus stuff, so definitely feel free to share some of your insights here.

[00:27:30] Piotr Skalski: Okay. Yeah. And first of all, awesome to meet somebody from Qwentin. Yeah.

[00:27:36] Piotr Skalski: So yeah I'm from Roboflow, like you said and I'm responsible there for computer vision and growth. So it's like in between of being ML engineer and marketing something like this.

[00:27:49] Piotr Skalski: And yeah, I was experimenting with Qwen, Plas and Max last week. Super impressed in my opinion. I know that you tried to be humble, maybe, but. In my opinion it's, at least on things that

[00:28:04] Junyang Lin: I test, it performs like the best compared

[00:28:08] Piotr Skalski: to other

[00:28:09] Junyang Lin: models. Thank you very much. Thanks for the appreciation.

[00:28:14] Piotr Skalski: Yeah. And especially the fact, so the biggest game changer for me, and I know that there were models that were capable of that before, is the fact that you can ground those predictions and you can, for example, point to a specific element on the image. So it's not only that you can ask questions and get answers and do OCR, but you can straight up do zero shot detection if you would like.

[00:28:40] Piotr Skalski: Yeah. Which is which is awesome. And that's something that none of the. Other popular models can do to that extent, at least on the

[00:28:51] Piotr Skalski: things

[00:28:51] Piotr Skalski: that I

[00:28:51] Piotr Skalski: tested. My question is,

[00:28:55] Piotr Skalski: do you plan to open source it? Because it's awesome that you can try it out for the API and I highly appreciate the fact that you created the, HF space and you can go there and try it.

[00:29:07] Piotr Skalski: But is there a chance that you will open source it even with the meeting? License are not necessary.

[00:29:16] Junyang Lin: Yeah personally, I would like to open source some but I cannot decide these things, but I think there's a chance I'm still promoting these things inside the core, but I cannot say too many things about these stuff, but we will try because we have found out that we ourselves can also build very good LMM.

[00:29:37] Junyang Lin: I think the gap Just between the big corp between us and the big corp. In LMM, it's very small. And we have found that our techniques or our training is quite effective. So maybe one day we'll share to the community, but for now it is still APIs and demos and I would try to think about these things.

[00:29:59] Junyang Lin: And also question about. The comparison with us and Lava, and I have just tried Lava 1. 6 not quite freQwently. I just tried it. I think it's a very good model and it it has very good performance in the benchmark results but I think the limitations of these other open source models may be that It still lacks sufficient pre training for them Skullscape just said we can do Qwen can do OCR and you can find that Qwen's reasoning capability is quite strong because we have done a lot of pre training work on it.

[00:30:39] Junyang Lin: We have done a lot of data engineering on pre training because we have capabilities of handling different resolutions and different aspect ratios so that we can use the curated, the OCR data and put them in the pre training. And when the vision length model can understand a lot of textual like linguistic information inside the images, they may do something like like we said, reasoning, and you will find that really powerful, very impressive, or things like that.

[00:31:13] Junyang Lin: Yeah I think the gap between other models and us, or also Gemini Ultra and GPT 4b, maybe still the lack of large scale data. for training. Yeah, this is my opinion.

[00:31:27] Alex Volkov: we're waiting for more data, but we're also waiting for you guys too. I just want to thank you for being the champion for open source from within the organization, and really appreciate all your releases as well. I think Piotr and Nisten, like everybody here on stage, definitely. It feels that, and thank you for coming and talking about this.

[00:31:45] Alex Volkov: Justin, feel free to stick around because the next thing we're gonna talk about, you already mentioned, which is Smog 72 B which is the top of the leaderboard. And I just read through the thread from Bindu, ready from Abacus ai and it looks like they didn't even use 1.5. I think they used 70 the previous Quinn

[00:32:02] Junyang Lin: yeah, they used the previous QUANT72B. If they are really based on the base language model there might not be a lot of differences. Because 1. 5 for the base language model 72B is actually slightly better than the original 72B for the base language model. Yeah.

[00:32:22] Alex Volkov: for the base ones. And very interesting what they

[00:32:24] Junyang Lin: the base one.

[00:32:25] Alex Volkov: So they, they don't share any techniques, but they promised to open source their techniques. They're saying like, our next goal will be to publish these techniques as a research paper and apply them to some of the best Mistral models, including Miku.

[00:32:37] Alex Volkov: So I got confused. I thought that they already fine tuned Miku, but no, they just fine tuned on top of Qwen. And now the top Hug Face leaderboard model is based, is a fine tune of Qwen, which is like also super cool.

[00:32:50] Junyang Lin: Yeah, I'm very proud of it.

[00:32:52] Alex Volkov: Yeah, congrats.

[00:32:53] Junyang Lin: They are using our model to be the top of the model. I'm also really expecting their technical report to see how they reach the top of the benchmark. But I think it is not that It is not that kind of difficult because you have a lot of ways to improve your performance in the benchmark, so we'll still see how it really performs in the real scenarios, especially for their chat models, yeah.

[00:33:18] Alex Volkov: Yeah, that's true, [00:33:20] that's often the case. But I just want to shout out that the world is changing like super fast. We're definitely watching and monitoring the Hagenface leaderboard. And performing better than Mistral Medium is impressive. And this looks at least on the MMLU, this is 77. I think they said they broke The average score of 80, this is the first model that broke the average score of 80 on the open source leaderboard on hang and face, which is super cool based on Quinn as well, and definitely worth it.

[00:33:46] Alex Volkov: I'm gonna add this link to the show notes and hopefully we'll find a way to connect you guys with the Bindu team there at Abacus to see how else this can be improved even for, and whether or not these techniques can be put on smaller models as well. I think in the open source, the last thing.

[00:34:00] Junyang Lin: expecting the chat. Yeah, I'm really expecting to chat with them. Yeah, continue,

[00:34:05] Alex Volkov: So definitely hoping that some of our friends can connect between these awesome teams and learn from each other, which I think is the benefit of speaking in the public and putting things in open source. Now, moving on, the last thing that you definitely mentioned is the update from LMSys, which is quite a few of our friends of the pod are now also part of the chatbot arena.

[00:34:24] Alex Volkov: They just announced this yesterday. They've added Three of your versions, right? They added 1.572 B, 1.57 B, 1.5, four B, and they also added open chat. So shout out the folks from Open Chat Alai and the Alignment Lab and some other friends of ours who like release open chats latest release and they also added news imis fine tune.

[00:34:47] Alex Volkov: So if you guys remember we've talked about news fine tuning on mixed mixture and that improved on the mixture of expert model from. From Mistral a little bit based on DPO data sets. So now that's also in the LMCS arena and it's now powered by Together Compute. Which I have no affiliation with besides the fact that they're awesome.

[00:35:04] Alex Volkov: They're sponsoring a bunch of stuff. And we did a hackathon together together is great. Like you can easily fine tune stuff on their platform, but now they're also sponsoring the arena, at least to some extent, which is great because we get more models and arena keeps going. And if you guys remember, or you probably use it, LMC's arena is this another great way for us to feel what human preference is in models.

[00:35:27] Alex Volkov: And for many of these models. That's what's more important than actual performance on evaluations, on leaderboards, et cetera. So definitely great update from LMCs as well. And I think that, I'm gonna ask my folks here on stage, but Nisten, Far El, if this is like anything else in open source that's super interesting this week, I think that's mostly it.

[00:35:44] Alex Volkov: We can talk about Gemini.

[00:35:48] Nisten Tahiraj: There was a data set, which I think is pretty huge of HackerNoon that they released. And oh, there was one more thing HuggingFace made a GPT store.

[00:35:58] Alex Volkov: Oh,

[00:35:59] Nisten Tahiraj: they made their own GPT store. Yes. I think that's a big,

[00:36:03] Alex Volkov: I want to hear about this, for sure. I haven't used it yet, but I invite the Hug Face folks that are listening to this to come and tell us about this, because I haven't used it yet, so I don't actually have many opinions. But yeah, they released their own open source GPT store, which is super cool, and we're going to add this maybe in the show notes, but I don't have a lot to say about this.

[00:36:24] Alex Volkov: And I think, in the spirit of Yeah, go ahead.

[00:36:27] Nisten Tahiraj: Oh, sorry. Sorry. I'll quickly say that the HackerNoon data set of tech articles, those are some Because they have a lot of guest developers I remember over the years, they had the best ones. Those articles, that data set, is extremely great for any kind of coding or website or whatever work you're doing.

[00:36:50] Nisten Tahiraj: That's because it's step by step instructions on how to build something and all the code for it, it's pretty awesome and it's at the very beginning on the Jumbotron if you guys see it from Daniel VanStream. And yeah, and it's MIT licensed and it's 6. 9 million articles and you can do whatever you want with it.

[00:37:07] Nisten Tahiraj: That, shout out to them.

[00:37:09] Alex Volkov: We'll add this again to the show notes. And as you said something about articles and code, I remembered another thing that definitely Also worth mentioning Junaid Embeddings, if you guys remember, we had a chat with Bo Wang from Juna deep dive into embeddings a while ago, and Junaid Embeddings released a fine tune for code.

[00:37:25] Alex Volkov: So just a quick shout out that embeddings can be fine tuned, embedding models can be fine tuned for specific purposes, and definitely embeddings for co and you guys re if those of us who follow from week to week, we talk about embeddings a lot. We've talked about NumX Embeddings last week, the open source full, including the training datasets.

[00:37:42] Alex Volkov: We've talked about. OpenAI changing embeddings and giving us new ones and cheaper ones. And Junaid, we had a deep dive and I definitely welcome you to go and check out that special episode with Bo Wang from Junaid and they trained their own BERT model as the backbone, the LLM backbone that decides about embeddings and they just released an update to their embeddings fine tuned for code retrieval specifically.

[00:38:03] Alex Volkov: And I think for many folks are building rack system. That's something that they should be aware of that embedding models can be also fine tuned for specific purposes like Q& A and obviously code as well. So if you haven't tried that yet and you're doing a bunch of material on top of code, for example, using some of the data sets that Nisten just mentioned, that probably there's code in there definitely check this out.

[00:38:25] Alex Volkov: I think we're moving on to the big company thing, and I don't have a big company transition, I do have this one though.

[00:38:43] Google finally lanuches Gemini Ultra

[00:38:43] Alex Volkov: Just in, as we started the space, maybe an hour before, our friends from the big G, Google finally answered the question that we've been asking since 10 months and three weeks ago, where is Google? So GPT 4 was released to us after ChaiGPT released in, I want to say December, maybe December 1st, November 31st of 2020.

[00:39:06] Alex Volkov: Then GPT 4 was released in March of 2023. And throughout this time, there was this famous video of Satya Nadella asking where is Google and where's this like 600 pound gorilla in the room of search? And we're going to make them dance. And they definitely make them dance. And we've been waiting.

[00:39:25] Alex Volkov: Where's Google? Where's Google? And Google has released. Quite a few stuff for us since then. Just for context, I think everybody knows this already. Google is the place of the birth of the transformer paper. So like most of this, the recent Gen AI explosion is, can be attributed to transformers architecture that came out from Google.

[00:39:43] Alex Volkov: Google had trained multiple models, including like Palm, and we've talked about Palm and Palm 2, and I don't even remember all the names of the models that they've released for us throughout the years. Google then also. At some point gave us BARD, which is their interface, the chat interface that people used in order to play with their models, and I think some of this was Bye.

[00:40:04] Alex Volkov: Bye. Palm, something else as well. And recently, and I think around December, they said, Hey, you know what? We're here and we have this thing called Gemini after the unification of Google Brain and DeepMind under one org. And we're going to give you Gemini Pro right now, but we'll tell you that Gemini Ultra, that was back in December.

[00:40:23] Alex Volkov: The Gemini, I guess December will tell you the Gemini Ultra is coming and it's going to be better than GPT 4 and you're going to get it soon. And we've been like saying when? And today is the day is the answer for those questions. So today we're celebrating, congrats folks at Google who finally released and upgrade to their LLM capabilities.

[00:40:41] Alex Volkov: Not only an upgrade, so much an upgrade that they've killed the Bard brand completely. No more Bard. That's what I'm understanding. No more BARD, even though that's very confusing. If you guys remember a few weeks ago, we've talked about LMSYS changes were barred with Gemini, I think, something like confusing like this, shot up to the top of the charts and just was trailing GPT 4.

[00:41:05] Alex Volkov: So like second best model in LMSYS arena was barred with GPT 4, or sorry, barred with Gemini. See how confusing this is? And now there's no more barred. But there is an LNCS. Anyway, this is like the whole naming is confusing thing, but Google, including a blog post from Sundar and everything, Google comes out with a new update and says, Hey, Bard is no more.

[00:41:25] Alex Volkov: It's now Gemini and the models are also Gemini. So that's confusing. And the models are Gemini Ultra. We finally get access to Google's answer to GPT 4 today, which is incredible. That answer is Ultra 1. 0. [00:41:40] And we can get this. As part of something like a paid premium tier that's called GMA Advanced on Google.

[00:41:46] Alex Volkov: So you can actually go right now, you can sign up, it's 20 bucks a month, and it starts 20 bucks or 30 bucks? I think it's 20

[00:41:52] Nisten Tahiraj: It's two months free

[00:41:54] Alex Volkov: Yeah, and you get two months, two months trial because they have to prove themselves to you because many people will decide whether or not they're going to go with Google or with JGPT.

[00:42:03] Alex Volkov: And we're going to talk about which one folks will prefer. I haven't tried it yet. Literally as I woke up, I had to prepare my notes for the space. I just want to say. Google, welcome to the party, we've been waiting for you, and I counted, it's been exactly 10 months and 3 weeks and 4 days since GPT 4 released that you came with the same level of, at least, based on benchmarks.

[00:42:24] Alex Volkov: And now we're gonna talk with some folks who actually tried it, Nisten, you tried it, I think Ray, you also tried it let's talk about your first impressions from BART, oh, or, sorry, Gemini.

[00:42:35] Nisten Tahiraj: One, it's heavily moderated. No one's surprised by that. It does answer and reason nicely, or at least the way it communicates, it's a lot more eloQwent, I would say. It feels nicer in the way it reasons stuff out. However, compared to Mistral Medium, or Mixtral, it doesn't quite obey you. I tried my standard question, which is just like Climb out a schedule of building a city on Mars and write the code in C and JavaScript.

[00:43:10] Nisten Tahiraj: And that's a pretty complex question for, that only the best models get. And I needed to re prompt it in order for it to give the answer. And even then, it only wrote some JavaScript. But it was really good JavaScript. However, it didn't do the rest of the task. Okay, it's not bad. It is worth using. Again, very heavily moderated.

[00:43:33] Nisten Tahiraj: As for the vision side of it, it's extremely heavily moderated. I was even telling it to count out, I had an old gaming PC on the floor with two GPUs on the side, and I told it to make me a JSON of all the parts that it sees in the picture. It won't answer questions like, that have humans in them, or even if they're like Star Wars characters or whatever.

[00:43:58] Nisten Tahiraj: But This, I thought, would be something pretty simple, and it, even this one it refused to answer. Yes is good, I think. On the, as far as the vision side goes, the model, the open source models might have it already beat, or will soon.

[00:44:19] Ray Fernando: Yeah, I wanted to add, Ankesh from Google DeepMind actually wrote because I've been posting some of this stuff, and he says, To preempt any confusion, multimodal queries don't go through Pro slash Ultra yet, but that is coming soon too.

[00:44:33] Ray Fernando: Which makes sense a little bit of why you're seeing some of that stuff. I've been seeing similar things when I've been doing some image analysis or even trying to generate images that have people. One of my examples I've just been posting on my my Twitter feed is like having to analyze a meme.

[00:44:48] Ray Fernando: So it's the hot girls meme or the hot ones meme and I was like, hey, this is very popular. Can you tell me what this meme is? And it's I'm sorry I can't because there's images of people. And then I had to do some other meme analysis with Elon Musk and it's the same type of queries. But to add to what Nisten was saying, I've been doing a lot of creative writing tasks, and the writing output has been actually really nice.

[00:45:10] Ray Fernando: And it doesn't have all that extra fluff that you normally would get from ChatGPT 4. And what I find with OpenAI's ChatGPT 4 is that they freQwently say, Hey, don't use purple prose, which is all that extra fluffy stuff you read that make people sound smart. It's I just want a regular sounding piece.

[00:45:27] Ray Fernando: And usually ChatGPT would do that and then revert back to its normal state but I find that Gemini Advanced just keeps going through it and, continues with the writing pieces of things. And for coding stuff, it's really strange. You actually cannot upload any CSV or any text files.

[00:45:43] Ray Fernando: They only let you upload images right now. So you can only have a picture of a microphone and a picture of the little icon to upload an image. Because I wanted to just do a simple analysis on my tweets with a CSV file. And it's there's no place that I see to actually upload that. And I could probably upload so many lines, but there's also a character cutoff, too, that doesn't allow me to upload a lot of code for,

[00:46:03] Ray Fernando: A code base.

[00:46:04] Alex Volkov: What's the, I was about to say this next thing. Do we know the context length? Anybody have an idea of where Gemini Ultra is at around? 'cause we know that GT four is 1 28 K and I think they recently opened this up on the UI as well. I've been noticing less restrictions. I've been able to pace like a lot more code.

[00:46:21] Alex Volkov: My, my test is, you guys know my test is the transcription of the Thursday I conversation that I past and Claude with the a hundred K context definitely takes all of it. GBT. For the pro kind of level used to refuse and now recently it's okay. Yeah, let me summarize this for you Have you guys been able to sense the context length of Gemini Ultra?

[00:46:41] Alex Volkov: Is it any close? Actually, go ahead Welcome to the stage, buddy

[00:46:46] Akshay Gautam: Hello, I just wanted to bring up that their official document mentions that it's 2k context length.

[00:46:53] Alex Volkov: Actually, we don't get greetings of the day

[00:46:57] Akshay Gautam: I see. Yeah. Yeah. Greetings of the day everybody. My name is Akshay Kumar Gautam and I'm an applied AI engineer. I was a data scientist before, but now I work with, modeling and stuff. And yeah I was literally waiting for, I tried, came out, I paid for it because why not? And and a lot of stuff.

[00:47:14] Akshay Gautam: First of all, it's really good at coding. By the way, the context length is 32K at least that's what they say. Yeah, 32K. And and the model is not good at keeping context, like that is what I was here to talk about. It will lose sense for example, if you ask it to do multiple things in a single prompt, it will not.

[00:47:33] Akshay Gautam: Unlike chatGPT, but like with coding, it's better than chatGPT in my humble opinion.

[00:47:41] Alex Volkov: so I want to talk about some advantages that Google has, the big dog definitely, because an additional thing that they released, which Chantipiti doesn't have, is ChairGPT has this, but they released an iOS and Android app, but Android also has integration with the Google Assistant, right?

[00:47:56] Alex Volkov: So you can now join this advanced or ultra tier and use this from your Android device. Now, I'm not an Android user, but I definitely understand that the ecosystem is vast and many people just use this assistant and we're waiting for Apple. We don't have anything to say about Apple specifically today, besides the fact that, they released the, maybe the next era of computing.

[00:48:16] Alex Volkov: But. There's nothing AI series, still the same series from like 2019 with some examples, but Google has now moved everybody who wants to, who pays the 20 bucks a month and has an Android device basically towards this level of intelligence, basically a GPT 4 level of intelligence. And I saw that Marques Brownlee, MKBHD on YouTube, like one of the best tech reviewers out there.

[00:48:38] Alex Volkov: He has been playing with the Android stuff, and he said that even the integration Google Assistant even uses your home stuff. So you can actually ask this level of intelligence to turn on some lights, whatever, and probably better context. Actually, you have any comments on this? Have you played with the Assistant version?

[00:48:54] Akshay Gautam: Two things first of all, Bing chat was already available on Android devices, right? The Copilot, now it's called. Copilot uses GPT 4, so it's already really good. And you can actually use a lot of voice stuff with Copilot as well, which was surprising. The Google Assistant to be honest, in terms of assistants among Siri and I have a Samsung device, so it has Bixby and, among all the AI systems, Google Assistant was the best one by far, in terms of how much you can, use it, and hoping to get access because I have paid for the Ultra, but I still don't have, access to everything.

[00:49:29] Akshay Gautam: Also, there's no API for Ultra, so you cannot actually test anything as well.

[00:49:34] Alex Volkov: we haven't gotten an API developers Sundar Pichai said the developers announcements are going to come next week. IOS hasn't updated yet. Yeah, go ahead Nisten.

[00:49:44] Nisten Tahiraj: I just really quickly tested it with the entire Lama. cpp file. I am down to 15, 000 tokens I cut it down to and it's still too long. We know it's under 16, 000 that you can paste in. I will know [00:50:00] exactly in a few minutes,

[00:50:03] Alex Volkov: So not super, super impressive in terms of like long context. I will also

[00:50:06] Nisten Tahiraj: at least not for the UI,

[00:50:08] Alex Volkov: for the UI. Usually, yeah, usually for some reason they restrict the UI or they forget to update this. And then the model itself is like way longer context, but for now not extremely impressive comparatively.

[00:50:18] Alex Volkov: And again, we're comparing the two like main flagship models OpenAI GPT 4 and now Google's Gemini Ultra. And I also want to say one thing, Gemini seems to be optimized only for English as well, even though it will answer like most of the questions other languages, but it looks like the optimization was focused on English as well.

[00:50:36] Alex Volkov: including some of the apps as well, which is, understandable, but we have to, as we're trying to compare apples to apples GPT 4 is incredibly versatile in multi language operations as well. LDJ, you have some comments? Welcome, buddy, to the stage and give us some Have you played with Ultra so far?

[00:50:55] LDJ: Yes I was actually wondering, does anybody know of plans for them to integrate this with Google Home? Because I just asked my Google Home right now are you Gemini? And it said, I'm a Virgo. And then I asked it, what AI model are you running right now? It said, sorry, I don't understand. So I don't think it's, at least mine, I don't think is running Gemini right now.

[00:51:16] LDJ: But

[00:51:17] Alex Volkov: No, so I think the announcement was

[00:51:18] Junyang Lin: to put it.

[00:51:19] Alex Volkov: The integration into Google Home will come from the Google Assistant. So if you have an Android device, you'll have Google Assistant there. That you can switch on like a smarter brain, and that you can ask it to integrate like with your home through the device. So you can ask it to do stuff in your home.

[00:51:34] Alex Volkov: But the Google Home itself, like the Google Home devices that you have, they're not talked about upgrading them, but maybe at some point, because why not? But I haven't seen anything on this yet. Anything else here?

[00:51:46] Junyang Lin: I think that'd be the perfect. Sorry. Yeah, go on.

[00:51:48] Alex Volkov: Yeah, no, that would be great. I agree with you. Being able to walk around your house and just talk with GPT 4 level intelligence to do operations, I definitely agree.

[00:51:55] Alex Volkov: That would be great. I gotta wonder anything else here on Ultra? We've talked about its code performance. We've talked about its inability to talk about people. Anything else interesting that we want to cover so far? And again, folks, it's been two hours and we're already giving you like a bunch of info, but we'll play with this going forward.

[00:52:12] Nisten Tahiraj: It's about 8, 000 the context length that you

[00:52:14] Alex Volkov: Are you serious? Wow, that's not a lot at

[00:52:17] Nisten Tahiraj: that's as much I was able to paste it like 7, 500.

[00:52:20] Alex Volkov: So yeah, folks, you heard it here first. You'll get more context than you previously got probably, but it's not a lot comparatively. Even though it can probably, it's probably a consideration of compute for Google, right? How much context to give you the model probably gets more. And it's also a vision enabled model.

[00:52:36] Alex Volkov: But I think that we've covered this enough. Gemini Ultra. It's here, it's very impressive from Google, and yet, I want to say personally, maybe a little bit underwhelming because, they need to convince us to move, and it's going to be the same price, and I don't know, let me just ask this before we move on.

[00:52:55] Alex Volkov: Anybody here on stage who has access to both plans to pay for this and not GPT?

[00:53:03] Nisten Tahiraj: I haven't paid for anything since September But I'm

[00:53:08] Junyang Lin: not the right person for this question. My company pays for like my character description. So I might keep both

[00:53:15] Alex Volkov: Interesting.

[00:53:16] Junyang Lin: paying for mine's out of pocket. I'm just going to keep both. I like the OpenAI app because it's just the multimodal picture on my phone.

[00:53:23] Junyang Lin: I'm on the go. For Google, I'm just curious because it's two months free. That just means that, they have me hooked. We'll see.

[00:53:30] Alex Volkov: Yeah, it's two months free. And then let's check in back in two months, and see how many of us kept paying. All right. I so Google also releases. a Llama CPP wrapper called Local LLM. I don't know if you guys saw this. It's pretty cool. It's an open source tool from Google that actually helps you run LLMs locally on CPUs and then also on the Google Cloud with a super easy integration.

[00:53:51] Alex Volkov: Very interesting choice. They also call out the bloke that you can download models from the bloke with their tool. And I think it's very funny that if you go on. The description of the blog of local LLM, they call this. Now, the tool, they told you in the code snippets, they say, Hey, install OpenAI.

[00:54:10] Alex Volkov: So I found it really funny. But yeah, they have a wrapper there that integrates with Google Cloud as well.

[00:54:15] OpenAI adds DALL-E watermarking and per API key restrictions

[00:54:15] Alex Volkov: Running through the big companies areas like super quick. OpenAI added watermarks to Dali images. They use this new metadata thing called C two P embeds and it embeds in the metadata.

[00:54:27] Alex Volkov: And so basically what this means for us is not that much, but when you download images from Dali generated, I assume that the same will come to Microsoft copilot. They will now have in the metadata, where like the location is and everything else. They will now have the fact that they have been generated with.

[00:54:43] Alex Volkov: They have been generated with DALI this information will sit in the metadata. Now it's only images, not text or voice or anything else from OpenAI. This happens over the API or from the ChatGPT interface as well. This increases the file size a little bit because of some of the stuff, but it's not super interesting.

[00:55:00] Alex Volkov: This can be stripped. So it doesn't mean that if the lack of presence of this thing does not mean that it's not generated with DALI. It just, if there is, it's definitely generated with DALI. And so this is an interesting attempt from OpenAI to say, Hey, we're doing as much as we can.

[00:55:15] Alex Volkov: It's not foolproof, but an interesting attempt. And also, I just want to mention that if, for those of us who develop with OpenAI, The API keys, they keep upgrading the developer experience there and the API keys part. And now you can restrict per API key. You can restrict its usage, which many people have been waiting for a long time.

[00:55:33] Alex Volkov: And that's really like many people has been wanting this. You can create one API key for OpenAI for a specific purpose and restrict it to only DALI, for example. And you can, I don't know if you can restrict. based on credits, I don't think so, but you can definitely restrict in, in the usage related stuff.

[00:55:49] Alex Volkov: That's, I think, all the updates from the big companies and the LLMs and APIs,

[00:55:53] Alex Volkov: This week's buzz is the corner and I stopped the music too prematurely. This week's buzz is the corner where I talk about the stuff that I learned in Weights & Biases this week. And I don't know how many of you were, had a chance to join our live segments, but we definitely had a build week. And I think I mentioned this before, but actually we had a live show on Monday.

[00:56:19] Alex Volkov: We're going to have another one this probably tomorrow. Yeah, tomorrow. I think it's Noon Pacific, where I interview my team, the GrowthML team in Weights & Biases, about the build with projects that we've built, uh, last December to try and see what's the latest and greatest in this world. So as we build tools for you in this world, we also wanna Build internal tools to see what are the latest techniques and stuff like we just talked about.

[00:56:46] Alex Volkov: For example, it gives us a chance to play around with them. It's like an internal hackathon. And what happened was is we build those tools and we present them to the company and then this was basically it. And I said, Hey, hold on a second. I learned the best publicly. I learned the best about, the way I just learned from Connor and Benjamin.

[00:57:02] Alex Volkov: I learned from Nisten and Far El and all the folks in the audience. And Luigi and I had a whole section where he taught me weights and biases before. I learned the best by being public and talking about what I'm learning as I'm learning this. And so I did the same thing with our folks from the GrowthML team.

[00:57:15] Alex Volkov: We just literally folks came up on stage and I asked them about what they built and what they learned. And we're going to summarize those learnings in the live show. And that live show, if you're interested, is all over our social, so on Weights & Biases YouTube and LinkedIn. Yes, LinkedIn, I now need to also participate in that whole thing.

[00:57:33] Alex Volkov: So if you have tips about LinkedIn, let me know. But it's live on LinkedIn, live on YouTube. I think we did X as well and nobody came. We're probably try to send you to the live YouTube flow. But basically the second part of this is coming up tomorrow. We're interviewing three more folks and you get to meet the team that I'm, the incredible team that that I'm part of.

[00:57:53] Alex Volkov: Very smart folks. like Kaggle Masters, and some of them came to Kano's show as well, which is super cool. And I find the first conversation super interesting and insightful for me. Definitely recommend if you're into Understanding how to build projects that actually work within companies was the process.

[00:58:11] Alex Volkov: We have folks who build something from scratch, we have somebody who runs a actual bot with retrieval and re ranking and evaluations and like all these things and [00:58:20] have been running them for a year basically on the production. So you can actually try our bot in Discord right now and in Slack and on GPTs.

[00:58:28] Alex Volkov: If you want to hear about the difference between a mature, rag based But that's in production for a professional AI company, but also the difference between that and something that somebody can like quickly build in a week. We've talked about those differences as well. So definitely worth checking out that live.

[00:58:46] Alex Volkov: Moving on from this week's buzz, and I learned a lot. Okay, so back from the this week's buzz, we're moving into vision.

[00:58:52]

[00:58:57] Alex Volkov: And Bria AI like super quick, they released a new Background Segmentation Model, or Background Removal Model, that's live on Hug Face, is called RMBG V1. 4, and I think the cool thing about this is that it now runs completely in the browser, thanks to the efforts of our friend Zinova, who is no longer in the audience, I think, from Hug Face and Transformers.

[00:59:19] Alex Volkov: js, and it's super cool. You can like, remove backgrounds completely without sending any images to anywhere, and just straight from your browser. That model is called, again, RMBG, and it's not Commercially viable. So you cannot use this for professional stuff, but it's open for you to try and play with in the voice category, the voice and audio category.

[00:59:39] Alex Volkov: We don't have a lot of audio stuff lately, so I think the main audio stuff that we've talked about was. I want to say Suno is like the latest and greatest, but we're still waiting for some cool music creation stuff from different labs. And definitely I know some of them are coming but in the voice category and you know that we've been talking about, my position in this and Nisten and I share this position.

[01:00:01] Alex Volkov: I think personally, The faster models will come out that can clone your voice and the faster they're going to come out in open source, the better it is generally for society. I know it's a hot take, I know, but I know also, I cannot reveal the source, I know that voice cloning tech is going to be at open source like super, super quick.

[01:00:21] Alex Volkov: And I think it's like one of those. Break the dam type things that the first kind of major lab will release a voice cloning and then everybody will see that nothing happened in the world, everybody else will release theirs, and we know everybody has one. We know for a long time that Microsoft has, I want to say Valley, was that Valley?

[01:00:38] Alex Volkov: That clones your voice in under three seconds. There's papers on this from every company in the world. We know that OpenAI has one. They collaborated with Spotify and they cloned Lex Fridman's voice and it sounds exactly like Lex Fridman. We know that companies like Heygen, for example, I think they use 11labs.

[01:00:54] Alex Volkov: 11labs has voice cloning as well. None of this is open source, everything is proprietary. So we're still waiting for the voice cloning area from open source from a big company. But for now, we got something called MetaVoice from a smaller company. Not from Meta, it's just called MetaVoice, it's confusing.

[01:01:08] Alex Volkov: It's just like a tiny model, 1. 2 billion parameters model. It's trained on 100k hours of data, which is quite significant, but not millions of hours. And it supports zero shot voice cloning. So basically under a few samples, like a basic sample of your voice, and then you're going to get a clone of your voice or somebody else's, which is what scares many people in this area.

[01:01:30] Alex Volkov: It has like long form synthesis as well. It's super cool. And it has emotional speech. If you guys remember, we've talked about. How important emotion is in voice cloning, because again, for those of you who follow ThursdAI for a while, you may remember myself voice cloned in kind of Russian, and I'm doing this with a lot of excitement, when the regular voice cloning thing for Alex speaks in a monotone voice, that's Very clearly not the same kind of person.

[01:01:56] Alex Volkov: So emotional speech is very important. And some of this is with prompt engineering and some of this happens in voice casting or voice acting. And the best part about this MetaVoice thing is Apache 2 license and it sounds pretty good. And so we've talked about multiple TTS models, and now this model is definitely out there.

[01:02:14] Alex Volkov: So if you're building anything and you want a TTS model for you with voice cloning, I think this is now the best. the best shot you have. It's called MetaVoice. I'm going to be adding this to the show notes as well. And I think we have a breaking news from a friend, VB with another model called Nemo.

[01:02:30] Alex Volkov: So let's take a look. Yeah, definitely a new model from NVIDIA. It's called Nemo. Let me actually use this. I want to use the sound as much as possible.

[01:02:50] Alex Volkov: So I'm gonna go and try and find this tweet for you, but basically we have a breaking news, literally Rich VB, which is the guy friend of the Padawars, who's in charge of, like, all the cool voice related and TTS related tech and Hug Face, he mentioned that NVIDIA AI released Nemo Canary.

[01:03:07] Alex Volkov: Nemo Canary is the top of open a SR leaderboard. VB is also part of the folks who are running the leaderboard for us, a SR stands for automatic speech Recognition. No, I think I'm confusing this. Yes, automatic speech recognition. Cool. Thank you, Nisten. So basically, if you guys remember Whisper, we talked about Whisper a lot.

[01:03:25] Alex Volkov: This is the leaderboard, and Whisper has been on top of this leaderboard for a while. Recently, NVIDIA has done some stuff with stuff like Parakit. And now we have a new contender in the ASR leaderboard called Nemo Canary 1B. 1B is not that much. Whisper The highest Whisper large, I think it's 2. 5 B or something.

[01:03:44] Alex Volkov: This is now the top SR leaderboard. It beats Whisper and it beats Seamless from Meta as well. And I don't know about License here. It supports four languages. Whisper obviously supports a hundred, which is, uh, which is, we know the best for many kind of low resource languages as well. Trained on not that much hours of annotated audio, only 85 1000 hours or so, and it's super fast as well.

[01:04:10] Alex Volkov: It's very interesting that NVIDIA does multiple things in this area. We had Parakit, now we have Canary as well. What else should we look at? I think Bits, Whisper, and a considerable margin, again, on these specific languages. Folks, we've been, I think, we've been on this trend for a while, and I think it's clear.

[01:04:28] Alex Volkov: Incredible automatic speech recognition comes on device very soon. Like this trend is very obvious and clear. I will add my kind of thoughts on this from somebody who used Whisper in production for a while. The faster it comes on device, the better. And specifically, I think this will help me talk about the next topic.

[01:04:47] Alex Volkov: Let's see what else I have to cover. Yeah, I think it's pretty much it. The next topic

[01:04:51] Nisten Tahiraj: I'm trying it right now, by the way. And it's pretty good.

[01:04:55] Alex Volkov: Are you transcribing me in real time or what are you doing?

[01:04:58] Nisten Tahiraj: yeah, I was transcribing your voice through the phone to my laptop but weirdly enough it doesn't output numbers, it only outputs words however

[01:05:06] Nisten Tahiraj: It seems pretty good, huh? I don't know, it seems good to

[01:05:09] Nisten Tahiraj: me, LGTM looks good to me.

[01:05:11] Alex Volkov: Yeah, it was good to me. Absolutely. The word error rate, the word error rate for Whisper is around 8%, I think, on, on average for these languages and for Canary is less than it's 5. I think, if I remember correctly, VB told us that word error rate is like how many mistakes per 100 words it does, and this does, Five Mistakes Versus Eight, I think on the general data sets.

[01:05:36] Alex Volkov: Quite incredible. This is coming and I think I'll use this to jump to the next thing

[01:05:39] Alex finds a way to plug Vision Pro in spaces about AI

[01:05:39] Alex Volkov: . The next thing, and briefly we'll cover this, is that I haven't used it for the show, but for the past, since last Friday, basically, I've been existing in reality and in augmented virtual spatial reality from Apple.

[01:05:52] Alex Volkov: And the reason I finally have a chance to connect these two things is because. I use a lot of the hand gestures within the Vision Pro from Apple, which was released on Friday and a lot of voice as well. And obviously we've talked about Siri, we've talked about finally Google stepping up with their assistant.

[01:06:08] Alex Volkov: Siri voice recognition and also typing is not that great. And I know because I used Whisper in production for a bunch. I also use Super Whisper, shout out Neil on my Mac to actually dictate a bunch. And all those tools, all the new tools, Whisper and now Canary and like all these things, they understand me and my accent very well.

[01:06:26] Alex Volkov: Whereas Siri is like on device. So Siri actually has two automatic speech recognition. They have the fast one on device and they actually do send your voice on onto the cloud and they return something. So you would [01:06:40] actually see a wrong transcription and then the right one replaced the wrong one. And the right one is actually generally okay, even though with my accent doesn't get me as much, but the wrong one is very bad.

[01:06:50] Alex Volkov: It's it's like they stopped. Thinking about ASR, Automatic Spatial Recognition in Apple, back in 2019, and that's what they shipped. However, there were quite a few papers from Apple on this topic, and I know for a fact that we're getting on device. And the reason I'm excited about this in the spatial context as well is because you can talk instead of using Hands on keyboard and that's very cool I think that's all I had to connect with the spatial computing in addition to I've tried all the AI tools and games and everything And we're still not there.

[01:07:19] Alex Volkov: There has been one thing that I want to connect if you guys know from the diffusion model area There is a way to generate images in 360 around you and I thought this was super cool because this is essentially a holodeck moment where you can stand in full virtual embedded reality and just say, Hey, I want this thing to appear.

[01:07:39] Alex Volkov: And we have now models of text to 3d that are coming like super soon. We obviously have virtual friends that embedding them in real space needs a robot. But now if you have this like spatial computing thing, you can actually put an AI friend in the corner. You will always talk to you. So there's a few like attempts at this in the Apple thing.

[01:07:57] Alex Volkov: but not a lot. And also I will ping back to this like last thing where Apple is coming. We've talked about this. Apple is coming on Friday of release of Vision Pro, which was the day after last Thursday. Apple had their uh, shareholder meeting. And in there, Tim Cook said, Hey, we launched spatial computing.

[01:08:15] Alex Volkov: We're really happy. This is the next iteration of spatial stuff, blah, blah, blah. I definitely agree about all this. If you watch my feed for the past week, that's pretty much all I can talk about besides AI. However, going back to the AI, Tim Cook finally mentioned the word AI in the call, and he's not the only one.

[01:08:30] Alex Volkov: It's very clear where the thing is going. Every earnings call for every major company mentioned AI. Tim Cook specifically mentioned AI finally and said, Hey. We're very excited about this technology and we're going to show you something like soon. So I expect that this WWDC is going to be Spatial and AI related and I definitely think that Apple are thinking about both just because the way Siri looks in Spatial is just incredibly like nice.

[01:08:55] Alex Volkov: And I can see how embodying AI in your physical world, where you have spatial awareness, you can put something in the corner, it will sound like it's coming in the corner. And I'm waiting for the, for the point where that has a bot, like a Tesla Optimus bot with AI.

[01:09:11] Alex Volkov: But before that, we'll definitely get there with spatial computing. So I'm going to have embodied AI agents around me and I'm going to ask questions. For some reason, the ChatGPT interface within the headset is horrible. And specifically because we all know that the iPhone app you can talk to, but Vision Pro only has access to iPad apps, and you can install the ChatGPT iPad app, but you cannot talk to it, which is a miss, I think, on OpenAI's part.

[01:09:35] Alex Volkov: This isn't in my segment about the Vision Pro. I tried as much as possible to connect these things to AI to bring this to you. But, separately from this my full review of Vision Pro is, holy s**t, this device is the new category of computing, and I can talk about this in a different space if you're interested.

[01:09:50] Space reset

[01:09:50] Alex Volkov: and I think it's time for a reset the space, as we've gone up for an hour here, folks. A little bit more than an hour. I'm just gonna play some music, reset the space, and then we're gonna have a conversation with some folks here on stage.

[01:10:12] Deep dive into DSPy, COLbert and RAGatouille with Ben Clavie and Connor Shorten

[01:10:12] Alex Volkov: Welcome, everyone, to the second hour of ThursdAI. Where we usually, we have a bunch of stuff to cover still from the news angle, like the Bria stuff and the MetaVoice stuff and the Arts in the Fusion. But, and also maybe you want to have some time to talk about Vision Pro, but for now, we have two guests here on stage that I want to welcome and introduce.

[01:10:31] Alex Volkov: And we're going to talk about very interesting things that maybe some of you who follow the Twitter, XAI, Ecosphere have been seeing around and I really want to say I want to say thank you and welcome to Conor and Benjamin for joining us. Maybe let's unmute Conor first and then Benjamin and just introduce yourself.

[01:10:49] Alex Volkov: Benjamin, I know you're going through some stuff, buddy. And as much as you can Benjamin feel free to, to talk to us, but we'll try to cover as much as possible. Conor, go ahead and then Benjamin.

[01:10:58] Nisten Tahiraj: Hey Alex, are you able to hear me first

[01:11:00] Alex Volkov: Yes, we can hear you loud and clear.

[01:11:03] Connor Shorten: Awesome, cool. I think I've been like refreshing the Twitter page and all that, but awesome. So I'm Connor. I'm a research scientist at Weavier. I also host the Weavier podcast. And yeah, I've just been so excited about DSPI and I'm, really excited to be diving

[01:11:15] Connor Shorten: into it further.

[01:11:16] Alex Volkov: That's awesome. And I think that WayVid podcast was the first podcast that I came on as a little bit of a guest from NeurIPS. So we had a great conversation outside of NeurIPS sign. If you guys want to check this out, but also WayVid podcast, the folks from Weights & Biases had a great chat with you.

[01:11:29] Alex Volkov: That's where I know you from. Actually researched my position and my team based on the conversation you had with them. Very knowledgeable. And thank you for that content. It's really great. And folks definitely should check it out. And I want to also say hi to Benjamin Clavy. Welcome, Benjamin.

[01:11:44] Benjamin Clavie: Hey,

[01:11:45] Benjamin Clavie: thank you for having me. Can you hear me?

[01:11:47] Alex Volkov: Yes, you're coming through loud and clear.

[01:11:50] Benjamin Clavie: Yeah. Thank you. Yeah, I've made Tato, which you might have seen if you're interested in T at all, which is

[01:12:02] Benjamin Clavie: physically here, but not present in, but

[01:12:05] Alex Volkov: Do, what's in terms of background? Could you give us a little bit of background? Like how you came up to build these things? What's your background? Is this AI? Give us maybe a few brief sentences there.

[01:12:15] Benjamin Clavie: I'll say. My background

[01:12:16] Benjamin Clavie: here is basically ai. I've done the stereotypical thing of dropping out of uni and immediately gone walking into NLP and I've been doing retrieval on NLP for 6 7 years now.

[01:12:25] Benjamin Clavie: Very standard background.

[01:12:27] Alex Volkov: So definitely related background. Okay. So we're here to talk about multiple multiple things, interesting things. And Conor, I think maybe let's just start with. I think the guy behind some of this work Omar Khattab is not with us, right? But definitely some of the work that we're going to talk about is attributed to him.

[01:12:45] Alex Volkov: So maybe, can you, Conor, can you start us with an introduction to maybe DSPy and then Colbert, and then we're going to talk about Colbert and Ragatouille, and then just a brief one, then we're going to dive into what this means for retrieval stuff, definitely as it relates to you guys in Wave V8 rags are everywhere and like better rack systems and better.

[01:13:03] Alex Volkov: Options to prompt these LLMs to better retrieve is, everybody's looking for those. So let's start maybe there.

[01:13:12] Connor Shorten: Okay, so I'll try to keep the story going from intro to DSPy and then taking it into retrieval. So I think the first thing about DSPy that will like capture your interest is the programming model. It has this way of Writing initial prompts in a really succinct way, and then you can chain together or compose these graphs of several large language model calls with tool use in the middle, and we can come into retrieve a little bit there as well, but you start off with a really coarse description of what you want it to do, re rank these documents, and then it will optimize the, the whole description of the task as well as giving you a few shot examples to put in the prompt.

[01:13:50] Connor Shorten: So that's the first thing that is just super interesting I'm sure everyone listening has done this like manual tweaking of the prompt to try to, get it to do your task and how irritating that can be. And so that's probably the quickest value add is it automatically will come up with the prompts.

[01:14:03] Connor Shorten: And then when you want to switch your language model you've been over there saying please output JSON, four exclamation marks performing better than one. And now you switch from GPT 4 to Gemini Ultra, or say, you want to see if Quinn can be view shot prompted to do this.

[01:14:17] Connor Shorten: You can now recompile the prompt by using DSPy, and you can switch your language model without having to then redo the prompt tuning.

[01:14:24] Alex Volkov: So I have to pause right here, Connor, because I'm coming to this as clean as possible with not a lot of understanding of these things . You said recompile the prompt.

[01:14:33] Alex Volkov: I'm definitely one of the folks who've tweaked prompts, tried again, saw, okay, it works for a GPT 4. I'm definitely one of those folks. What do you mean compile the prompt, recompile the prompt? Let's talk about the compilation part of this.

[01:14:44] Connor Shorten: I even, when I met Omar, I said, compile it. It's overloaded. I think this kind of analogy started with calling LLMs the new operating system for LLMs and So I think that's the line of thinking to be calling it a compiler. Really we mean automated prompt [01:15:00] tuning.

[01:15:00] Connor Shorten: But the reason compiling, I think is the right way to think about it, is, let's say you have eight large language model programs eight parts of it that's what I think is the really exciting that's what I think makes LangChain so popular is people see this gallery of examples of chains where you first analyze some chunks of blog posts, extract the topics, then, You later on aggregate the topics into a description of the topic and then you maybe pass it to an editor prompt, and then you maybe have a council of reviewers, like there's this chain, and so with each component of the chain, or I think graph is now the more common abstraction.

[01:15:35] Connor Shorten: You have a prompt there. So let's say you have eight language, or however many, I imagine that as this, continues to evolve, we're going to see like super deep LLM the programs that will have so many LLMs in the middle of it. And so you have a prompt for each of those components.

[01:15:49] Connor Shorten: And so that's why compiling, I think the analogy is great because you're compiling the prompts for all of these prompts and yeah, so that's why I'll defend the compiling.

[01:16:01] Alex Volkov: So I'll just say like from a perspective of a tinkerer. That's something that maybe triggers me a little bit to say, Oh, I need to compile stuff. No, I just write Python code, but you're saying developers do not fret. Compiling is not that like crazy. It's specifically very helpful and useful for larger applications and very, is very helpful for when you want to replace the brain behind the stuff that you're doing or you want to do this in a structured way.

[01:16:24] Alex Volkov: Is that me understanding correctly of what we're talking about?

[01:16:28] Connor Shorten: Yeah, I agree completely with that.

[01:16:29] Alex Volkov: Awesome. So that's DSPy and Omer Hatab Latent Interactions, or Latest Interactions I think the nickname is. We're definitely going to add him to show notes as well. He's the author of this. DSPy has been around for a while. I definitely know that he has been posting about this quite, quite a lot, but recently has been on the pickup as well.

[01:16:46] Alex Volkov: And maybe Colbert is one of the reasons. Let's maybe, can you introduce Colbert as well, Conor? Or do we have some stuff about DSPi still to cover in the introduction phase?

[01:16:56] Connor Shorten: Okay, I can transition to Colbert.

[01:16:58] Alex Volkov: Colbert? Colbert? How do we, how do you even pronounce this thing?

[01:17:02] Connor Shorten: I was surprised when Omar pronounced it Colbert because it, it's Bert and then there's Stephen Colbert. I'd heard him on the podcast with I think Christopher Manning from Stanford who had, asked him about that.

[01:17:14] Alex Volkov: So if Omar, the creator of this pronounced Colbert, unfortunately, even though it's BERT models, I think Colbert is what we're talking about. But yeah, from Stephen Colbert. What is Colbert? Why is there excitement on my feed around this? And let's give us an introduction, Carmen.

[01:17:31] Connor Shorten: So the, probably the right way to start thinking about it is in search, you typically have retrieval and then re ranking and retrieval is where you have like encodings of the documents. Like you put each of the documents into an embedding model and you get a vector embedding, and then you're doing just, dot product distances between the query vector and these document vectors.

[01:17:51] Connor Shorten: So there's no interaction between the query and the documents. The representations are encoded completely separately in retrieval. And then you'll typically pass that into a re ranker. And so there are three kinds of re rankers. There's point wise re rankers that take as input the query in the document and then output a relevance score, doing the interaction between just this query and this, the query in this one document.

[01:18:12] Connor Shorten: Then there's pair wise where you take two documents in the query and have a tournament of two at a time. And then there's the list wise re rankers where you're taking all the documents as input at once. So the re rankers are pretty effective, you have this massive latency overhead by doing it like that.

[01:18:28] Connor Shorten: So what Colbert introduces is this late interaction. So the benefit of having this interaction between the query and the document most similar to the point-wise cross and coer reran, where you keep the vectors for the the documents and you have this kind of interaction between the inner token vectors.

[01:18:47] Connor Shorten: So let me, it's right now what we're doing mostly with vector search is, and this is why the BERT thing is actually really important, is because we're using these encoder only models that output that like a vector for each of the token. But then we pool all those vectors to represent the object with one vector.

[01:19:02] Connor Shorten: But Colbert, you keep all the vectors for the query and the document. And then you have this inner, it's maybe a little hard to just talk you through the math behind this, but you have this. The maximum similarity of each of those query vectors with all the document vectors. So say you have 100, document vectors and you're at index 0 of the query vector as you do the maximum similarity with those 100.

[01:19:22] Connor Shorten: Then you're at the first vector of the query, second, third, so on. And then you'll average that out. So you now have this late interaction of the vectors between the query and the document. I hope that maybe Benjamin can take the mic from here. I hope that gets the gist of it.

[01:19:37] Benjamin Clavie: Yeah, that was pretty good. So just to clarify, like max similarity is like when you're using normal vectors or like batch representation, you do have a single vector for the whole document.

[01:19:48] Benjamin Clavie: When you're using Colbert, like Connor said, you've got one vector per token, and at retrieval time, what you do is you compare every single one of your query tokens, so generally not a lot, like maybe 32, and you compare that with every single token in every single document, and you make, you only keep the highest similarity, and then you sum that up, and so you compare every token to every token, you get this really fine grained comparison, instead of trying to slot everything into one massive vector, which would probably lose information.

[01:20:17] Benjamin Clavie: Because you're doing it at the token level, it's very clear, I call this like a bag of embeddings because it's like quite close to what we do with TF IDF but with embeddings instead of like just a word count.

[01:20:29] Alex Volkov: Wow. Okay. So let me try. So Connor said a bunch of stuff. Then Lindgren, you simplified. Let me try to simplify from my understanding. Okay. Regular rack system, regular basic, not without even the re ranking step. Connor? Like the basic stuff that people do in the wavy examples, for example or whatever local embeddings you have, let's say a vector store of a bunch of information.

[01:20:49] Alex Volkov: You have a user asking a question, you want to augment LLM's information. tree because of the knowledge cutoff. And then you embed the user's query in some sort of embedding. We've talked about embeddings multiple times here on ThursdAI. You get some number back and like Benjamin said, you get one embedding for the whole document or the whole query.

[01:21:08] Alex Volkov: You get like just one, not per token. You get one embedding and then you use that. And to compare, and the usual similarity score is the ways to compare this. Then if we, you wanna go to advanced stuff, then you maybe do some re ranking. Re ranking is basically showing you like another LLM step, basically, right Conor?

[01:21:28] Alex Volkov: Or some maybe model that does re ranking for you, that chooses, you retrieve multiple examples, and you choose which one like fits better. And you can do this based on several things. The downside of this is, the bigger documents you embed, the kind of, um, The last concepts maybe in this whole embedding are similar to your query.

[01:21:47] Alex Volkov: And we've all like talked about this kind of similarity is very interesting because embedding definitely has dimensions, but it's hard to figure out if a huge document like embeds into one is how should I say, averages with everything that happens in there. And the benefit here of cold bear.

[01:22:06] Alex Volkov: Finally, I'm pronouncing this correctly. Colbert is that instead of embedding one time, it embeds per token. And am I getting this correctly? That sounds to me like a lot of compute. Is that correct? Embedding per token sounds okay, now we can compare each token from the query to each token of the document.

[01:22:24] Alex Volkov: But is it significantly overhead in terms of compilation time compute? What's the downside? It sounds better on the surface.

[01:22:32] Benjamin Clavie: So yeah,

[01:22:33] Alex Volkov: Go ahead, Benjamin, please. Yeah.

[01:22:35] Benjamin Clavie: clarification was quite clear in that, yeah, it's very clear, the problem with single vector representation is You've got a long document, and you're essentially asking the model to be like, I'm going to squeeze in every single thing that could be to know about this document into 500 floats or something, which is not a lot of space.

[01:22:54] Benjamin Clavie: But, Colbert takes more storage space, to answer your question, like you will need to store more tokens even though there are compression techniques, and we'll get into that later. But compute wise, it's essentially the same, because when you're using any sort of transformer model, you'll be attending to every token anyway.

[01:23:09] Benjamin Clavie: The only difference is Colbert actually stores those, instead of just averaging them at the end.

[01:23:15] Alex Volkov: Oh, so the, on the output of something like Colbert, you actually get all of the [01:23:20] embeddings per token and not just one embedding per the whole document. And then you can, it's like the storage is higher, but you can actually use those for more, better, higher quality comparisons. That's what we're talking about here.

[01:23:33] Alex Volkov: Is that correct?

[01:23:35] Benjamin Clavie: That's the gist of it, yeah. And then after Colbert You've got Colbert V2 and PLED, which is essentially Omar and Tim found out that, yeah, that does take a lot of space, but can we compress the embeddings? So most of the time when you see Colbert using production, it actually compresses every single token vector to just one or two bits.

[01:23:56] Benjamin Clavie: So don't take that much space

[01:23:58] Alex Volkov: Oh, so Colbert v2 is, what, 10x size or something comparison, right? Something like this. Conor, can you speak about this? Cause obviously you're in the vector dataset space. The more folks host, the better it is, for you guys. Cause you get a pet token. Can you just speak about the size of this and like the improvement as well?

[01:24:20] Connor Shorten: There's a couple ways you can do this quantization. The most common is just to have k means on the segments. You divide vectors and every two contiguous values you would then cluster that and then reduce the precision to like, eight bits, so when you quantize the token vectors, you can take down the storage overhead a lot. But yeah, I think Benjamin already said it all.

[01:24:43] Alex Volkov: Okay, so now let me take this into the practical realm because Colbert, the original paper came out in 2020 and I don't remember this off the top of my head, but the way I'm reading, I have some mental documentation here that I'm using to ask you guys the proper questions. And then Colbert V2 came out and a significant compression of the data because they quantize the actual individual embeddings and performance is essentially the same, I assume.

[01:25:06] Alex Volkov: And then. It also came out a while ago, and then, Benjamin, I think you're in charge, single handedly, for the resurrection, or like the renewed interest, because all of what we're saying doesn't not, doesn't sound to me super easy, as somebody who just okay, it's super easy for me to use a vector database, like wavy, other competitors, local vector stores, they all have very simple tutorials for me to just embed the query, go do a regular the nearest neighbor can then search whatever, and then just do this for the user.

[01:25:34] Alex Volkov: Now, all of what we're talking about, embedding per token, like comparison, like all of these things sound complex to me, and then that's where Ragatouille comes in, correct? So can you talk about, you see all this happening, and then what's your library doing why is it in charge of the resurrection of this whole concept?

[01:25:53] Benjamin Clavie: Yeah, I don't know if I'll go as far as resurrection, but yeah, Colbert is basically used by everyone who is quite aware of search, like pretty much every search startup, people at Google, etc. are using Colbert, but they don't got that big outside the poor user area, and the reason I think it's something that Omar mentioned the other day is I wouldn't say Colbert itself isn't usable, but it's not approachable.

[01:26:16] Benjamin Clavie: If you go look at the repo, it's scary. There's a lot of things. How do I store those vectors, et cetera. And the point of Rege2 is trying to bridge that gap because we are now at the point, I think, where AI has users that aren't like traditional AI for users, especially in IR. Vectors are complicated.

[01:26:33] Benjamin Clavie: Embeddings are complicated. And the point of Rege2 was basically like, yeah, but what if you could use Colbert and just like 4 lines of code, and I tried to build that, and it turned out to be quite easy to build, so that's how it came to be.

[01:26:46] Alex Volkov: So you built it, it's quite easy for you. What is it? Just this is like a library wrapper on top of, The knowledge of how to run Colbert in production. What is the library like? Is this the lang chain for Colbert? Tell us like what folks are to expect when they open up and they say, okay, I need to use something like this.

[01:27:03] Alex Volkov: This is super interesting. This is higher quality retrieval. How do I start?

[01:27:09] Benjamin Clavie: Yeah, so I think there's two things here, that's where I would like it to be, and where it currently is. Where I would like it to be is to keep adding more stuff and basically bridge the gap between what's popular in IR research or retrieval, which is probably a few years ahead of what's actually popular in the mainstream because it's quite obscure.

[01:27:26] Benjamin Clavie: And then what it is right now, like when you open like a tool, it's basically there's two main classes, one that you can use to fine tune and train Colbert models and hopefully more late interaction models, but right now it's just Colbert. And tries to abstract away all the hard stuff there's a thing called hard negatives, when you're training for retrieval, and you need to mime for hard negatives, and that's they're done in the background.

[01:27:48] Benjamin Clavie: And then you've got the main one, which you can use to use Colbert to re rank her, or use Colbert to uncode documents in memory, or use Colbert to create an optimized Colbert index, which does the compression, etc. So it's basically, yeah, give it your documents, it will process them, and then you end up with something you can play.

[01:28:04] Alex Volkov: Just from a perspective of nobody that used this model so far . Let's say I already have a vector database existing. I need to reed everything in there to start using called Bay and with regulatory. And that's what you mean by fine tune or is there like an additional thing that's called fine tune?

[01:28:20] Alex Volkov: 'cause this is not like the LLM fine tune that we've talked about here on Thursday and multiple times. This is a different fine tune. What are we fine tuning? How long does it take? Does it need GPUs? If you don't mind, walk us through this. If how easy this is for the user to do.

[01:28:36] Benjamin Clavie: Yeah, that's a good question. So it's actually quite similar to LLM fine tunes, just on a much smaller scale, because you would actually be fine tuning the model itself. There's another paper by Omar and team, Omar is everywhere in this link, regardless. There's another paper by Omar and team called UDA PBR, which is actually a combination of choosing DSP, so the proto DSP Y.

[01:28:59] Benjamin Clavie: With Colbert to make the fine tune Colbert to any unknown domain. So any new domain, you could technically get a much better retrieval model using that. Right now there's only one implementation. That's something we would like to have in Regentoo. But yeah, the other question is, can you use joint distinct vectors with this?

[01:29:17] Benjamin Clavie: The answer is no, and that's quite annoying. And when fine tune, I also mean like you can fine tune the model, but you can also just choose Colbert of the shells and use that to embed your documents and create a new index. Beef. If I have to speak of the cons, I would say there's no VectorDB except Vespa, which I don't think qualifies as a modern VectorDB we probably mean here that can use call back embeddings out of the box.

[01:29:41] Benjamin Clavie: I know there's interest, maybe Connor, you guys will support it at

[01:29:44] Connor Shorten: some point soon. Yeah, we're definitely working on it. I think we, I think, I do think that you've maybe understated the contribution of Ragatouille before you did this, it wasn't, it was not easy to train your own Colbert model, and it definitely wasn't something that we saw as freQwently.

[01:30:03] Connor Shorten: It was like, Yeah, I think that you've definitely evangelized it. I don't necessarily agree with the most people doing search were doing it this way. Maybe I've just opened a thing, but I think most people have been doing the kind of pooled vectors thing and this is very new, but and yeah, we are working on adding it.

[01:30:22] Alex Volkov: I, from my perspective, just judging by the social feeds, I agree, Benjamin, without working through it I don't think I've been even been interested. But I want to maybe ask Connor here as a follow up. So you, Ragatori, you see it blowing up, like what piques your interest in how approachable this is?

[01:30:36] Alex Volkov: What's fine tuning a Colbert model mean for retrieval? You guys are like researching every retrieval technology out there as much as possible in order to bring this obviously to your users as well. Quality of retrieval is very high of a very high importance as well, but storing these like vectors in different vector databases.

[01:30:54] Alex Volkov: What do you see in Ragatori like exploding and how does this translate into people are using rags better, sorry, rags better.

[01:31:05] Connor Shorten: Yeah, I guess it yeah it definitely is just I think what I opened with this kind of retrieved and re rank it, collapsing it into the one thing. And I think it's really just explained it really well. I agree with you, Alex. I don't think I would have understood Colbert as well as I do now if it wasn't for Benjamin and Ragatouille.

[01:31:21] Connor Shorten: So that's what I think, but under the hood, it's I think it's still like this re ranking thing where we can still use, we still use the pooled vector and like an HNSW search to surface the candidates and then we'll now bring the, the other token vectors with it.

[01:31:35] Connor Shorten: And then, for Weaviate that just means opening up, like having a more generic type [01:31:40] for how we store vectors to, instead of just one vector now we have this, like an open interface. To, to let you still use the, because the pooled vector embedding search is still very popular as well.

[01:31:51] Connor Shorten: The OpenAI embedding. I think the Matryoshka thing, maybe we could talk about that as well. I think that has some flavors of this. I'm not sure if it still has the same kind of hierarchy to it. But I think there's also, maybe I'm going off topic, but there's also a paper from DeepMind about semantic IDs.

[01:32:06] Connor Shorten: And so semantic IDs, they're like this like hierarchical, discrete quantized things where it'd be like you Like at the, say you have three, three IDs and they're each eight bits and the first one would be like whether it's about sports or news or something like that. So there's definitely a, yeah, this is definitely like a newer thing, I would say.

[01:32:25] Connor Shorten: And I hope I answered the question. I think I just did like a circle around.

[01:32:28] Alex Volkov: No, with this article, definitely. I just want to touch about a concept that may be not familiar for folks here on the ThursdAI stage. Matryoshka embeddings came to my, on my radar just recently after OpenAI released their new embedding models. And one of the things they've added in their new embedding models is the ability to reduce dimensions like via API call.

[01:32:45] Alex Volkov: And people were starting thinking like, Hey, how did they do this? What usually, like when you get an embedding model, you get And then some folks started saying there was this paper called Matryoshka embeddings that Matryoshka, if you guys are not visualizing what this is like the Russian dolls thing where one fits into another.

[01:33:00] Alex Volkov: And there's this paper, and I think the author of Matryoshka embeddings is on my Reddit as well. Maybe we'll get him on ThursdAI that actually allows for significantly smaller, correct me if I'm wrong, way to do this. And I think. Folks from Junaid definitely talked about trying to train Matryoshka with some other stuff.

[01:33:17] Alex Volkov: So this is like a new concept we haven't touched upon yet, but could potentially be an additional competitor here. I want to scroll back real quick. We have Benjamin back. Benjamin let's talk about the speed of this for like larger documents. Definitely what I Learned about Regato definitely, but also about Colbert is for larger documents.

[01:33:36] Alex Volkov: I saw something, I think from Omar about just like millions of rows or something significantly faster. Could you speak about like the speed of this whole thing? Are we getting like an improvement significantly for speed? Like why would a person who already has a setup consider switching to something like this?

[01:33:51] Alex Volkov: And let's talk about the seconds it takes to run through like a bunch of documents. to find similarities.

[01:33:59] Benjamin Clavie: Okay, so that's, so I did miss a few things, so it might have been said already, but there's a trade off here in that creating a Colbert index as in an optimized one using quantization, like Connor said, is quite slow, like pretty slow, because it has to run k means on all your embeddings, etc., but the con, like the flip side of that is that once your documents are in an optimized index, Query is pretty much in constant time, like it doesn't matter if you've got 100 million documents or billions, it will take about 50 60 milliseconds, and that's because the indexing optimization step, I think, creates A bunch of centroids that you can use to, you can use as a gateway to documents, like to simplify things.

[01:34:40] Benjamin Clavie: So query is pretty much constant, and that's a big pro of optimized Colbert indexes. I think that's what counts, because it also means that adding and deleting from a Colbert index is very slow, because you need to recompute that. And I think that's space here for some sort of hybrid approach. Also using NHSW for like smaller collections, because you don't need that sort of optimization if you've got like 10, 000 documents or something.

[01:35:04] Alex Volkov: Interesting. It's just for my understanding brain this is very similar to pre compilation of some stuff versus like runtime executions or some stuff you're saying if basically you can offload. The compilation part, and your users will not basically suffer from this, right?

[01:35:20] Alex Volkov: You don't have to go and call different APIs for this. If you're able to do this, and then you precompile everything, and the benefit here is larger indices, like larger, like significant larger document stores. You're talking about like millions or a hundred millions or so. But then retrieval is almost near time, like instant, under like milliseconds.

[01:35:41] Alex Volkov: That's, I think, a crazy benefit for folks, especially in enterprises and different places where Yeah, I think it's like a significant improvement towards regular like search and vector comparison. Conor, would you say so as well? Because you guys are in the business of vector comparison and bringing people.

[01:36:00] Alex Volkov: Are you seeing like a significant improvement from a retrieval speed here.

[01:36:08] Connor Shorten: Yeah, I think the latency probably isn't too bad because you, the way that I understand Colbert is that you still, or Colbert, sorry, I would agree on Colbert, but, is that you still have the the top 100 search with HNSW and, that latency is, Pretty slow. It's gonna be like five milliseconds at a million scan.

[01:36:25] Connor Shorten: That's like the most hand wavy thing ever, but and then you just bring these quantized vectors into memory to then re it's way faster than, the cross encoder approach where you're going to take those top 100 results and then append them with the query and send them to a, an inference container to get back the scores and sort them.

[01:36:39] Connor Shorten: So it's way faster than that. I think maybe one thing out of what you just said that I'd want to parse is I don't think it's the same analogy as compile it or compose it at runtime. It's maybe more so like an asynchronous kind of thing where you can query the index that you currently have and then in the background, the index can start doing that k means quantization.

[01:37:00] Connor Shorten: That's probably the slowest thing of as Benjamin just mentioned. Like that quantizing the token vectors and now we're, let's say we're I'm actually not familiar with the detail of exactly how many token vectors you're keeping for document, but let's say it's 512, right?

[01:37:14] Connor Shorten: And now you're going to be running k means over, each of those and in parallel and then you also are, trying to multi thread the per segment codebook. So I think feeding that, fitting that codebook is going to be your challenge. And so that's probably, and then keeping that fresh because these codebooks, if that's the way you're doing it, I don't The thing about Matryoshka and it's like maybe, and it's like maybe you can get the quantized vectors out of the box with one of the embedding models, but it's the quantization schemes are pretty dependent, like dependent on your data, particularly, like you can't it's not like the embedding models that you get from the common APIs that they come with the code books.

[01:37:53] Connor Shorten: You have to fit these code books to your data. So I think the way to think about it would be that we can fit these code books like asynchronously in the background and you can query what you currently have and then, the updating and having the refresh indexing that can happen with a cycle kind of way.

[01:38:10] Alex Volkov: All right. I wanna maybe move towards, okay. Let's say folks are interested to trying this. Benjamin, could you could you speak about how to like. Is Regatoid the right start? Do they have to? I think you mentioned this briefly. I just want to return to this. Is this only like significantly better for a large set of documents?

[01:38:28] Alex Volkov: What are the steps to getting started here and what people should know? And then I guess we'll ask about if where to find you guys and how to keep up to date with as these developments around this area happen.

[01:38:43] Benjamin Clavie: So if you want to get started, I think Regato is probably definitely the easiest way to try Colbert. We've got a few example notebooks on the GitHub repository. If you want to contribute more, please do. That's the big thing. I need more documentation, more notebooks. But you can try re ranking or indexing in memory or building your index.

[01:39:01] Benjamin Clavie: So I've got Finetuning pretty much out of the box. So I'd say start there. In terms of retrieval performance, like Colbert is always a really strong competitor. Performer in the existing IR literature, and we do have a re ranker, so you can just try it out, just use it to re rank before you commit to indexing your whole documents, just to see how it would perform for you.

[01:39:21] Benjamin Clavie: So that could be an easy way to slot in any existing pipeline, basically, just retrieve documents, re rank them. and see what the rerun code does for you.

[01:39:29] Alex Volkov: And that in that case, I think integration with existing libraries also exists for folks who use like ClangChain or LAMI index. I saw that they also integrate at least some parts of this, correct?

[01:39:40] Benjamin Clavie: Yeah, and I do want to thank them for that because they basically did this within 24 hours of me reusing ReGaTu. On Lama Index you can use Colbert Indexes and on LangChain you can use Colbert Indexes and you can use like Colbert's rerun code as well. So if you already use LangChain you can add like an extra Colbert step using [01:40:00] ReGaTu in three more lines of code, I think.

[01:40:02] Alex Volkov: Incredible. So folks definitely definitely who are interested in trying out what the big dogs use for search re ranking is a very easy, like without committing re ranking is a fairly easy way to get started with this and see if you get a significant performance. And Connor, we barely touched on DSPies.

[01:40:19] Alex Volkov: I do want to have a conversation about because that's also all over my feed and basically Omar is all over my feed. And could you Let's say, how, does this all connect somehow with DSPies or no, and because DSPies is for the prompts area. This is more for the retrieval area. Where's the connection point that I'm missing besides Omar being everywhere?

[01:40:39] Connor Shorten: I think that, oh, I think Omar being everywhere is maybe the biggest connection I, because to me it's kinda like D SPY is like optimizing the LLM program prompt part. And then I think to have the optimi optimization loop connect between that and the retrieval model, you definitely, there's works like propagator in pairs.

[01:40:59] Connor Shorten: Omar has, I think, UDAPDR, something like that, where you use the LM to generate synthetic queries, then you fine tune the embedding model with that. So that's that would be where the connection would be, DSPy is like a synthetic data framework, you tell it what you want it to do, and it will use the LLMs to generate successful executions of the task, and then you use that to distill it to either small models, or to tune the prompts, or you could fine tune an embedding model.

[01:41:25] Connor Shorten: I don't think it's quite, I think that would be pretty advantageous. Benjamin can take the mic from here.

[01:41:32] Benjamin Clavie: Yeah, I wouldn't say DSPy and Colbert are directly related. They exist in the same space, but definitely very different tools. Like Connor mentioned, UDA PDR, which is a paper, the paper I mentioned, actually, where you use DSP and hopefully soon DSPy to fine tune a Colbert to any domain.

[01:41:50] Benjamin Clavie: It's not exposed. It's never been exposed to before and get it to a state of the art result only domain. That's a really good application of DSPy to Colbert. And likewise, you can use Colbert as a retriever on your DSPI pipeline, but it's just a component, it's not quite the DSPI thing.

[01:42:08] Connor Shorten: I do have something, though, that is very related to retrieval generally.

[01:42:12] Connor Shorten: Is we saw all these amazing LLM query router things. I want to give Llama Index credit for evangelizing most of this stuff. But, so one example is, say you have the LLM pick a metadata filter to put on the vector search. Like you want to, search only where you're searching through, let's say you have an index of podcast clips and you want to say only where the speaker is Omar Khattab, and you have an LLM predict that filter, and then that would be in the retrieval engine.

[01:42:38] Connor Shorten: And so you have this you have a prompt behind that same with text to SQL. There's a prompt behind how you we'll put these things around retrieval. And so DSPy can optimize the prompts or optimize the models that do that to get the maximum performance out. And not, I, not to, I don't mean to say anything negative about the existing frameworks, but you're right now, locking into the prompts that they have built in to the framework it do these things, whereas DSPy opens it up to optimize it for your thing.

[01:43:06] Alex Volkov: Interesting. Yeah, I don't think it's negative necessarily. I think people after using some of these frameworks they understand that and we've seen this from multiple folks. This, they could potentially start with something like a Lama index or LinkedIn and then quickly figure out that some more.

[01:43:20] Alex Volkov: Freedom is needed and de SPI saying is a potential kind of way to do that. Okay. Connor, anything else? Very interesting. So first of all, you have a bunch of great content on this. You recently did. I think it's been to the top of the tweet. I'll definitely add this to the show notes as well.

[01:43:32] Alex Volkov: You did a deep dive into de SSPs on your, was that on the podcast or was just a video? Definitely we'll send folks there. Anything else you want to add of like, how to find you, where to find your content and definitely folks should follow you. First of all, we'll add your things.

[01:43:48] Connor Shorten: Thanks, Alex. Yes, I have two podcasts right now with Omar, of course, and then I have Carol Duserlink, who's created this. Infer, Retrieve, Rank, Program. It's one of the coolest examples of DSPi. And yeah, and then I have one video out so far explaining the whole thing. Quickly, I wanted to point people to the update to DSPi Assertions.

[01:44:05] Connor Shorten: Because I think this is the most important thing with these prompting frameworks. And I think it's important. to also understand Instructor from Jason Liu, which is where you use Pydantic to define the schema of the outputs that you want from the language model, and then you validate the outputs to make sure that it, outputted JSON with the keys that you wanted.

[01:44:23] Connor Shorten: And so DSPi Assertions is in this similar category, and this is like the most common discussion I'm seeing in the DSPi Discord is people looking to add Instructor to DSPi and jointly looking to do this thing of like structured outputs and have this retry mechanism. There's a new work from Arnav Signal Sig, oh, sorry, Arnav Singh V.

[01:44:43] Connor Shorten: We haven't met yet, but, and know more about DSPi assertions. And I'm going to link it in the description of this chat. Cause I highly recommend people check it out.

[01:44:50] Alex Volkov: Awesome. Nisten, just before I give you a question I will shout out that Jason Liu from the instructor library came to the Weights & Biases courses, and there's a course that he builds with us as well that's for free. You can just go 1db. ai courses. I'll definitely add this in the link below about structured output and how to force these LLMs to give us better structure output.

[01:45:09] Alex Volkov: It's funny that a person named Jason is building, you tools to get LLMs to output JSONs. But that's all I have. Just super quick. Nisten, go ahead. You had a question here.

[01:45:19] Nisten Tahiraj: I just want to say it's pretty amazing that the people we bring here are from the industry. We actually use, like from last week, I started using Lilac, I might actually start running Ragatouille on that on that Hacker Neon dataset. And so I wanted to know and mainly since some people ask in the comments, what have I used, I forced myself to only use open source models.

[01:45:45] Nisten Tahiraj: And cause I feel like that's the only way they're going to start getting better if we restrict themselves to them. I don't recommend you do it just yet, just wait another. Maybe a week or two but I want, I wanted to ask uh, we see some limitations with retrieval augmentation systems, like in GPT 4 when people use it.

[01:46:07] Nisten Tahiraj: It only gives three points from the document, doesn't really summarize it and stuff. What are the benefits of going with the Colbert? I'm sorry. Is it because it's much faster? Can you feed it many more documents? I'm talking from a practical point of view, not necessarily even from a tech person's point of view, like as a business who has a lot of customer data why should they use this versus just putting it on pgVector and doing function calling?

[01:46:41] Nisten Tahiraj: Is this faster that way? And what limitations does using again, RegA2 with Colbert

[01:46:47] Benjamin Clavie: have? That is a good and open question. So limitations we have a lot right now, like the lack of Cloud hosting offering is a big one. There's not really somewhere you can host this except doing it yourself, which is a big problem.

[01:47:05] Benjamin Clavie: And the main reason to use it, I would say, is generalization because the thing when you use any of the shared submitting models is they look good on benchmarks, and they tend to work quite well, but they've been optimized for those benchmarks. Whereas Colbert, for instance, like Colbert V2, has never been trained on the MTEB benchmark for retrieval, etc.

[01:47:24] Benjamin Clavie: The reason it generalizes well is because working at the token level makes it a lot easier for your model to encode information. Whereas, like, when you're trying to squeeze everything into a single vector, it might not very well, not work very well, say, for your custom domain. Whereas with Colbert, you can always assume it's going to be okay in every domain, but if it's not the best, you will need to fine tune it later.

[01:47:45] Benjamin Clavie: It's probably the biggest draw, I'd say.

[01:47:51] Alex Volkov: Awesome. So I definitely wanna thank you guys for coming up and explaining these concepts that have been floating around in very simple language. And I appreciate your patience with me re asking this in the way that I understand, because I know definitely that this is my way to understand, but also some folks in the audience.

[01:48:06] Alex Volkov: That's how we do here on ThursdAI, so more than welcome to rejoin. For I now consider both of you friends of the pod, so I agree with Nisten. It's really cool to see the authors of the libraries and the tools that we use. Come here to ThursdAI to talk about them, [01:48:20] and obviously, upcoming features as well.

[01:48:22] Alex Volkov: Definitely welcome. Benjamin, thank you for doing a bunch of open source stuff, and evangelizing the whole con birth call birth thing to make it simpler for folks. Definitely, thank you. And any anything you want to add here that I haven't touched yet? Please go ahead, Benjamin.

[01:48:36] Benjamin Clavie: I do have a few shoutouts, shall we say. One of them is that LungChain and DSPy are not mutually exclusive, and I shared that in the chat. There is now LungChain x DSPy integration, where you can define your chains in LungChain and still use DSPy to optimize things, which is pretty cool.

[01:48:53] Benjamin Clavie: And in the embedding world, so you mentioned Matrioshka embedding, and we talked about Colbert, and the people at JIN are actually training a Colbert model right now using Matrioshka embedding for compression, as like some sort of let's try this out, see how it works. And the final one is, you might have brought this up already, but the people at BAI train, like really, BGM3, as a really cool embedding model that in a single pass outputs.

[01:49:19] Benjamin Clavie: Dan's Vector, Burst, or Colbert Style Multivector Implantation, and the Splate Style Sparse Implantation. I won't go into too much detail about that,

[01:49:26] Alex Volkov: I'm sorry. I don't think I covered that. Who was that? Sorry. Could you repeat?

[01:49:31] Benjamin Clavie: The people at BAAI, the people who do the BGE

[01:49:34] Alex Volkov: Oh yeah, but yeah. We've talked about their model recently. They,

[01:49:37] Benjamin Clavie: ABI, yeah,

[01:49:38] Alex Volkov: Oh, I did not know.

[01:49:39] Alex Volkov: So they're now have a thing where outputs a regular embedding and also called burst style embedding.

[01:49:46] Benjamin Clavie: Yeah, the big thing last week was M3, which has a Colbert Style Embedding, Splate Style Embedding, which is a Sparse Implantation method, and Dan's Embedding, which is just a single model, a total of three.

[01:49:57] Alex Volkov: Oh, that's incredible. Okay. So we're adding some knowledge here. Thank you for, let me just repeat just the way that I hear this, we've talked about the BAAI BGE M3. M3 basically stands for multiple things. One of them is multilinguality. So they upgraded their embeddings to use not only English, but also I think a hundred languages as well.

[01:50:14] Alex Volkov: So now Benjamin, you're saying they're also implementing for us this step, the output, the dense embedding, but also the. The call Burr embedding, correct?

[01:50:25] Benjamin Clavie: yeah, yeah, one of the meanings of M, I think, is

[01:50:27] Alex Volkov: Multicomposability or some con yeah. Multifunctionality. Yes, exactly.

[01:50:33] Benjamin Clavie: can use it to generate different kinds of embedding. And I think that the first Non Colbert, actually like strong multi vector model. There's issues as in the vectors are too big, etc.

[01:50:45] Benjamin Clavie: But it's a very nice thing to see happen. Definitely, like

[01:50:49] Alex Volkov: Oh, definitely shout out then we need to get the folks from BA AI here to speak about this. So if you folks know them, definitely connect them to me. I would love to hear about from the authors of BG. Yeah, definitely shouts out Junaid. I think Bo Wang, we've mentioned he's a friend of the pod.

[01:51:03] Alex Volkov: He came when Junaid released embeddings and he often comes here and gives us like insights about how embeddings work. Shout out Bo and the team with Junaid as well. Connor your stage, if you want to add everywhere else where folks. You can follow or shout out your stage. And then we're going to continue with some more news.

[01:51:21] Connor Shorten: It's been so cool to be a part of the podcast. And I love how it's integrated into X because this is actually my favorite place to manage communication. So if you want to reach out, here would be great.

[01:51:31] Alex Volkov: Yeah. So definitely give a Connor a follow and a Wave8 podcast is incredible. We've been, by we, Wits and Biases. We had a mutual video together and Connor hosted our folks. And there was a, I learned a bunch of it before I joined Wits and Biases as well. A great source of information from both of you.

[01:51:45] Alex Volkov: Thank you guys so much for coming up, explaining these complex. on the surface concept to us, maybe complex also implementation wise, but making them simpler as well. I think it's very important talking about them. I think it's very important and you are now considered friends of ThursdAI community and hopefully this will get more folks to learn about this, contribute, etc.

[01:52:05] Alex Volkov: And I think with that, we're like, a bit over the top, like two hours since I started the recording. We had a great show today. Thank you everybody for listening and coming. I just wanna summarize this in a few notes that that I really enjoy my time here every week. And I really enjoy learning from folks. I think Nisten, you mentioned today that it's so cool to have the authors of the things we talked about.

[01:52:25] Alex Volkov: So today we also had this benefit. We had Benjamin here and we had Connor who covered this. And we also had Justin again from the Qwen team to talk about the Qwen stuff that they released. And it's really cool that the community now connects different people.

[01:52:36] Alex Volkov: So I was able to connect Justin and the Qwen team with the LM studio folks and Olama folk. No, I think only LM studio. And they were able to work together that they release is now supported in LM studio. the second they release something. So I love how this community comes together. I encourage everybody who listens to this to also participate in this.

[01:52:55] Alex Volkov: Either follow everybody who's on stage here interact with our posts and boost the signal a little bit. Tell your friends if you're working with friends and they don't listen to ThursdAI. And there's alpha in listening to ThursdAI like today definitely tell your friends where this alpha can be found.

[01:53:10] Alex Volkov: And with that, I want to thank you all and have a nice Thursday. Bye bye, everyone.



This is a public episode. If you’d like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe
Fri, 09 Feb 2024 01:25:42 GMT
📖 ThursdAI - Sunday special on datasets classification & alternative transformer architectures

Hello hello everyone, welcome to another special episode (some podcasts call them just.. episodes I guess, but here you get AI news every ThurdsdAI, and on Sunday you get the deeper dives)

BTW, I'm writing these words, looking at a 300 inch monitor that's hovering above my usual workstation in the Apple Vision Pro, and while this is an AI newsletter, and I've yet to find a connecting link (there's like 3 AI apps in there right now, one fairly boring chatbot, and Siri... don't get me started on Siri), I'll definitely be covering my experience in the next ThursdAI, because well, I love everything new and technological, AI is a huge part of it, but not the ONLY part!

📖 It's all about the (big) Datasets

Ok back to the matter at hand, if you've used, finetuned, trained or heard about an AI model, you may or may not realize how important the dataset the model was trained with is. We often talk of this model, that model, and often the only different is, additional data that folks (who I sometimes refer to as alchemists) have collected, curated and structured, and creating/curating/editing those datasets is an art and a science.

For example, three friends of the pod, namely LDJ with Capybara, Austin with OpenChat and Teknium with Hermes, have been consistently taking of the shelves open source models and making them smarter, more instruction tuned, better for specific purposes. These datasets are paired with different techniques as well, for example, lately the so-called DPO (Direct preference optimization) is a technique that showed promise, since it not only shows a model which answer is the correct for a specific query, it shows an incorrect answer as well, and trains the model to prefer one over the other. (see the recent Capybara DPO improvement by Argilla, which improved model metrics across every evaluation)

These datasets can range from super high quality 16K rows, to millions of rows (Teknium's recently released Hermes, one of the higher quality datasets comes in at just a tad over exactly 1 million rows) and often times it's an amalgamation of different other datasets into 1.

In the case of Hermes, Teknium has compiled this 1 million chats from at least 15 different datasets, some his own, some by folks like Jon Durbin, Garage bAInd, and shareGPT, from LMsys.org, which was complied by scraping the very popular sharegpt.com website, from folks who used the shareGPT extension to share they GPT4 conversations. It's quite remarkable how much of these datasets are just, conversations that users had with GPT-4!

Lilac brings Garden

With that backdrop of information, today on the pod we've got the co-founders of Lilac, Nikhil Thorat and Daniel Smilkov, who came on to chat about the new thing they just released called Lilac Garden.

Lilac is an open source tool (you can find it RIGHT HERE) which is built to help make dataset creation, curation and classification, more science than art, and help visualize the data, cluster it and make it easily available. In the case of Hermes, that could be more than millions of rows of data.

On the pod, I talk with Nikhil and Daniel about the origin of what they both did at Google, working on Tensorflow.js and then something called "know your data" and how eventually they realized that in this era of LLMs, open sourcing a tool that can understand huge datasets, run LLM based classifiers on top of them, or even train specific ones, is important and needed!

To strengthen the point, two friends of the pod (Teknium was in the crowd sending us 👍), LDJ and Austin (aka Alignment Lab) were on stage with us and basically said that "It was pretty much the dark ages before Lilac", since something like OpenOrca dataset is a whopping 4M rows of text.

Visualizations in the Garden.

So what does lilac actually look like? Here's a quick visualization of the top categories of texts from OpenOrca's 4 million rows, grouped by category title and showing each cluster. So you can see here, Translation requests have 66% (around 200K rows) of the translation category, and you can scroll on and on and add filters and really dissect this whole thing up and down.

The categorization is created by running Lilac on your dataset, which uses embedding algorithms and other neat tricks to quickly chunk and put labels on the categories (AKA classifying them).

Btw, you can see this view and play around with it yourself, here

But running this on your own local machine can be a drag, and take hours if not days for bigger datasets, including sometimes hanging and not even working 100%, so the Lilac folks created Lilac Garden, which is a hosted solution by them to provide a dataset, and do classify something like 4M in 4-5 hours or so.

Which is definitely not possible on local machines. If you're into that kind of thing, again, Lilac is open source ,so you don't have to sign up or pay them, but if speed and this view matters to you, definitely check Lilac out!

RWKV with Eugene (Pico Creator)

On the news segment of ThursdAI we mentioned Eagle, which is the 5th version of RWKV, an attention free, potential alternative to Transformers, that's being developed fully in the open source. Later in the show we had the honor to have PicoCreator, one of the front running folks in the RWKV effort, which is an attempt to see if Transformers can be beat with a new type of architecture (RNN) that doesn't require specific attention mechanisms, that add the problem of Quadratic Attention scaling, making LLMs hard and expensive to run the more context is provided.

Eugene had some technical issues so joined in the middle of the pod, so we didn't have a full deep-dive, however, I figured it's important to bring this info to you guys, as these efforts may yield AI that runs 10-100x cheaper and potentially faster on devices, using almost infinite context lengths.

RWKV and other attempts like StripedHyena (Together AI) and Mamba (from Tri Dao) are attempts that are worth watching as they may supersede or join with Transformers to create the next jump in LLM capabilities.

That's all for this Sunday, needless to say, with the Vision Pro releasing on a Friday, it's been a full weekend of future exploration, which is the main driver in my personal life!

P.S - if you read through to here, you get a gift! A teaser, I have done something different on the pod, recorded a human interest podcast x AI, for the first time. I mostly bring the news and sometimes deep dives like this one, but this story I couldn't ignore, so stay tuned if you're into dating x AI, and how technology disrupts our lives and wether this is all moral or not, as I recorded an Episode with Sasha Jadan and his new Fiancee Karina, which his AI bot picked out for him, after swiping and matching with over 5200 girls on Tinder. The AI also... suggested he'd propose which he did. It was a very interesting conversation that I plan to upload soon!

That's it from me this week, see you all on ThursdAI and don't forget, if you liked this, do me a solid, listen to the pod and then leave a review or a 5 star (at least a 4?) on Apple podcasts 🙏



This is a public episode. If you’d like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe
Mon, 05 Feb 2024 01:15:10 GMT
ThursdAI - Feb 1, 2024- Code LLama, Bard is now 2nd best LLM?!, new LLaVa is great at OCR, Hermes DB is public + 2 new Embed models + Apple AI is coming 👀

TL;DR of all topics covered + Show notes

* Open Source LLMs

* Meta releases Code-LLama 70B - 67.8% HumanEval (Announcement, HF instruct version, HuggingChat, Perplexity)

* Together added function calling + JSON mode to Mixtral, Mistral and CodeLLama

* RWKV (non transformer based) Eagle-7B - (Announcement, Demo, Yam's Thread)

* Someone leaks Miqu, Mistral confirms it's an old version of their model

* Olmo from Allen Institute - fully open source 7B model (Data, Weights, Checkpoints, Training code) - Announcement

* Datasets & Embeddings

* Teknium open sources Hermes dataset (Announcement, Dataset, Lilac)

* Lilac announces Garden - LLM powered clustering cloud for datasets (Announcement)

* BAAI releases BGE-M3 - Multi-lingual (100+ languages), 8K context, multi functional embeddings (Announcement, Github, technical report)

* Nomic AI releases Nomic Embed - fully open source embeddings (Announcement, Tech Report)

* Big CO LLMs + APIs

* Bard with Gemini Pro becomes 2nd LLM in the world per LMsys beating 2 out of 3 GPT4 (Thread)

* OpenAI launches GPT mention feature, it's powerful! (Thread)

* Vision & Video

* 🔥 LLaVa 1.6 - 34B achieves SOTA vision model for open source models (X, Announcement, Demo)

* Voice & Audio

* Argmax releases WhisperKit - super optimized (and on device) whisper for IOS/Macs (X, Blogpost, Github)

* Tools

* Infinite Craft - Addicting concept combining game using LLama 2 (neal.fun/infinite-craft/)

Haaaapy first of the second month of 2024 folks, how was your Jan? Not too bad I hope? We definitely got quite a show today, the live recording turned into a proceeding of breaking news, authors who came up, deeper interview and of course... news.

This podcast episode is focusing only on the news, but you should know, that we had deeper chats with Eugene (PicoCreator) from RWKV, and a deeper dive into dataset curation and segmentation tool called Lilac, with founders Nikhil & Daniel, and also, we got a breaking news segment and (from ) joined us to talk about the latest open source from AI2 👏

Besides that, oof what a week, started out with the news that the new Bard API (apparently with Gemini Pro + internet access) is now the 2nd best LLM in the world (According to LMSYS at least), then there was the whole thing with Miqu, which turned out to be, yes, a leak from an earlier version of a Mistral model, that leaked, and they acknowledged it, and finally the main release of LLaVa 1.6 to become the SOTA of vision models in the open source was very interesting!

Open Source LLMs

Meta releases CodeLLama 70B

Benches 67% on MMLU (without fine-tuninig) and already available on HuggingChat, Perplexity, TogetherAI, Quantized for MLX on Apple Silicon and has several finetunes, including SQLCoder which beats GPT-4 on SQL

Has 16K context window, and is one of the top open models for code

Eagle-7B RWKV based model

I was honestly disappointed a bit for the multilingual compared to 1.8B stable LM , but the folks on stage told me to not compare this in a transitional sense to a transformer model ,rather look at the potential here. So we had Eugene, from the RWKV team join on stage and talk through the architecture, the fact that RWKV is the first AI model in the linux foundation and will always be open source, and that they are working on bigger models! That interview will be released soon

Olmo from AI2 - new fully open source 7B model (announcement)

This announcement came as Breaking News, I got a tiny ping just before Nathan dropped a magnet link on X, and then they followed up with the Olmo release and announcement.

A fully open source 7B model, including checkpoints, weights, Weights & Biases logs (coming soon), dataset (Dolma) and just... everything that you can ask, they said they will tell you about this model. Incredible to see how open this effort is, and kudos to the team for such transparency.

They also release a 1B version of Olmo, and you can read the technical report here

Big CO LLMs + APIs

Mistral handles the leak rumors

This week the AI twitter sphere went ablaze again, this time with an incredibly dubious (quantized only) version of a model that performed incredible on benchmarks, that nobody expected, called MIQU, and i'm not linking to it on purpose, and it started a set of rumors that maybe this was a leaked version of Mistral Medium. Remember, Mistral Medium was the 4th best LLM in the world per LMSYS, it was rumored to be a Mixture of Experts, just larger than the 8x7B of Mistral.

So things didn't add up, and they kept not adding up, as folks speculated that this is a LLama 70B vocab model etc', and eventually this drama came to an end, when Arthur Mensch, the CEO of Mistral, did the thing Mistral is known for, and just acknowleged that the leak was indeed an early version of a model, they trained once they got access to their cluster, super quick and that it indeed was based on LLama 70B, which they since stopped using.

Leaks like this suck, especially for a company that ... gives us the 7th best LLM in the world, completely apache 2 licensed and it's really showing that they dealt with this leak with honor!

Arthur also proceeded to do a very Mistral thing and opened a pull request to the Miqu HuggingFace readme with an attribution that looks like this, with the comment "Might consider attribution" 🫳🎤

Bard (with Gemini Pro) beats all but the best GPT4 on lmsys (and I'm still not impressed, help)

This makes no sense, and yet, here we are. Definitely a new version of Bard (with gemini pro) as they call it, from January 25 on the arena, now is better than most other models, and it's could potentially be because it has internet access?

But so does perplexity and it's no where close, which is weird, and it was a weird result that got me and the rest of the team in the ThursdAI green room chat talking for hours! Including getting folks who usually don't reply, to reply 😆 It's been a great conversation, where we finally left off is, Gemini Pro is decent, but I personally don't think it beats GPT4, however most users don't care about which models serves what, rather which of the 2 choices LMSYS has shown them answered what they asked. And if that question has a google search power behind it, it's likely one of the reasons people prefer it.

To be honest, when I tried the LMSYS version of Bard, it showed me a 502 response (which I don't think they include in the ELO score 🤔) but when I tried the updated Bard for a regular task, it performed worse (in my case) than a 1.6B parameter model running locally.

Folks from google replied and said that it's not that they model is bad, it's that I used a person's name, and the model just.. refused to answer. 😵‍💫 When I removed a last name it did perform ok, no where near close to GPT 4 though.

In other news, they updated Bard once again today, with the ability to draw images, and again, and I'm sorry if this turns to be a negative review but, again, google what's going on?

The quality in this image generation is subpar, at least to mea and other folks, I'll let you judge which image was created with IMAGEN (and trust me, I cherry picked) and which one was DALLE for the same exact prompt

This weeks Buzz (What I learned with WandB this week)

Folks, the growth ML team in WandB (aka the team I'm on, the best WandB team duh) is going live!

That's right, we're going live on Monday, 2:30 PM pacific, on all our socials (X, LinkedIn, Youtube) as I'm hosting my team, and we do a recap of a very special week in December, a week where we paused other work, and built LLM powered projects for the company!

I really wanted to highlight the incredible projects, struggles, challenges and learnings of what it takes to take an AI idea, and integrated it, even for a company our size that works with AI often, and I think it's going to turn out super cool, so you all are invited to check out the live stream!

Btw, this whole endeavor is an initiative by yours truly, not like some boring corporate thing I was forced to do, so if you like the content here, join the live and let us know how it went!

OpenAI releases a powerful new feature, @mentions for GPTs

This is honestly so great, it went under the radar for many folks, so I had to record a video to expalin why this is awesome, you can now @mention GPTs from the store, and they will get the context of your current conversation, no longer you need to switch between GPT windows.

This opens the door for powerful combinations, and I show some in the video below:

Apple is coming to AI

Not the Apple Vision Pro, that's coming tomorrow and I will definitely tell you how it is! (I am getting one and am very excited, it better be good)

No, today on the Apple earnings call, Tim Cook finally said the word AI, and said that they are incredibly excited about this tech, and that we'll get to see something from them this year.

Which makes sense, given the MLX stuff, the Neural Engine, the Ml-Ferret and the tons of other stuff we've seen from them this year, Apple is definitely going to step in a big way!

Vision & Video

LLaVa 1.6 - SOTA in open source VLM models! (demo)

Wow, what a present we got for Haotian Liu and the folks at LLaVa, they upgraded the LlaVa architecture and released a few more models, raging from 7B to 34B, and created the best open source state of the art vision models! It's significantly better at OCR (really, give it a go, it's really impressive) and they exchanged the LLM backbone with Mistral and Hermes Yi-34B.

* Better OCR and higher res

* Uses several bases like Mistral and NousHermes 34B

* Uses lmsys SGlang for faster responses (which we covered a few weeks ago)

* SoTA Performance! LLaVA-1.6 achieves the best performance compared with open-source LMMs such as CogVLM or Yi-VL. Compared with commercial ones, it catches up to Gemini Pro and outperforms Qwen-VL-Plus on selected benchmarks.

* Low Training Cost. LLaVA-1.6 is trained with 32 GPUs for ~1 day, with 1.3M data samples in total. The compute / training data cost is 100-1000 times smaller than others.

Honestly it's quite stunningly good, however, it does take a lot more GPU due to the resolution changes they made. Give it a try in this online DEMO and tell me what you think.

Tools

Infinite Craft Game (X, Game)

This isn't a tool, but an LLM based little game that's so addicting, I honestly didn't have time to keep playing it, and it's super simple. I especially love this, as it's uses LLama and I don't see how something like this could have been scaled without AI before, and the ui interactions are so ... tasty 😍

All-right folks, I can go on and on, but truly, listen to the whole episode, it really was a great one, and stay tuned for the special sunday deep dive episode with the folks from Lilac and featuring our conversation with about RWKV.

If you scrolled all the way until here, send me the 🗝️ emoji somewhere in DM so I'll know that there's at least one person who read this through, leave a comment and tell 1 friend about ThursdAI!



This is a public episode. If you’d like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe
Fri, 02 Feb 2024 01:50:46 GMT
📅 ThursdAI - Sunday special on Merging with Maxime LaBonne

Hey everyone, we have an exciting interview today with Maxime Labonne.

Maxime is a senior Machine Learning Scientist at JPMorgan, the author of Hands on GNNs book and his own ML Blog, creator of LazyMergeKit (which we cover on the pod) and holds a PHD in Artificial Intelligence from the Institut Polytechnique de Paris.

Maxime has been mentioned on ThursdAI a couple of times before, as he released the first Phi mixture-of-experts, and has previously finetuned OpenHermes using DPO techniques which resulted in NeuralChat7B

For the past couple of months, following AI on X, it was hard not to see Maxime's efforts show up on the timeline, and one of the main reasons I invited Maxime to chat was the release of NeuralBeagle7B, which at the time of writing was the top performing 7B model on the LLM leaderboard, and was specifically a merge of a few models.

Model merging

Model merging has been around for a while but recently has been heating up, and Maxime has a lot to do with that, as he recently checked, and his wrapper on top of MergeKit by Charles Goddard (which is the library that put model merging into the mainstream) called LazyMergeKit was in charge of >50% of the merged models on HuggingFace hub leaderboard.

Maxime also authored a model merging blogpost on Hugging Face and wrote quite a few articles and shared code that helped others to put merged models out.

Modern day Alchemy

This blogpost is a great resource on what model merging actually does, so I won't go into depth of what the algorithms are, please refer to that if you want a deep dive, but in a nutshell, model merging is a technique to apply algorithms to the weights of a few models, even a few instances of the same model (like Mistral7B) and create a new model, that often performs better than the previous ones, without additional training!

Since this is algorithmic, it doesn't require beefy GPUs burning power to keep training or finetuning, and since the barrier of entry is very low, we get some cool and crazy results as you'll see below.

Yeah, quite crazy as it sounds, this method can also create models of non standard sizes, like 10B or 120B models, since it's slicing pieces of other models and stitching them together in new ways.

If you recall, we had a deep dive with Jon Durbin who released Bagel, and Jon specifically mentioned that he created Bagel (based on everything everywhere all at once) as a good base for merges, that will include all the prompt formats, you can read and listen to that episode here

This merge frenzy, made HuggingFace change the leaderboard, and add a checkbox that hides model merges, because they are flooding the leaderboard, and often, and require much smaller effort than actually pre-training or even finetuning a model

And quite often the top of the leaderboard was overrun with model merges like in this example of Bagel and it's merges by CloudYu (which are not the top ones but still in the top 10 as I write this)

ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

On why it works?

Nisten summarized this pretty well in this now famous copypasta tweet and I've confirmed with Maxime that this is his current understanding as well, it's quite unclear why this seems to perform so well, but it of course doesn't stop the "folks who look for AI Waifus" to keep merging.

Following folks like Nathan Lambert from interconnects.ai to start paying attention even though he didn't want to! (Still waiting on your writeup Nathan!)

UPDATE: As of today Monday Jan 29th, just released a super comprehensive deep dive into merges, which you can read here 👇👏

YALL + Automated LLM Evaluation

Maxime as also worked on so many models of his own, that he built a convenient little tracking leaderboard to track their performance, which he called YALL, Yet Another LLM Leaderboard and it's on HuggingFace. You can see that NeuralBeagle is the top dog (sorry, I literally could not resist)

It uses the Nous evaluations, and Maxime has created an automation called LLM AutoEval that makes it really simple to run evaluations, which you can run in a Colab super easily.

LLM AutoEval is on Github.

Merge-aology!

Since chatting, Maxime has released a Colab and later a HuggingFace space that takes models names, and shows the genealogy, nay, Merge-aology of the models, which models it was merged from and it's pretty crazy how deep this rabbit hole goes, and crazier even still that these models perform very well after all of these lobotomies!

Try it out here: https://huggingface.co/spaces/mlabonne/model-family-tree

I really hope you enjoy this special deep dive, I definitely learned a BUNCH from this conversation with Maxime, and I'm very happy that he came on!



This is a public episode. If you’d like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe
Sun, 28 Jan 2024 17:30:00 GMT
📅 ThursdAI - Jan 24 - ⌛Diffusion Transformers,🧠 fMRI multimodality, Fuyu and Moondream1 VLMs, Google video generation & more AI news

What A SHOW folks, I almost don't want to write anything in the newsletter to MAKE you listen haha but I will I know many of you don't like listening to be babble.

But if you chose one episode to listen to instead of just skimming the show-notes, make it this one.

We've had 2 deep dives, one into the exciting world of multi-modalilty, we chatted with the creator of Moondream1, Vik and the co-founders of Prophetic, Wes and Eric about their EEG/fMRI multimodal transformer (that's right!) and then we had a DEEP dive into the new Hourglass Diffusion Transformers with Tanishq from MedArc/Stability.

More than 1300 tuned in to the live show 🔥 and I've got some incredible feedback on the fly, which I cherish so if you have friends who don't already know about ThursdAI, why not share this with them as well?

TL;DR of all topics covered:

* Open Source LLMs

* Stability AI releases StableLM 1.6B params (X, Blog, HF)

* InternLM2-Math - SOTA on math LLMs (90% GPT4 perf.) (X, Demo, Github)

* MedArc analysis for best open source use for medical research finds Qwen-72 the best open source doctor (X)

* Big CO LLMs + APIs

* Google teases LUMIERE - incredibly powerful video generation (TTV and ITV) (X, Blog, ArXiv)

* 🤗 HuggingFace announces Google partnership (Announcement)

* OpenAi 2 new embeddings models, tweaks turbo models and cuts costs (My analysis, Announcement)

* Google to add 3 new AI features to Chrome (X, Blog)

* Vision & Video

* Adept Fuyu Heavy - Third in the world MultiModal while being 20x smaller than GPT4V, Gemini Ultra (X, Blog)

* FireLLaVa - First LLaVa model with commercial permissive license from fireworks (X, Blog, HF, DEMO)

* Vikhyatk releases Moondream1 - tiny 1.6B VLM trained on Phi 1 (X, Demo, HF)

* This weeks's buzz 🐝🪄 - What I learned in WandB this week

* New course announcement from Jason Liu & WandB - LLM Engineering: Structured Outputs (Course link)

* Voice & Audio

* Meta W2V-BERT - Speech encoder for low resource languages (announcement)

* 11 labs has dubbing studio (my dubbing test)

* AI Art & Diffusion & 3D

* Instant ID - zero shot face transfer diffusion model (Demo)

* 🔥 Hourglass Diffusion (HDiT) paper - High Resolution Image synthesis - (X, Blog, Paper, Github)

* Tools & Others

* Prophetic announces MORPHEUS-1, their EEG/fMRI multimodal ultrasonic transformer for Lucid Dream induction (Announcement)

* NSF announces NAIRR with partnership from all major government agencies & labs including, OAI, WandB (Blog)

* Runway adds multiple motion brushes for added creativity (X, How to)

Open Source LLMs

Stability releases StableLM 1.6B tiny LLM

Super super fast tiny model, I was able to run this in LMStudio that just released an update supporting it, punches above it's weight specifically on other languages like German/Spanish/French/Italian (beats Phi)

Has a very surprisingly decent MT-Bench score as well

License is not commercial per se, but a specific Stability AI membership

I was able to get above 120tok/sec with this model with LM-Studio and it was quite reasonable and honestly, it’s quite ridiculous how fast we’ve gotten to a point where we have an AI model that can weight less that 1GB and has this level of performance 🤯

Vision & Video & Multimodality

Tiny VLM Moonbeam1 (1.6B) performs really well (Demo)

New friend of the pod Vik Hyatk trained Moonbeam1, a tiny multimodal VLM with LLaVa on top of Phi 1 (not 2 cause.. issues) and while it's not commercially viable, it's really impressive in how fast and how quite good it is. Here's an example featuring two of my dear friends talking about startups, and you can see how impressive this TINY vision enabled model can understand this scene. This is not cherry picked, this is literally the first image I tried with and my first result.

The image features two men sitting in chairs, engaged in a conversation. One man is sitting on the left side of the image, while the other is on the right side. They are both looking at a laptop placed on a table in front of them. The laptop is open and displaying a presentation, possibly related to their discussion.

In the background, there is a TV mounted on the wall, and a cup can be seen placed on a surface nearby. The scene suggests a casual and collaborative environment where the two men are sharing ideas or discussing a topic.

Vik joined us on the pod to talk about why he didn't go with Phi-2, he also mentioned that Phi-1.5 was retroactively also MIT'd, it's license literally says MIT now on HF 👏 Great conversation, tune in for that at around 00:31:35

Adept is teasing FuYu Large - their CHONKY VLM

Adept previously released Persimmon, and then Fuyu VLM (which is a type of persimmon we see you adept) and now tease the release for Fuyu Heavy, a much bigger model that can compete or come close to GPT4V and GeminiUltra on MMMU and MMLU (text) while being 20x smaller approx.

While we don't yet get to play with this, they show some great promise in the benchmarks

⭐️ Performance: Excels at multimodal reasoning and matches/exceeds text-based benchmarks.❗️ Challenges Faced: Dealt with issues related to image data, model stability, and pre-training data scarcity.✅ Evaluations: Outperforms Gemini Pro on MMLU and MMMU benchmarks.AI Summary by Arc Browser (haha see how I cheated here? I sometimes do shortcut summaries using Arc Max, it's dope, try it) https://t.co/BZi6EKhS5R

Fireworks AI releases FireLLaVa - with a commercially available license

FireLLaVA is the first commercially permissive open-source LLaVA model, a type of multi-modality model called a Vision-Language Model (VLM) that can understand both visual and textual inputs.

* The original LLaVA model was limited for commercial use as it was trained on data generated by GPT-4, which has non-commercial licenses.

* Fireworks.ai recreated the LLaVA training data using an open-source language model, CodeLlama 34B Instruct, to make a commercially viable version.-

* FireLLaVA performs comparably to the original LLaVA model on benchmarks, showing open-source models can generate high-quality data for VLM training.

* FireLLaVA is available via HuggingFace and through Fireworks.ai's prediction API, enabling new visual capabilities for applications.

Vik and I chatted about this, and while Fireworks didn't release datasets, they did release an example of how to start collecting them, and it's clear that everyone is clamoring after great vision / image datasets 👏

Really hoping that many great dataset for multimodal AIs will come out in 2024 giving us increasingly better multi modal LMMs 👏

Big CO LLMs + APIs (Blog)

GOOGLE announces LUMIERE video generation model that shows incredible push in consistency

Supports multiple tasks like image to video, text to video, video inpainting, Video stylezation and more, looks incredible. It seemed that they have cracked both spatial and temporal consistency, something that's severly lacking in previous video generation attempts, and makes character consistency quite remarkable. Of course, as with other google incredible papers, we never know if we'll ever see this model or be able to play with it, here's hoping 🤞

Google will add 3 new AI features to chrome

* Chrome is introducing 3 new experimental AI features to make browsing more efficient:

* Tab Organizer: Chrome will automatically group similar tabs to help with multitasking

* Custom themes: Users can generate unique browser themes using text prompts and AI image generation

* Writing help: Chrome will offer suggestions to help users draft messages and posts on websites

- They are currently only available to US users who opt-in on the Experimental Features page

I think this development is super super important because making AI accessible via the incredible Chrome platform to billions of people, is going to put Gemini in front of grandmas, students, everyone. Qutie impressive and the compute needed to pull something like this off is also quite mindboggling! 👏

Of course, they are not the first browser to add AI, I love the Arc Browser and it has AI previews that I use quite often!

This weeks Buzz (What I learned with Weights & Biases this week)

Have you like many of us have trouble getting structure output (JSON, other stuctures) from LLMS? Jason also had this problem, that's why he authored the Instructor Library, which makes it easy to guide the LLM to give structured output using Pydantic. Jason has presented at Ai Engineer conference, and recently collaborated with Weights & Biases to launch a free course in how to guide your LLM to give structured outputs!

COURSE LINK

Jason is also an independent consultant working with companies on their AI implementations and has many battle tested examples from implementations across the board, which he shared with us on the pod.

Give this short course a try if you haven't yet, it's really high quality content, in addition to tons of other stuff we have there, for free 👏

Voice & Audio

11Labs has a new overdub studio and it's really working well

Check out this short segment of myself, speaking in dubbed Russian! It’s really sounds like me, sent to my mom to see if she falls for it 😆 She didn’t

AI Art & Diffusion

Hourglass Diffusion Transformers

New high resolution diffusion architecture from K-diffusion and RoPE team (X, Blog, Paper, Github)

Paper presents a new method called HDiT ( HourGlass Diffusion Transformers) that shows promise in training models with high resolution images without incurring the significant hardware costs that go with scaling image sizes, replaces the latent diffusion models enabling O(n) complexity and scaling well.

Utilizing tricks and best practices for transformers architectures, like RoPe (that we've covered on ThursdAI before) cosine similarity self-attention, RMSNorm, GeGLU, etc. and using something called local self attention, this paper shows incredible promise for high resolution architectures for image creation tools.

We had the pleasure to host Tanishq Abraham, one of the co-authors (and CEO of MedArc, Director of research with Stability + PHD at 19) to walk us through the paper, explain the problem and the solution. Additionally, friend of the pod is co-author as well 👏 and Alex Birch joined us silently from the audience 👂while giving commentary in the group chat.

P.S - All of these co-authors attribute the bulk of the work to Katherine Crowson from k-diffusion 👏

Tools & Others

Prophetic introduces Morpheus-1 - multimodal foundational model trained on fMRI and EEG signals

In a breaking news fashion, the folks behind Prophetic, a new startup that just announced MORPHEUS-1 as we were hopping into the space, came to chat with us.

They are working on a new multimodal ultrasound transformer! That's right, multimodaliy is not only about images/text folks, we've covered this before but these chads are actually trying this out, they have trained a transformer architecture to take EEG and fMRI signals and output directions for the ultrasound to activate areas of the brain to induce Lucid dreaming. And they are asking for beta testers!

It's all quite futuristic, and if you're in NY, reach out to them (and then let us know if you had Lucid dreams!)

Definitely worth a listen on the pod and check out their video announcement for mode details, was really quite an incredible conversation with Wes and Eric.

National Science Foundation launches NAIRR pilot (Blog)

Partnering with 10 other federal agencies as well as 25 private sector, nonprofit and philanthropic organizations, the NAIRR pilot will provide access to advanced computing, datasets, models, software, training and user support to U.S.-based researchers and educators

Basically, this is a huge governmental endeavor to provide resources about AI, make sure companies collaborate and keep AI accessible across the board and tons of government agencies as well as private sector companies have joined hands in this. Just look at this list, it's a veritable who & who of AI in US (notably, Tesla/X is missing)

And that’s all folks, that’s all she wrote (or I guess, I wrote) today! What an incredible show, really thankful for folks who came out, guests and co-hosts and see you next week!

If you scrolled all the way to here and want to show me that you did, your emoji of the week is 🍊 (only cause persimmons don’t have emojis) so DM or reply with this and share this pod with 1 friend or tag us on social media!

Full Transcription below:

transcript

[00:00:00] Alex Volkov: right, folks, it's time for the sound. Let's get it started today.

[00:00:11] Alex Volkov: Welcome, everyone. Welcome to

[00:00:13] Alex Volkov: this live recording of ThursdAI, the Twitter space, podcast, and newsletter that brings you. everything that happened. the AI world, every Thursday, literally almost every Thursday. My name is Alex Volkov, an AI evangelist with Weights Biases, and

[00:00:33] Alex Volkov: this is ThursdAI

[00:00:37] Recap & TL;DR

[00:00:37] Alex Volkov: Alright, recap, here we go. Taking a deep breath. We've talked about incredible amount of stuff here on Thursday AI for January 24th. We've talked about the areas of open source LLMs was very interesting. We've talked about stability AI, releasing a stable LLM, tiny version, 1. 6 billion parameters. That's really good at different languages, the European languages as well.

[00:00:58] Alex Volkov: And it's not commercially viable. For open source, but it is under the stability membership. So if you have that's a great model for you. We've talked about Intern LM2 for a state of the art on math LLMs. We briefly mentioned this, but it's getting 90 percent of GPT 4 performance on math, which is, was quite incredible.

[00:01:16] Alex Volkov: We also had the pleasure of Tanishq, Abraham to join us from MedArk for the analysis of open source models as it relates to the medical field. And it turns out that the model called Quen72 from Alibaba, Quen72 is the best open source doctor that we have achieving like incredible and beating even MedPalm1, which was back then by Google trained as one of the best medical LLMs.

[00:01:42] Alex Volkov: We also. were a very multi modal heavy space today like a lot we had the like we had the folks from Prometheus lab join us and talk about their multi modality which is not Trans, which is transformer based, but not LLM based so their multimodality is EEG signals and fMRI signals as they work on hyper focused ultrasound to induce a lucid dream state in your brain.

[00:02:11] Alex Volkov: Their multimodal model is basically taking inputs from EEG and outputs in, in the directions or where to focus this ultrasound is super cool. And I definitely advise you to listen to them. It wasn't planned. I just saw the post. I just commented, Hey, we're going to talk about this. They jumped on Prometheus looks like a cool multimodal attempt, nothing to do with vision, but also we talked about vision multimodality as well.

[00:02:34] Alex Volkov: So we've covered Adept the company who was founded by a few folks from the original Transformers paper and they have previously released. Per semen models. And then EU eight B was a multimodel that did not use a vision encoder like a different architecture. They released an announcement. They didn't release any code or weights or the way for us to try this yet, but they released something called Fool You Heavy, or they announced something called FU You Heavy, which is an extension of the previously released fool you eight B.

[00:03:00] Alex Volkov: Significantly more trained. And they talked about how difficult it is to train multimodal models and they claim to have a third. Place in the world after GPT 4 and Gemini Ultra on a bunch of the multi modal metrics and evaluations like MMU and MMLU. They also talked about the process, how difficult it is to train these models at scale.

[00:03:20] Alex Volkov: So cool from Adept and we're waiting for some ways to test this. We also talked about fire lava, which is, if you remember, we've talked about lava before multiple times. Lava is a Open source way to train models in multimodal and like Baklava from Focus on Stage here, Nissen and Farrell, and Obsidian from LDJ who's also on here and also Moondream.

[00:03:39] Alex Volkov: Like all of the things we've talked about are based on Lava. Lava was not commercially permissive licensed because of the data set. Fire Lava decided or released the first Lava model with commercial permissive license from Fireworks AI. And we also had it. Quite an interesting chat with Vic, who is the author of Moondream 1, which is a tiny 1.

[00:03:59] Alex Volkov: 6 billion parameter vision language model, also on top of Lava, that has Phi 1 as 1. 6 billion. The foundational kind of brain, the LLM brain in it the conversation with Wick was very interesting. So shout out Wick. Thanks for coming up. Specifically because he also mentioned that Phi 1 Microsoft, if you guys remember Phi 2 was MIT licensed back in December.

[00:04:20] Alex Volkov: It was a surprise to all of us. And apparently they went back and also changed the the License on Phi 1, which is super cool, and Vic told us that he saw this. So Moondream is a very capable, very tiny vision model that works quite well. Definitely worth listening to this conversation with Vic.

[00:04:36] Alex Volkov: We also announced in the This Week's Buzz category of ours, or segment of ours, about Everything Weights Biases, we announced a new course in our academy from Jason Liu, the author of the Instructor Library. And he has a course now that was released today called LLM Engineering Structural Outputs.

[00:04:54] Alex Volkov: And as Nissen , pointed out a bunch of the folks in open source are learning from these free YouTube videos and definitely worth checking out Weights Biases Academy because there's a bunch of knowledge there. And it's all for free and just join and just register. It's super, super cool. And then we had an incredible honor again of having one of the authors of this paper.

[00:05:12] Alex Volkov: As always, I love when we discuss stuff and the authors of the stuff come to chat with us. So we had Tanishq Abraham. But also we had Alex Birch in the audience listening to us while he was working and sending us DMs from the new paper called Hourglass Diffusion High Resolution Image Synthesis.

[00:05:30] Alex Volkov: And this paper will be in the show notes and Dinesh went through the kind of the in depth of the problem he tries to solve. And they. They talked about integrating transformers and diffusion models previously to separate areas and they haven't came up with the first one, but they definitely used a bunch of the techniques to optimize transformers into the diffusion world and create a pixel space, high resolution image synthesis, which is, shows great promise going forward.

[00:05:59] Alex Volkov: Incredibly insightful conversation from Tanishq, definitely worth a listen. We also covered in this area, we covered Instant ID, which is a one, one shot or zero shot face transition into diffusion models. So you can upload one picture of yourself and get quite incredible results in image diffusion.

[00:06:17] Alex Volkov: Or like generative images with your faces or your kid's faces, which is super cool. I haven't tried my cat. I don't know if it like works on cat's faces. I'll try it out. We covered a new, a state of the art. Automatic speech recognition system that beats Whisper or at least runs 30 times faster than Whisper on different tasks.

[00:06:36] Alex Volkov: We're going to add this to the show notes as well. And a little bit about deepfake audio with 11 labs have a dubbing studio released. And some conversation about whether or not or how it already affects politics. And then the last thing we've covered is the National Science Foundation, NSF, announces a new partnership from all major labs and government agencies around AI, and includes DOD and DOA, and includes OpenAI and Tropic, includes open source folks like Hug and Face, and MetaAI is also participating in this.

[00:07:11] Alex Volkov: And also Ways and Biases is part of that huge partnership, governmental partnership. So I think this is all the stuff that we've covered in this space.

[00:07:19] Show starts with house keeping and structure breakdown

[00:07:19] Alex Volkov: We have quite the show for you today, and as always there's no boring weeks in AI, is there? And some weeks start slow and then pick up, some weeks start Crazy from the get go. If you remember, there's one week where one Friday had a bunch of releases, and this week we had a very full week, full of very cool innovations, but also exciting stuff.

[00:07:47] Alex Volkov: And then we have some authors of those stuff here with us today, and we're gonna talk about a bunch of multimodality, which we've been talking about for a while. Obviously the space started with the multimodal GPT 4 and then we just kicked it into high gear. I think that it's time to get started with our default segment. So for those who are new to Thursday AI, we usually segment this to five or six segments, the biggest one being open source LLMs. And then we have big companies LLMs and API. So we usually cover the Google stuff and OpenAI stuff.

[00:08:18] Alex Volkov: Mistral has been here and there, been [00:08:20] in the open source, now is the big company as well. So depends on what they release, that's where Mistral stuff falls. And then we talk about vision and video, which is Basically, we'll recover the multimodality stuff and that section is going to be the, I think, the main one today.

[00:08:36] Alex Volkov: There's so much stuff. It's crazy. We also have tthis com this corner I call This Week's Buzz. I feel like I have to explain this. Maybe people don't get this dad joke that I put in there. Buzz, as in bees, right? So bees, Buzz. And Weights and Biases, the shorthand for Weights and Biases is WandB.

[00:08:54] Alex Volkov: Weights and Biases, W and B. And for a very funny reason, there's a mascot of ours that's a bee that's holding a wand, because it's WandB. And like this little joke has been Prevalent like in many places. I think I haven't explained it yet. And so this week's buzz is actually the corner about everything that I've learned with Weights Biases every week.

[00:09:13] Alex Volkov: And so this corner we're going to chat with Jason and announce some cool stuff. The next corner we have is voice and audio, which we usually have a bunch of stuff. We have VB from Hug Face usually join us. He's like the AI audio person over there. There's not a lot of voice and audio stuff.

[00:09:29] Alex Volkov: So I actually don't have anything voice and audio related in my notes. However if you guys know like very cool things that happened. This week with voice and audio, please let me know, we're going to talk about them. We're going to move to AI art and diffusion in the next segment. We're going to talk about some cool things there.

[00:09:45] Alex Volkov: And then the last segment is like a free for all, it's tools and others. So I usually put agents in there. I usually put like super cool things. So I have two, two, two exciting things to talk about there. So this is usually the structure.

[00:09:58] Nisten Tahiraj: I do have, is one more thing there, and it's the W2V, the BERT speech encoder. think it's for meta, and it's about, it's supposed to be like 30 times faster than than Whisper. So yeah, it's another very efficient automatic recognition ASR model. So I'll I'll post it in the links

[00:10:20] Alex Volkov: And I think also we had 11Labs announce like a yeah, I had a tweet about actually ThursdAI Content, that I spoke in English, obviously, and then I asked it to translate to Russian. We'll cover this, 11Labs has a dubbing studio.

[00:10:33] Alex Volkov: .

[00:10:33] Open Source LLMS

[00:10:33] Alex Volkov: And then, let's go to open source, folks. I think let's go to open source.

[00:10:55] Alex Volkov: All right, let's start with our open source segment here. And I think the first thing we should probably quickly mention is our dear friends at Stability AI, folks who've Made a dent on the industry with Stable Diffusion, obviously but they're training a bunch of other stuff. We've talked about multiple stuff they did.

[00:11:12] Stable LM 1.3B

[00:11:12] Alex Volkov: We've talked about Stable Video Diffusion and like how open source lags behind closed source, but not by that much. And Stability released a new LLM, which they had the Stable LLM before, I think, Nistan, have you used Stability stuff before? For the LLM stuff?

[00:11:31] Nisten Tahiraj: I have Months ago, so I'm not up to date on

[00:11:35] Alex Volkov: Yeah, so

[00:11:36] Nisten Tahiraj: used it on Google collabs and

[00:11:37] Alex Volkov: Yeah, so they're not like, they haven't changed the industry in the LLM world as much as they have in the image diffusion world, for sure. However, there's a big however, they're working on multiple fronts. And it looks like, I had a chance to actually chat with Imad for almost 20 minutes.

[00:11:52] Alex Volkov: Imad is this like very incredible person who knows a lot about a lot. And it's like the conversation there is like basically a stream of consciousness conversation, which I had. No trouble in following up because we talk about everything here on ThursdAI. But the folks who were with me and talking to Imad, they looked at me and was like, How do you know all this?

[00:12:11] Alex Volkov: And I'm looking at Imad and was like, How does Imad know all this? That's what happens when you're on stability. So they released they're training a bunch of different models. This week they gave us Stable LLM, which is a tiny model, 1. 6 billion parameters model. It's really we've been saying this previously.

[00:12:24] Alex Volkov: It's really funny to say small LLM, right? If you expand the LLM abbreviations, like a small large language model. But this one is tiny. It runs super fast on, on multiple devices. I think their point is to actually like edge device running. So obviously we've covered multiple small. LLMs before, we've covered PHY, if you remember PHY 1, we're gonna talk about PHY with Vik in a second.

[00:12:47] Alex Volkov: We also talked about like PHY 2, I think there's like a few others StabilityRelease, there's It's pretty good. It's pretty good. I was itching to play with this, they released a GGUF. Apparently I dunno if you knew this name, but apparently stability has their own CPP and their like GGF file, which is like a, for those who are not following all the AT acronyms.

[00:13:11] Alex Volkov: GGF is a quantized version of models. So apparently stability has, like stability. CPP is incompatible with Lama cpp . And so apparently Elm Studio had to add a specific support for this and they did. And so if you wanna play with stability, AI. Stable LM, now you can , with LM Studio, and LM Studio at least in my experience, gave me ridiculous performance.

[00:13:34] Alex Volkov: I got, on, on this Macbook M3, M3 Max I got more than 130 tokens per second, which was like ridiculously fast. And the model was fairly capable for a small model. I was very impressed. So if you want to play with a small model, you want to do some stuff with this, stability is definitely an interesting one.

[00:13:53] Alex Volkov: Support in Elm Studio. Yeah, go ahead.

[00:13:56] Nisten Tahiraj: yeah, it's a 1. 6B. So in that means it's 1. 6 gigs to run at eight bit without losing much accuracy. However, the, that means that it has a lot more applications for tiny stuff, because then you can get that down to 800 megs. And so on. So this is people did find some issues. Again, it's a tiny model, but they found issues with it being able to continue the conversation.

[00:14:24] Nisten Tahiraj: However, for one shot answers, it was extremely capable. So just keep that in mind when using it. It is probably right now the best model for that size. Just keep in mind if you're going to do something with it. Don't expect much in terms of follow up stuff. Just if you can do it in one shot, great.

[00:14:48] Nisten Tahiraj: Use that. And yeah that's about all I have to say.

[00:14:51] Alex Volkov: Yeah. And additional things that it punches above its weight on other languages. So if you folks remember when we talked about Mistral, for example, getting compared to open the eye on Tropic, et cetera Mixtral medium, the model is like specifically for the German, the European language, the German, Spanish, French, Italian, all those it's significantly better.

[00:15:11] Alex Volkov: Stability is also playing in that market looks like for the smaller size. And so this. Out this tiny model beats the five versions of three billion parameters. So it beats models twice its size, even some seven billion parameters, specifically for , European languages,

[00:15:25] Alex Volkov: and if you remember, we've talked about MPT from Mosaic, was that? Yeah. So this model beats the Mosaic MPT 7B, which was probably back in May was like the coolest like open source model. So that was 7 billion. This beats that on empty bench and everything.

[00:15:40] Alex Volkov: It's quite incredible. It beats Falcon 40B. It's really, the speed, the reason why we bring you these models is not only Hey, use this one. Because Nissen said this one may not be exactly good for your commercial stuff. Also, it's not really commercially viable. There's a specific stability license that you have.

[00:15:58] Alex Volkov: Stability membership, they call it. They have to apply for stability AI membership. And then based on the size of your business you're able to use, they have to make money somehow. But we bring this to you also to show that how fast we're moving from a 30 billion parameter model to a 77 billion parameter model, and now to a 1.

[00:16:13] Alex Volkov: 6 billion parameter model, that compresses like incredible amounts of trillions of like words from the human knowledge into just, listen, do we say like this can go down to like less than a gig, right? If we look super quick,

[00:16:28] Nisten Tahiraj: Yep. At 4 bit, it should be 800 So we're getting to the point where they'll just fit in a Raspberry Pi Zero with 512 megs and they'll be conversational [00:16:40] and useful and even multi modal. So we're almost there.

[00:16:43] Alex Volkov: Yeah, it's quite incredible. And then, okay, so this is stability stuff. Meanwhile, I'll say hi to a new guest of ours that I just saw on my timeline.

[00:16:51] Prophetic announces MORPHEUS-1 an EEG/fMRI multimodal to induce lucid dreams via hyperfocused ultrasound

[00:16:51] Alex Volkov: What's up Wes, how are you?

[00:16:53] Wes Louis: Hey

[00:16:54] Wes Louis: guys, how are you?

[00:16:55] Alex Volkov: Hey. Hey welcome. Folks maybe saw my tweet, maybe didn't as that I love planning for Thursday, but I also love breaking news. As I was planning, I was going through my feed, and thankfully my Twitter feed is back at his own, like giving me the best AI stuff. And Wess and I think your co-founder is also here.

[00:17:10] Alex Volkov: Eric, yeah. Let me add you real

[00:17:12] Alex Volkov: quick. I didn't plan on this folks. I just literally just like tagged and they came. The video that you guys posted came through my timeline and I would love to go and give you a stage for a minute or two to explain what prophetic is because the transformer stuff that you discussed with the EEG and fMRI signals, I really dig.

[00:17:30] Alex Volkov: Could you summarize that video for us for a brief, like two sentences? That would be super cool, I think.

[00:17:38] Wes Louis: So

[00:17:38] Wes Louis: this has been something we've been working on for a while.

[00:17:40] Wes Louis: It's really a, essentially,

[00:17:42] Wes Louis: a multimodal transformer model that is designed entirely for neural data. And so basically, what we've done is, we built a data set of EEG and fMRI and, what we're designing is a neural simulation device to basically induce lucid dreams.

[00:17:59] Wes Louis: And so we build the data set on heightened prefrontal cortex activity. This is, the neural correlate of lucid dreaming. And we basically built a model where you prompt it with your current brain state. We have a set of sensors on the device, and then we output targets for the neurostimulation.

[00:18:17] Alex Volkov: That's quite incredible. So for folks in the audience, we talk about multimodality often and oftentimes we just mean VLMs, like we mean like vision and text, which we're going to cover like a bunch today. But today I think the highlight of today's Thursday is multimodality applies to many things. So you guys are, your multimodality is not even there's no text in there at all, right?

[00:18:36] Alex Volkov: This is just EEG signals and fMRI signals. Is that correct?

[00:18:41] Wes Louis: Yeah, it's purely prompted with EEG. And one thing I'll say is, everyone talks about multimodal. And, so you're using, let's say, an LLM, and you're prompting it with a photo, for example. This is similar in many ways because neural imaging data, particularly EEG, is you can nicely get, you can get it into, it's a neural image you can get it into an image format.

[00:19:02] Wes Louis: And then prompt the model that way, but then on the generation side of things that's entirely, we use a pretty unique fMRI embedding process that we've come up with ourselves and ultimately the idea there is that you take this heightened neural activity, And those are candidates for targets for nerve simulation.

[00:19:20] Wes Louis: And, we

[00:19:21] Alex Volkov: What do you, sorry, what do you mean, what do you mean by targets for folks who have no idea what this means?

[00:19:26] Wes Louis: Yeah. We're using this is the other big technology that makes all this work is FocusUltraSound. FocusUltraSound, for those that don't know, is this Really, cutting edge neurosimulation technique that can get, quite deep into the brain, other techniques, people who may be familiar with, direct current, alternating current, really get soaring to the surface.

[00:19:47] Wes Louis: Of the brain, whereas focus ultrasound can get quite deep, but there's also this ability to steer the beam and also create acoustic holograms. And so when we think of heightened neural activity it really takes the form of these 3D figures. And the idea being that we can create these outputs of fMRI targets and then translate those over to the focus ultrasound.

[00:20:12] Alex Volkov: This multi modal transformer takes on the input EEG signals, and on the output it prints out those targets. Those are targets for this technology to then stimulate the brain to go into a specific state.

[00:20:31] Wes Louis: Yes, and all of this is closed loop so in that, once you create the simulation, the model is prompted again with the current brain state and this is continuous. Process of learning and figuring out what sets of tokens lead to this heightened state and that heightened state is really identified as gamma frequencies and that's really the fastest band of activity.

[00:20:53] Wes Louis: So it's this continuous process until someone gets to a lucid state.

[00:20:58] Alex Volkov: That's quite incredible. So you guys announced the LLM today, but it's not like you're not releasing the open source. This is just an announcement of your efforts, correct? Anything else you want to add here? And I think you started talking about folks can join the beta if they want to.

[00:21:12] Nisten Tahiraj: Yeah, that's what I

[00:21:12] Wes Louis: would point out is that we have a beta program that, that this is really the purpose of this announcement is we're looking for people to sign up. We've had 200 or so in the last, Two hours. And so this spring we'll have it working. And if you're a New York based or you're willing to come out to New York we'd be, more than happy to have you test out the product.

[00:21:31] Alex Volkov: That's awesome. Congrats folks. Actually, you want to add anything?

[00:21:33] Eric Wollberg: Alex. Hey, how's it going? This is Eric. I'm a

[00:21:36] Alex Volkov: Oh, Eric, yeah.

[00:21:37] Eric Wollberg: with West. Yeah. Hi thanks for doing this. Yeah, one thing that's just I think, the sequence of how we've released these things, we showcased in October our prototype that we designed with Card79 notably did, Neuralink for Elon, and then we, Also worked with Max Hodak at Science.

[00:21:52] Eric Wollberg: Max Hodak used to run Neuralink for Elon and then spun out Science. So really top consumer VCI kind of design folks. And so then now we have this model, right? This ultrasonic transformer where now we're going to be migrating that on to, the technically working prototype and beginning neuromodulation.

[00:22:08] Eric Wollberg: So that's what the beta user program is all about. We've got, yeah, like 225 people signing up in the first two hours we're really looking for we're excited to have people on board and begin to do this you have an opportunity if you're, especially if you're early up on that list to be the first person to achieve an ultrasonically induced lucid dream, which is You know, I think it's going to be a pretty watershed moment.

[00:22:28] Alex Volkov: That's super cool. I've tried to, to lucid dream a lot of times in my life and I never actually got to a stable one. So I'm excited to follow you guys, but also excited from the technology application of this, because we talk about transformers and a lot of this is going to LLMs.

[00:22:42] Alex Volkov: Now we're going to, this week we're going to talk about Transformers as applied to the fusion models as well. And here you are like doing like full multimodality out, out of the left field. So I love it. And hopefully you guys will do some cool things and keep us up to date and welcome to, to join on Thursday.

[00:22:55] Alex Volkov: I, to talk about this.

[00:22:57] Nisten Tahiraj: Awesome. Thanks, Alex. Thank you, Alex.

[00:22:58] Alex Volkov: Thanks for hopping on, folks. And as folks, as I love breaking news here on Thursday. This is like a tiny breaking news. Thank you, Wes. Thank you, Eric, for joining folks. If you want to try, the future, sign up for the beta, because why not?

[00:23:09] Alex Volkov: And I think it's it feels like non invasive, right? You put this headset on, and then hopefully you go to sleep, and hopefully you're able to control your dreams, which is like what Vision Pro will do for outside world, but this is like inside your dream, it's super cool. All right, let's move on to, I think we're moving on to the big, no, actually we're moving on to the big category for multimodality as we're already here.

[00:23:33] Alex Volkov: Vision and video and multimodal, or at least VLM multimodal.

[00:23:38] Adept teases Fuyu Heavy, their flagship multimodal catching up to Gemini Ultra and GPT4V

[00:23:38] Alex Volkov: I'm gonna start with the big dog here, ADEPT. If you guys remember ADEPT Labs were co founded by a few folks from the original Transformer paper. I don't think they're no longer there, but I have to, I feel like I have to add this.

[00:23:52] Alex Volkov: Prefix every time we talk about adept, adapt released a few models for us. If you guys remember, Persson was a seven B model or eight B, eight B it was weird, but they released an 8 billion parameter model. It was like very interesting back then. They also then on top of this released fio, which is persson is the type of fruit, F is the type of tree that persimmon grows on.

[00:24:10] Alex Volkov: So we see you adept, we see your jokes here. Also. I love the LLM naming and then they raised Fuo back then. And FIO was. Interesting from the perspective of it didn't use a vision encoder, it did something else. It was very interesting that their approach to vision models allowed them to use Non standard image sizes, because they didn't train it on such a thing.

[00:24:31] Alex Volkov: So back then, that was what was interesting. And now, they've announced, they haven't released anything. They haven't said, hey, here, use this. I wasn't even able to use this. But they announced Fuyu Heavy. Fuyu Heavy, according to them. And so far, Adept have been trustworthy enough for us to trust.

[00:24:48] Alex Volkov: What they say this is the third in the world multi modal or I guess VLM. So not multi modal like, like Wes and Eric just told us, but a multi modal in the sense of like images plus text together. This is the [00:25:00] third in the world model behind GPT 4 Vision and Gemini Ultra. Which Gemini Ultra we haven't yet tried, obviously, we don't have access.

[00:25:08] Alex Volkov: If you have access in the audience for Gemini Ultra, and you want to help me, help a brother out, let me try and play with this, please let me know. But so they're announcing, AdeptFuyu is announcing that Fuyu Heavy, their model, is 20 sizes smaller than GPT 4 Vision. I have no idea how they even know what size GPT 4 Vision is.

[00:25:28] Alex Volkov: They say that around 20 to 30 sizes smaller. And comes very close in the multimodality stuff. And they talk about the challenges of creating like large multimodal image based model. The challenges are stemming from there's not a lot of assets properly to test. There's not a lot of the tooling instrumentation stuff are really hard for images as well.

[00:25:47] Alex Volkov: And so they announced this they showed some very incredible performance. And I will remind folks that Adept specifically started with tools to make you run your computer. So their models are specifically tuned on UX, UI and web stuff. And expecting to hear more from them and finally getting to play with this.

[00:26:06] Alex Volkov: Go ahead, Faro.

[00:26:09] Far El: I just

[00:26:09] Far El: want to say that,

[00:26:10] Far El: Demos are easy. I'm going to take it with a

[00:26:14] Far El: grain of salt until I actually see the model or are able to test it. The thing is that there is no indication of actual like speed of the inference or whether these examples were cherry picked or not, right? There's a lot of question marks about this, especially when you just come out and, make a marketing announcement without actual access to the model.

[00:26:37] Far El: Yeah, it looks cool, but I'm not, I'm not hyped just because it's not like it's not verified or validated

[00:26:43] Nisten Tahiraj: in any way.

[00:26:44] Alex Volkov: Yeah, I'm with you, I'm with you. Specifically I will say though, about Adept specifically, we've seen stuff from them, we've seen papers from them before, and they did, folks started asking like, Hey, where's the weights? Where's the weights? And they did say that, stuff is coming, but they want to like, keep a competitive edge.

[00:27:00] Alex Volkov: But we see, we've seen like at least a new architecture from them, if you remember with Fuyu. And so we know

[00:27:05] Nisten Tahiraj: Oh, of course.

[00:27:06] Alex Volkov: yeah, the Fuyu architecture is legit, like they literally was able to. create a multi modal without an image encoder thing back then. We're definitely going to listen to this. But based on the metric that they released, if this actually performs as well on MMMU, which is the kind of the equivalent of MMLU.

[00:27:25] Alex Volkov: For multi modal stuff it's going to be very exciting their heavy model, definitely.

[00:27:29] Fireworks releases FireLLaVa with a fully commercially viable license

[00:27:29] Alex Volkov: Moving on, actually, Pharrell we'd love to hear what you think about this. And actually, Vic, this is wrapping you up to the next conversation. Fireworks AI that I haven't actually used, but they released the first Lava model with commercial permissive license from Fireworks.

[00:27:43] Alex Volkov: So Lava was released. Lava, we've talked about Lava is the architecture. That allows many of these models to be trained in a multi modal fashion, correct? Lava was released, it was not with a commercial license because it was trained on a bunch of I want to say that wasn't marked for commercial and open source licensing.

[00:28:01] Alex Volkov: So a lot of these models that we get, we cannot actually use in production. And FireLava announced that like their first Lava model was commercially permissive licensing. And I think that's super cool because finally folks will be able to build this. And as a reminder, Lama, the LLM was released without commercial license.

[00:28:19] Alex Volkov: And then Lama 2 released with commercial license and then incredible amount of stuff started happening because companies who wanted to use this in production actually started like looking into this and using Lama 2. And so hopefully the same will start happening with FireLava. I actually am not sure if they released the weights.

[00:28:36] Alex Volkov: I think they did. Yes, they released the weights on Fireworks AI, FireLava 13B on HugInFace. And yeah, listen, go ahead. You guys trained stuff on top of Lava. So please, first of all, introduce the stuff that you've trained on and then also like comment on the ability to use this now in production.

[00:28:56] Nisten Tahiraj: Yeah, I just want to say that The entire vision open source vision field, and non open source, it is extremely competitive right now. For example, here, we've released Baklava, which is bak lava. Again with the naming. So that that was three months ago. Also LDJ here made the obsidian, which is like the three B one, and then they made A seven B as well.

[00:29:22] Nisten Tahiraj: We also have the dev lead of Quinn. He was in the audience as well, so they made the Quin 14 b vl. And this part is, oh, and we have Vic as well, who also made a very fast. And a small model recently. And Valkylava was being used as a benchmark, which was pretty interesting, actually. Yeah, the Vision LLMs are extremely competitive right now, and I think it's one part where open source can really surpass what you get from from any from any API, because it's something you can run local on the device and you have full control over.

[00:30:01] Nisten Tahiraj: So the interesting thing yeah, as for Fireworks 13b, that's still Lama 13b base, as far as I saw, and I tried to use their inference on their site, but it wasn't working, and I can't complain too much about it, because ours is not working either. That's why I wasn't using WSGULAG yeah, also to comment a little bit on Fuyu, because I do like their trying a completely new approach. They don't use stuff that's similar to clip image models, which is what everybody else uses. They do something where they take, I think, groups of pixels or stuff. They serialize it, so the image is just being represented as just another string of text or a string of tokens. So they can scale.

[00:30:48] Nisten Tahiraj: To 8k, 16k, whatever you have, they don't have, they don't have that limitation that others have in, in terms of architecture. So it is good to see that approach is working overall, whether it will be competitive we'll see. So yeah, I wanted to comment on that. But yeah, I haven't actually tried the Fireworks model itself, but I did see, again, the architecture is similar to also Lava 13b. Yeah, that's about all the comments I have on that.

[00:31:22] Alex Volkov: And like you said, interestingly, it's still based on Lama, right? And it's time for, it's time for new things. And I think this takes us to the next topic of conversation. And again, Vic, I want to introduce you properly this time, or at least let you introduce yourself.

[00:31:35] Moondream1 from Vik Hyatk - 1.8B VLM

[00:31:35] Alex Volkov: But the next kind of iteration or of our conversation about multimodality, like we said, today is a multimodal space is the existence of like very tiny vision models, vision, large language models, or a large multimodal model, it's really hard to like, name these things. Vic, welcome to the space, this is your first time, please introduce yourself and then let's talk about Moondream a little bit.

[00:31:57] Vik Hyatk: Hey folks hey Alex, thanks for having me. Super excited. My name is Vik. I'm pretty new to the AI space, I think. Like a lot of people, I got into it when that big stable diffusion moment happened. And I was like, this is what I need to spend my life working on. So I went out, bought a workstation with 3090 and started playing around with stuff.

[00:32:15] Alex Volkov: You and me both brother, you and me both. And, okay. So the reason why you're here and the reason why I'm , calling on you in the vision and video area is because of Moon Dream one. You, can you introduce Moon Dream one a little bit to the audience?

[00:32:29] Vik Hyatk: Yeah so it's a small language model. It's about 1. 6 billion parameters. It's built on top of Siglip from Google or DeepMind. I forget which one of the two. Trimil, because that's the vision encoder and it uses 5. 1. 5 as the text model, and then it's trained using the standard lava. So super thankful for the folks that worked on these projects amazing models they've put together.

[00:32:52] Vik Hyatk: It works. I'm tooting my own horn a little bit here, but it's surprising. I see people post screenshots of them asking questions and it still blows my mind that it works that well.

[00:33:03] Alex Volkov: I let me talk the horn a little bit because I definitely tried out. Thank you for the hugging face. How can I say, space that you put up like super quick, and the next follow up is going to be about how to actually use this, but this is based on Lava, so the same non commercial license, correct?

[00:33:19] Vik Hyatk: [00:33:20] Correct. The top piece of feedback I've gotten from people is that they want to see this with a commercially permissive license. I'm working with, working on that. The FireLava folks didn't release the dataset, but thankfully they did talk about their process to create the the non encumbered version of the dataset.

[00:33:37] Vik Hyatk: So I'm working on it. I'll have that out in a couple of days, the dataset at least, and then we can start training models that aren't encumbered like that.

[00:33:44] Alex Volkov: Incredible. And so the next thing that I wanted to talk to you about is PHY 1. So PHY is from Microsoft. PHY 1 was not released with a commercial license. We remember it was trained on synthetic data in tiny stories, like a tiny 1. 6 model. So we saw a few releases since then. So obviously we talked just now about StableLM.

[00:34:01] Alex Volkov: Semi commercial, if you're a part of their membership, and also Phi2 was MIT license. It's a little bit bigger. It's three, I think, billion parameters. Have you tried with Phi2 and could you speak about that experience?

[00:34:14] Vik Hyatk: Yeah, I I did actually. So I was initially working on training Moondream 1 with PHY 2 once it came out. There are some issues with fine tuning it when you have flash attention on I believe. And so it just takes a lot longer to train. So I went back and looked at PHY 1. 5 and I saw that they updated the license for 1.

[00:34:32] Vik Hyatk: 5 to MIT as well.

[00:34:33] Alex Volkov: Oh, really?

[00:34:35] Vik Hyatk: stick with what works. Yeah.

[00:34:37] Alex Volkov: Wow. I did not know this. So it actually updated the license backwards.

[00:34:42] Vik Hyatk: Yeah, on the Hugging Face page, at least it says MIT now.

[00:34:45] Alex Volkov: I love it. Like it would make sense, right? But folks, I don't think we've talked about this. So like breaking news here. Thanks, Vic. Phi 1 is also, we'll check this. We'll double check,

[00:34:55] Nisten Tahiraj: Also three. They're both MIT licensed now. So whatever pressure we put on Microsoft's Azure side, it worked.

[00:35:03] Alex Volkov: nice. That's incredible. All so now, so this part of your stack of Moonbeam is now MIT licensed. So Lava is the only thing that's holding this back from being used in

[00:35:14] Vik Hyatk: Just the

[00:35:14] Unkown: data set, yeah.

[00:35:16] Alex Volkov: The dataset. Okay. Okay. So definitely there's work being done there. I will just pay send folks attention to the nest, to the top of the space where I had my tests.

[00:35:25] Alex Volkov: I literally just pasted an image. And again, thank you for the demo, Vic. Folks will get the demo in show notes as well. I pasted an image of two of my friends just sitting and talking across like a TV with some things. Literally the model said, image features two men sitting in chairs engaging in conversation.

[00:35:42] Alex Volkov: One man sitting on left side, one other on the right side. That's obvious, but still cool. They're both looking at a laptop placed on the table in front of them. The laptop is open and displaying a presentation. Possibly related to their discussion. So this feels like hallucination a little bit because the model does not know what it displays, but fine.

[00:35:57] Alex Volkov: And so in the background, there's a TV mounted on the wall, a cup that can be placed on the surface nearby. The scene suggests a casual collaborative environment. This is ridiculous. This is like a super tiny model and it outputs this scene almost perfectly. And. I've tested like the same image in different, like a bigger, GPT 4, it pretty much gives me the same information.

[00:36:17] Alex Volkov: So I was really impressed. So Turing the Horn, for sure, because the tinier the model is, the better the utilization. And we've talked about different vision enabled hardwares that are possible or not possible based on whether or not they're going to be able to run stuff on like Raspberry Pi. And, the smaller these models, the smarter they are, the better we'd be able to use them in cheaper hardware.

[00:36:40] Alex Volkov: Really impressive. What are you planning to do with this? Like, how has the community accepted this? What type of conversations did you get into? And what are you planning to do next here? Besides training the

[00:36:51] Vik Hyatk: I was blown away by the reception to this. I've, when I put it up, I thought like maybe it might get like a hundred likes or something and then I'd move on to my next project. But I've seen a bunch of super cool demos. Come out of this, I think the fact that it is small and it runs inference so fast makes a lot of use cases that were previously not possible, a lot more viable, like captioning a video in real time or recaptioning a billion images and whatnot.

[00:37:15] Vik Hyatk: There's a couple of things I'm working on. Obviously the top thing is like getting it to a permissive license. I also, I could use some help on a couple of fronts. So I do want to make it easier to run, get gguf, olama integration and whatnot.

[00:37:30] Alex Volkov: Definitely LM Studio integration. I would love To play around with this with Elm Studio, just to see how fast this is, this runs on my software. MLX would be a cool suggestion as well the community is very excited about MLX, I don't know if you saw. But Elm Studio is a friend of the pod, definitely it's connected to YouTube.

[00:37:46] Alex Volkov: I think it's super easy to just add it there. Right? Listen it's not difficult.

[00:37:51] Nisten Tahiraj: You just gotta add a Jason file to, to, to your model and that's it. Or just message him 'cause he's very responsive to this stuff. And might even write the Jason for you. And then it will be immediately available for everyone running LM Studio.

[00:38:06] Vik Hyatk: Amazing. Another thing we have going on, by the way, is we're building an agent version of this with Open Interpreter in mind.

[00:38:13] Vik Hyatk: A version of this that's excellent at identifying UI elements because we want Open Interpreter to have the ability to operate purely off of a local model. Open Interpreter, by the way super cool project. Check it out, folks, if you haven't already, is is a way to have the LLM use your computer.

[00:38:31] Vik Hyatk: So you can do stuff like. Just tell LLM here I want to turn dark mode on and it'll figure out what buttons to click to enable dark mode for

[00:38:40] Alex Volkov: for folks who follow ThursdAI closely, they remember Kilian came on the pod like a week after Open Interpreter was released, and this was, I think, in 2023, our most famous or received episode back then. It was a super cool conversation, so shout out Kilian Lukas, and definitely Open Interpreter since then has been very huge community of people building very cool things.

[00:39:00] Alex Volkov: Recently released the kind of the browsing area, where it can Controls the computer for you. And it definitely needs eyes for that. And so I think it used GPT 4 vision and now you're saying that Open Interpreter will get open source eyes. Is that what I'm hearing?

[00:39:15] Vik Hyatk: Exactly. That's a goal. CogAgent is super promising in this space. They didn't release their datasets, so we're working on replicating that. CogAgent is just too big for most people to run on their computers. It's I forget, 17 billion parameters or something.

[00:39:29] Alex Volkov: Is that CogAgent and CogVLM, right? I think we, yeah, I think we talked about this. Yeah. It's really good

[00:39:35] Vik Hyatk: but yeah, that's another place where if folks want to get involved the link in my bio as a Discord would love to collaborate with folks on getting that dataset together and training that version of the model.

[00:39:44] Alex Volkov: So I think the kind of the thing I'm hearing from Fuyu, from you as well, the data set for vision stuff are the bottleneck to create like incredible things, right? Like data sets for images, data sets for how people use different UIs, for example, like all these data sets are the kind of the bottleneck for us to get to the next hurdle of getting these models even smaller, even faster performing.

[00:40:04] Alex Volkov: So what are we doing folks? Let's start building multimodal data sets.

[00:40:09] Nisten Tahiraj: Yeah, and at first for Baklava, we were going to have the dataset also open source because we are, the code for us is also open source as well. So it's not just open wave. It is fully open. However, the data we couldn't because of So that's not available and yeah, it's pretty hard to make datasets for vision because with text is very, it's very easy now to manipulate, modify, do whatever you want to, to the data and you can do that at large scale with images, just aren't that many tools, that many ready to go datasets and the open source models just started getting good at them.

[00:40:52] Nisten Tahiraj: So yeah, that's going to remain. A challenge for the time being, but again if anybody here is like a grad student or you're at a company or something in academia, the biggest contribution you can make probably is in the data sets, because the models will get replaced. You'll always have better models coming and going, but the data sets are forever.

[00:41:15] Nisten Tahiraj: If you want to make an impact in this field, get your professor, university, whatever to, to put some money for datasets. We need datasets. For images. With images. Yeah.

[00:41:27] Alex Volkov: And we need them like bigger and bigger ever increasingly bigger scale. All right, Vic, so thank you so much for joining us. Thank you for talking, taking us through how you created Moonbeam. And thanks for telling us like what's next, how [00:41:40] the community can help besides, besides just, data sets provided and testing.

[00:41:45] Alex Volkov: What else would you need?

[00:41:48] Nisten Tahiraj: I I have a

[00:41:49] Vik Hyatk: list of issues on GitHub where I'm looking for help with various But besides that, Compute always helps. I'm currently I'm limited on how many things I can do because my 4090s can only do so many matrix multiplications at a given time. So if anyone has Compute that they can give me access to run these, that would be super appreciated.

[00:42:09] Alex Volkov: Yes, I I've seen this time and time again on ThursdAI on stage, folks ask for sponsorship for compute. I'm actually getting I'm actually getting like DMs from different companies like, Hey Alex, the space is super cool. Can we sponsor someone? Can we? And I'm like, no, I already work with Let's Ambassadors, I don't need sponsorship.

[00:42:25] Alex Volkov: I would want to connect guys that work on super cool things. We need compute to keep going with different companies around like compute specifically. So I'll definitely keep you in mind. And go ahead, Nissan. You had a thing you want to say?

[00:42:38] Nisten Tahiraj: Yeah, just really quickly, this is a very effective way to make projects that are impactful. For example, with Balclava, Pharrell here, and Suntex, they just put out a readme, and tweeted something out, and we got compute. And we got it from Together Computer. So they, they sponsored that, that project and they ended up being a very impactful project that a lot of people use.

[00:43:05] Nisten Tahiraj: That, that works pretty well. I just say be careful with conditional stuff. If they're gonna start talking about an NDA, just Ignore them because that's not really, then you're doing an exchange, you're basically doing work for that person, so that's just a job contract, that's not a sponsor, if someone's sponsoring an open source model

[00:43:27] Alex Volkov: Better be.

[00:43:28] Nisten Tahiraj: not be like an NDA, that's not, that's no longer a

[00:43:32] Alex Volkov: Better be open source after that. Yes, absolutely. So Vic, I'll keep you in mind when people reach out to me. Folks in the audience, if you work at a company that wants to be featured forever in the, in the open source community, definitely reach out to Vic and we want more of this.

[00:43:47] Alex Volkov: We want more of like tiny models that perform incredibly well. We want them to be built into different Tools that we can all use without relying or paying by just using our machines. So definitely we'll keep in mind. Vic, welcome and welcome to the community of ThursdAI. More than welcome to keep joining and participating in this.

[00:44:06] Alex Volkov: I think it's time for us to move on, folks. It's been around 40 minutes. I think we're actually good on time. I think it's time for us to move on to this week's buzz. I wish I had a I really want to do like a music transition here for the, with this week's buzz, with like bees buzzing, etc.

[00:44:20] Alex Volkov: But maybe for next week. Let me just play the regular music and we'll transition and talk with Jason a little bit.

[00:44:24] This weeks buzz - Jason Liu launches a new course with Weights & Biases for free

[00:44:24] Alex Volkov: All right, welcome to this week's buzz, where I talk about some cool things that happened or I learned in Weights Biases. Weights Biases is, ooh, that was an abrupt music stop. Weights Biases is the system of records for all your LM needs. So pretty much like most of the folks up on stage who use who train models use Weights Biases.

[00:44:52] Alex Volkov: It's incredible. The ubiquity , where bias pretty much prevented everywhere. I just saw a stable Kwan, one of our friends of the pod just train something and just post like words and biases, like a snapshot of his last curve going down and literally just asked Hey, do you mind putting a link to the dashboard?

[00:45:08] Alex Volkov: And he did. So you wanna check out how his model is going? I think he's training. I don't think I saw, he's training something like super cool, like a Oh, he's training a mixture. Four 400 million parameters. So he's training like a tiny MOE of mixed role. StableKwan is, he just posted like a chart with the train loss from Weights Biases and I just asked, Hey. Can we follow along with the training? And he posted a link to the Weights Biases dashboard, which is super cool.

[00:45:34] Alex Volkov: Which got a reaction from Weights Biases CEO. . And so I love seeing this in the wild. So folks, if you're training models, please put those dashboards up so people can follow along. It's super it's really nice. But on the other news from Weights Biases this week I want to say hi to Jason Liu.

[00:45:47] Jason Liu: Yeah, Jason Liu.

[00:45:48] Alex Volkov: Jason Liu. Welcome, Jason. I've seen you around. I've seen you, I think at AI engineer event from SWIX. I don't know if we like ran into each other there, but you had a talk there as well. Yeah.

[00:45:58] Jason Liu: Yeah, it was Paidandic is all you need. It did pretty well on YouTube, so I'm pretty

[00:46:02] Alex Volkov: It did great. I also talked with a bunch of people. I think I was interviewing folks, outside of the stage while we were giving the talk, but then it was very well received. And this is on the same similar topic that we're going to talk about now. So please feel free to introduce yourself briefly.

[00:46:15] Alex Volkov: And then we're going to talk about the stuff that we did together.

[00:46:19] Jason Liu: Great. Yeah. So I'm Jason. In the past year and a half, I've been mostly doing a lot of applied AI consulting. Before that, I spent the past like eight years just doing like machine learning. So I did the big data wave, the machine learning wave, the neural networks and deep learning wave.

[00:46:32] Jason Liu: And now we get generative AI. So it's been a lot of fun. And in my spare time I work on a library called Instructor. So now. We have Instructor in, I think, JavaScript, Python, and Elixir. And the general idea is that we want to bring just functions and structs into LLMs and make LLMs feel a lot more backwards compatible with existing code rather than creating new abstractions to handle some of these things.

[00:46:55] Jason Liu: And I think that's been pretty well received in the community.

[00:46:57] Alex Volkov: Absolutely. So Instructor is definitely where I know you from. And today we have an announcement together. So feel free to. Feel free to announce the cool thing that we did and that you worked on really hard.

[00:47:09] Jason Liu: Yeah, so we're starting a new series around, the idea of using like schemas and structures to prompt language models. And I think. At the day or end of this week, we're going to release the first part of a LLM engineering series. And the first part really is just an introduction on how we can use things like structure to prompt LLMs a lot better, right?

[00:47:30] Jason Liu: In the past, we just like beg for the language model to give us JSON. Now we have things like JSON mode and function calling and tools, which gives us the ability to get more structure. But we still need a lot more tools and ways of thinking about how we can reason about these structures. And so part one is going to be around justifying and motivating why we wanna, why we might want to do this.

[00:47:54] Jason Liu: And then I think in February or March we'll start working on part two that uses a lot of the new ways and biases, observability tools to look at how I've solved a lot of LLM problems in production with a lot of my consulting clients.

[00:48:07] Alex Volkov: So just to highlight for folks, Weissenbeisser has a free courses area, Weissenbeisser Academy. And some like very prominent folks in the industry have collaborated with Weissenbeisser to like just basically teach. So we teach you for free how to do these things. So we have courses from like training, LLM from scratch, fine tuning, et cetera.

[00:48:24] Alex Volkov: And then Jason is announcing a new course today that he wrote and and recorded and we helped edit a little bit and publish and also obviously talk and promote this a little bit about how to actually ask your model to give you what you need as a developer, as a AI developer in the structured output, which uses the instructor library.

[00:48:42] Alex Volkov: Correct, Jason?

[00:48:43] Jason Liu: Yeah, these ideas can be used in other libraries as well, right? So for the Python community, we're really using a library called Bydantic, and so this is supported in things like Langchain and Marvin. And so even if you don't use a library like Instructor, learning how to think about prompt infrastructure is still something that's going to be really applicable and valuable for everyone listening.

[00:49:05] Alex Volkov: And you mentioned before, there's like a bunch of stuff that open the icons up with, like JSON mode, in example, etc. There is functions back in June. But also The other LLMs, they don't necessarily follow the same kind of new abstractions that OpenAI releases. I think Anthropic just recently announced that they're moving to function system messages or moving to just messages, things.

[00:49:27] Function calling in OpenSource LLMS

[00:49:27] Alex Volkov: And also we have open source, which is like all over the place. So I guess my question is, with these libraries, with these Paidantic approach and Instructor, would that apply to other LLMs? Does this apply to open source, which we talk a lot about?

[00:49:40] Jason Liu: Yeah, so right now there's only a few open source models that support function calling. So if you've looked at some of the work from the functionary team, they have been training I think mixed role now with function calling, same with the guys that like news research with Technium. There's been a lot of progress in the open source world and getting things like function calling.

[00:49:58] Jason Liu: If you want more structured outputs [00:50:00] too, there's a great library called outlines. That can use something like the Hugging Face Transformers library to also do structure extraction. And again, they also support things like Pytantic. And the goal of the course really is to show you how to think about and how to model these problems in a particular way.

[00:50:15] Alex Volkov: Absolutely. And I think John Durbin in the audience I think Ouroboros was trained on function calling as well, if I'm not mistaken, John. So folks who haven't heard our conversation with John, definitely go and check out where the deep dive with John about Bagel, which now includes the Ouroboros dataset, which now includes function calling as well.

[00:50:33] Alex Volkov: So that's awesome. The open source also moves there. Go ahead, Nissan.

[00:50:37] Nisten Tahiraj: Also really quick the news vision model ended up being good at at function calling, although it had other drawbacks. It was good at function calling because of the Arrow Boro's like thousand something functions dataset. And as far as I saw the newer bagel models, so Bagel seven B are also good at at that, at at function calling.

[00:50:57] Alex Volkov: So big old model series from Maxim Le Bon. Again, shout out Maxim Lebon, who came on the pod last week, and then the full deep dive with him will be released this Sunday, so make sure you're subscribed. We talk about, we don't talk about FunctionCall, we talk about NeuroBeagle. NeuroBeagle is like one of the top performing 7 billion parameters, it's a merge, it's a cool conversation about merging.

[00:51:16] Alex Volkov: But let me back, let me get back to Jason just real quick. Jason, you're also like doing independent consulting, you said, in multiple places, and you're like helping them build. I got to like tap into your experience from like actually like hands on AI building and companies. Could you give us like a little bit of what do companies struggle with?

[00:51:32] Alex Volkov: Like with the first obvious thing that comes to mind that people like AI builders probably like already solved in their minds. What do you have to go through to not only build to them, but also educate them on as you join the company, it starts like helping them out with AI stuff.

[00:51:47] Jason Liu: Yeah. So one of the biggest things I noticed is that when we look at something like a RAG application, really what it looks like is a recommendation system. If you went on Netflix, for example, and you watch a bunch of movies and the recommendations don't get better, it would be a really terrible experience and you probably lose a lot of customers.

[00:52:03] Jason Liu: But for a lot of companies these days that are using things like agents or retrieval, We are in a situation where, you know, no matter how many users you get, if you don't improve your language model, if you don't improve your embeddings, the product doesn't really get any better. And so one of the big things I'm focusing on this year is helping these companies build a better feedback loop and a data flywheel.

[00:52:22] Jason Liu: And so we can know for sure that as we get more users, there's these network effects that improve the models that we want to train. And so I think step one is, being able to fine tune your own embedding models and your re rankers and go from there and then, see what comes up in the future.

[00:52:39] Alex Volkov: Awesome. So definitely folks, give Jason a follow. The course, I think we're releasing it today, but I haven't seen any social mentions, but it's really worth watching. I watched a few of this and we'll follow as well. And this is a course series now. So we're going to start with this, and then we're going to continue with the monitoring tools that Waze Ambassadors have.

[00:52:56] Alex Volkov: Correct?

[00:52:58] Nisten Tahiraj: Yeah, the first course is like 30 minutes. It's super quick. The real goal is to show you what's possible and get you thinking about some new ideas. And then the next course will be deeply integrated with the more visibility tools from Wisdom Biases and specifically around the experiences I've gotten from consulting production clients.

[00:53:13] Alex Volkov: Incredible. Thank you, Jason. Thank you for joining us. And thank you folks who worked on the course together with you. I'm excited to see this. And again, the reminder, there's a bunch of free stuff there. There's a bunch of like knowledge just drops here. And hopefully I will be able to tap into this community and also build more things.

[00:53:29] Alex Volkov: Go ahead, Nistan, and then we'll move on.

[00:53:31] Nisten Tahiraj: Yeah, I just want to say that a lot of us here that got good at machine learning were from just a random YouTube series. So the Karpathy series on Building one from scratch. The full stack is just pronounced like that. Their LLM one from way back in April and March. So I'm really looking forward to this one because doing YouTube tutorials is actually extremely efficient.

[00:53:53] Breaking News - HuggingFace announces a collaboration with Google

[00:53:53] Nisten Tahiraj: But on that note, we have breaking news.

[00:53:56] Alex Volkov: Wait, we have breaking news. Hold up. You know what this means.

[00:54:11] Alex Volkov: Yes, Nistan, go ahead now.

[00:54:14] Nisten Tahiraj: Phil Schmidt, who is a friend of the pod and has been here.

[00:54:18] Alex Volkov: Here, yes.

[00:54:18] Nisten Tahiraj: definitely. Yeah, Devleet at, At Hugging Face, he's also the one that did the integrations, if I might be wrong, but the integrations for with AWS Bedrock and also with CloudFlare workers. Yeah, so now it looks like he's been working on doing an integration.

[00:54:35] Nisten Tahiraj: with Google, where you'll be able to just take whatever models or fine tunes and stuff you have on HuggingFace and then use Google's infrastructure, use both their TPUs and NVIDIA H100s, they're advertising this, that Google owns, to continue training, fine tuning, serving, deploying stuff via HuggingFace.

[00:54:55] Nisten Tahiraj: Google. Is this is a very interesting move. Google's jumping in more on the open source side there. I don't know what this means, but this is a very interesting development.

[00:55:06] Alex Volkov: I know what this means. This means that, if Hug Face becomes public ever, buy their stock. That's what this means. Hug Face like literally embedded into the, like the infrastructure of AI and definitely worth following. And the more integrations they have, the better, like it is for the open source community as well.

[00:55:25] Alex Volkov: All right, folks. Thanks Nissan

[00:55:26] Nisten Tahiraj: This is not financial. By the

[00:55:28] Alex Volkov: financial advice, but they're also not public yet. Look, I don't think this move. Yeah, I don't think this moves the needle for, in terms of Google investing,

[00:55:36] Hourglass Diffusion Transformers deep dive with Tanishq Abraham

[00:55:36] Alex Volkov: Alright folks, we're moving forward and the way, where we're moving forward is also like into kind of like diffusion mode, and I'm very excited to introduce Tanishq.

[00:55:45] Alex Volkov: Tanishq, have you been here before? Remind me, please. I don't think you've been here on stage before.

[00:55:50] Tanishq Abraham: I, I don't think I've been on stage

[00:55:52] Alex Volkov: No. All right. So I'm very excited to have you here. Thanks. Thank you for joining us. So folks, one of the coolest things that came out in at least the research area from this week was this paper from.

[00:56:03] Alex Volkov: From multiple authors, some of them friends of the pod, like Enrico, if you remember the chat with Enrico we did with rope scaling is on the paper as well. Katherine Crowson who we should mention, I don't think she's been here or, but we've talked about some stuff that she did. Stefan Baumann, Alex Birch, Tanishq, you're on there, Daniel Kaplan, and then Enrico, a friend of our Nico.

[00:56:23] Alex Volkov: Tanishq has been the friend of the pod behind the scenes, you guys didn't know this, but we've met in NeurIps so we've met before. Tanishq, do you mind introducing yourself just briefly for the audience who haven't met you or followed you so far?

[00:56:34] Tanishq Abraham: Yeah, sure. My name is Tanish. I am a research director at Stability ai and also CEO of MedAR, which is a medical AI research organization. I've also been involved with fast ai, been working on, diffusion models for

[00:56:48] Tanishq Abraham: I guess past year and a half or so. Yeah, so I do all kinds of stuff.

[00:56:53] Tanishq Abraham: Generative ai,

[00:56:53] Tanishq Abraham: medical ai. Yeah.

[00:56:55] Alex Volkov: You also just like a briefly skipped over the fact that you got your PhD at 19, right? Is that correct?

[00:57:01] Tanishq Abraham: Yes, that's correct. I got

[00:57:02] Tanishq Abraham: it. That was last year. Yes,

[00:57:03] Alex Volkov: So if folks in the audience don't know what this means that there's not many like 19 year old PhDs and Tanishq is one of them. And also we met once. I think a year and a half ago. And then the next time we met in Europe, I just remember every detail of our conversation. But that's beside the point.

[00:57:17] Tanishq Abraham: yes.

[00:57:19] Alex Volkov: Thanks

[00:57:19] Tanishq Abraham: met at the Stability AI

[00:57:21] Alex Volkov: Lunch party. That was super cool. And since then, many things have changed. And I really want to talk to you in that area, right? So this paper, shout out to all the authors because I'm looking at this. I've seen like multiple folks share this paper. Paper is talking about high resolution image synthesis.

[00:57:39] Alex Volkov: With something called Hourglass Diffusion Transformers. And I will pin your great thread about this here on top of the space, and it will be in the show notes. Could you briefly tell us the problem this tries to solve? And then we're going to go into actually how this kind of approaches how to solve this.

[00:57:57] Tanishq Abraham: Yeah, definitely.

[00:57:58] Nisten Tahiraj: Yeah. So first of all, of course preface this by saying it's mostly, of course

[00:58:01] Tanishq Abraham: Kat's genius work here. And we were just lucky to be able to help her on this project. But yeah, just to get her started.

[00:58:06] Alex Volkov: just one tiny second because it's worth a shout out. So Kat, by Kat you refer to Katherine Carlson, right? And if folks ever used Stable Diffusion before, either in Automatic 1. 1. 1 or whatever, and you [00:58:20] choose anything with K dash that's, this is the Katherine, right?

[00:58:24] Alex Volkov: This is, K Diffusion is like her area. Very incredibly prolific person in this area I don't know many facts about her, but like everybody who I talked to from this paper, including Enrico, everybody's like referring to Kat, that's her work. So maybe a huge shout out to Kat and yeah, go ahead, please.

[00:58:40] Tanishq Abraham: Yeah yeah, she's like a, she was like one of the original AIR people, so yeah, I had, she helped start the field in a way, anyway,

[00:58:46] Tanishq Abraham: To To provide some context of

[00:58:48] Tanishq Abraham: what this paper is about the idea is that, if you want to do like high resolution generation, so think like 1024 by 1024 the typical approaches these days utilize some sort of multi stage approach, like the most common one, like stable diffusion, is this sort of latent diffusion where you have to encode it in with some sort of auto encoder into some latent space and you're doing diffusion on the latent space and you're not actually doing it on the actual pixels.

[00:59:15] Tanishq Abraham: And so that comes with some disadvantages. For example, if I don't know if people who are like doing things like image editing with stable diffusion, you realize you don't have a whole lot of fine grained level of control in terms of the actual, at the pixel level.

[00:59:30] Tanishq Abraham: It's difficult to do that because it's happening in the latent space rather than at the pixel space. So there are various different things where like it has its own challenges. Of course, like latent diffusion has a lot of different advantages too, but you know for some applications it may not be ideal.

[00:59:44] Tanishq Abraham: And then on top of that the other aspect that, we wanted to like, look into basically was the fact that we're seeing People move towards towards transformer models for diffusion as well. And of course, in the past, most of the diffusion models have been with, a U net architecture, a convolutional U net.

[01:00:02] Tanishq Abraham: Also stable diffusion uses a convolutional U net. But, there have been a lot of papers examining the use of transformers. And, of course, the nice thing about transformers is, people know how to train them, they're quite scalable, so people would rather use transformers for diffusion over over something like a U net.

[01:00:18] Tanishq Abraham: But again, the problem is that So far, it's mostly only been applied to the Latent Diffusion Scenario, mainly because it would be very hard to do this at pixel scale because of the quadratic complexity of attention. So if you wanted to scale up to higher resolution, you know that, it would be, at the number of pixels, you're going to have quadratic scaling with that.

[01:00:40] Tanishq Abraham: So it would be very difficult to train this with, I guess enough resources or whatever. So that's the problem that we're trying to solve is what can we do to resolve the quadratic complexity of the transformer architecture that allows us to then train a diffusion transformer in pixel space.

[01:00:58] Tanishq Abraham: So that's what the hourglass diffusion transformer tries to address.

[01:01:02] Alex Volkov: Thank you for the brief introduction. For I will try to recap as a way I understand this. So folks who are not machine learning scientists in the audience would be able to follow along. But basically Gen AI, this whole wave of Gen AI has two, two big infrastructures so far, right?

[01:01:15] Alex Volkov: The diffusion, the stability AI and of the image models and video models. They're based on like diffusion or you said latent diffusion, correct? And then there's the LLM area with basically based on transformers. And we've seen a bunch of stuff going back and forth in tech, like in techniques between them, right?

[01:01:31] Alex Volkov: So Laura, I think is a thing that like many people in the diffusion area, like trained Laura's on different concepts. And then obviously like fine tuning with Laura's then became a thing and back and forth. We've seen like back and forth different approaches. I think you said like The open source area in LLMs in Transformers specifically has a bunch of like super cool tricks and optimization techniques and flash attention different things, right?

[01:01:54] Alex Volkov: There's a bunch of stuff that people developed in one area that wasn't necessarily applyable to to, to diffusion models. And so you guys set out to try and unify those two, or at least use some of the tricks and looks

[01:02:09] Alex Volkov: succeeded to an extent. Yeah. Go ahead please.

[01:02:12] Tanishq Abraham: yeah, I think it's, yeah, about Now that we have this transformer architecture, we can try to apply some of the tricks that people have been using, things like, rope embeddings there are other tricks like RMS norm, these are the sorts of tricks, for example, that are used in the Lama architecture these sort of similar architectural decisions and you could take those sorts of best practices and try to see if they help with diffusion now.

[01:02:33] Tanishq Abraham: So yeah, I think that's the idea. And like people were exploring yeah, that's like another interesting thing about our papers. Like people were exploring diffusion transformers, but they were using very kind of old architectures for diffusion transformers. And here we're trying to also apply all these tricks that we see.

[01:02:47] Tanishq Abraham: People are applying the LLM space and trying to apply that to to, to diffusion. Yeah, that was also an important part of our paper as well.

[01:02:54] Alex Volkov: And of course, you mentioned Rope, and I want to shout out a friend of the pod, Enrico, from News Research, Enrico. Wait, I don't actually remember if Anuka is part of News Research. Maybe, so he and News Research worked on the Rope paper together. And for folks who are interested in hearing about Rope, we had a deep dive during the summer, one of the coolest episodes.

[01:03:12] Alex Volkov: Most of it back then went above my head, but it was super cool going back there and saying, Hey, oh, I learned this. Rope is basically a way to extend context windows and do a bunch of other things for Transformer based large language models. And I wonder how does Ropen play here? And Enrico is part of the authors here on, on the paper.

[01:03:29] Alex Volkov: So he contributed at least part of that work, I assume. Enrico?

[01:03:34] Tanishq Abraham: Yeah. I think the rope stuff is like something that We even, we haven't like fully explored the full potential there, I think. But at least for what we were doing, we saw improvements in, in performance, just using rope over other sorts of, these sorts of position embeddings.

[01:03:50] Tanishq Abraham: But yeah, I think there's definitely potential for allowing the model to handle larger resolutions or do things like this because of the rope embeddings that we have in the model. Yeah it's, I think, also meant for future work.

[01:04:02] Alex Volkov: Incredible. You guys use all these techniques. You introduce, or I guess start formally announcing this concept of the fusion transformers, which is the mixture of these two things. And what are some of the results that you get? You've trained a few models to test.

[01:04:15] Alex Volkov: How do you even, measure that you're getting performance or you're just looking at algorithms or you're actually generating images. Can you talk us through the process of validating this like theories and papers?

[01:04:26] Tanishq Abraham: Yeah, but I just want to yeah, I guess to take a step back to clarify we didn't necessarily invent the concept of diffusion transformers. That is something that people have already developed but the idea that we focus here is the problem is in the past, diffusion, Transformers were done with the latent space because of this quadratic complexity.

[01:04:45] Tanishq Abraham: So we basically have a different type of transformer architecture, which is this hourglass transformer that enables for Like O of N scaling, so like a linear complexity. So it, it will scale with the number of pixels much better than it won't blow up like, like you, you have with with the attention quadratic complexity.

[01:05:07] Tanishq Abraham: So that was the main trick that we're using. So we have some tricks in there. That allow it to have that property. And that's what enables us to do it on the pixel space, as opposed to the latent space that the previous diffusion transformers were doing. And then on top of that, we are adding all these additional transformer tricks, which no one had tried out before with diffusion transformers.

[01:05:27] Tanishq Abraham: So those are the main sort of contributions of this paper in terms of in terms of, and yeah, I guess one thing, the, yeah, the other thing worth mentioning is that the way that this architecture is able to do this is partly because it's, it the architecture is a very hierarchical architecture.

[01:05:45] Tanishq Abraham: So it's actually able to process at different image resolutions. And for example at the high resolutions, we use a sort of the, this sort of local attention, which is what is. Having this linear scaling, but then at the low resolutions, we were able to do the regular attention.

[01:06:01] Tanishq Abraham: Yeah, there's also this hierarchical processing of the image resolution. That's also, I think, an important point, which enables also for higher fidelity as for generation. And yeah, in terms of testing the

[01:06:13] Alex Volkov: Yeah. And so the next question is how do you actually like test the architecture? How do you validate these like approaches that you tried actually better than what the field has previously been at?

[01:06:26] Tanishq Abraham: Yeah. We looked at two datasets. One, we did ImageNet generation. So can conditional, class conditional ImageNet generation. So that is, passing in an ImageNet class, you generate images of that class. So if you pass in a zebra [01:06:40] class, you're generating zebras, or you're in some sort of dog class, you generate the dogs.

[01:06:43] Tanishq Abraham: That's, we train a model for that. We train it at a resolution of 256 by 256 and that, that's one of the experiments where we compare to other architectures. And so we we're, the interesting thing is that, of course, we're comparing to other architectures that are using, for example Latent Diffusion, that they're, using the latent space there the architecture is functioning on the latent space and not on the pixel space, but we have our architecture that's functioning on the pixel space and using this hourglass transformer and it's getting better results than with the with the latent space.

[01:07:19] Tanishq Abraham: We're beating, for example, the previous Diffusion Transformer model which was using the latent space. And then another interesting data set that we used was the FFHQ. Data set which is this sort of data set of high yeah like high resolution faces and so this is at this is at a 1024 by 1024 resolution and so this is like you know very difficult to be able to train especially in a pixel space you know at Scale of 1024 by 1024.

[01:07:47] Tanishq Abraham: And actually there are not many other diffusion models that are trained on this model. There are a bunch of GAN models, for example, but not really many diffusion models. There's like only one or two that we actually found in the literature because it is, it can be a bit difficult because of this, because of the.

[01:08:01] Tanishq Abraham: The pixel scale or the, the resolution of the images, but yeah we were managed to train a model with our architecture. It can, it trains quite fast. And yeah we are able to we're basically like, I guess at this point now we would be the best diffusion model for that for that data set.

[01:08:18] Tanishq Abraham: And we are measuring with FID. But of course, like FID, as a metric also has its problems it does have some bias towards like towards GANs and so GANs tend to have a lower FID kind of in terms of the bias of the FID. So like when we look at it qualitatively, honestly, we think like it's quite comparable to the GANs, might be better than the GANs, honestly.

[01:08:41] Tanishq Abraham: So we may do more evaluations and study that further. But honestly, this may be like. One of the state of the art models for this FFHQ dataset but it's a bit hard when you're using as a metric, but that's of course the problem with, everyone's using that metric in the literature, but yes, but yeah, I think that, again, that's another really interesting result that we observed.

[01:09:01] Tanishq Abraham: And then, of course, we do

[01:09:02] Alex Volkov: I want to follow up with a question here real quick. For folks like, hard for them to follow like much of this, but they've used something like Stable

[01:09:09] Tanishq Abraham: oh, sorry.

[01:09:10] Alex Volkov: No, that's all great. This is all recorded. Folks can like pause and go to, and go research and come back and listen to you.

[01:09:15] Alex Volkov: This is great. Like you did the deep dive. I really appreciate it. I just want to bring this back a little bit upwards towards like

[01:09:21] Unkown: Sure.

[01:09:22] Effects on the industry from Hourglass Diffusion Transformers

[01:09:22] Alex Volkov: affect the industry, given that we have stuff like Stable Diffusion out, and that keeps getting better, Mid Journey is getting like reality adjacent to the point where like it's really hard to distinguish, there's like different upscalers that take the outputs and then run some upscaling how does this affect the industry to, in your mind?

[01:09:40] Alex Volkov: Will this Accelerate some stuff. Will this be applied to different areas that like the fusion models have not been traditionally in? What is the kind of the, let's say this is a building block that you've created. How does this affect us in three, six months?

[01:09:54] Tanishq Abraham: Yeah, I think this is just a kind of a new unique direction to explore. Of course, I think latent diffusion is still a very interesting, invaluable direction, but this is just it's always good to have different directions to explore. And I think And honestly, like this architecture can be applied to latent diffusion as well, and maybe we get even better results, for example, we can do maybe like, multi megapixel level synthesis by combining, this method with latent diffusion or something like this as well.

[01:10:23] Tanishq Abraham: So it's not even like it's. Limited to just the pixel space. That's what we're showing that, that's something that is interesting about this. But again, it can also be applied to agent diffusion and can even, of course, these models could be scaled up further. There's a whole lot of future work to explore here, I think.

[01:10:39] Tanishq Abraham: And yeah, I think and of course it's computationally efficient. And yeah, I think the nice thing is yeah, moving towards the transformer architecture when, it's, people understand the transformer architecture at this point. I think, there's people understand how to scale it and different tricks.

[01:10:55] Tanishq Abraham: And it's, I think this is a good, by introducing this architecture, this is a good way for. As to try to bring some of those advances in transformers into the diffusion model field as well. So I think that's the other interesting aspect of this.

[01:11:12] Alex Volkov: for me reading this is not a machine learning scientist. Reading this was like the highlight of interesting things were like The open source community moves in, in, in different areas, but also like bringing over some of the learnings, bringing over some of the talent the tooling around, like making things available.

[01:11:28] Alex Volkov: And I think that's like very exciting. We also have Alex Birch, is that correct? The name also in the audience. So shout out Alex. And then what else do we not cover this stage? What is the last thing that you want to say? Or maybe shout out some of the co authors feel free, the stage is yours.

[01:11:44] Tanishq Abraham: Yeah, I'm just looking at some comments that I, Alex also has some comments that he said. So he thinks, for example, that with this model, that there's potential to. Achieve more realistic textures than even mid journey. So I think, we have observed, like with the model, like the, because that's the thing about using, when you're using a latent diffusion where, it's not, you're not doing, when you're not doing it at the pixel level, it's a bit.

[01:12:07] Tanishq Abraham: Difficult to get those get those, textures accurately, but if you're doing it the pixel level I think you're able to get those textures yeah, it can do that much better. And we've observed that with the models that we've been training. And yeah, I definitely agree with Alex there.

[01:12:22] Tanishq Abraham: Yeah, I think also like it may have potential to achieve like really realistic textures and that, that's something. That I guess we could look forward to hopefully. Yeah.

[01:12:31] Alex Volkov: that's incredible cause I think the realism comes from the imperfections, especially like textures and skin, et cetera. And like diffusion models have, at least for many folks are easier identifiable by the kind of the smoothness of edges and different things. So definitely like more more textures are there for humans in real pictures.

[01:12:50] Alex Volkov: And then we're looking forward to more of that in diffusion models. That's incredible. So definitely, thank you for breaking this down for us, Dinesh. Thank you and Catherine and Alex and everybody else in Rico who worked on this. I think we have some questions from folks on stage here. Vic, go ahead, please.

[01:13:05] Vik Hyatk: Yeah, another question.

[01:13:06] Vik Hyatk: I just wanted to see I played around with the repository a bit. It's a great way for anyone interested in getting into diffusion models to get started. It's not your typical research code base. It's super clean.

[01:13:19] Vik Hyatk: You're not going to run into a bunch of dependency issues and whatnot.

[01:13:22] Vik Hyatk: So that

[01:13:23] Vik Hyatk: was amazing. It's also super compute efficient, so you don't need a ton of compute. To start to see good results. I'd strongly recommend checking it out if anyone was feeling intimidated

[01:13:32] Vik Hyatk: before,

[01:13:32] Vik Hyatk: don't be.

[01:13:34] Alex Volkov: Incredible.

[01:13:35] Tanishq Abraham: Yeah. That, that, that comes down to Kat's again, Kat's genius. I think this is a code base that she's been working on for quite some time and I also really enjoy working with it.

[01:13:42] Tanishq Abraham: It's like one of my favorite diffusion model code bases. So I definitely agree that anyone who's interested in playing around with diffusion models should check it out.

[01:13:49] Alex Volkov: So that, that's on Cat's GitHub. We're going to add this in shell notes called KDiffusion, correct? It's now

[01:13:55] Alex Volkov: part of that existing code base, but now like this, the Hourglass Diffusion Transformer. Get used to say Diffusion Transformers from now on, folks. Hourglass Diffusion Transformers, HDITs, are now a thing.

[01:14:06] Alex Volkov: And Tanish, thank you so much. And Alex for joining in from the comment area. And thank you for working on this work. Hopefully this will get the recognition it deserves and definitely as a foundational block to get us Higher performance, lower, hardware requirement models that look way better.

[01:14:22] Alex Volkov: Incredible.

[01:14:23] Open source models in medical fields

[01:14:23] Alex Volkov: Tanishq I wanted to follow up with you, because MedArk is something that you're now CEO of medical things, and then you had a tweet today that I really wanted to talk to you about specifically because Quyen was involved, and we have like folks from Quyen, usually like friends of the path as well, they join us could you,

[01:14:37] Alex Volkov: let's talk through this please, let's talk through How open source is catching up to medical space.

[01:14:42] Alex Volkov: Could you briefly summarize what we've talked, recent work from you guys?

[01:14:46] Nisten Tahiraj: Yeah. Sure. Yeah. I've been

[01:14:48] Tanishq Abraham: quite busy with all kinds of different research projects. So that was another. Ongoing research project that we're working on at MedArc and that I'm shared some progress of that today morning. So basically, at MedArc, we're of course interested in [01:15:00] developing open source medical language models.

[01:15:03] Tanishq Abraham: So that, that's something that we're heavily interested in. And of course, in order to be able to do we wanted to understand what The current capabilities of these language models look like the open source language models and no one had done like a very proper analysis of this as far as I could tell and yeah, basically we, what we did is we added this suite of tasks known as the Multimed QA.

[01:15:24] Tanishq Abraham: Sweet of tasks. So this is a kind of a bunch of tasks, a total of nine tasks that were they came from different other papers and stuff, but Google put them together as this is their sort of evaluation bench, this is the evaluation benchmark that This is what Google was using to evaluate their MedPAL models and, whatever models they had.

[01:15:44] Tanishq Abraham: And then, the community, the medical AI community been using that. It's been used to evaluate GPT 4

[01:15:49] Unkown: and all kinds of

[01:15:50] Tanishq Abraham: other models as well. And yeah, I, we, at MedArf, we added it to the LM eval harness. So that's like the common sort of for open source language models.

[01:15:59] Tanishq Abraham: Everyone I think uses LM eval harness to evaluate the models on various tasks. So now it's in there. So people can easily also evaluate their, whatever the models they have on these medical tasks. And so once we added it into LM eval harness, we just wanted to just. Do a comprehensive like analysis of a whole bunch of models in the open source space, just to see like these sorts of generalist models.

[01:16:21] Tanishq Abraham: Like they're not necessarily particularly trained on medical data. Of course they've probably seen some in, in, in their pre training or whatever, but that's not their main purpose and that's not their main focus in their pre training. And I'm, I was just curious what their performance would look like and, how it compares to other models like GPT 4.

[01:16:36] Tanishq Abraham: GPT 4 is also a generalist. It's a generalist language model as well. It's not also necessarily trained on medical, but, it's really good at that. In fact Prompt Engineer GPT 4 is state of the art on this benchmark, actually.

[01:16:48] Alex Volkov: I remember this. I remember where Google came up with a specific medical device and then GPT 4 just like basically with prompt engineering on that benchmark became the top one, right? This was quite incredible that the most generic

[01:17:00] Alex Volkov: model we have. Yeah,

[01:17:02] Tanishq Abraham: that's the it's called MedPrompt. That's the state of the art, this prompt engineering, prompt engineered GPT 4, it's called MedPrompt. And so they do a whole bunch of tricks like, dynamic few shot and GPT 4 written chain of thought and all kinds of tricks that they throw at GPT 4 and they got state of the art.

[01:17:18] Tanishq Abraham: And then of course they use the same tricks to then, later claimed that GPT 4 is better than Gemini as well. It's not just for medicine that you can use it. They use it for just general prompt engineering as well. But yeah, anyway so yeah, this is, so overall the point is I wanted to evaluate how these models do in the how the open source models do in this, on this benchmark.

[01:17:38] Tanishq Abraham: And so I evaluated a whole bunch of models. I evaluated Lama, Mistral, Mixtral. I evaluated the Yi series of models. I evaluated Quinn. Yeah, so I evaluated a whole bunch of models here and basically what I found out is first of all, Lama 2 is not that great compared to all these other models, actually, and it's, It's interesting because in the literature people are still fine tuning Lama 2 for medical purposes but, it actually doesn't have a very good base capability of for medical knowledge.

[01:18:09] Tanishq Abraham: So Lama 2 is not very good at medical stuff, but the models that are quite good are basically the Yi series of models, so Yi 34b is really good, as well as the Quen series of models. So Quen 72b is The state of the art open source model and it's, and this is not like doing any sort of prompt engineering or anything like this.

[01:18:28] Tanishq Abraham: This is just like five shot prompting and it's beating MedPalm version 1. So MedPalm version 1 was released in November of 2022 and that was like the first sort of yeah, that was Google's model that they had. And this Quenz72B is beating MedPom1 without any sort of prompt engineering or any of these tricks.

[01:18:50] Tanishq Abraham: And yeah, I think that's really, honestly, quite impressive because

[01:18:54] Alex Volkov: Yes.

[01:18:55] Alex Volkov: I want to shout out Jun Yang or Justin Lin a friend of the pod, the head of technical, working on Quen for such like incredible achievement. And thank you for testing this. Because we and Nistan, like you worked on AI in medicine as well. Like we've waiting, this is going to happen.

[01:19:11] Alex Volkov: Want it or not, there's like several doomers that say, Hey, never trust an AI doctor, but, many people already go to JGPT to, maybe get a second opinion. And Google has obviously been working on this MetPalm and MetPalm2.

[01:19:22] Alex Volkov: I think for many people it's going to be easier to digest this idea if the model that talks to them is like fully runs on their computer, open source, no internet, like no data sharing.

[01:19:33] Alex Volkov: I think that's a very important piece of this as well. And it's great to see that, we're now getting like some cool comparison, but definitely open source is coming strong on this one.

[01:19:42] Unkown: Yeah.

[01:19:43] Nisten Tahiraj: Yeah. I had the same thing as, Astonish with the Lama models, you can train them on good medical data, but they don't have a, they don't perform great at the base. I'll tell you, it's still, GPT 4 is king when it comes to it. And the product I worked on last year in March, it's still going, Dr.

[01:20:04] Nisten Tahiraj: Gupta. ai is, it is still going. It's just a very well prompted, engineered product. Doctor with with a good RAG system too, that was one of the first, but I will say the thing about the main concern now, and why I think open source will basically completely dominate medical AI, is that their main concern is If they're dependent on some kind of API endpoint that makes the hospital and people's medical data really vulnerable to malware and foreign intelligence groups, which have been wrecking havoc with with medical data and ransomware.

[01:20:42] Nisten Tahiraj: So that's their main concern. And the only way we're going to solve that is by having models that they run locally. So I'm really glad Tanishq actually took the task on. Benchmarking some of these, because you have the entire medical safety field and all the funding and all the people and I have yet to meet an AI safety person that even knows how to rename a file in Linux, let alone actually write some kind of benchmark.

[01:21:07] Nisten Tahiraj: So I'm glad someone's actually taken on the challenge of making open medical yeah, medical LM benchmarks.

[01:21:19] Tanishq Abraham: Yeah, I completely agree in terms of yeah, I definitely think open source is definitely the feature for medical AI and medical LLMs. And I think hospitals and doctors will be more comfortable when they know they have access to the model and this is the model that they're using rather than when it's behind some API also where not only like in the case of like malware or things like this, but open eye.

[01:21:40] Tanishq Abraham: AI will just change the model or something like this too, or, these are all concerns that we see this already happening with the models that OpenAI has. And, these are all like concerns that, there needs to be complete transparency when working with with these kind of more crucial applications.

[01:21:55] Tanishq Abraham: And, by doing all this open source I think that that provides that transparency that doctors and hospitals and healthcare systems will be comfortable with that. That's why I'm really excited about working in this area. And I think there's really a lot of potential here.

[01:22:09] Alex Volkov: incredible. Thank you for this work, Dinesh. Thank you for bringing us kind of the idea of which of the models. Surprisingly, Quen. Like I wouldn't assume if you gave me all the models that we've talked about I wouldn't assume that Quen was the most performing, but hey, we'll take what we can get.

[01:22:22] Alex Volkov: Quen72b, the best open source doctor, folks. You hear, you heard it here based on this research.

[01:22:30] Tanishq Abraham: Yeah. Thank you for letting me share all this work.

[01:22:32] Alex Volkov: That's incredible. And as a friend behind the scenes, but now friend of the path, you're always welcome. Thank you for the deep dive on the Hourglass Diffusion Transformers. Thank you for the authors as well. Alex, like still, I think is in the audience and Catherine and Rico and some other folks, and definitely for MedArk, keep us up to date.

[01:22:48] Alex Volkov: We'll keep reporting and the stage is yours whenever you want it. I think folks we're moving forward. I think Nissan, unless you have, or sorry, Tanish, you have the one last thing you want to

[01:22:57] Tanishq Abraham: I would just say please follow first of all, follow all of our Hourglass Diffusion authors. They all deserve your support and also please follow MedArk as well.

[01:23:06] Alex Volkov: 100 percent worth following and definitely will be in the show notes for folks who are listening to this while driving and cannot like click that follow button. I think we're moving to as we're in the hour and a half into the space, let me reset [01:23:20] this a little bit for folks. If you just recently joined us, you're listening to ThursdAI where we talk about everything.

[01:23:26] Alex Volkov: And everything incredible and interesting in the world of AI and open source, LLMs, big companies we cover. And we also had a deep dive today about vision video. My name is Alex Volkov. I'm with Weights Biases. I'm an AI evangelist. And yeah, we're here every week and we keep up to date. So you don't have to, so if you were out of Twitter or if you don't even participate in Twitter and you're just listening to this on the podcast, we got you we're going to cover everything that's most important and then send you this, so definitely check out.

[01:23:52] Alex Volkov: There's the i. news for that. And I think we're moving towards the big companies area, which we haven't touched. We briefly covered in the breaking news where Hug Face just announced a partnership with Google. So you'd be able to very easily run the models from Hug Face on TPUs and the Thingisneyosa GPUs, which is incredible because Google has those, but they don't even give them away.

[01:24:15] Alex Volkov: I think they're all reserved for collab or something. But also. Everything that I have today in the big company LLMs and APIs and everything is from Google.

[01:24:25] Google teases LUMIERE, SOTA video generation models

[01:24:25] Alex Volkov: So the next thing that we're going to talk about is Lumiere. And I don't know if you guys saw the video, but I definitely saw the video. I think Pharrell, you sent this in our group chat first, but by that time there was already spreading around.

[01:24:37] Alex Volkov: . So there's obviously the whole area that we've talked about. Sable Diffusion Video releases like very short videos image to video and text to video. And then there's the front runners in the closed source, which is Runway and Pika. And there's like another one Firework. Oh, Leonardo is doing some incredible things.

[01:24:54] Alex Volkov: All of them have very short videos and the consistency between the frames is not like incredible. And Lumiere. Has shown a video and this, like for all, sorry, you're saying this could be like very cherry picked, et cetera. But it feels like this is like another step in this direction that's significant.

[01:25:13] Alex Volkov: And for folks who are not like watch the video yet, definitely worth watching. I'm going to add this it's already on the top of the space, but basically you see they announced a bunch of stuff that Lumiere can do besides just generation. So video in painting is one that they've announced.

[01:25:28] Alex Volkov: They announced like a text to video text to video, image to video in painting. And they have something like they say, realistic, diverse, and coherent motion specifically around the motion of kind of the characters, which has been lacking in all these like video synthesis. I will say it's.

[01:25:44] Alex Volkov: It's pretty remarkable to even discuss that oh, this vision text to video image is not as good as that one. It's really incredible that we're, like, at this point where we can say, a highbrow, Oh, yeah, I prefer this output. We're, like, we're typing text and getting a video back.

[01:25:59] Alex Volkov: It's ridiculous on the surface of even saying this to us. Like a year and a half ago that this would even be possible. But with that said, we're moving forward. We're like hedonistic adaptation is a thing. We're getting used to these tools and we're getting them like day to day. And then we're like, okay, yeah, this tool is better.

[01:26:15] Alex Volkov: They said the existing video malware synthesized distant keyframes, followed by temporal super resolution and then that's probably it makes it temporal consistency difficult to achieve. Temporal consistency basically says where like characters throughout the video, what they do.

[01:26:30] Alex Volkov: And so you've all seen these videos where like the face changes from frame to frame, et cetera. And this. This series of videos from New Year looks very consistent, like spatially and temporally. Like definitely where the characters are in the video, but also like throughout time. And they Attribute this to different methods that they've used I will not go into this, but I think the tasks are very interesting.

[01:26:53] Alex Volkov: They have video editing applications image to video and painting and stylized generation. Something I also liked. You'd be able to take like an image and then generate videos based on that style, not necessarily that image. So very impressive from folks from Google, as always from Google.

[01:27:08] Alex Volkov: I haven't played with this. I don't think there's a way for us to play with this yet. So there's a paper maybe some of the ideas in the paper could be reproduced in open source. But it's like a model show in the paper from folks quite a lot of folks, Omar Bartal, Hila, Omar, Charles Herman, and there's like a bunch of folks there on the paper.

[01:27:25] Alex Volkov: Very like visually appealing demo as well. So definitely we'll add this video in the show notes. And I think we have. One more thing here in Diffusion stuff. Yes, the one last thing that I wanted to talk about is Instant ID. Where so we moved off from Lumiere, Lumiere is like super, super cool, but we haven't seen this work.

[01:27:43] Alex Volkov: Hopefully the releases as Google has a back, they have an example of like when they released stuff, like Dreambooth was released and everybody was using this. And. I think that's pretty much it in the big companies and open source.

[01:27:55] InstandID - 0 Shot face transfer diffusion models

[01:27:55] Alex Volkov: The other thing that I wanted to mention is instant ID. We've mentioned this briefly before, but it's been pretty much everywhere on my timeline. If you haven't played with this, I very strongly encourage you to play with this. Because instant ID is a technique to transfer to create diffusion models with your face.

[01:28:11] Alex Volkov: And we've all probably tried this at once. With, like I said, like a dream booth from. Nathaniel Ruiz, who's a dear friend of the pod, who's been here a couple of times. There's like other techniques also to transfer your face into a latent diffusion model. And they all used to take multiple images of your face and some amount of training.

[01:28:32] Alex Volkov: And Instant ID is basically a technique that you can try right now, super quick. With zero shot, one image. You can generate images with your face, or with your kid's face, or whatever. And literally I just want to highlight how impressively fast we're moving towards these type of tools. This used to take fine tuning.

[01:28:52] Alex Volkov: This used to take GPU and knowledge, and there's, like Kokhya, and like this used to take Loras and before Loras, Dreambooths. So actually there's a couple of companies that I know that built on top of providing the fine tuning experience around this, where you upload images, you get like this huge, like four gigabit, like stable diffusion file specifically trained on you as a concept.

[01:29:13] Alex Volkov: And now there's like a zero shot transfer thing called Instant ID. Where a hug and face demo is included here. I will attach this now soon. Where you just upload one image of yourself. Literally for me and Nishtha and Tanishq, for the non on, Umesh, for the non anons here on stage, we'd be able to use our profile picture here and just generate us with a cowboy hat in, in noir style and it will look like us.

[01:29:36] Alex Volkov: For most of the time. I've tested this Instant ID on my kids. And, I'm not going to post this because of privacy. But my kid loved it incredibly so much. He was a superman. It looked like him. It's unbelievable. That it was, like, able to transfer this with one image. It's quite incredible how fast we moved here.

[01:29:52] Alex Volkov: Definitely, if you haven't tried Instant ID but you have tried avatars before, Try Instant ID, you'll be blown away. It runs on your Mac as well, not that great, but it runs through Pinocchio computer. Definitely worth noticing how fast we're moving in this generation. And shout out to whoever built this.

[01:30:08] Alex Volkov: And there's quite a few technologies like this now. Highlighting how fast we're moving, and I think that's pretty much it.

[01:30:15] Voice and Audio - New tech challenges Whisper

[01:30:15] Alex Volkov: So we've covered our diffusion. We've covered yeah, let's move to voice and audio Nistan, you brought us this new, so I definitely want you to pull up the tweet and let's talk about the faster encoder ASR.

[01:30:25] Alex Volkov: And then we can also, while maybe you pull this up, I will say that this week I've 11Labs announced like a big funding rise, but 11Labs also released their dubbing studio. And if you followed Twitter at all, not even the I Twitter for the past like week and a half, two weeks, you maybe have seen the dubbed video of the Argentinian prime minister, or I don't know if he's a prime minister or president, probably president, right?

[01:30:55] Alex Volkov: Yes, president. Millet something he went to the World Economic Forum and gave a speech in Spanish. And then there was a dubbed version, as like these meetings of global summits of leaders, et cetera, they have. Instant translation in their ear to any language, and that's a human that knows both languages.

[01:31:14] Alex Volkov: And then, somebody said, hey, okay, this is one example, and they posted a Heijan. If you remember Heijan, we've talked about Heijan, quite incredibly translation, dubbing, and leap modulation service, where you can upload yourself and get an instant avatar. Somebody used Heijan on the whole speech.

[01:31:29] Alex Volkov: And that went ridiculously viral. I think there was like 50 million views on it, on X. And that was like mostly a combination of [01:31:40] Millet being like very viral in his opinions, being like, stoking some controversy. But also because you literally hear the person. Speak in English with a Spanish accent where this didn't happen, like literally he spoke in Spanish.

[01:31:52] Alex Volkov: Quite incredible technology and people have been shocked and said, Oh my God, this is coming for all of us in DeepFakes. Fine, we've talked about this multiple times. So Eleven Labs now has a, like a alternative to this, called Eleven Labs Dubbing Studio. And I've actually used this on a piece of Like on a trailer for ThursdAI, of me speaking in English, and they asked to dub me in Russian, the language that I do speak, and my mother tongue from Ukraine, and it sounded ridiculously cool.

[01:32:18] Alex Volkov: Here's a quick snippet of me from a Thursday I show with you three weeks ago that I dubbed into Russian for your entertainment.

[01:32:28] Gadget for children, for parents who have children who do not want to buy iPhones. Because then Instagram will destroy their brains. This is the perfect device for this.

[01:32:36] It looks like a language. In fact, you can talk to a rabbit, it is very cute, there is one simple interface, this is a voice.

[01:32:43] Alex Volkov: It sounded like, so far, How should I say, these models that emulate voice did not work on me. Specifically, my accent is not that great, but because my accent is probably Russian, the Russian version of me sounded really close to me.

[01:32:54] Alex Volkov: For the first time, I was like, Oh, okay. All right. And Eleven Labzner released this dubbing studio and hopefully these models are now coming to open source.

[01:33:04] AI deepfake of Biden caused controversy on mass media about AI

[01:33:04] Alex Volkov: Because there's also a thing where I think there's a recording of Biden saying something like stay home going around and everybody in the media making the big fuss about, Oh my God, AI is coming for all of us.

[01:33:15] Alex Volkov: And there's a big cry for folks to say, we should build tools to detect against this, et cetera. And my stance remains the same. Listen, I think we've talked about this multiple times. The only way through these woods is for everybody to know that their voice is very easily be fakable with three seconds or 10 seconds of their voice.

[01:33:31] Alex Volkov: It's time for the it's time for humanity to adapt to the situation where there's no panacea here. You should just know that just trusting voice blindly without knowing the source just don't do that because it might as well be fake. I don't know if you want to add anything.

[01:33:44] Alex Volkov: Yeah, go ahead.

[01:33:45] Nisten Tahiraj: really quick, I want to say, we already have laws to deal with this. More law is not necessarily going to fix the issue because, fraud is illegal in a free market. And if you want. Or at least people that are more in politics and stuff. If you want to solve the issue, do the job you already have.

[01:34:05] Nisten Tahiraj: You already have a list of spam callers, which you have been identified without an AI. And can you shut them down? So People love to imagine problems and love to think of doom or whatever in the future and then they completely ignore the stuff in front of them. All of us do this, but yeah, again, fraud is illegal.

[01:34:27] Nisten Tahiraj: Can you shut it down as a job, as a government? You don't need a new law, you don't need to be make speeches about AI. You need, just need to shut down fraud when it's identified. Otherwise, all of these tools and conferences and stuff are pointless.

[01:34:42] Alex Volkov: As predicted.

[01:34:43] Nisten Tahiraj: that's what I'm gonna

[01:34:44] Alex Volkov: Yeah, no, that's great. As predicted, the first. Election related deepfake type thing. The media was all over this and the doomers were like, here we go. And people were like it came sooner than we thought. And no, we've literally been talking about this for the past year.

[01:34:57] Alex Volkov: That like elections are coming. These things are going to happen. The technology was there even before. Now it's just like a little bit more accessible. The laws are in place, make it more difficult for grandmas to get spam calls, not make it difficult for the open source stuff. So hopefully like the more prevalent these technologies are, this is my stance, the better the chance that, people will just get used to this being everywhere.

[01:35:19] Alex Volkov: And definitely for folks of us who have our audio out there, we're doomed, right? So come up, like my usual suggestion here is come up with your loved ones with a key phrase that only you to know like. The Terminator scene with the dog come up with this and make sure that if you get a call in 3 a.

[01:35:34] Alex Volkov: m. at night, it sounds like a bad quality version of you, of your relative from somewhere, from an unknown phone. Make sure it's them by asking like, Hey, remember we went to Hawaii and you never went to Hawaii? And they say, Oh yeah, of course. But also you can probably most of those will be LLMs, so you can probably like.

[01:35:53] Alex Volkov: Don't prompt trick them, the spammy LLM calls that sound like you're a relative.

[01:35:57] W2V BERT ASR gets whisper quality with significantly less parameters

[01:35:57] Alex Volkov: Alright, moving for unless, listen, you want to add some stuff about this W2V BERT speech encoder? I've added it to the top of the space.

[01:36:07] Nisten Tahiraj: Yeah, just really quickly, I'm gonna do the paper reading on it 'cause

[01:36:10] Alex Volkov: Oh, hell yeah!

[01:36:11] Nisten Tahiraj: It's a pretty nice paper, so stay tuned from that at some point when we announce it and it's from MITs and and Google and some people from Google. So it's a, another really nice encoder only it has potentially seems to be up to 30 times faster.

[01:36:29] Nisten Tahiraj: So this could

[01:36:30] Alex Volkov: then whisper,

[01:36:31] Nisten Tahiraj: quite useful. It could be quite useful for those making assistance that run on local devices or on low resource devices. But also, For stuff on the web. Now it is officially supported by the Transformers library. We'll wait on Zenova to I think probably it's going to be available via WebGPU and stuff, I'm guessing.

[01:36:55] Nisten Tahiraj: Yeah it's very, it's nice to see that that field also going forward. Because we already have excellent speech recognition. We know it works really well. We just needed to work on more low power devices and mobile and

[01:37:08] Alex Volkov: Absolutely. And looks like some stats here. A bunch of languages are more than the Stan Whisperer, 143 languages. And you can like fine tune this on specific languages as well to make it like better. And VB benchmarked it on Mongolian, and beat Whisperer in less than 1200 steps. So smaller model, like fine tunable, super, super cool, and the best part of it is MIT license.

[01:37:29] Alex Volkov: So there have been other ASRs. They're not in this license. And now we're getting like a state of the art tiny model in this license. I think that's most of the stuff that I wanted to cover.

[01:37:39] NSF announces a new initiative called NAIRR

[01:37:39] Alex Volkov: No, I wanted to cover one last thing. One last thing. National Artificial Intelligence Research Resource. N A I R R.

[01:37:47] Alex Volkov: Which is coming to us from National Science Foundation, United States National Science Foundation collaborating with agencies and different so All of these incredible three letter agencies are collaborations in this foundation now. NSF is the science foundation, both DARPA and NASA, and NIST, which is the Institute of Standards and Technology, and DOD and DOE, and, like, all these things.

[01:38:11] Alex Volkov: But also, the private sector is joining this companies like Entropic and OpenAI. And Palantir, and Google, and Luther, and HugInFace, and Weights Biases. Obviously, I saw this oh, that's cool. We're, like, Weights Biases are participating in this incredible effort. Are all joining together in this initiative to, to promote, support AI research and advancing like safe and secure and trustworthy AI.

[01:38:33] Alex Volkov: And it's also great to see like folks like Hug Face here and Meta as well is represented folks who push open source as well, because, these government affiliations, government organizations, they have to have folks who promote open source as well. And they've organized them to. Four focus areas open enable AI research to access into diverse AI resources via the NAIRR pilot portal.

[01:38:56] Alex Volkov: So definitely expect there to be government grants for GPUs for different things, I don't know how easily those will be obtainable, but we had some folks in Canada from Canada before talked about you could ask for grants. to train or fine tune like the stuff that Tanish was talking about research which open source is better medical in QA could be happening through the government they also focus on security and And I think something called NARR classroom, which I have no idea.

[01:39:22] Alex Volkov: Oh, which new communities for education, training and user support. Like very government like approached. However, this is definitely like good to see the companies that participate in this. It's not only government, it's also open, like a private sector as well. NVIDIA is there, AMD is there, Eleuther, like we said, open source folks are represented as well.

[01:39:43] Alex Volkov: A huge kind of chunk of companies, it's good to see that the government is like actually moving towards some standardization which may be needed hopefully less regulation, more standardization. And I think with that, we are pretty much all over the news that we had for [01:40:00] this week. Which was great.

[01:40:01] Alex Volkov: I want to say thank you. A huge thank you again for, first of all, the listeners who come here and listen, and the folks on stage who help me from week to bring you the latest and greatest in the iNews.

[01:40:11] Alex Volkov: Thank you so much, and we'll let you go on this Thursday, and we'll see you next week.

[01:40:14] Alex Volkov: Take care, everyone. Bye bye.



This is a public episode. If you’d like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe
Fri, 26 Jan 2024 01:13:03 GMT
📅 ThursdAI Jan 18 - Nous Mixtral, Deepmind AlphaGeometry, LMSys SGLang, Rabbit R1 + Perplexity, LLama 3 is training & more AI news this week

👋 Hey there, been quite a week, started slow and whoah, the last two days were jam-packed with news, I was able to barely keep up! But thankfully, the motto of ThursdAI is, we stay up to date so you don’t have to!

We had a milestone, 1.1K listeners tuned into the live show recording, it’s quite the number, and I’m humbled to present the conversation and updates to that many people, if you’re reading this but never joined live, welcome! We’re going live every week on ThursdAI, 8:30AM pacific time.

TL;DR of all topics covered:

* Open Source LLMs

* Nous Hermes Mixtral finetune (X, HF DPO version, HF SFT version)

* NeuralBeagle14-7B - From Maxime Labonne (X, HF,)

* It's the best-performing 7B parameter model on the Open LLM Leaderboard (when released, now 4th)

* We had a full conversation with Maxime about merging that will release as a standalone episode on Sunday!

* LMsys - SGLang - a 5x performance on inference (X, Blog, Github)

* NeuralMagic applying #sparceGPT to famous models to compress them with 50% sparsity (X, Paper)

* Big CO LLMs + APIs

* 🔥 Google Deepmind solves geometry at Olympiad level with 100M synthetic data (Announcement, Blog)

* Meta announces Llama3 is training, will have 350,000 H100 GPUs (X)

* Open AI releases guidelines for upcoming elections and removes restrictions for war use (Blog)

* Sam Altman (in Davos) doesn't think that AGI will change things as much as people think (X)

* Samsung S24 has AI everywhere, including real time translation of calls (X)

* Voice & Audio

* Meta releases MAGNet (X, HF)

* AI Art & Diffusion & 3D

* Stable diffusion runs 100% in the browser with WebGPU, Diffusers.js (X thread)

* DeciAI - Deci Diffusion - A text-to-image 732M-parameter model that’s 2.6x faster and 61% cheaper than Stable Diffusion 1.5 with on-par image quality

* Tools & Hardware

* Rabbit R1 announces a deal with Perplexity, giving a full year of perplexity pro to Rabbit R1 users and will be the default search engine on Rabbit (link)

Open Source LLMs

Nous Research releases their first Mixtral Finetune, in 2 versions DPO and SFT (X, DPO HF)

This is the first Mixtral finetune from Teknium1 and Nous team, trained on the Hermes dataset and comes in two variants, the SFT and SFT+DPO versions, and is a really really capable model, they call it their flagship!

This is the fist Mixtral finetune to beat Mixtral instruct, and is potentially the best open source model available right now! 👏

Already available at places like Together endpoints, GGUF versions by the Bloke and I’ve been running this model on my mac for the past few days. Quite remarkable considering where we are in only January and this is the best open chat model available for us.

Make sure you use ample system prompting for it, as it was trained with system prompts in mind.

LMsys new inference 5x with SGLang & RadixAttention (Blog)

LMSys introduced SGLang, a new interface and runtime for improving the efficiency of large language model (LLM) inference. It claims to provide up to 5x faster inference speeds compared to existing systems like Guidance and vLLM.

SGLang was designed to better support complex LLM programs through features like control flow, prompting techniques, and external interaction. It co-designs the frontend language and backend runtime.

- On the backend, it proposes a new technique called RadixAttention to automatically handle various patterns of key-value cache reuse, improving performance.

- Early users like LLaVa reported SGLang providing significantly faster inference speeds in their applications compared to other options. The LMSys team released code on GitHub for others to try it out.

Big CO LLMs + APIs

Meta AI announcements (link)

These #BreakingNews came during our space, Mark Zuckerberg posted a video on Instagram saying that Llama3 is currently training, and will be open sourced!

He also said that Meta will have 350K (that’s not a typo, 350,000) H100 GPUs by end of the year, and a total of ~600,000 H100 equivalent compute power (including other GPUs) which is… 🤯 (and this is the reason why I had to give him double GPU rich hats)

Deepmind releases AlphaGeometry (blog)

Solving geometry at the Olympiad gold-medalist level with 100M synthetic examples

AlphaGeometry is an AI system developed by Google DeepMind that can solve complex geometry problems on par with human Olympiad gold medalists

It uses a "neuro-symbolic" approach, combining a neural language model with a symbolic deduction engine to leverage the strengths of both

The language model suggests useful geometric constructs to add to diagrams, guiding the deduction engine towards solutions

It was trained on over 100 million synthetic geometry examples generated from 1 billion random diagrams

On a benchmark of 30 official Olympiad problems, it solved 25 within time limits, similar to the average human medalist

OpenAI releases guidelines for upcoming elections. (Blog)

- OpenAI is taking steps to prevent their AI tools like DALL-E and ChatGPT from being abused or used to spread misinformation around elections

- They are refining usage policies for ChatGPT and enforcing limits on political campaigning, impersonating candidates, and discouraging voting

- OpenAI is working on technology to detect if images were generated by DALL-E and labeling AI-generated content for more transparency

- They are partnering with organizations in the US and other countries to provide users with authoritative voting information through ChatGPT

- OpenAI's goal is to balance the benefits of their AI while mitigating risks around election integrity and democratic processes

Microsoft announces copilot PRO

Microsoft announced new options for accessing Copilot, including Copilot Pro, a $20/month premium subscription that provides access to the latest AI models and enhanced image creation.

Copilot for Microsoft 365 is now generally available for small businesses with no user minimum, and available for additional business plans.

This weeks Buzz (What I learned with WandB this week)

Did you know that ThursdAI is not the FIRST podcast at Weights & Biases? (Shocking, I know!)

Lukas, our CEO, has been a long time host of the Gradient Dissent pod, and this week, we had two of the more prolific AI investors on as guests, Elad Gil and Sarah Guo.

It’s definitely worth a listen, it’s more of a standard 1:1 or sometimes 1:2 interview, so after you finish with ThursdAI, and seeking for more of a deep dive, definitely recommended to extend your knowledge.

AI Art & Diffusion

Zero shot face adapted image gen - 3 different tech approaches

What used to take ages, now takes seconds with 0 shot, there are quite a few approaches to generate images with real human faces, in 0 shot capacity, providing just a few faces. Gradio folks call it Zero-shot face-adapted image generation and there are 3 tools to generate those:

1⃣IPAdapter

2⃣PhotoMaker

3⃣InstantID

Here’s a great summary thread from Gradio folks for this fast advancing field! Remember we had to finetune on faces for a long time? Dreambooth and then LORAs, and now we have this exciting development.

Tools & Hardware

Rabbit R1 partners with Perplexity

The R1 device that was just announced, is about to sell through it’s first 50K in just a few days, which is remarkable. I definitely pre-ordered one, and can’t wait to get my hands on it. Jesse the founder has been all over X, getting incredible recognition, and after a few conversations with Aravind Srinivas, they agreed to make a deal right on X.

Today they hopped on a space and announced that all the first 100K early buyers of Rabbit are going to get a full year PRO subscription of Perplexity (one of the best AI search engines out there) for free! I sure as heck didn’t expect it, but the email was sent just a few minutes after the X space, and now guess who uses perplexity pro?

Here’s an example of a perplexity searching ThursdAI content (it doesn’t always get it right tho)!

I guess that’s it for today, as I’m writing this, there are incredible other stuff getting released, Codium open sourced AlphaCodium (here’s a link to the founder talking about it) but I didn’t have a second to dive into this, hopefully will bring Imatar to ThursdAI next time and chat about it!

Have a great weekend all 🫡 (please give us a good review on Apple Itunes, apparently it really helps discovery!)

Full Transcription for convenience:

[00:00:02] Alex Volkov: Hey everyone, happy Thursday. My name is Alex Volkov. I'm an AI evangelist with Weights Biases, and this is Thursday AI.

[00:00:13] Alex Volkov: We had such a great show today, over 1100 of you tuned in to the live recording, which is incredible.

[00:00:30] I also wanted to say that if you're not subscribed to thursdai.news newsletter, please go ahead and do because I send a full blog with the links to the show notes and to the speakers that we have on stage, and you should be able to follow up.

[00:00:46] Alex Volkov: There's a bunch of multimedia, like videos, that are not coming through in the audio only podcast format. So please subscribe to ThursdayEye. News as well. This live recording, we also hosted Maxime Lebon, who's a senior machine learning scientist with J.

[00:01:04] Alex Volkov: P. Morgan, and the author of several models, and Merged models, lately the Neural Beagle model that we've talked about. We had a great conversation with Maxime. And that full episode will be posted as a Sunday special evergreen content episode. So please stay tuned for that.

[00:01:29] Alex Volkov: It's been an incredibly illuminating conversation in the world of merging and merge kit and everything else that Maxim does and it was a super cool conversation. So that's coming soon.

[00:01:41] Alex Volkov: And, as I've been doing recently, the following is going to be a 7 minute segment, from the end of the live recording, summarizing everything we've talked about.

[00:01:54] Alex Volkov: I hope you've been enjoying these TLDR intros. Please let me know in the comments if this is something that's helpful to you.

[00:02:05] ThursdAI Jan18 TL;DR recap by Alex

[00:02:05] Alex Volkov: Alright we started with talking today, Thursday I, January 18th. We was talking about n News imis, the Mixt mixture fine tune that came out from Teo and the folks at News. It, it was of the first fine noon of mixture, the mixture of experts model from a mistral that came from the news research folks.

[00:02:35] Alex Volkov: And it released in two versions, the DPO only version SFT plus DPO version. Given different data sets they was trained on and actually different capabilities. It looks based on the community, the DPO version is like very well performing. I've been running this on my Macbook with LM studio and it really performs well.

[00:02:53] Alex Volkov: So shout out and folks should try this. This is By far the best, looks like the best new Hermes model based on just benchmarks. They're trained on the best open source model that's currently Mixtro. Mixtro is number 7th in the world based on LMCS Arena, and that's an open source model that we all get to use.

[00:03:10] Alex Volkov: Then we've covered the Neural Beagle 14. 7b from Maxim Le Bon. Maxim also joined us for a full interview that you can hear as part of the a podcast episode and Maxim released a Neural Beagle, which is a merge plus a DPO fine tune. And it's one of the top performing 7 billion parameters on the OpenLM leaderboard.

[00:03:30] Alex Volkov: When released in a few days ago, now it's fourth. So the speed with which things change is quite incredible. We then covered the LMSYS. SGLang attempt is a 5x performance inference bunch of techniques together on the front end and the back end called Radix attention on the back end and the SGLang way to run through inference code on the front end that combines into almost a 5x performance on inference.

[00:03:56] Alex Volkov: 5x is incredible Nistan mentioned that it does less than 5x on like longer sequences and then we had a conversation about Where it could improve significantly, which is agents and agents are sending short sequences. Alignment Labs told us that this could be significant improvement in that area.

[00:04:13] Alex Volkov: So our agents are about to run way faster. A 5x improvement is just incredible. And we also mentioned that at the same day when this was released, another Optimization was shouted out by Tim Ditmers from the Qlora fame called Marlin that also improves by 4x some significant inference techniques.

[00:04:34] Alex Volkov: And I wonder if those can be compiled together in some way. Quite impressive. We also covered neural magic doing spars, pacification and sparse. And we did in a deep dive into a short, deep dive. Thank you. Alignment and thank you Austin for what's spars, pacification means. And they do in this for like major models and they compress them with specification to around 50% sparsity.

[00:04:55] Alex Volkov: It's zeroing. Out the weights that you don't actually use. And it makes the models like significantly smaller. We covered Desilang a little bit. We didn't actually get to the diffusion. I'll just read out those updates as well. Then we covered the OpenAI had new guidelines for upcoming elections, and they're trying to add techniques for folks to identify daily generated images.

[00:05:18] Alex Volkov: And they're adding, restrictions to how their LLMs are used in the context of voter suppression, etc. We then talked about DeepMind and AlphaGeometry, where DeepMind released And open sourced looks like a model called Alpha Geometry that uses neuro symbolic approach with two models that solves geometry at almost a gold medal at the Olympiad level.

[00:05:42] Alex Volkov: So Geometry Olympiads and quite impressive this release from from DeepMind and shout out. It was trained on a hundred million synthetic data set sources. A source from like more than one billion. Or so random examples and it's quite impressive. So shout out DeepMind as well. We also briefly mentioned Samsung that has a Samsung S24, the flagship phone that now Apple is needed to compete with, that has AI everywhere, uses the new Qualcomm chip and has AI in.

[00:06:10] Alex Volkov: Pretty much summarization everywhere. There's like a button with the sparkles with AI. And one cool thing that we haven't mentioned, but I saw MKBHD on Twitter review is that they added real time translation of calls. So you can literally call some people with a different language and on device translation, after you download the model on device, we'll actually be able to translate this in real time.

[00:06:30] Alex Volkov: So you can read what the other person said in different language, but also hear it. And that's like quite cool. Then we had a deep interview with Maxim Lebon, the author of many things. Recently, we've talked about Fixtral or Fixtral, the mixture of experts of the five models. We've talked about merges.

[00:06:46] Alex Volkov: Maxim had a great explanation on, on, on his blog. And then on the Hug Face blog about what merges, what MergeKit does and how that. Plays into the whole ecosystem, the top LLM leaderboard now has been taken over by merges, specifically, likely because merging models does not require additional computer, additional training, and that's fairly easy to do with just the code merges takes and combines.

[00:07:11] Alex Volkov: With different, using different algorithms like SLURP and other algorithms it combines different models and different weights from different models, including potentially building models of novel sizes. So we've seen 10 billion parameter models, like 120 billion parameters so you can use those techniques to Combine models or merge models into different ways.

[00:07:31] Alex Volkov: There's also Frankenmerge that uses different models to combine into one. So we dove into that and what the inspiration for merging and what it actually does. Maxim also released like Lazy Merge Kit, which is a thin wrapper on top of the merge kit from Charles Goddard. So shout out to Charles.

[00:07:47] Alex Volkov: So we had a very interesting interview about merging and thank you, Maxim, for joining us. Definitely worth a listen as well. And then we had breaking news from BigZuck and the meta team that talked about he gave an update about the number of GPUs that they have. And by the end of this year, they're talking about 350, and overall 600, 000 H100s or equivalents of compute which they're going to use for AI and Metaverse.

[00:08:14] Alex Volkov: And Definitely a great update. They're training Lama 3 right now. The stuff that we didn't get to, but I wanted [00:08:20] to update, there's a, and I will add in show notes. There's a stable diffusion code that runs 100 percent in the browser with WebGPU and Diffusers. js, a thread from ClipDrop, the CEO Cyril Diagne.

[00:08:32] Alex Volkov: And there's also, we've talked about DeciEye, the company that releases a bunch of models. They release DeciDiffusion, a text to image model with only 370, the 300. Sorry, 732 million parameters. It's twice as fast and 61 percent cheaper than Stable Diffusion with the same image quality, so that's getting improved.

[00:08:51] Alex Volkov: But I think they're talking about Stable Diffusion 1. 4, so not SDXL or the new one. And Desi, I also released Desi Coder, and we also covered the Stable Diffusion Coder that is a coding model that runs closer on device, a 3 billion parameter model that beats Code Llama 7b. I think that's most of the stuff we talked about.

[00:09:09] Alex Volkov: And then one of the major things that Umesh brought we've talked about corporate drama, maybe a new segment in Thursday Eye where Microsoft, Did some things that actually disrupted workflows and companies actual products built on top of Microsoft, which is considerably not great and led to a fight.

[00:09:30] Alex Volkov: Hopefully not, but potentially a legal battle as well, and that's not something that should be done by a cloud provider such as Microsoft. Very ugly. In addition to this, we also talked about Microsoft announcing the CoPilot Pro that's now open for small businesses for 20 bucks a month with no minimum seats as well.

[00:09:46] Alex Volkov: And I think that's most of the things that we've mentioned

[00:09:49] Alex Volkov: Let's go.

[00:09:51] Sounds: to all of you.

[00:09:57] Alex Volkov: from, I guess

[00:09:59] Sounds: all of you. Namaskaram to

[00:10:07] Alex Volkov: 2024, we all need to get used to say 2024 at this point we have a bunch of AI news. My name is Alex Volkov, I'm an AI evangelist with Weights Biases, and I'm joined on stage here with dear friends, co hosts of Thursday AI. Podcast, newsletter, live X recording, community, I don't know, a bunch of other stuff as well.

[00:10:29] Alex Volkov: Nishten does paper readings, is a semi part of this as well. Welcome everyone. Welcome.

[00:10:33] Introduction to the Session's Structure

[00:10:33] Alex Volkov: I will just say a few things before we get started. So first of all, for those of you who are new, who are listening to this for the first time first of all, welcome.

[00:10:41] Alex Volkov: It's great that you have found us. Please DM me with like how you found us. I would love to know as I'm looking into the channels, et cetera. However, I will say that we've been here every week, pretty much at the same time. I don't think we've changed time since the summer. So 8.

[00:10:55] Alex Volkov: 30 AM Pacific and we try to do this every Thursday. I think we missed one or two. I was sick once, apologies. But other than that, we're here to talk about the AI every week. And what happens often is as we as we talk about things, different breaking news happened and folks announced different stuff on Thursday., and we cover pretty much everything. A very broad spectrum in AI changes. So I know there's like spaces to talk about diffusion, specifically art spaces as well. So we cover diffusion to an extent, but we try to focus on I guess our main focus is open source LLMs. We love those. We have a bunch of folks here on stage. They're training and fine tuning the greatest kind of open source models and definitely follow up on the different how should I say, different techniques, like the merging stuff that we're going to talk to at length later, and we, we hopefully get to hear about them first before they take over hug and face which was the case, I think with some of the models and some of the techniques.

[00:11:54] Alex Volkov: And I see two more folks joining us as well from different areas of the open source community. So I will say welcome LDJ and welcome alignment, LDJ. You've been missing in action. I was just saying, how are you, man? Welcome back.

[00:12:08] Luigi Daniele: Yeah, I'm doing good. Glad to be

[00:12:10] Alex Volkov: Yeah. And also we have Austin AKA Alignment Lab. What's up Austin?

[00:12:16] Alignment Lab: Oh, dude, I'm doing great. I was actually just in a call with LDJ and he was like, oh, Thursday Eye is starting and I was like, let's go.

[00:12:22] Alex Volkov: Yeah that's exactly what I like to hear that the calendar events is popping off and Thursday is starting.

[00:12:27] Open Source AI: Nous Hermes Mixtral Finetune + DPO deep dive

[00:12:27] Alex Volkov: So with that, I think it's time for the open source stuff.

[00:12:44] Sounds: Open Source AI, let's get it started.

[00:12:48] Alex Volkov: All right, so welcome to probably the biggest, the most fun, the most Contentful section of Thursday ai, where we talk about open source, LLMs and lms. I guess we should also start mentioning because a bunch of these models that we see are also multimodal, and I guess we'll start with.

[00:13:08] Alex Volkov: , News Hermes Fine Tune on Mixtral we've been waiting for this, Mixtral was released I want to say a month or so ago, a month and a half ago, and now we're getting one of the top kind of data sets and fine tunes trained on Mixtral, and we're getting this in multiple formats.

[00:13:25] Alex Volkov: Again, shout out Technium. If you guys don't follow Technium yet what are you even doing showing up on Thursday? I definitely give Technium a follow. But Mixtral fine tune is available and it comes in two variants and SFT and then DPO and SFT only. So SFT is a supervised fine tuning and DPO, direct preference optimization.

[00:13:45] Alex Volkov: This is like a, not a new technique, but definitely has been around for a while. Many people are using DPOs at this point. We've talked about DPO multiple times. I think we also saw, Nistan, correct me if I'm wrong, the actual mixtural instruct is also DPO, right? We saw this in the paper.

[00:14:00] Alex Volkov: So DPO is everywhere. And this is not the first time that the SFT and DPO pair is getting released separately. I think we've chatted with John Durbin who's, shoutout John, is in the audience. And that conversation is on the feed. So definitely check out the conversation with John.

[00:14:16] Alex Volkov: And the Bagel models were also released separately with SFT and the DPO version as well. And I think John back then mentioned that each one has Different different things it's good at. And I also would love to figure out which one of the new, Neus Ermis Mixtural Fine Tunes is best at what.

[00:14:33] Alex Volkov: Technium has a bunch of stuff in in, in the thread, so I'll link this below for examples. And I will say that the comparisons to Mixed Real Instruct. Technium posted a bunch of comparisons to Mixed Real Instruct. And it's interesting that not all of the benchmarks look like on improvements.

[00:14:51] Alex Volkov: There's a few, I think on GPT4ALL and HelloSwag. The base model, at least the non DPO base model, still wins just by a little bit. But everything else, like ARX, AGI, EVAL, and MMLU are significant improvements. And we're gonna probably continue to see those improvements. Shoutout. If you have tried it, please let me know.

[00:15:08] Alex Volkov: I will say this last thing, that finally, after setting up LM Studio again, shoutout to LM Studio we'll get to chat with LM Studio at one point. Hopefully soon, I am now, the first thing I do is download these models because it's super, super easy. Both of them, Studio and Allama, and there was a tiny, I think, quantization thing in the beginning, and now there isn't, and now it works great.

[00:15:33] Alex Volkov: And these models, I've loaded them up on my Mac before, before a flight. And I was just able to chat with this AI with no internet connection or like poorly internet connection. It was really something. I know we've talked about this multiple times. Hey, put this on a a thumb drive and then have all of human knowledge, quote unquote.

[00:15:51] Alex Volkov: I'm not really saying it's all human knowledge, but I've been actually able to do this before my flight and it was really cool.

[00:15:57] Alex Volkov: And I think the last thing to mention here is that Technium suggests to make liberal use of system prompts. So all of Hermes models, which is, there's now a bunch of Hermes models flying around, definitely the most. At least the famous one is Hermes, I think, 7B, but also the YI version, and this seems to beat the YI version as far as our friend Wolfram Raven, Wolfram Loco Lama tested.

[00:16:22] Alex Volkov: This is probably the best news model out of them all. So far, obviously it's based on the best. Open source model called Mixtro and definitely liberal use of system prompts. Yeah, roleplay is suggested setting expectations, specifications and everything else you can think of. Very easy to do with Elm Studio.

[00:16:39] Alex Volkov: I haven't [00:16:40] dove into like actually how to steer these models for exactly the task that I do. Luigi, you said LDJ, you said that you want to Tell me how to use LM studio in regards on this. So I would love to hear from you. First of all, have you had a chance to try these models specifically? And second of all let's talk about system prompts in LM studio a little bit, because I think it's a part that people are definitely missing.

[00:17:02] Luigi Daniele: Yeah. A lot of the latest models like Hermes and I think maybe Dolphin too, trained with system prompts. So if you really want to get the best use out of it definitely use that and it's just same thing with chat GPT really, where it's give instructions of how you maybe want to have it respond to you, or maybe add in a few threats of, of what you would do to the AI if it does not respond correctly, and so surprisingly that seems to actually sometimes.

[00:17:28] Luigi Daniele: Give good results, I personally try to always say please and thank you, but yeah yeah. And there's also prefix and suffixes, which I think I talked to you about, Alex,

[00:17:36] Alex Volkov: You briefly mentioned this, but maybe worth like a given a little bit of a heads up for folks.

[00:17:41] Luigi Daniele: yeah I think it really is worth maybe just a sit down and just a video with me and you actually going through it, because,

[00:17:47] Alex Volkov: Sure.

[00:17:47] Luigi Daniele: it's a decent amount to go through, but, yeah on the model card of most models, if you just look at something called prefix or suffix that is usually described in the model card, then You apply that to the LM Studio settings on the right panel in the chat settings.

[00:18:03] Luigi Daniele: And yeah, you just make sure you have those things right. If you don't, there's a good chance you're not actually using the model correctly. And it's not going to give you the best results.

[00:18:10] Alex Volkov: And they differ from the base model as well. Like we've seen like different base models have different things that you want to you want to add there. And you may getting like the same performance, but getting under performed a little bit. I'll also say for folks who are using Mac the Silicon, Apple Silicon, there's a little hidden checkbox there that I don't know if it's like, it's by default already.

[00:18:30] Alex Volkov: It's called use Apple Metal. And definitely make sure that's on for you. Significant improvement in performance and inference. All so I think NeuralRMS, anything else on folks here on stage that want to talk about this model and how it was trained and the difference in DPO? Folks, feel free to chime in.

[00:18:45] Alignment Lab: There's the cool thing about DPO is It's so it's a reinforcement learning technique. I don't know if anyone else has had a chance to read the paper about it, but essentially what occurred was that some researchers found that the, that transformers already have a baked in optimal reward function.

[00:19:03] Alignment Lab: And so what DPO is really doing is just training the model on that reward function, just biasing it towards the selected. Like good example when you give it a good and bad example pairs not directly unique to to the, to this model, but it is super interesting because it really opens up a whole bunch of possibilities for what you can do with the model now that you can give it negative examples and get more performance for it.

[00:19:27] Alex Volkov: DPO is ranking different outputs in terms of like preference, . So can you talk about the pairs stuff? Everybody says DPO pairs, like what do they mean by pairs? Could you say this about this?

[00:19:38] Alignment Lab: instead of training on like typically what you would do is you would build your data set. And that would be like your good data set. You'd have a weaker model that you, than the one that you use to synthesize the dataset or just bad examples of responses for every single example in the dataset.

[00:19:54] Alignment Lab: So if you have one that's like, how do I make a cup of tea? And then instructions about how to make a cup of tea, then you'd also have that paired with a negative example of, a response to how do I make a cup of tea? And then, the response is something else, like how to build a Lego house or whatever.

[00:20:08] Alignment Lab: And when you go to actually train it, you show it both at once, and you tell it which one is the positive and which one's the negative, and you just bias it towards the positive. It's quite similar, conceptually, to the way that OpenChat does the CRLFT training, although OpenChat actually has a specific token for the good and bad examples that it has weighted.

[00:20:34] Alignment Lab: But functionally, it's, the idea is the same. You're just doing reinforcement learning which lets you take data where you may have bad examples in there, and rather than having to remove them and waste data, you can now make a good example and get more out of it than you would have been by just replacing it.

[00:20:50] Alignment Lab: So it lets you recoup extra performance out of bad data.

[00:20:54] Alex Volkov: Thanks for the explanation. And definitely we've seen at least in my game plays with the bigger model and the DPO version of noose. RMS mixture this feels like the DPO at least behaves a little bit. Actually don't know how to attribute this to the technique or to the datasets, but it's really good.

[00:21:13] Alignment Lab: Yeah, we've noticed if we do a regular supervised fine tune first, like a just normal fine tuning, and then we DPO over that we, the models push just much further than either thing alone, too. I don't know if that's unilaterally true, because we do a fairly, specific kind of model when we make these big releases, but it seems, at least for the case of just general reasoning skill it helps a lot.

[00:21:37] Alex Volkov: Yeah, it's super cool. And I guess the downside of this, not the downside, but the outcome of some of this is that folks now have, folks who want to just use a model and are trying to maybe tune in to Thursday Eye to know which model is good to use, or maybe they're reading the local Lama stuff.

[00:21:53] Alex Volkov: There's now so many choices, including so many configurations. So maybe we should do Like a recap and also a simplification LDJ for like system messages and the prefixes alignment with DPO versus SFT. Just simplify and say, Hey folks, use this. Because right now there's so many, you can choose between quantization methods.

[00:22:11] Alex Volkov: There's at least four or five different ones for you to choose from. And LM studio says in a few of them, use this is recommended, but it says recommended for five, five different ones. There's different quantization providers as well, right? So the bloke is obviously the most familiar one,

[00:22:26] Alex Volkov: there's now a choice between DPO or SFT or DPO plus SFT, and We haven't even begun to talk about merges, which is coming as well. So there's a lot of choice and we need to simplify this for folks. So definitely just to simplify the Hermes models are usually very well behaved and great for role play as well.

[00:22:43] Alex Volkov: Try them out. If you have the room to run Mixtrl for your stuff, Mixtrl is definitely by far the best open source models that we have. Go ahead, Levent.

[00:22:52] Alignment Lab: Yeah, so Mixtrel is, that model is the architecture is very similar to a really old, comparatively old architecture that's been tried and true before. And so because of that, there's a lot of efficiencies that we just haven't integrated into the modern stack, but that will come.

[00:23:09] Alignment Lab: And there's a bunch of new ones that people have been making. And between the new quantization methods that you can do with Mixtro, because since it's sparse MOE, it doesn't actually, need all of its weights as much as it, as as each other. So some of them are, like, less important. It lets you quantize those quite a lot without actually hurting the model's performance very much.

[00:23:27] Alignment Lab: And you can also offload these layers when they're not being used. And then you can do like expert pre caching, where you predict some experts ahead of time, which lets you get faster inference speed. And at the end of the day, if the sort of quick sharp, which is like 2 bit quantization method continues to prove out that it's as performant as it claims, We could end up running Mixtro on 4 gigs of VRAM, like on a laptop.

[00:23:58] Alex Volkov: And

[00:23:59] Nisten Tahiraj: We will.

[00:24:00] Alex Volkov: we will.

[00:24:00] Nisten Tahiraj: it to perform a bit better.

[00:24:02] Alex Volkov: So I guess this takes us to the next, I'll go ahead and stand, and it's going to take us to the next optimization stuff.

[00:24:09] Nisten Tahiraj: We could definitely have it run on on 4 gigs. I've had it a little above 4. However, but the point is to have it run well. The quantization, it still makes it a little bit unfit for anything other than very short conversations. And we'll get it there.

[00:24:30] Alex Volkov: All right. So in this, in, in this

[00:24:32] Nisten Tahiraj: we'll have Mixtro under 4 gigs very soon and it'll be good.

[00:24:37] Nisten Tahiraj: Yes.

[00:24:37] Alex Volkov: And that's a promise. That's a promise.

[00:24:39] LMsys SGlang - increased inference by 5X

[00:24:39] Alex Volkov: So what happens is once you go and put those bigger models on slower hardware, which is possible you then wait painfully a long time for inference to actually happen. But this takes us to the next thing from the folks from LMSys. They released a fast and expressive LLM inference with Radix attention and SG Lang.

[00:24:59] Alex Volkov: So folks from [00:25:00] LMSys, if you guys remember from Models like Vicuna that took Lama and trained it on additional datasets. and NMSIS Arena and all these places like we definitely trust them at least with some of the evaluation stuff. I think, is MMLU also in NMSIS's area? Or at least they test on MMLU. They released a inference optimization kind of collection of techniques.

[00:25:24] Alex Volkov: I don't think it's one specific technique because there's like Radix attention. Yeah, go ahead.

[00:25:28] Alignment Lab: It's where all this was going in the first place between all these sort of different prompting programming frameworks and inference engines. What they've done is they built out the back end with the end goal of having an extremely controllable, steerable compiling system for programming outputs from a, from like an AI in the way, like a Pydantic or in the way that you would typically use sort of structured grammars and sampling techniques.

[00:25:58] Alignment Lab: And way more. It's hard to explain in, in summary in a way that's very easily grokkable without getting too technical but it's a combination of many things that we've been doing individually, which were always gonna be one big thing, they just saw it first and did it first, and now, when you're looking at it, it seems very obvious that this is probably how things should look going forward

[00:26:17] Alex Volkov: so let's actually talk about

[00:26:18] Bluetooth: overall, just a

[00:26:19] Alex Volkov: they have. Yeah, they propose like different co designing the backend runtime and the frontend language, which is like Alain said, a structured domain specific language embedded in Python to control the inference generation process. It's called domain specific language, DSLs.

[00:26:35] Alex Volkov: I, I think many folks have been using some of this. I think DS p Ys as well from is being like mentioned in the same breath. And then this language like executed in the interpreter code or in compiler code. And on the backend they have this radix attention technique for automatic and efficient KV cache reuse.

[00:26:53] Alex Volkov: I don't know if that's like instance like MOE specific or not yet, but definitely. The combination of those two plus the code that they've released shows just incredible results. Like folks, we live in an age, and we've talked about multiple of those techniques. We live in the age where somebody like this can come up and say, Hey here's an example of a set of techniques that if you use them, you get.

[00:27:12] Alex Volkov: 5x improvement on inference. In the same breath that we're saying, Hey, we're going to take Mixtrel and put it in 4GB, and we've seen this obviously with Stable Diffusion, which we're going to mention that runs fully in the browser, we're now seeing releases like this from a very reputable place. A collection of techniques that have been used to some extent by some folks, and now all under one roof, under one like GitHub.

[00:27:35] Alex Volkov: Thing that actually improves the inference by 5x on all of the major evaluations, at least that they've tested, that we always talk about. So 5x on MMLU and HelloSwag is significantly more performant, all these things. Quite impressive. One thing that I would definitely want to shout out is that the maintainer of Lava the LMM, the kind of the visual Lama, is definitely also replied and said that the execution of Lama is actually, of Lava, is actually written in the report itself.

[00:28:07] Alex Volkov: And it improves lava execution by 5x as well. And by execution, I mean like inference speed, basically. So without going like too much into Radix attention, because honestly, it's way too heavy for the space. It's quite incredible that we get, do we get stuff like this from like places like LMCS, specifically in the area of running smaller models, sorry, running bigger models with smaller hardware.

[00:28:33] Alex Volkov: Go ahead, Nissan.

[00:28:36] Nisten Tahiraj: I'll say something. So it does automate a lot of the tricks that people have been pulling, and it works great for large amounts of smaller prompts. Once you go to longer prompts, the benefit is not that much compared to VLLM. I think it felt like five or ten percent faster when it came to VLLM. So again, I haven't taken a very deep dive into it.

[00:29:01] Nisten Tahiraj: Just want to just make people aware that it's fantastic for smaller prompts and stuff. But for longer ones, you don't necessarily need to switch your whole stack to it. VLLM still works fine. Yeah, I think for if you're doing like what you would normally be doing with VLLM, which is like processing like large amounts of data or serving for just general purposes.

[00:29:24] Nisten Tahiraj: Probably, there's no need to switch your stack. I think for, specifically what it feels optimized for is Asian frameworks, in which you have many models communicating short strings back to each other. One model wearing many hats. And the optimizations just while we're on the topic, is crazy right now.

[00:29:43] Nisten Tahiraj: There's still three papers with major inference optimizations for MixedRole alone, as well as for VLLM, and that seem to compose everything pretty well. Having an alternative to VLM that's similarly. Performance is huge because VLM is a big bottleneck on a lot of stacks because of the way that it handles attention off on the CPU.

[00:30:00] Nisten Tahiraj: It feels a lot like when llama CPP got like offloading the same week that speculative decoding came out with hugging face transformers and. Everything just got a hundred times faster, like a half a year ago or so.

[00:30:12] Alex Volkov: Yeah, I would also it definitely felt like that day when LMS released the SG Lang optimization that we just now talking about I don't have a link for this, but also LES released from IST Austria. Released Marlin, which is a 4 bit, I think the way I know it's cool is that, Tim Dittmers from QLOR retweeted this and said this is a huge step forward.

[00:30:33] Alex Volkov: And Tim Dittmers is the guy who in KUDO mode, the codes, KUDO kernels, within like a night or something, planning for 3 months and then finishing. So I know that Tim Dittmers, when he says something is a huge deal, he probably Probably knows what's up. So Marlin released the same day that like the SGLang released and it's a linear kernel for LLM entrants with near ideal.

[00:30:53] Alex Volkov: 4x speedup up to batch sizes of 16 to 32 tokens. And they came out pretty much the same day yesterday on January 17th. So I'm going to add this in the show notes. So Marlin is also like an exciting optimization. And Nostia, I fully agree with you where we see these breakthroughs or collections of method that suddenly are finally collected in the same way.

[00:31:11] Alex Volkov: A bunch of papers that haven't, released code as well or haven't played with different things. And it's very exciting to see them Keep coming out, we're only at the beginning of this year. And I think to the second point that you just mentioned, with agent frameworks Specifically, RAG, Retrieval Augmented Generation this benefit is significant like you said, because the short strings back and forth, these agents communicate with each other.

[00:31:34] Alex Volkov: Last week we've talked with one such author from Cru AI, Cru specifically is an orchestration of different agents that do different tasks and coordinate and talk to each other and improving inference there. Many of them run on GPT 4 and I haven't fully gotten into how to do this yet, but SGLang also say that they're like LLM programming can actually work with various backends.

[00:31:55] Alex Volkov: So OpenAI as well and Tropic and Gemini and local models. That's very interesting if they actually improve OpenAI inference in Python. But DSPY RAG, so RAG on DSPYs from Omar Khattab is definitely mentioned in the SGLANG report. I know I'm throwing like a lot of a lot of acronyms at you guys.

[00:32:14] Alex Volkov: So SGLANG is the stuff we talk about as the That's the new language from LMCS org that speeds up some stuff. DSPY I haven't talked about yet, so we'll cover but one of the tasks on, on, on DSPY's RAG, so retrieval is mentioned that it gets like a significant boost. Like Nissen and Austin said, not necessarily for longer context prompts.

[00:32:35] Alex Volkov: 30, 000 tokens for summarization, maybe this technique that caches a bunch of. Stuff between calls is not going to be super helpful, but for fast execution of multiple things is definitely significant 5x. And like I think Lyman said, it's only the beginning of optimization cycles that we see, and it's quite exciting to to see them come out.

[00:32:56] Alex Volkov: I think we've covered two optimization techniques, SGLang, and then Marlin as well. I'll put a link to the show notes as well.

[00:33:03] NeuralMagic, compressing models with sparcification

[00:33:03] Alex Volkov: And I think now it's time to move to Yeah, one, one, one thing that we're going to chat about is neuromagic and I definitely focus on stage. Feel free to talk about neuromagic because I saw [00:33:20] somebody told me it's cool, but I have no idea how to even simplify this.

[00:33:23] Alex Volkov: So if you want us and you want to take a lead on this one, definitely feel free.

[00:33:28] Alignment Lab: Okay Neural Magic. This is actually the first conversation I think that me and LDJ both geeked out really hard on we were talking, because we were both the only people the other person knew who even knew about this company. Neuromagic has been making miracles in the corner for years.

[00:33:44] Alignment Lab: I first got interested in them because they had made a BERT model that was initially, it was nearly like I think a gig on your computer to run and, it spoke English perfectly well and all this other stuff. And they had compressed it to the point that the full model completely On your computer was like 15 megabytes and it, and what blew my mind was like, how does that even know English?

[00:34:06] Alignment Lab: And it's it was at like 96 percent the original accuracy, despite all of that. They specialize in these like optimization and compression techniques. And so what they do typically is they have a stack, which they wrote a paper about a while ago, which I'll post in the comments here.

[00:34:22] Alignment Lab: It's called Overt Surgeon, which is basically a process in which they have a teacher model. In a student model, in the student model they use distillation in the the more traditional sense than I think it's more commonly used now, where you're just training on a model's output, and they use the actual logits during they basically load both models in during the training run, and train the smaller model to behave like the larger model, and while they're doing that, they're also pruning it, which is, Essentially, you reduce the weights that are not getting used during training to zero, which lets your computer not have to calculate them, so it moves much faster.

[00:34:58] Alignment Lab: And then they also quantize, which is where you reduce the accuracy. Basically, without getting too technical, you're literally summarizing the parameters of the model, such that it's literally a smaller file. And they do this all at once, which takes the larger model, And compresses it into the student model that's starting out smaller, and then they're quantizing the student model and pruning it, so it's both running faster and literally getting smaller, and they can, as far as I'm aware, there's nobody who's even coming close as far as being able to compress a model so much and recently I think about two months ago we first saw that they're integrating transformers with Sparsify Alpha, which is now just out and it's called Sparsify on the GitHub.

[00:35:43] Alignment Lab: Totally check it out. You can make a tiny llama and do all that stuff to it and make it microscopic. It's amazing. And

[00:35:49] Alex Volkov: here, Austin, just real quick. So we've been talking about quantization for folks who are like not following the space look super closely. Let's say there's different quantization techniques in, and some of them create like small files, but the performance or like the accuracy, is getting lowered.

[00:36:03] Alex Volkov: How is Sparsification different from quantization, at least on the basic level. Are they compatible? Will they be used could you use both of them on the same file? What is this thing, sparsification?

[00:36:15] Alignment Lab: so in reality, probably if it were like more accessible of a tool, we would all likely just be doing both every single training run. But since there's always new quantization techniques, it doesn't make sense to. But with sparsification, the specific difference is rather than taking the same model and reducing the accuracy of its, the calculations, but making it smaller, the model's staying the same size physically on your drive, but you're reducing the weights that aren't getting used to to a zero value.

[00:36:50] Alignment Lab: And what that does is just means your, your GPU just has to do less calculations for the model to do inference and it makes it just much faster.

[00:36:59] Alex Volkov: All

[00:36:59] Nisten Tahiraj: Also, we for the next Baklava version, Neural Magic did make a A clip model for us. So shout out to them. They were able to cut down the size by from about four times smaller.

[00:37:14] Nisten Tahiraj: So we'll we'll have that out soon. And yeah, also for anybody else that. wants to learn about sparsity, just look up Nir Shavit on on YouTube. N I R S H A V I T. He's the OG MIT professor that pioneered sparsity and has a lot of videos out, and Neuromagic is his company. And yeah, it's looking really promising in the future because they can optimize at a deep level for CPU inference.

[00:37:45] Nisten Tahiraj: And it's not necessarily just quantization, it's also They are reducing the amount of unused weights. So yeah, expect to see a lot more stuff about sparsity from the GPU poor side of the spectrum, , because that's where the benefits are yet to be read.

[00:38:02] Nisten Tahiraj: Anyway, shout out to Neural magic as well.

[00:38:04] Alex Volkov: shout out to Neer Shovit and Neural Magic, it looks cool, and they just got into sparsifying fine tuned models as well, I think they sparsified some new models, and I don't know if they got to open chat yet, but I think some folks are waiting for PHY sparsification, definitely. The area of smaller models running on smaller hardware is advancing super, super fast.

[00:38:26] Star Coder from Stability AI - 3B coding model bearing CodeLLama

[00:38:26] Alex Volkov: Let's move on, folks, because we've been in the open source area for quite a while, and then we also need to get to our to the end of our conversations here and start doing deep dives. So StarCoder was released from Stability. A brief review here is a 3 billion parameter language model.

[00:38:41] Alex Volkov: From Stability AI it does code completion and obviously it runs offline cause it's a small model and you can run it. They claim it can run on MacBook Airs as well. And they say something like without GPU. Interesting. Accurate completion across 18 languages at level comparable to models twice their size.

[00:38:57] Alex Volkov: This is a Code Llama. Interesting comparison to Code Llama at this point, because we've seen a bunch of other models already beat, I think, Code Llama on different metrics. But people still compare themselves to the big dog. And it's very interesting. They use the multi stage process, pre training in natural language.

[00:39:15] Alex Volkov: fine tuning on code datasets to improve programming language performance. And it supports fill in the middle and expanded contact sizes compared to previous versions of stable coder. And I think, oh yeah the stable diffusion now has like a commercial membership plan because everybody's thinking about, okay how was.

[00:39:33] Alex Volkov: Table going to make money. So they have this membership where you can use their models. So it's not like fully open source. I think you can use this models commercially if you participate in this membership, otherwise you can use them for research. So stable quarter, check it out. I think it's new on, on hug and face.

[00:39:48] Alex Volkov: I think from today I believe,

[00:39:50] Discussion on Neural Beagle 7B & Model merging

[00:39:50] Alex Volkov: And I think the last thing that I want to chat about in open source just briefly is Neural Beagle 7B from Maxim who's in the audience and is going to come up hopefully in the interview in a few.

[00:39:59] Alex Volkov: Minutes, I want to say maybe 20 minutes, Maxim. Neural Beagle back when I added this to my notes, was the top performing 7 billion parameter fine tune in, in, in open source LLM leaderboard. It's no longer the top performing, it was definitely number 4, at least.

[00:40:14] Alex Volkov: And it's a merge plus a DPO, that's what I saw from Maxim, a merge of Actually interesting what it's a merge of, so let's go into the model card and check this out.

[00:40:24] Alex Volkov: But Maxim looks like have a bunch of models and Neural Beagle, the, this Neural Beagle 14, 7 billion parameters has an average of 60 on the, all the scores, 46 on AGI eval. And yeah, it's one of the top performing models and it's a merge of different things. And it already has a demo space that I'll link in the show notes as well.

[00:40:43] Insights on Lazy Merge Kit

[00:40:43] Alex Volkov: Yeah, it uses Lazy Merge Kit, which is a collab that Maxim also we're going to chat about and figure out what this means, what this merging thing means but definitely, I think that this model triggered one of the Nathan's in AI that says, Hey, I wanted to ignore this merge business for a while, but I guess I can't anymore because, merges is not to be ignored at this point.

[00:41:04] Alex Volkov: And this is a merge of the Wunna And distilled Markoro. Slurp. So which is also a merge. So if you guys hear me and you're like confused, like what are all these things mean? Hopefully we'll be able to clarify this one. Maxim. Maxim also had a tweet where there's now a collab where you can take a model like this and basically map out the genealogy of these models.

[00:41:25] Alex Volkov: What is based on what? And it's quite cool to see. And what else should I say about this model? I think that's pretty much it. It's very performative. I actually haven't had the chance to use this, but it's right up there and it's a merge model. There is, there's the [00:41:40] checkbox, like we said, in the open LLM leaderboards.

[00:41:42] Alex Volkov: If you don't want for some reason to see the merge models and we'll see like more trained models, you will uncheck that. But definitely the merge models are competing for the top of the LLM leaderboards right now. Haven't seen a lot of them on the LMCs arena, so it's going to be interesting to see how they treat the merge models.

[00:42:02] Alex Volkov: And I think that's most on open source, and we've given this corner almost 40 minutes, so I think it's time to move on a little bit here, folks. So I'll, yeah, I don't have breaking news here, so I'll just do this, a small transition so I can take a breath, haha.

[00:42:17] Sounds: Namaskaram to all of

[00:42:22] Deep mind to Alpha Geometry

[00:42:22] Alex Volkov: LMs and APIs, and I think the biggest player in this whole, Aparigraha, Niyama, Shaucha, Satya, Ashtanga, Yama, Ashtanga, Niyama Ashtanga, Ashtanga, Ashtanga, Ashtanga, Ashtanga, Ashtanga, Ashtanga, Ashtanga, Ashtanga, Ashtanga, Ashtanga, Ashtanga, Ashtanga, Ashtanga, Ashtanga, Ashtanga, Ashtanga, Ashtanga, Ashtanga, Ashtanga, Ashtanga, Ashtanga, is deep mind, deep mind released, A Nature article, which they always do, they always publish in Nature, this time the link to Nature article didn't really work but hopefully they fix it by now, and they released Alpha Geometry, so they released like a bunch of stuff, Alpha Fold, if you remember, Alpha Go Alpha Zero, they had a model that, that self trains to play anything, not only chess, or, not only Go, and now they've released Alpha Geometry, that solves geometry, almost a gold medal Level at the at the Olympiad level, so they have this this how should I say, this nice chart that says the previous state of the art on this Olympia Gold Medallist Standard gotten to ten problem solved there's like time limits. I'm not sure what the time limits are actually are. I don't have it in my notes. But you have to solve these like very like difficult geometry levels. Folks compete for the gold medals in this Olympiad. And alpha geometry now comes very close to the gold medalist standard.

[00:43:29] Alex Volkov: So the gold medalist is answers 25.9 problems solved, and alpha geometry now answers 25, and they claim that the previous state of the art answered 10, just 10. So they more than doubled and they're getting close to the Olympiad. I think I saw like a tweet from Nat Friedman or somebody. That says they would offer a 1, 000, 000 prize for somebody who solves the Geometry Olympiad at the Golden Medal, and now we're getting there.

[00:43:53] Alex Volkov: They use the newer symbolic approach and they combine all of them with a symbolic deduction engine to leverage the strength of both. Which some folks compare to thinking fast and slow, where you have system 1, system 2 thinking, or at least the outline system 1, system 2 thinking.

[00:44:09] Alex Volkov: In this case, this does actually help. They have the neuro symbolic approach. I think they use this, the neuro symbolic approach. I don't think I've seen this before. And I think the most interesting parts It was trained on over a hundred million synthetic geometry examples generated from one billion random diagrams.

[00:44:27] Alex Volkov: Completely, solely synthetic geometry examples. This whole data set for training of this model that beats Humans at Geometry, which was previously very difficult, is fully synthetic. And I think that's super cool. We only began this year, but definitely this is going to be the year where full synthetic datasets are going to rule.

[00:44:49] Alex Volkov: And Yeah. Opinions, folks here on stage. Have you read about this? What's interesting to you? I would love to hear folks kind of chime in on this as well, because I think it's like super cool and kudos for them to releasing this. Also, I saw somebody said, I think Bindu said that they released this open source, but I haven't seen anything.

[00:45:06] Alex Volkov: Definitely Luigi Go and then Nistan.

[00:45:09] Luigi Daniele: Yeah it's funny that you brought up Nat Friedman having that bet up. Because I remember that too, and now I'm thinking, I wonder if he'd be willing to give up like the million dollars or whatever the money is to DeepMind. Ha

[00:45:20] Luigi Daniele: was done by Google DeepMind, so that'd be funny.

[00:45:25] Nisten Tahiraj: How has Google not discovered AGI yet and fallen so behind?

[00:45:30] Nisten Tahiraj: This almost feels like an internal illness or something. Something's going on. Because yeah.

[00:45:40] Alignment Lab: I don't think that Google needs to compete is the thing. I just don't think they're incentivized to release anything into the space because they don't have to. There's really not anything here except money to lose for them.

[00:45:51] Alignment Lab: They already have all the data and stuff. Yeah, and back to the geometry problems, I can't wait to test this, if they release it, as to how it does when given really random, very long numbers. If it still solves the problem, then that, that will be extremely impressive. And yeah, I've done those Math Olympias with geometry questions and they're not easy at all.

[00:46:18] Alignment Lab: You have to picture stuff in 3D. 4D and whatever in your head. They're very tricky problems. So yeah this is pretty huge. That's all. Yeah.

[00:46:26] Alex Volkov: Quite, quite huge and kudos on them. Umesh, I think you actually found the source, right? I just

[00:46:32] Umesh Rajiani: Yeah so there is GitHub repo on Google DeepMind. So if you go to Google DeepMind on GitHub and then alpha geometry, you can find the code repo for that. So Nistan, if you want to test it out, it's there for you. So I'm taking your

[00:46:47] Alex Volkov: hark on this just like for a little bit. Did Google release code for us finally? Did Google like open source something? Welcome back, Google.

[00:46:54] Umesh Rajiani: yeah, so this is like first release kind of thing, coming out of Google. So it's going to be, yeah, it is quite quite interesting.

[00:47:01] Alex Volkov: Definitely moves us towards like more generalist

[00:47:04] Bluetooth: I'll have it up in a sec.

[00:47:05] Alex Volkov: Yeah, listen, please put this and we'll add this to the show notes as well. Definitely the question, how have they not solved AGI yet? Solving math at the Olympiad level seems like moving us forward, definitely. This neuro symbolic approach where they combine language models with a symbolic deduction engine, which I have no idea what symbolic deduction means in this case.

[00:47:24] Alex Volkov: But leveraging strength of both, this seems like going towards the right path. We've seen, I think Similar things with vision as well, where you combine kind of vision heads into one model they can understand. I don't think this model was multi modal at all. Doesn't look like, but maybe I'm wrong here.

[00:47:42] Alex Volkov: And I think Yeah, the solutions for this thing is verifiable by machines. I saw this one tweet that will go down in history. Somebody said, computers has always been good for calculations. So I don't understand the big deal there, here. And I think I think it's really funny to like, keep this tweet behind the scenes.

[00:48:04] Alex Volkov: Alright, so shout out to DeepMind for this fairly incredible release. Hopefully some of the techniques they used will be then used by folks in other areas as well to get us AIs that are significantly better at the geometry and different things. Oh yeah, Umesh, just before, before we continue, you want to talk about this NeuroSymbolic thing? Cause we've talked about this. I think Daniel Jeffries talked about this last time we've talked about Rabbit.

[00:48:27] Alex Volkov: If you guys remember, this was at the end of the last space and we've talked about Rabbit LAM, Large Action Model. And Umesh, you just mentioned something that they also use NeuroSymbolic to an extent, right?

[00:48:39] Umesh Rajiani: Yeah, so the LAM Large Action Model, basically based on Neuro Symbolic Programming for when, specifically when they are talking about training the model from the actions that you're going to perform is basically they are encoding Neuro Symbolic Programming to train the model or capture the actions, basically.

[00:48:55] Umesh Rajiani: So that's what we're trying to do. Namaste. In theory, they are saying we have to see what comes out in practice.

[00:48:59] Alex Volkov: Yeah, and based at least on their examples, it looks like very compelling and potentially like being able to solve a bunch of stuff or like to remember based on your actions. So neuro symbolic not a new approach. I apologize. I will edit this. Definitely Rabbit said this, you're right and hopefully we're going to get to see this lamb thing.

[00:49:19] Alex Volkov: So back to OpenAI as elections are happening right now and everybody was fearing like, Hey, what's going to happen with deepfakes, et cetera. OpenAI released their guidelines toward election, as they prepare for elections, obviously, they're aware that they're happening. And I think the few interesting things there that they're taking steps to prevent their tools like Dalai and Shajipati from being abused.

[00:49:38] Alex Volkov: I don't know. We have open source, so I don't know if folks will go to the GPT 4 to generate let's say, propaganda. But DALI, for example, starts to integrate some cryptography to their images, which is very interesting. Cryptography solutions, which, again, In case you download the actual file and then send it, could be a thing.

[00:49:58] Alex Volkov: But I don't know if [00:50:00] somebody takes a screenshot of a Dalit generation, if that will apply at all. There are definitely like usage policies for like stuff like Chajapati enforcing limits on political campaigning and impersonating candidates and discouraging voting. And then they want to run ahead of what happened with Facebook and Cambridge Analytica, and like all these things they want to get ahead of us which, it makes sense.

[00:50:18] Alex Volkov: So the technology they use to detect images were generated by DALI I haven't seen any release on them that says, Hey, we'll build a tool for you to actually identify if those are generated images or not. It's going to be interesting because like with LLM writing all of these tools that you use to like dump AI text in there, they're all can be obscured with another LLM.

[00:50:38] Alex Volkov: I don't know if it's a futile attempt, but definitely a worthwhile one. And at least in the basic UI, I think blocking some attempts of destabilizing democracy, I think it's a good idea. And I think that's mostly it. I think there's one different mention that somehow silently they removed where the terms and conditions thing where their outputs is not to be used for war or weapon developing.

[00:51:04] Alex Volkov: And I think they removed that and I think they're also like signed something with Department of Defense, but I think that's all for OpenAI.

[00:51:11] Microsoft announces CoPilot pro

[00:51:11] Alex Volkov: And then I wanted to mention about Microsoft and Umesh, definitely feel free to chime in here as well, because the underlines the benefit for open source, but quickly Microsoft announced Copilot, we've talked about Copilot, the kind of previously BingChat, Copilot everywhere.

[00:51:25] Alex Volkov: So they've announced like different paid plans for Copilot Pro, 20 bucks a month premium, and then it does. Enhanced image creation, where we don't even get We don't even get in, in, in Dali like by default, and it's now generally available for small businesses with no user minimum. So if you guys remember, we've talked about Copilot before when Microsoft announced it for large enterprises it integrates into Microsoft 365 everywhere.

[00:51:49] Alex Volkov: And now the Copilots are also open for smaller businesses. And soon there's going to be like this Copilot Studio to build custom GPTs. Very cool for small businesses. We'll see how much actually folks will use this. And there's also some Microsoft Saga that they've changed some stuff in their pipeline.

[00:52:04] Corporate Drama - Microsoft Azure changing moderation flows and breaking products

[00:52:04] Alex Volkov: So Umesh, you mentioned this in the beginning. We'd love to hear from you what's been going on as you guys are big Azure users through Microsoft.

[00:52:11] Umesh Rajiani: Ooh happened

[00:52:15] Umesh Rajiani: day before yesterday. Actually, we got a call from one of our clients, which is one of the, one of a very big financial institution. And we have a deterministic pipeline, which was constructed using Azure studio, in fact. And we work together with very core Microsoft team actually to make sure that it is right.

[00:52:36] Umesh Rajiani: properly deterministic because there are some legal implications and everything. And and then the tool started failing and because we had some function calling, which would actually go into the knowledge base of the company. And that function calling was was getting extracted, getting triggered using what you call the deterministic intent from user's prompts, basically.

[00:52:56] Umesh Rajiani: And and that entire function calling was failing. Now, we carried out all types of work and everything it was very frantic because it was a front end tool and it started having some impact. And it was, remember, it was working for six months. So it's it worked without any problems for six months and suddenly it just stops working.

[00:53:14] Umesh Rajiani: And the reason was that there were two words that were in the definition of The tool, so that definition of tool was actually informing the pipeline what the tool is all about and that's how the tool was getting invoked and those two words basically were getting flagged into The OpenAI API.

[00:53:32] Umesh Rajiani: So we're basically Azure OpenAI API, not OpenAI's direct API. We are routing it through Azure and it's a separate separate instance of of GPT 4 and there are separate guidelines. They mimic some of the guidelines that are there in OpenAI, but Microsoft has its own guidelines and they change the guidelines without actually informing the clients. That basically triggered. Yeah. So we literally we literally had legal people and literally had fight. It was an open fight, literally, with Microsoft. If you were in that room, you would have you would have seen. It was really bad. And and then eventually there were talks about cases and stuff like that.

[00:54:08] Umesh Rajiani: And eventually, basically actually this company is actually modifying the contract with Microsoft. Where Microsoft will be liable to inform the company before they change any kind of guidelines. And you know what happened after that is, is the beauty because in the beginning of my startup, like beginning of the year, we implemented some solutions where we have a direct contract with Microsoft And we have implemented solution on the backing of those contracts.

[00:54:34] Umesh Rajiani: So in last two days, actually, I've gone back to those clients with whom we have implemented solutions so that they have a direct contract with Microsoft, because we don't want to be a party involved as far as the SLAs are concerned, because this is very dangerous if you're developing solutions for.

[00:54:49] Umesh Rajiani: For people and and if the core solution through which you are driving the entire application pipeline is getting changed without any kind of data contract backing, so to say. Yeah, this is a great learning for us and I've been always a proponent of. Open source solutions, and I think this has given one more kind of a booster to us because now we can go back to the new clients and say, Hey, guys if possible, if we give you the kind of solution that you're looking for, then let's go to open source solution rather than going for a closed source solution.

[00:55:20] Umesh Rajiani: So

[00:55:20] Alex Volkov: And this is like a huge, yeah, a huge like reason why, right? Getting, it's very interesting, like in this area we mentioned, definitely feel free to chime in on this a little bit more. The outputs of LLMs are usually non deterministic. And so this has to be built into understanding when you build tools on top of this.

[00:55:36] Alex Volkov: But this is not that. This is them adding not like a different model or something like a different that you can switch. They're adding something in between or some like policy thing without announcing this to the customers. And supposedly if you go to Azure instead of OpenAI, for example, you would go for the most stability as underlined by the fact that when OpenAI had downtime after Dev Day, Microsoft Azure, GPT for like endpoints, they were all fine.

[00:56:02] Alex Volkov: They were all green, right? So supposedly you would go for the stability and kind of the kind of the corporate backing. There's also like different ISO things and HIPAA compliances, like all these things that Microsoft Azure like proposes on top of OpenAI. But here we have a case where like underlines how.

[00:56:17] Alex Volkov: How important open models that you host yourself are, even if you host them, like maybe on Azure as well, because then nobody can change the moderation endpoints for you and suddenly decide that a few words in your prompt are not, to be used anymore.

[00:56:32] Umesh Rajiani: Yeah, but Alex this had nothing to do with the prompt, actually. It was actually the definition of the function that was there. And the key is like I would draw an analogy to what you call the data contracts. I don't know how many people are aware of data contracts, but when you have.

[00:56:47] Umesh Rajiani: Ownership of data within a very large organization, let's say 20, 000, 30, 000 people up you have data contracts where the data originates from a particular source and some other division is using that data. So you have a contract between those two and that data contract details the data definitions which are there and the contract sign, the signatory of the contract is responsible to ensure that if they change any kind of data structure or data definition.

[00:57:14] Umesh Rajiani: Then the receiver of the data or the client of the data contract is supposed to be informed. That is a part of your data contract. And that's how these large organizations function. And what we need is that kind of a framework where you have a data contract with the service provider.

[00:57:30] Umesh Rajiani: So even if you're going with an open source solution, and if your open source solution is hosted by someone, Then you need to have that kind of a contract in place. So it's not just that open source solution is a solution for everything. It's about the person who is providing the inference. So if you are controlling the inference, then you are secure because you are not going to make the changes without, understanding the repercussions of those changes.

[00:57:52] Umesh Rajiani: But if you are let's say hosting open source model on Amazon Bedrock, for example. And if they have a system prompt that lies in front of your prompt that goes to the the model, then you have to make sure that Amazon adheres to their responsibility in terms of giving you the required inference.

[00:58:12] Alex Volkov: Absolutely. Thanks for giving us the, first of all, like it's, it sucks that it happens and hopefully now Microsoft, like you said, they [00:58:20] changed their their approach here. Aniston, go ahead if you want to follow up.

[00:58:26] Nisten Tahiraj: Yeah. So for us, this has been amazing. I already have clients lining up to pay for the Baclav API. So I'll just say that first before it's even out. However It is extremely unfortunate for those that built, let's say, apps in a hospital or for a therapist because now those kinds of applications just had a moderation engine added, and they added apparently for their safety, and now whoever was relying on these applications, now they just stop working.

[00:59:02] Nisten Tahiraj: Out of nowhere, and this is an extremely immature thing to do this is something you expect from like a random startup with kids, not from freaking Microsoft, and it is pretty worrisome that this safety hysteria has gotten to the point where You're literally just breaking medical applications in production without modifying, without notifying people.

[00:59:27] Nisten Tahiraj: That's just, you lost people's trust now. You're not going to gain that back for a couple of years. And I hope they realize and don't do this again. Don't break production and make changes. To people in Prad that are relying on this for like SOC 2 or as in the case of UMass that have signed service level agreements.

[00:59:49] Nisten Tahiraj: Because now those people lose all their money if they don't, if they don't provide the service. And it's really bad. That's all I have to say. It's pretty bad.

[00:59:58] Alex Volkov: Yep. Very bad look from Microsoft. Even I think I remember like not entirely OpenAI, when they talked about Sunsetting some models and there was like a developer outcry that said, Hey, like we use those, we haven't had time to change how we work with different prompts, et cetera, for the newer models.

[01:00:15] Alex Volkov: And so OpenAI actually went back and said, Hey, we heard you and we're going to release we're going to deprecate deprecation is going to be pre announced in advance. It's going to be way longer Omesh let's yeah, let's go ahead.

[01:00:27] Umesh Rajiani: Yeah, very quickly I think you have raised a very valid point, Alex, that I think all the models that they actually put out of service, they actually should make them open source. I think that's the best solution.

[01:00:39] Alex Volkov: Nah, I wish this was the case. We're still waiting for potentially like open source GPT 2. 5. We haven't seen any open sources from OpenAI for a while. Besides like some GitHub code, I agree with you. There should be a way for folks to keep doing this, the same exact thing they're doing.

[01:00:52] Alex Volkov: I don't know, in my example, I use Whisper, no matter like what their API really says, what it's like, what they deem inappropriate to translate, the Whisper that I use is hosted and it will be the same version until I decide basically and test everything. All right, folks, we're moving forward, I think, just quickly.

[01:01:10] Alex Volkov: There's not a lot of stuff in the vision area. I will mention briefly we've been here for more than an hour. So I'll definitely like recap the space a little bit. If you're joining, let me just play the music and then I'll recap and then we'll get into the interview. So with with Hour 15, you're listening to Thursday Eye. Those of you who just joined us, welcome. If you haven't been here before, this is a weekly space all about AI, open source, as our friend of the pod, Jan, just tweeted out, everybody and everybody in LLM space and open source is in here, and very great to see.

[01:01:45] Alex Volkov: We've covered open source stuff, we've covered corporate drama right now, and then we're moving on to an interview. Thank you.

[01:01:53] This weeks Buzz from Weights & Biases

[01:01:53] Alex Volkov: And then we're going to talk about AI, art, and diffusion, if we're going to have time at the end of this. There's a brief mention that I want to say, but basically, let me just reintroduce myself.

[01:02:01] Alex Volkov: My name is Alex Volkov. I'm the AI Evangelist with Weights Biases. And we have a small segment here for Weights Biases that I want to choose to just bring. I just came back a few days ago from San Francisco Hackathon, the WeHub sponsor with TogetherAI and LengChain. It was a pretty cool hackathon.

[01:02:20] Alex Volkov: It was very brief, like a few hours with AGI House. But basically the theme was RAG versus FineTune. And I think the theme was versus, and I just promised I'll bring some learnings from this. So there's a bunch of projects that did different things. They used Together's endpoint for FineTune.

[01:02:35] Alex Volkov: So if you can FineTune. On your models and your GPUs that's one thing, but for many of the AI engineers, that's very difficult to do. So there's a bunch of startups together as one that they offer like very simple fine tuning. I'll definitely add my my Link to the show notes, to the presentation I gave there, which talks about how easy it is to fine tune using their endpoints.

[01:02:56] Alex Volkov: And the folks that won the hackathon, some folks won different prizes, basically used both Reg and FineTune. And it looks like also there was a paper released afterwards from some folks trying to identify what's better. Is it just doing RAG on top of Hindu models or just doing basic RAG?

[01:03:13] Alex Volkov: And I don't think we have a clear answer yet. Definitely this hackathon wasn't the end all of all answers. However it does look like doing RAG on top of a fine tuned model improves just a little bit on top of just basic RAG. And it looks like RAG wins on top of just a regular fine tuned for information retrieval tasks as well.

[01:03:30] Alex Volkov: So definitely do not skip RAG. And I think from the open source perspective, which we love here on Thursday Eye getting more RAG kind of Related models is definitely going to happen. I think we saw some from John Durbin. I think I saw Technium. You mentioned something about like function calling.

[01:03:47] Alex Volkov: Datasets are coming to, to, from news as well. So definitely that area is still to be explored. But it looks like the combination of FineTune and RAG wins just a little bit on top of just basic RAG. I think this is the outcome of that hackathon. Next week in this corner of 1B is going to be an interview with Jason.

[01:04:06] Alex Volkov: Stay tuned for that.

[01:04:07] BREAKING NEWS - Meta announces LLama 3 is training and will be pen source

[01:04:07] Alex Volkov: I think now we have, and many folks have been DMing me because right now we have breaking news. Breaking news actually happening right now.

[01:04:17] Sounds: AI breaking news coming at you only on Thursday ice.

[01:04:27] Alex Volkov: You know I love to use this sound. You know I love to use this sound, everyone. We have some updates from BigZuck. you guys see this because it's over on threads. And I don't know how many of us are on threads. I definitely know that I barely go there. We have some updates from BigZuck specifically around Training Lama 3.

[01:04:43] Alex Volkov: There's like key updates about the long term vision. I think the summary there is They have an insane amount of GPUs this year. So like literally he says at the end of this year, we'll have three, around 350, 000 NVIDIA H100s. I'm going to repeat this slowly for the people in the back. 350, 000 NVIDIA H100s and overall 600, 000 H100s or equivalents of compute if you include other GPUs.

[01:05:13] Alex Volkov: You remember those hats that people wear, like GPU poor, GPU rich hats? I think Zack can stack the GPU rich hats, like one on top of the other and it still won't be enough because 600, 000 H100 compute is just like ridiculous. And he talks about. Two major parts of their vision, AI and Metaverse are connected.

[01:05:32] Alex Volkov: I love how like it was Metaverse, and then suddenly AI started being a thing and now oh, they're connected. I definitely am expecting AI to exist in some form of virtual slash world, et cetera. But definitely he talks about Lama 3. And Lama 3 is coming. They're currently training it per BigZakh.

[01:05:48] Alex Volkov: We know that's coming or like we at least expected this, but I think now is like more of a confirmation. And I'm very excited about Lama 3. I will just mention that it's not been a year since Lama 1 yet. So we're in January Lama was released in like around February 12th, 13th or so.

[01:06:06] Alex Volkov: And it's not half, like it hasn't been a year yet. And here we are like training the third model on top of Lama. We've had just an incredible amount of like innovation on top of it. So definitely expecting and we're obviously going to cover this as much as possible. So this is I think most of it.

[01:06:23] Alex Volkov: Oh and this last thing that he added, Zak has added and I think it's Adding to Thursday as well where we have to start talk about hardware is that he says I think lots of people will talk to A. I. s frequently through the day using smart glasses like what we're building with Ray Ban Meta.

[01:06:38] Alex Volkov: And I think we've [01:06:40] talked about their smart glasses that they're like multi modal glasses. They have a camera built in them. You can press a button and actually pass the image into the LLM. They're making improvements in speed as well. I will say just like an additional one thing we've talked how Meta is adding a bunch of AI into every chat and nobody like necessarily used them.

[01:06:58] Alex Volkov: Recently, a friend of mine, maybe because, I'm an AI evangelist, so he felt free to do this in our chats. He just added an AI bot to our chat. Literally, just like my DM with a friend who has no, nothing about AI, like it's not part of his world. He does something else. Recently, he's Hey, let me add this thing.

[01:07:14] Alex Volkov: So Meta is definitely letting folks experiment with AI more than some other places. And he just added in the AI to our chat. It was super cool. So here's an update from Zack BigZack. Allama3 is training and then they have a lot of GPUs. They're like super GPU rich and, hopefully we'll get the benefit.

[01:07:30] Alex Volkov: Go ahead, Nissan. Yeah,

[01:07:36] Nisten Tahiraj: H100s? Yeah, they're going to need that if they're going to have visual stuff from people's glasses. But it's an insane amount. That's all. Yeah, I just ran some quick calculations. I got roughly similar numbers to what Nishtan just said. And if I'm doing my math I'm running just some numbers based off the alleged GPT 4 leaks of the amount of GPU hours that it might take, let's say if they used all those meta GPUs.

[01:08:08] Nisten Tahiraj: It's do a GPT 4 level model. I'm getting numbers it would take less than a week pretty much to train, yeah, this is an insane amount of GPUs for people that, don't have good references for this. Yeah.

[01:08:18] Alex Volkov: I think it's insane enough to maybe open a new category like on top of GPU rich. It's just quite incredible and like hopefully they're committed to the open source of this in Lemma 3. Omesh, you had a comment as well?

[01:08:32] Umesh Rajiani: Yeah, what if Lama 3 is going to be multi modal? Then they will need those GPUs.

[01:08:37] Alex Volkov: I'm really hoping it will. Like they're training the models, like multimodality is something they talked about. It's time. To move towards the LMM world and multimodality, and they will need all those GPUs to crank out. The vision part of this hopefully multimodal in other areas reminder meta has released like bull a bunch of attempts at multimodality in other areas, not only image.

[01:08:59] Alex Volkov: IMU motion units and they've talked about F-F-M-R-I signals they've talked about, like incredible stuff. But definitely modality, other modality like sounds like audio. Live video would be super cool, like I think this year is the year of live video, so not only, hopefully not only vision, and if it's vision, then hopefully it's like a live video.

[01:09:18] Alex Volkov: Alright folks, we're coming up on two hours,

[01:09:20] Alex Volkov: and with that, I think this is the summary of today's Thursday Eye. Thank you everyone for joining. If you haven't subscribed yet, definitely feel free to subscribe at ThursdayEye. News. I appreciate everyone's time and attention here. Thank you so much for the Co hosts and guests for today's pod and shallow with everyone.

[01:09:36] Alex Volkov: And I have to end this on the very happy note of the alchemy thing, because the one thing that came out from the conversation with with Maxim, who merges and Nistan and everything is that a lot of this is alchemy and a lot of this is like trying to see how things work when you combine and not continue to train models, they still perform better.

[01:09:55] Alex Volkov: So I have to end on this very happy tune, which will represent the alchemy that we're all doing. And we love it. Thank you everyone for joining this Thursday. I will see you next week. Cheers. And we'll add this banger to the show notes as well. Bye everyone.



This is a public episode. If you’d like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe
Fri, 19 Jan 2024 00:27:46 GMT
🔥 ThursdAI Sunday special - Deep dives into Crew AI with Joao then a tasty Bagel discussion with Jon Durbin

ThursdAI - Sunday special deep dive, interviews with Joao, and Jon, AI agent Crews and Bagel Merges.

Happy Sunday dear reader,

As you know by now, ThursdAI pod is not a standard interview based podcast, we don't focus on a 1:1 guest/host conversation, but from time to time we do! And this week I was very lucky to have one invited guest and one surprise guest, and I'm very happy to bring you both these conversations today.

Get your Crew together - interview with João Moura, creator of CrewAI

We'll first hear from João Moura, the creator of Crew AI, the latest agent framework. João is a director of AI eng. at Clearbit (acquired by Hubspot recently) and created Crew AI for himself, to automate many of the things he didn't want to keep doing, for example, post more on Linkedin.

Crew has been getting a lot of engagement lately, and we go into the conversation about it with João, it's been trending #1 on Github, and received #2 product of the day when Chris Messina hunted this (to João's complete surprise) on Product Hunt.

CrewAI is built on top of Langchain, and is an agent framework, focusing on Orchestration or role-playing, autonomous agents.

In our chat with João we go into the inspiration, the technical challenges and the success of CrewAI so far, how maintenance for crew is now partly a family effort and what's next for crew

Merges and Bagels - chat with Jon Durbin about Bagel, DPO and merging

The second part of today's pod was a conversation with Jon Durbin, a self described AI tinkerer and software engineer. Jon is a Sr. applied AI researcher at Convai, and is well known in our AI circles as a master finetuner and dataset curator.

This interview was not scheduled, but I'm very happy it happened! If you've been following along with the AI / Finetuning space, Jon's Airoboros dataset and set of models have been often mentioned, and cited, and Jon's latest work on the Bagel models took the lead on HuggingFace open LLM leaderboard

So when I mentioned on X (as I often do) that I'm going to mention this on ThursdAI, Jon came up to the space and we had a great conversation, in which he shared a LOT of deep insights into finetuning, DPO (Direct Preference Optimizations) and merging.

The series of Bagel dataset and models, was inspired by the Everything Everywhere All at Once movie (which is a great movie, watch it if you haven't!) and is alluding to, Jon trying to throw as many datasets together as he could, but not only datasets!

There has been a lot of interest in merging models recently, specifically many folks are using MergeKit to merge models with other models (and often a model with itself) to create larger/better models, without additional training or GPU requirements. This is solely an engineering thing, some call it frankensteining, some frankenmerging.

If you want to learn about Merging, Maxime Labonne (the author of Phixtral) has co-authored a great deep-dive on Huggingface blog, it's a great resource to quickly get up to speed

So given the merging excitement, Jon has set out to create a model that can be an incredible merge base, many models are using different prompt techniques, and Jon has tried to cover as many as possible. Jon also released a few versions of Bagel models, DPO and non DPO, that and we had a brief conversation about why the DPO versions are more factual and better at math, but not great for Role Playing (which is unsurprisingly what many agents are using these models for) or creative writing. The answer is, as always, dataset mix!

I learned a TON from this brief conversation with Jon, and if you're interested in the incredible range of techniques in the Open Source LLM world, DPO and Merging are definitely at the forefront of this space right now, and Jon is just at the cross-roads of them, so definitely worth a listen and I hope to get Jon to say more and learn more in future episodes so stay tuned!

So I'm in San Francisco, again...

As I've mentioned on the previous newsletter, I was invited to step in for a colleauge and fly to SF to help co-host a hack-a-thon with friends from TogetherCompute, Langchain, in AGI house in Hillsborough CA. The Hackathon was under the Finetune VS RAG theme, because, well, we don't know what works better, and for what purpose.

The keynote speaker was Tri Dao, Chief Scientist @ Together and the creator of Flash Attention, who talked about SSM, State space models and Mamba.

Harrison from Langchain gave a talk with a deepdive into 5 techniques for knowledge assistants, starting with basic RAG and going all the way to agents 👏

I also gave a talk, but, I couldn't record a cool gif like this for myself, but thanks to Lizzy I got a pic as well 🙂 Here is the link to my slides if interesting (SLIDES)

More than 150 hackers got together to try and find this out, and it was quite a blast for me to participate and meet many of the folks hacking, hear what they worked on, what worked, what didn't, and how they used WandB, Together and Langchain to achieve some of the incredible results they hacked together in a very short time.

The projects showcased a range of creative applications leveraging RAG, finetuning, and other large language models. Several projects like Magic RAG, CareerNavigator-AI, and CompetitionAI used RAG for document retrieval and knowledge enhancement. Others like rags2pizza and Naturalist DALL-E focused more on finetuning models for specific domains. Some projects compared finetuning and RAG, finding that combining both gives superior performance over using either alone but that result wasn't conclusive.

My vote as a judge (which I did not expect to be) eventually went to the team that built the OptiMUS project, they had generated a systentic dataset, cleaned it up, finetuned a model on it, and showed that they want to optimize AI agents. They used WandB to track their work and I hope they take this project forward and keep making advancements in AI. Congrats for the win Ali and Shayan, hope you enjoy the WandB branded Airpods (even I don't have those) and the Meta Quest, well deserved!

Thank you for tuning in! See you next week!

Full Transcription :

[00:00:00] Alex Volkov: Hi. Welcome back to Thursday. The Sunday special episode. This is Alex Volkov. And I'm recording this in. A gorgeous space. In San Francisco. Where I was. Invited to judge hackathon. And now I'm hanging out with a few friends from cerebral valley. So thank you. Valley folks. For letting me use this place for recording and Today, we have a special episode for you. As If you hear this on Sunday. Today's not a Thursday. We often times have special guests on the pod. Where conversations. Or deeper.

[00:00:45] Alex Volkov: And usually I reserve that slot for a Sunday special release. So this is what you're hearing now. In today's episode, we actually have two conversations. Although I only planned on one. And the first part is the planned part that you hear from Joao Maura. He is a director of AI in Clearbit, and now acquired by HubSpot. And he's also the creator of Crew AI and the Gentek AI framework that can run. By orchestrating.

[00:01:14] Alex Volkov: c

[00:01:15] Alex Volkov: The digital AI agents and have them work together.

[00:01:19] Alex Volkov: And I think you'll hear from, Joao why this peaked interest. For many folks. Specifically. Because as we caught up with. Wow.

[00:01:29] Alex Volkov: Crew AI was trending on GitHub and getting number two on product hunt at the same time. And it's a really cool framework. And I think the underlying. Power of this is that it can use open source, local models. A lot of previous agent attempts used GPT4 For example, and the crew AI can use things like Mistral or Mixtral running in LM studio or Ollama on your Mac, which I think is super cool.

[00:01:55] Alex Volkov: And I think on device AI, plus something like this framework is going to be very, very powerful. It was a great conversation was wow. And surprising to me, the second guest was not planned. However you may have heard from the previous Thursday that the. Bagel series of models from a. Self-proclaimed AI, tinker, John Durbin. Have taken over the leaderboards on hung and face. Including a bunch of mergers and we haven't. Done a deep dive into merges and merge good and Franklin state models.

[00:02:32] Alex Volkov: But if you've been to Thursday for awhile, you probably heard about them. Merging is a technique to take a model or different models. And without any computation, great, bigger or different models using a dissection and some computing. Process of the layers of those models just based on weights without any training or continuing to fine tuning, which is incredibly interesting.

[00:02:58] Alex Volkov: And John goes into this a little bit and he created. Bagel. Based on the inference of what I'll let you hear this at the end. And it's a very fascinating conversation. I took a lot from it and unfortunately we didn't have time for a long, deep dive, but I learned a lot from John and hopefully he'll come from the podcast and we'll be able to deep even dive even deeper and talk with John about. How to create data sets, why DPO is better than PPO and all of these great things. So we had two great guests. And I. Had a blast having them on the bud and I probably should do more of these deep dives.

[00:03:37] Alex Volkov: So please let me know what you think. Don't forget to subscribe to the newsletter or I sent a summary and in the newsletter, you'll find my. Trip report, quote unquote for the hackathon. There was co-sponsor with together, AI. And Lang chain and Harrison was there and I gave a brief talk as well. And the, sorry, I that a bunch of pictures.

[00:03:57] Alex Volkov: So if you're hearing this in your car, check out the newsletter afterwards on Thursday, either.

[00:04:02] Alex Volkov: And with that, I give you our first guests as well. Maura. All right, everyone. Welcome back to ThurdsAI. And we have a great guest today. João Moura from I want to say clear a bit. If I'm not mistaken. Or Joao, could you please introduce yourself and what you do and then we're going to talk about the thing we're here to talk about.

[00:04:36] Joao Moura: A hundred percent. Thank you for having me. First of all, you got my name right, it's hard to pronounce. I go by Joao, make it easier for everyone. I work at Clearbit, yes, but we just got acquired by HubSpot. I'm not sure. I'm Joao from Clearbit, from HubSpot, and from Crew AI. Everything at once.

[00:04:54] Alex Volkov: Awesome.

[00:04:58] Alex Volkov: Eye. I think it's your first time here on stage. Welcome. We've met In San Francisco, at the Ollama open source event, and I think like Teknium was there and a bunch of other folks, Ollama, and I met you and we had like a brief conversation, and you mentioned CREW. AI to me, and it sounds like super, super interesting, and then, this week and the previous week, there was like an explosion of interest in CREW.

[00:05:17] Alex Volkov: AI, so I would love to hear from you how your like last few weeks have been going, definitely the time that we spent like together since then. A lot of stuff happened to Kurei. Could you just, without saying what Kurei is, could you just, like, recap on your experience for the past two weeks?

[00:05:33] Joao Moura: A hundred percent, a hundred percent and first of all, that Oyama event the other day was so good. Had so much, so much fun on it.

[00:05:41] Alex Volkov: It was

[00:05:41] Joao Moura: last couple of weeks have been intense I gotta tell you, kind of like, the thing. Got, like, blown up out of proportion. Like, I have a lot of DMs, a lot of messages, a lot of issues, and not that many requests, I want to say, but but it has been a lot of fun.

[00:05:59] Joao Moura: Kriyai just like, seems to have a lot of interest in from different people. I think this idea of building like AI agents is something that captivate most of the tinkerers out there, like how you can automate your life away. And it seems that have been resonating with a bunch of engineers out there.

[00:06:16] Joao Moura: The last couple of weeks has been very intense in terms of Writing code, like at late nights having like to spare a few hours to insert DMs and help with the Discord community. And actually, I actually ended up recruiting my wife to help me with that. So if you see Bianca on Discord or over GitHub issues, that's my wife helped me out, make sure that I get it all covered.

[00:06:41] Alex Volkov: Definitely shout out Bianca, thanks for helping. And uh, as well so, now trending on GitHub, I think number one , I think somebody submitted this to Product Hunt as well?

[00:06:50] Joao Moura: That was a thing. So I have been working on this and like as an engineer working on an open source project, you don't, you don't think about this project as products necessarily from the get go, but as it starts to get more traction it got the interest of like this one guy that seems to be like a I don't know if it's a big deal or not, but it seems that he hunts a lot of products and product hunt.

[00:07:14] Joao Moura: And for some reason he got like. The super fun thing is that I have been part of like, and I have seen other like product, product launches, and I know how much effort goes into preparing those and to be ready for it and have like a, like social media ready to post a lot of content about it. And I had none of that.

[00:07:36] Joao Moura: I woke up in the morning and there was a message from a VC saying like, Hey, congratulations on your launch. And I was like, What is this guy talking about? I have no clue. It was very interesting because I, I opened like Product Hunt's website and I'm searching like, how do I cancel this? Like, I, I didn't want to launch this, at least not right now.

[00:07:58] Joao Moura: And on Product Hunt's like documentation, they mentioned that you have two options, either, You send them a message like super urgent so that they can pull like the, the brakes on it, or you run with it.

[00:08:13] Joao Moura: And at the end of the day, I was like, I'm just going to run with it. I'm going to see how it goes. And turns out we end up the day as [00:08:20] number two.

[00:08:20] Joao Moura: And that was, that was something else. Thanks.

[00:08:25] Alex Volkov: number one hunter. I think he hunted like most of the products on ProductHunt, so shout out Chris. And definitely, I saw this and what a surprise to wake up to and then get like the product number two. Definitely helped the stats probably. Right. So I think, I think with this excitement, let's talk about like why it's so exciting.

[00:08:43] Alex Volkov: Could you give us like a brief primer on Crew AI? We've talked about agents before. We obviously talk about like auto GPT previously and GPT engineer from, from Anton Musica and like a bunch of other very interesting projects. Could you give us the brief primer on like a crew AI, what it is?

[00:08:57] Alex Volkov: And then we're going to talk about like why you built it and the orchestration stuff.

[00:09:02] Joao Moura: percent Crew I is a very thin framework. It's a Python framework. It's in the process of being converted to TypeScript as well, but it's a Python framework that allows you to build a group of AI agents. You can think about it as if it AutoGem and ChatBath had a child.

[00:09:21] Joao Moura: That's the way that I usually describe it. So you're going to have a group of AI agents that are role playing in order to. perform a complex series of tasks. And you can do all sorts of automations on it and you can plug it to all sorts of different systems out there. I think that's the easiest way to describe it right now.

[00:09:43] Alex Volkov: Awesome. And could you, you briefly mentioned this, GPT, could you talk about like the, the inspiration here? what made you start this as Clearbit was getting acquired and, or, or around this area, at least I think what made you work on this? There's a bunch of other , orchestration platforms out there, the bunch of agents what made you write your own instead of like taking something off the shelf on open source?

[00:10:06] Joao Moura: So turns out that this is a fun story. There was so you're back into my wife again, always propping me up. I love her. She's so great. She was she was telling me, Hey, you have been doing all this amazing work at Clearbit. Because at Clearbit, we have been doing work with LLMs for the past one year.

[00:10:22] Joao Moura: And at a scale that I believe not many have. And she was like, You should be sharing more about this. Like, you're leading these efforts and you're doing all these complex systems at scale. And this could definitely help and benefit other people. So she was telling me that I should do a better job at posting online in things like LinkedIn and Twitter.

[00:10:41] Joao Moura: And Twitter, I, I think like I'm okay with, but LinkedIn was always hard to me. I feel like there is a, there is a harder threshold, like a higher threshold for how well your idea must be before you post it on LinkedIn. So I was considering like how, how I can do better LinkedIn posting. And because I was so excited about AI agents, I was like, can I build like a couple of agents that will actually help me out with this, where I can like shoveling my, like, like my, my draft and rough ideas.

[00:11:11] Joao Moura: And it's going to come up with like some guidance and a better post that I can just edit and post. It turns out that I could and that's, that's how I started QueryAI. I looked into AutoGem and I was not a huge fan on how they, like, one, they didn't have the option to execute tasks sequentially. They also have a lot of assumptions on how this agent should work together.

[00:11:34] Joao Moura: And I think The way that they work together should vary depending on the tasks that you're trying to accomplish. I was not a huge fan of it. Chat dev on the other side, I saw a lot of like good stuff on it, but it just didn't feel like a production system, right? Like it has like a game like UI, something that you would experiment with, but not something that you would deploy in production.

[00:11:56] Joao Moura: So that's, that's how I came up with this idea of like, maybe I should do something myself so I can build this LinkedIn automation. And if that works, then I can build other sorts of automations. And that's how I started to create AI. I viewed it. Five agents from A social network researcher all the way to a chief content officer to help me create great ideas so that I can post them on LinkedIn.

[00:12:23] Joao Moura: And it works great. I went from never posting on LinkedIn to post like three to four times every week. And I love what I post and it seems other people do as well. So from that point on. I decided that I want to create more automations and that's how CREATE. AI came to be. I just abstracted what I learned from that experience into this framework that I could then use to build other sorts of automations and things took off from there.

[00:12:50] Alex Volkov: Wow, that's incredible. Incredible story. As a lot of the engineering stories happen when people create like cool things, laziness is somewhere there. Like I want to automate something that I don't want to do, but I definitely need done. I definitely have a bunch of those as well, at least for Thursday. The collection stuff and the other stuff that I would love to just like happen for me.

[00:13:10] Alex Volkov: So definitely. Want to check out KuroAI for that and create like a Thursday collection thing. Could you, could you mention like the, like, like technical challenges here? You did mention that it's based on LengChain, if I'm not mistaken. You mentioned that there's not a lot of like, pull requests for people to help out with Could you talk about like the, the technical challenges you ran into?

[00:13:30] Joao Moura: Yes so basically when I start to build this out, I realized pretty quickly that Agents are just as useful as how many tools you can connect them with. And when I was looking online, I realized that both YamaIndex and LinkChain already had all these amazing tools that you could, you could run with.

[00:13:52] Joao Moura: So I wanted to make sure that I could, people could use those tools too. And build, like, Crews that use them. Because of that, I took the decision to build CREAI around LinkChain. So that if anyone wants to hook that up with their GitHub or their Gmail, there are already tools that were built out for that, and they're pretty easy to plug in and just work.

[00:14:15] Joao Moura: And it seems Lemma Index tools also work. I'm putting together some experiments around that to share with more people. But basically that was some of the initial decision that that will lead to this design. I think some of the technical challenges that came from it is It's just realizing that as people start creating all these different curls for these different use cases, there's so many edge cases, right?

[00:14:38] Joao Moura: You know that you can try to, like, steer LLMs your way, but especially if you're using, like, open source LLMs and smaller LLMs, they have a harder time just sticking with, like, a given.

[00:14:54] Joao Moura: I started to add a bunch of guardrails in Cree AI that actually makes it way better than what you would get with any other agent framework out there, where if it's For example, one of them is if you're running out of iterations, like you're, like your, your agent is stuck on a cycle or taking too long to come up with an answer it's gonna force it to come up with an answer if it goes over a certain number of iterations that you could define.

[00:15:21] Joao Moura: Another one is if it tries to use the same two in a row, it's going to prevent it to do that and guide there towards moving on. Another one is it has caching. So every two any agent uses is going to be cached so that if any other agent in the group decides to use the same two they don't need to actually execute it.

[00:15:41] Joao Moura: So I think a lot of the challenges come from like how I can add all these guardrails to make sure that Independently of what the use case and what the person is building a group of agents for, that's still going to run smoothly. And that's, that's where a lot of the work has been, has been put, been putting on.

[00:16:01] Joao Moura: So you mentioned local modals as well

[00:16:04] Alex Volkov: we mentioned, we met in the OLAMA event, and OLAMA is a CLI, a shout out, OLAMA folks, is a CLI to be able to download and run open source models on your hardware, basically. Many of the previous agent attempts, Auto GPT like different ones, they use maybe GPT 4 or something.

[00:16:20] Alex Volkov: We're getting to the tools and we heard previously in the space we heard from John Durbin that there are models now that are like better for specific tasks like function calling as well. Jao, could you speak a little bit about the difference that you see? Could Crue AI work with both, right? Open source and also like, API ones.

[00:16:39] Alex Volkov: And could you [00:16:40] talk about a little, the difference that you see between like the open source models as we have them right now versus kind of the, the online models and which ones would you prefer for your tasks?

[00:16:50] Joao Moura: Turns out that I think that the fact that crew AI supports local models is some like thing that, that. Make it take off because that's something that I wanted from the get go. Like these agents, especially if you're trying to automate complex tasks, they can become rather costly if you want to run them like 24 7.

[00:17:09] Joao Moura: But with like the ability to use local models, you can basically just set and forget, and they're going to keep doing work for you. So I wanted to make sure to support local models because of that. Guru AI supports like any of the vendors that you're going to find support in link chain. So you can use any of the open source models out there, a drawback, GPT, you name it.

[00:17:30] Joao Moura: And you can also use Zolyama, you can also use LM studio whatever is the best way that you have to run your models locally, you can use that. I. Specifically, like personally, love Olym. Olym is amazing. I love the guys that built it as well. And I think it's so easy to use that I ended up using that. And I have been using some of the smaller models.

[00:17:51] Joao Moura: Shout out to Nose Research. I love that OpenARMS 2. 5 model. It's just amazing and so small. Like I can't believe how good it is. And that's one that I use a lot for like when I'm using I'm using OpenARMS 2. 5 just because of how well it works, but I also tried with Mistro, I also tried with Solar, I also tried with Nexus so many models out there, so good.

[00:18:19] Joao Moura: One thing that I want to call out as well is that These local models, they definitely struggle a little bit more when compared to GPT 4 in terms of sticking with a given format. I'm also collecting all my executions data so that I can fine tune agentic models. Similar to how you have like instruct models and chat models, I want to make sure that we start to see more agentic models out there.

[00:18:46] Joao Moura: I have seen some closed source ones that are not like, You're not able to touch on. So I'm building an open source data set that I can then use to fine tune those models. And then you basically are going to have these agents run on local models without a glitch. That would be at least the end goal.

[00:19:05] Joao Moura: That's incredible, incredible specifically because

[00:19:08] Alex Volkov: we've, we've had interviews with a bunch of folks who build agentic stuff. So one, one of the more successful episodes of last year for Thursday, I was in an interview with Killian Lucas from Open Interpreter and the open source community here definitely opened the thread with Killian specifically to say, Hey, when the users run a bunch of this stuff, we would love to have.

[00:19:27] Alex Volkov: Users opt in maybe for some like telemetry or analytics to be able to build the data sets for the tasks that were completed or not completed. I don't know if you have this plan, but definitely this is a benefit to the community if you do have a way for folks to like, log their stuff. I also mentioned that like, I probably should reach out to you separately to see if like, these runs for these agents in crew could be logged in Weights Biases with the integration.

[00:19:50] Alex Volkov: Would be definitely more than happy to like participate and see if we can like look at the execution stuff of your agent on Weights Biases. As well, I think before I let Umesh wanted to have like a bunch of questions for you as well. He's been running and he, he does agents of his own. I want to say

[00:20:06] Alex Volkov: what's the next plans for crew? Where are you planning to take this? Many of these projects, suddenly people ask for a UI because maybe they don't want to do like, installing and, and doing like Python stuff. So you already mentioned TypeScript. Could you give us a little bit of a future sense of like, where are you planning to take this?

[00:20:23] Joao Moura: think what we are getting now is a bunch of feature requests from most bunch of different sites. So there is some prioritization going on so that I can figure out what to focus next. One thing that seems to be a no brainer to me though is that we need to have a UI for this.

[00:20:37] Joao Moura: I think this would be pretty cool and unlock a lot of use cases for people out there. I know there are other people that have been building UIs for their, like their businesses that are being built around this. I, I just think like an open source version would be better. So I'm definitely already working on the UI for this.

[00:20:53] Joao Moura: We're going to be able to. Put your agents together, bring your, all your cartoons together, and then you can basically have these agents like run by yourselves. I, I might look into offering an option where you, like, we can even host it for you, and I'm still figuring out what that would look like.

[00:21:10] Joao Moura: Maybe that's too far ahead. But but yeah, I think like the UI for it makes a lot of sense. Also another thing is that it seems a lot of the use cases kind of like go back into very similar tools over and over again. And even though you can hook them up with like link chain or lemma index tools, those might still require some like configuration.

[00:21:30] Joao Moura: It might not be as straightforward for some people. So we might take an opinionated take. On a tool specific repository and package that you can basically use to bring, let's say, let's say they want to create an agent that does reg you might be able to do that with one line versus having to be like a custom.

[00:21:51] Joao Moura: Like two cents for that. So that's another thing that we have been looking at as well. I think there's so many use cases. One thing that I'm trying to do more now is just kind of like chat with more people that are using this. Especially on the business side of things to understand like what other use cases we could support there.

[00:22:08] Joao Moura: But yeah, a lot of interesting things cooking.

[00:22:11] Alex Volkov: I'm looking forward to hear more about Kuru AI and upcoming things. I think Umesh Arkohos here has been doing Agents for a while and has a few questions as well. Umesh, go ahead.

[00:22:23] Umesh Rajiani: Yeah. Hey, Joe thank you for, for coming in. We are almost 80, 80, 90 percent of our workflow now is agentic workflow. So we are employing the generative AI library of. I think that's pretty much it for the introduction of Google for Gemini and also a lot of work using Autogen.

[00:22:41] Umesh Rajiani: And we got introduced to Crue AI, I think, four weeks ago through one of my engineers and found it pretty interesting. There are going to be a lot of pull requests coming in from us, actually, because we are thinking about a few things. I just wanted to ask you one particular question about the process part.

[00:22:59] Umesh Rajiani: Your current library, as I understand, is is a linear process library and what we have is what we are employing with Autogen is, is also. Bit of a, a graph of actions as well as the dag approach as well. Dag approach, can be implemented using your process. But do you have a, a, a graph of actions, workflow in planning or something that is coming up?

[00:23:24] Joao Moura: Yes, so this idea of processes, I want this to be like one of the cornerstones for our career AI. I, my understanding is that a lot, as I said earlier, like a lot of the different outcomes that you're going to get, a lot of the magic happens when you define true what processes these agents are going to work together, right?

[00:23:43] Joao Moura: And there are so many options out there. Like you can have them work like sequentially, you can have them work like in a group, like if they're in a meeting, you can have like a consensus strategy where they can kind of like bet to see who is going to take on the task and even evaluate the results.

[00:23:59] Joao Moura: So there's just a A lot of different processes that can be implemented there. And the idea is to implement all these processes so that people can have some work happen in parallel if they want to, or sequentially or whatnot. About a graph specific API, I I'm not sure how much I can tell about it, but we have been talking with link chain folks about it.

[00:24:19] Joao Moura: And there's, there's some things that have been cooking there.

[00:24:23] Umesh Rajiani: Enough said. This last question. So currently it is all Python but most of our implementations now because of the latency and everything and complexity of. The workflows that we are implementing, mostly our applications are enterprise applications.

[00:24:36] Umesh Rajiani: We are employing a lot of Rust to, for, for a compiled workflow. So do you have any plans of porting it to Rust or you're looking for kind of a support in that area or something?

[00:24:47] Joao Moura: Yeah. So we are, we are porting it to TypeScript right now, and there's some work being done in to build like an API where you might be able to just spin it off as like a service.

[00:24:58] Joao Moura: And you can then like [00:25:00] add agents, create agents, outrun API. So you don't have to create one yourself. You just need to figure out how you want to host it. I haven't thought about porting in trust yet, but I would be open to that idea. For sure. If I can get enough people to help out, I create a repository and we can get things working for sure.

[00:25:16] Umesh Rajiani: I'll, I'll reach out to you separately. Thanks Alex for, for allowing me to ask questions. Of course I have many questions, but I'll reach him out on his Discord.

[00:25:23] Alex Volkov: Yeah, thank you Umesh, and João, I just want to like recap on the awesome success of Kuru AI. I agree with you. I think the fact that, like, we've had many frameworks like this, we've talked about many frameworks like this, the ability to run this completely on your machine, the ability to, like, not pay for somebody else the ability to like, use Olama.

[00:25:43] Alex Volkov: I didn't know that you also support LM Studio. Shout out LM Studio, a friend of the Pada, hopefully we're, we're going to get on, on the next Thursday, I so I didn't know that I can, like, open up a local model on LM Studio and, and then the crew would use this API. Definitely. Definitely want to play with this now.

[00:26:00] Alex Volkov: I want to say, I want to give you a few minutes to just like talk to the community. A lot of things are happening in this world. I find it very interesting where kind of the AI engineers, the kind of the traditional software engineer background folks, they're building the tools, they're building the rag systems, let's say they use the link chain.

[00:26:17] Alex Volkov: From the other side, we have a bunch of machine learning folks who are Building the models, fine tuning the models, and working on that space, and reading the papers. And I do see a connection between, and obviously my role in Ways and Biases specifically is to connect these two worlds. I do want to see more people that train models also kind of like think about the agentic behaviors as well.

[00:26:37] Alex Volkov: We heard John Durbin before talk about like, hey, there's specific data sets for RAG, there's specific data sets for execution and function. I think Eroboros has The, the data set has a bunch of like function calling as well. So definitely I want to see a connection here. João, please feel free to talk to the community in terms of like what you need to make crew the best crew ever.

[00:26:57] Alex Volkov: Where can they find you, what you can get help with the floor is yours. Feel free to take over and ask everything. Community will provide.

[00:27:06] Joao Moura: A hundred percent. And just to tap into what you said there, I agree. I think like there's something magical that happened like last year with like GPT taking the world by the storm is that it like it connected two groups of engineers that in the past didn't talk very much.

[00:27:22] Joao Moura: And that was like AI and ML engineers with. regular software engineers. I have managed teams in both areas in the past, and I definitely have seen like that there isn't much interaction there, but right now it's, it's amazing to see all the amazing stuff that have been coming up from like those two groups to interacting more together.

[00:27:40] Joao Moura: It has been a lot of fun. About, about CREATE. AI. Yes, I would say give me a follow on Twitter or X, I would say now, so give me a follow on X and I definitely will keep posting and share more about CRE AI and all the things related to LLMs, Agents, and everything else. You can know more about CRE AI by looking into its GitHub.

[00:28:00] Joao Moura: So you can go into my profile slash Guru AI. I probably add the link to my ex account as well. From that, if you have follow up questions or if you want to like see what people have been cooking with it, I would say join the Discord community. We have around 500 people there and has been growing daily.

[00:28:18] Joao Moura: So if you join that, you might be able to see other use cases and things like that. If you're curious about it, but you're just like, you're, you're not sure what you could build with it there's a bunch of examples in the README and even some videos that I recorded crews doing like, stock analysis or tree planners and all that.

[00:28:38] Joao Moura: There is there's a lot of content there that you can consume in order to get your ideas. And if you do decide to give it a try, don't miss out on the custom GPT. It's also linked in the README and it can help you write the code. It can help you with ideas for the agents, ideas for the roles or for the tasks or anything around using QrooAI.

[00:28:58] Joao Moura: If you're also curious at contributing to the project. GitHub has a bunch of issues. My wife, again, has been flagging and tagging all of them. So thank you so

[00:29:07] Joao Moura: much.

[00:29:07] Alex Volkov: out, Bianca.

[00:29:08] Joao Moura: can find like all the ones that are tagged with help wanted or the ones that are related with questions And you can help answer them as well And we're gonna be writing new documentation from the scratch So this might be a great opportunity to help with like more simpler stuff as well if you're into that

[00:29:24] Alex Volkov: Awesome and I think I saw something, I don't know if I have a link

[00:29:28] Alex Volkov: to, to the generous documentation on the fly from, from just the, the code itself. And it looks super cool. I'll, I'll try to send this to you. Joao, thank you so much for joining Thursday. I, this is your first time here. Hopefully not the last.

[00:29:40] Alex Volkov: Congrats on the success of Kru AI and it's been great meeting you and then having you on definitely thank you for coming and folks should definitely check out Kru AI, give Joao a follow and we will expect more. I can't wait to like run a few Kru myself to help me with Thursday night tasks, especially on local, local models.

[00:29:58] Alex Volkov: It was super cool. Thank you for coming, man.

[00:30:01] Joao Moura: I love it. Thank you so much for having me catch you folks online.

[00:30:04] Alex Volkov: Awesome, and your audio quality was great by the way, thanks for testing out your mic.

[00:30:07]

[00:30:11] Bagel models the leaderboard from Jon Durbin

[00:30:11] Alex Volkov: We're moving forward into the top open source on the LLM leaderboard and the creator. So if you guys open the open source LLM leaderboard, which we often talk about. On HuggingFace we, we've talked about kind of the, the difference between human evaluation and the automatic evaluations that OpenLLM leaderboard runs.

[00:30:32] Alex Volkov: You will see a bunch of models. The top three ones are from CloudU and they're, they're like, I think merges of Yee34 and then the Mixtroll34b as well, but it's not based on Mixtroll. And then the rest of the is like a bunch of John Durbin Bagel examples. And, so all of those, there's like six models there that are based basically on the John's Bagel DPO versions.

[00:31:00] Alex Volkov: And I just wanted to shout this out and shout out Durbin for, for working this hard and releasing these models.

[00:31:06] Alex Volkov: Let's see if we can hear from the man himself. Hey, John.

[00:31:09] Jon Durbin: Hey, how's it going?

[00:31:10] Alex Volkov: Good. Thanks for joining us. I don't remember if you've ever been on stage. So feel free to briefly introduce yourself to the audience who doesn't know you. And definitely they should and they should follow you as well.

[00:31:22] Jon Durbin: Yeah, I'm a software engineer. I'm an AI tinker. I've been doing synthetic stuff since I guess maybe April with Aragoros project. It's been tons of fun. Lately I've been mostly working on the bagel models. If you're wondering what the bagel name came from, it's from Everything, Everywhere, All at Once.

[00:31:37] Jon Durbin: Great movie. Yeah, so that, that's the kind of the premise of the model is like all the prompt formats. Yeah. All the data sources, all the training techniques, there's Neptune, there's DPO yeah, just fun stuff there. As far as the leaderboard, that wasn't really my goal. If you look at the actual, like, token count per data set, I think the highest And then the last amount of tokens is actually probably the Cinematica dataset, which is movie scripts converted to roleplay format.

[00:32:07] Jon Durbin: So it's, it's interesting that it does so well, but really I was targeting the model for general purpose as a merge base because I know that, MergeKit is so popular now. So I was trying to come up with a base model that has a little bit of everything and every prompt format so that anyone who wants to do this, alchemy with MergeKit.

[00:32:28] Jon Durbin: Can use the Bagel series as a base, because I should, if you have an alpaca based model and a vicuña based model, they're not going to merge very well. It'll have, weird stray user tokens or whatever. The idea with Bagel is to be a good base.

[00:32:42] Alex Volkov: I also saw quite a lot of work you're doing on new DPO data sets. Could you talk about those?

[00:32:48] Jon Durbin: And then, yeah, I keep cranking out new DPO datasets to enhance the stuff that's lacking right now.

[00:32:54] Jon Durbin: I think even the YI 34B. Might be a little bit overcooked. I used QLORA for both the supervised fine tuning stage and DPO. And it turns out DPO, you really need to use an incredibly low learning rate. I was even using, like, maybe 50x smaller learning rate for the DPO phase than the Then the supervised fine tuning phase, and even then [00:33:20] I stopped the run about halfway through and killed it because the eval started spiking all over the place.

[00:33:26] Jon Durbin: Yeah, still, still lots of stuff to learn and I'd love to do a full weight fine tune of the E34B. I'm probably going to work on a Solar 10. 7B version of it next and maybe a DeepSeq 67B. I'm curious if the DeepSeq's, deeper network is actually going to improve things in any sort of way. But

[00:33:46] Alex Volkov: awesome. John, thank you so much for joining and thank you so much for the deep dive. So I have two questions for you real quick. I did not expect you to join. So this is not a full blown interview, but I'm very happy that I have you. First of all, you mentioned that there's like two versions, DPO and non DPO, of Bagel.

[00:34:01] Alex Volkov: And you mentioned the differences between them. You said like DPO version is more factual and truthful, but not great for RP. I wasn't sure what RP is. Roleplay?

[00:34:10] Jon Durbin: Roleplay,

[00:34:11] Alex Volkov: Yeah. And then creative writing. Could you give us like a little bit of a, of a sense of like, what's like DPO versus non DPO version? Is that just dataset based or is there something more going on behind the scenes that like makes the one model behave differently than the other?

[00:34:27] Jon Durbin: Yeah, so really all of the Bagel series, you basically have two phases of training. There's the super, regular supervised, fine tuning stage where I just, you can look at the Bagel repository. Everything is completely open source and reproducible. But in the supervised fine tuning phase it's just a ton of data sets and and then I take that fine tuned model, fine tuned model, and then I apply DPO, direct preference optimization to it.

[00:34:52] Jon Durbin: And I have quite a few DPO datasets in there, but really, the DPO landscape is sparse right now. You basically have DPO datasets from NVIDIA, the Helpsteer database, which is a human annotated one where they ran a bunch of gen a bunch of prompts against LLMs and then had humans rank them.

[00:35:14] Jon Durbin: Then there's like the LIMSYS, 1, 000, 000, where you can find the exact same prompt sent to multiple models. And so you can take like the GPT 4 answers. Use that as the preferred answer, and then the Kunyu 33 or something as the rejected answer, because you're assuming the GPT 401 is better.

[00:35:31] Jon Durbin: Same with there's Orca DPO pairs. I know Argya just did a new release of that, which is better. But we don't have a ton of DPO datasets that are specifically for creative writing tasks and stuff. I made one which is actually based on the Eroboros 2. 2 compared to the Eroboros 3 series where I actually rewrote most of the creative writing prompts with a different prompt and some other stuff.

[00:35:59] Jon Durbin: I actually used the March version of GPT 4 which is better. So in that case you get Basically like three to four times the number of tokens in the output. So there's that DPO data set, which I make myself in the Bagel Code. But otherwise there's really no role play focused data in any of the DPO data sets.

[00:36:21] Jon Durbin: So what happens is you take that supervised or, fine tuned model from the first phase. And you apply DPO to it, and it kind of experiences, forgetting of what it learned during the fine tuning of some of the stuff like creative writing and role play. Yeah same with code. So if you look at, my Twitter feed, you can see that I've released there's a Python DPO dataset that'll hopefully fix some of that stuff.

[00:36:44] Jon Durbin: I just released another contextual question answering DPO dataset for better RAG performance after the DPO phase. I put out just a few minutes ago Gutenberg DPO, which is basically I parse maybe 14 or 15 books from Project Gutenberg that are public domain into chapters and then create prompts to actually write those chapters and then I create summaries so you have like the previous chapter summary inside the prompt and then I use that to prompt one of the local LLMs so I used Dolphin, eChat, and Lama 213b. To get the rejected values the outputs from these models are fine in some cases, but they're short and they, you'll notice with the LLM, like most of the LLMs, when you write a story, it's always a happy ending and it, and it ends with like, and they walked into the forest lived happily ever after.

[00:37:37] Jon Durbin: It's boring and cliche. My hope with the Gutenberg stuff is that when you actually prompt it to write a chapter of a book, it's gonna be, from human writing that are popular books. They're a little bit old timey because they have to be to be public domain, but,

[00:37:52] Alex Volkov: Yeah.

[00:37:53] Jon Durbin: hopefully it will improve the writing and create creativity of the late whatever bagel models I do in the future with So I'm trying to kind of improve, improve that, but still a lot of stuff I need to do. I think the next thing I'll do before I actually make another bagel model is use something like the Goliath 120B to make a role play centric dataset for DPO. That way it doesn't completely forget how to do that as well.

[00:38:15] Alex Volkov: Awesome. And I'm just looking at the number of data sets that, like you said, everything, everywhere, all at once. And this is why it's called Bagel, Everything Bagel. It's just like an insane amount of data sets. I'm just gonna run real quick. AI2, Arc, Error Bores Belly Belly, Blue Moon.

[00:38:30] Alex Volkov: You have Capybara in there, Cinematica. Imo Bang, Gutenberg, LMsys chat, like, like tons, tons of stuff. It's incredible how well the model performs. John, one thing that I wanted to follow up on before we move on. You mentioned something that's better for RAG as well. You mentioned a DPO data set that's better for RAG.

[00:38:45] Alex Volkov: Is that the contextual DPO that you released?

[00:38:49] Jon Durbin: Yep.

[00:38:50] Alex Volkov: What, what makes it better for, for RAG purposes? Could you, could you like maybe give two sentences about this?

[00:38:56] Jon Durbin: And this is actually something you can reproduce with the AeroBoros tool as well if you wanted to generate your own data, but I have this instructor in there called Counterfactual Contextual, and what that does is it makes a bunch of fake facts, like it'll say, the Battle of Midway happened in the Civil War, something like that and it'll put that into context and then ask a question about it.

[00:39:19] Jon Durbin: And then it'll have the real version of the fact as well, World War II, Battle of Midway and then the idea is that you want to train the model to always attend to the context and not try to base the answers on what it knows from the base pre training. For example, if you're doing I don't know, like a virtual, you have a different planet where the sky is purple.

[00:39:41] Jon Durbin: And you ask the model, what color is sky, is the sky based on your lore book or whatever. You want to make sure that the model always obeys your context and, and answers accordingly, and not says the sky is blue, because I know the sky is blue. So the, the data set that I put in there has a bunch of those kinds of things.

[00:39:59] Jon Durbin: You can't just put in the fake facts, because then the model will just You know, learn to answer incorrectly. So for every, for every fake version of the context, you have to put in a real version of the context as well. The other thing that makes it better for RAG is I actually stuff more than one piece of context into it because Like with RAG, the retrieval accuracy is the hardest part, so you want to retrieve more than one document.

[00:40:23] Jon Durbin: So suppose you want to retrieve ten documents. If you want to stuff all ten of those into a single prompt and then you want to provide references to the user, you have to know which segment of the prompt it came from. This data set also includes, like, you can put metadata into the prompt for each section that you retrieve, and then when you ask for references in the output, it'll actually only reference that segment.

[00:40:47] Jon Durbin: A bunch of stuff like that, yeah, I, I put in irrelevant context as well to make, try to confuse them all because retrieval is very noisy. All of that kind of stuff is in there.

[00:40:57] Alex Volkov: First of all, I think from the whole community, thank you a lot for everything that you do and your work. And I really appreciate your time here on Thursday. You're more than welcome to always join us. And I didn't expect you to be here when I talked about.

[00:41:09] Alex Volkov: The stuff that you just released, but it's really, really awesome when people from the community who work on the stuff that they do also come and have a chance to speak about them. So John, you're always welcome on Thursday. I would love to invite you again and talk deeper.

[00:41:20] Alex Volkov: And as you release the next stuff that you're working on, I know you're working on a bunch of next stuff more than welcome to come here and, and, and discuss, or even like DM me before. So we'll know what to chat about. I will. Definitely mentioned the, the DPO datasets in the fine tuning hackathon that I'm going to this week.

[00:41:35] Alex Volkov: And so thank you for that. That, that was why I wanted to do a little bit of a deep dive. [00:41:40] And also I want to shout out you as the, one of the most active users of Weights Biases. You posted your like recap that we sent and you have two reports there. And you're part of like the top 10 percent of most active users with 2, 500.

[00:41:53] Alex Volkov: Hours trained in 23 and like 900 plus models. So that's, that's incredible. I just wanted to shout this out.

[00:42:02] Jon Durbin: Yeah, I'm a little addicted.

[00:42:03] Alex Volkov: Yeah, it's amazing. It's amazing. And I, I appreciate everything that you do and I think the community as well



This is a public episode. If you’d like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe
Mon, 15 Jan 2024 01:53:00 GMT
📅 ThursdAI Jan 11 - GPTs store, Mixtral paper, Phi is MIT + Phixtral, 🥯 by Jon Durbin owns the charts + Alex goes to SF again and 2 deep dive interviews 🎙️

Hey hey everyone, how are you this fine ThursdAI? 👋 I’m gud thanks for asking!

I’m continuing my experiment of spilling the beans, and telling you about everything we talked about in advance, both on the pod and in the newsletter, so let me know if this is the right way to go or not, for the busy ones it seems that it is. If you don’t have an hour 15, here’s a short video recap of everything we chatted about:

ThursdAI - Jan 11 2024 TL;DR

TL;DR of all topics covered + Show notes

* Open Source LLMs

* 🔥 Donut from Jon Durbin is now top of the LLM leaderboard (X, HF, Wolframs deep dive and scoring)

* OpenChat January Update - Best open source 7B LLM (X, Hugging Face)

* Our friends at NousResearch announce a seed round of 5.2M as their models pass 1.2 million downloads (X)

* Argilla improved (Distillabeled?) the DPO enhanced Neural Hermes with higher quality DPO pairs (X)

* New MoEs are coming out like hotcakes - PhixTral and DeepSeek MoE (X, Omar Thread, Phixtral Thread)

* Microsoft makes Phi MIT licensed 👏

* Big CO LLMs + APIs

* OpenAI adds personalization & team tiers (Teams announcement)

* OpenAI launches GPT store (Store announcement, Store link)

* Mixtral medium tops the LMsys human evaluation arena, is the best LLM overall after GPT4 👏 (X)

* Hardware

* Rabbit R1 is announced, $200/mo without a subscription, everybody has a take (X)

* This weeks Buzz from Weights & Biases

* Hackathon with Together, Langchain and WandB (and ME!) this weekend in AGI house (X, Signup)

* Video

* Bytedance releases MagicVideo-V2 video gen that looks great and passes Pika labs in human tests (X)

* AI Art & Diffusion & 3D

* Luma launched their online version of Genie and it's coming to the API (X)

* Show notes and links mentioned

* MergeKit (github)

* Jon Durbins Contextual DPO dataset (HuggingFace)

* Phixtral from Maxime Lebonne (X, HuggingFace)

* WandGPT - out custom Weights & Biases GPT (GPT store)

* Visual Weather GPT by me - https://chatg.pt/artweather

* Ask OpenAI to not train on your chats - https://privacy.openai.com/policies

AI Hardware

It seems that the X conversation had a new thing this week, the AI hardware startup Rabbit, showcased their new $200 device (no subscriptions!) at CES and everyone and their mom had an opinion! We had quite a long conversation about that with (his first time on ThursdAI 👏) as we both pre-ordered one, however there were quite a few red flags, like for example, GPUs are costly, so how would an AI device that has AI in the cloud just cost a 1 time 200 bucks??

There were other interesting things they showed during the demo, and I’ll let you watch the full 30 minutes and if you want to read more, here’s a great deeper dive into this from .

UPDATE: Ss I’m writing this, the CEO of Rabbit (who’s also on the board of Teenage Engineering, the amazing company that designed this device) tweeted that they sold out the initial first AND second batch of 10K unites, netting a nice $2M in hardware sales in 48 hours!

Open Source LLMs

Mixtral paper dropped (ArXiv, Morgans take)

Mistral finally published the paper on Mixtral of experts, the MoE that's the absolutel best open source model right now, and it's quite the paper. Nisten did a full paper reading with explanations on X space, which I co-hosted and we had almost 3K people tune in to listen. Here's the link to the live reading X space by Nisten.

And here's some notes courtecy Morgan McGuire (who's my boss at WandB btw 🙌)

Strong retrieval across the entire context window

Mixtral achieves a 100% retrieval accuracy regardless of the context length or the position of passkey in the sequence.

Experts don't seem to activate based on topic

Surprisingly, we do not observe obvious patterns in the assignment of experts based on the topic. For instance, at all layers, the distribution of expert assignment is very similar for ArXiv papers (written in Latex), for biology (PubMed Abstracts), and for Philosophy (PhilPapers) documents.

However...

The selection of experts appears to be more aligned with the syntax rather than the domain

Datasets - No info was provided to which datasets Mixtral used to pretrain their incredible models 😭

Upsampled multilingual data

Compared to Mistral 7B, we significantly upsample the proportion of multilingual data during pretraining. The extra capacity allows Mixtral to perform well on multilingual benchmarks while maintaining a high accuracy in English

Mixtral Instruct Training

We train Mixtral – Instruct using supervised fine-tuning (SFT) on an instruction dataset followed by Direct Preference Optimization (DPO) on a paired feedback dataset and was trained on @CoreWeave

Jon Durbin Donut is the 🤴 of open source this week

6 of the top 10 are donut based models or merges of it. If you remember Auroborous, Donut includes that dataset, and there are two varieties there, the DPO and the non DPO versions of Bagel, including two merges from Cloudyu, which are non trained merges with mergekit, based on Donut. Jon pro tip for selecting DPO vs Non DPO models is

FYI, the DPO version is more factual, truthful, better at math, etc., but is not great for RP, creative writing, etc. Use non-DPO for those tasks!

Donut includes an impressive amount of dataset mixed together, which are all linked from the model card but here they are:

"ai2_arc, airoboros, apps, belebele, bluemoon, boolq, capybara, cinematika, drop, emobank, gutenberg, lmsys_chat_1m, mathinstruct, mmlu, natural_instructions, openbookqa, pippa, piqa, python_alpaca, rosetta_code, slimorca, spider, squad_v2, synthia, winogrande, airoboros 3.1 vs airoboros 2.2.1, helpsteer, orca_dpo_pairs"

Jon also shared his end of the year WandB report nad has trained a whopping 917 models this year for a total of ~2500 hours and is in the top 10% of the top active users (among 800K or so users)

I didn't know that Jon is going to join, but was so happy that he joined the live recording that we ended up chatting for 20 minutes, and there was so many nuggets in that conversation, about how to prepare DPO datasets, which other ones Jon has been releasing, and just a bunch more gold, that I decided to CUT that out and post it as a separate special deepdive episode that's going to get released on the Sunday special. Stay tuned for that!

Nous Research announces $5.2 million funding seed round as they cross 1.1 million model downloads on the hub

Congrats to Karan, Emozilla, Teknium, Bowen, Shivani and the rest of the Nous team on this great news! 👏 We expect to hear more from them in the coming year, with a consistent commitment to open source, keep open sourcing the best models, and the upcoming Forge news!

With investors like Balaji, OSS capital, Vipul from Together, Nous completes the $5.2M seed round, and we had Karan (one of the co-founders of Nous) on the pod to chat to use about what they are planning to do with that money and what are their continuous commitments to open source!

In addition, they just recently passed 1.1 million downloads on the hub with Nous-Hermes-2-34B being their best model! 🤴

OpenChat Jan update becomes the leading open source 7B model (X, Hugging Face)

This update mainly enhanced training methodology, in-context learning & coding skills, outperforming the last 1210 release on 7 out of 8 benchmarks! and scores 71.3 on HumanEval, 65.8% on MMLU 👏

The previous version of OpenChat trails just behind OpenHermes on the human evals on Lmsys arena, but both are incredible 7B models.

Argilla

- Argilla used their Distilabel tool to build a preference dataset from ratings and critiques of AI response pairs, taking around 3 hours

- The original dataset assumed the GPT-4/3.5 responses were always best, but Argilla found this was not always the case

- Their dataset confirmed ~4,000 pairs had the same rating, 7,000 pairs were unchanged, and ~2,000 times the rejected response was preferred

- Improving existing DPO datasets with higher quality pairs is important for model fine-tuning

- They are releasing an improved version of the popular Orca Pairs DPO dataset from Intel, and a new OpenHermes model outperforming baselines with 54% fewer DPO pairs

Big CO LLMs + APIs

OpenAI has a big week, launches GPTs store and team pro accounts (Blog)

Things of note about the store:

* My GPTs are getting feedback and crossed 10K chats , was #6 on lifestyle and the disappeared, but has gained 2x more chats in 24 hours since the store has launched!

* Discoverability is great, trending GPTs are shown clearly, and folks are getting a lot of exposure

* Copycats already started copying a bunch of the great GPTs, see this example of what happens when you search for Gymstreak, most of the top GPTs are already being copy-catted.

Team accounts:

$25/mo per user for annual plans and at least 2 teams

The biggest confusion was from folks who didn't understand that OpenAI trains on Pro conversations, and there's an option to Opt-out!

This weeks Buzz (What I learned with WandB this week)

Weights and Biases (and ME!) are going to AGI house to lead a Rag vs Finetune hackathon with cool prizes!

There's still time to RSVP, will incredible guests speakers, this Hackathon is organized together with... LangChain, TogetherCompute and AGI house - If you're in the SF area, and you wanna hack on some cool RAG things and get awesome prizes (and meet me!) join the waitlist here https://partiful.com/e/AlntdLtxh9Jh1J6Pcsma

Vision & Video

Luma released GENIE on Web and IOS, if you remember, we covered the GENIE text-to-3d model they first released on discord a while ago, and now it's incorporated into the luma website, and is significantly higher quality 3D assets.

The generations are free for now, and they look awesome! Here are some of mine, I created a Bee holding a Wand (get it? WandB? 😆) and a polish bear (internal joke) and they look so cool!

Friend of the pod and recent LUMA hire Arthur Islamov jumped on and also told us that this is coming to the API, so developers would be able to automate asset creation and generate tons of 3D objects programmatically, and use cool prompt techniques to make sure they are a bit better every time maybe? Great news!

AI Art & Diffusion

Bytedance announces MagicVideo-V2 (Arxiv, Project)

We didn't get anything besides quite a few cherry picked videos and a paper, so we can't use this yet, but wow some of these videos look incredible!

MagicVideo-V2 that integrates the text-to-image model, video motion generator, reference image embedding module and frame interpolation module into an end-to-end video generation pipeline. Benefiting from these architecture designs, MagicVideo-V2 can generate an aesthetically pleasing, high-resolution video with remarkable fidelity and smoothness. It demonstrates superior performance over leading Text-to-Video systems such as Runway, Pika 1.0, Morph, Moon Valley and Stable Video Diffusion model via user evaluation at large scale

Lastly, I had the greatest time to interview my new friend João Moura, the creator of Crew AI, which been popping off, was the #1 trending on Github and #2 of the day on Product hunt, and is essentially an AI framework that lets you create a crew of AI agents to do tasks for you. I will be polishing up that conversation and post it together with the deep dive with Jon, so stay tuned, but here’s a sneak preview of how cool this is and expect that episode to drop soon!



This is a public episode. If you’d like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe
Fri, 12 Jan 2024 00:59:08 GMT
📅 ThursdAI Jan 4 - New WizardCoder, Hermes2 on SOLAR, Embedding King? from Microsoft, Alibaba upgrades vision model & more AI news

Here’s a TL;DR and show notes links

* Open Source LLMs

* New WizardCoder 33B V1.1 - 79% on HumanEval (X, HF)

* Tekniums Hermes 2 on SOLAR 10.7B (X, HF)

* Microsoft - E5 SOTA text embeddings w/ Mistral (X, HF, Paper, Yams Thread)

* Big CO LLMs + APIs

* Samsung is about to announce some AI stuff

* OpenAI GPT store to come next week

* Perplexity announces a $73.6 Series B round

* Vision

* Alibaba - QWEN-VL PLUS was updated to 14B (X, Demo)

* OCU SeeAct - GPT4V as a generalist web agent if grounded (X, Paper)

* Voice & Audio

* Nvidia + Suno release NeMo Parakeet beats Whisper on english ASR (X, HF, DEMO)

* Tools & Agents

* Stanford - Mobile ALOHA bot - Open source cooking robot (Website, X thread)

Open Source LLMs

WizardCoder 33B reaches a whopping 79% on HumanEval @pass1

State of the art LLM coding in open source is here. A whopping 79% on HumanEval, with Wizard Finetuning DeepSeek Coder to get to the best Open Source coder, edging closer to GPT4 and passing GeminiPro and GPT3.5 👏 (at least on some benchmarks)

Teknium releases a Hermes on top of SOLAR 10.7B

Downloading now with LMStudio and have been running it, it's very capable. Right now SOLAR models are still on top of the hugging face leaderboard, and Hermes 2 now has 7B (Mistral) 10.7B (SOLAR) and 33B (Yi) sizes.

On the podcast I've told a story of how this week I actually used the 33B version of Capybara for a task that GPT kept refusing to help me with. It was honestly kind of strange, a simple request to translate kept failing with an ominous “network error”.

Which only highlighted how important the local AI movement is, and now I actually have had an experience myself of a local model coming through when a hosted capable one didn’t

Microsoft releases a new text embeddings SOTA model E5 , finetuned on synthetic data on top of Mistral 7B

We present a new, easy way to create high-quality text embeddings. Our method uses synthetic data and requires less than 1,000 training steps, without the need for complex training stages or large, manually collected datasets. By using advanced language models to generate synthetic data in almost 100 languages, we train open-source models with a standard technique. Our experiments show that our method performs well on tough benchmarks using only synthetic data, and it achieves even better results when we mix synthetic and real data.

We had the great please of having Bo Wang again (One of the authors of the Previously SOTA Jina embeddings and a previous podcast gust) to do a deepdive into embeddings and specifically E5 with it's decoder only architecture. While the approach Microsoft researchers took here are interesting, and despite E5 claiming a top spot on the MTEB leaderboard (pictured above) this model doesn't seem to be super practical for most purposes folks use embeddings right now (RAG) for the following reasons:

* Context length limitation of 32k, with a recommendation not to exceed 4096 tokens.

* Requires a one-sentence instruction for queries, adding complexity for certain use cases like RAG.

* Model size is large (14GB), leading to higher costs for production use.

* Alternative models like bge-large-en-v1.5 are significantly smaller (1.35GB).

* Embedding size is 4096 dimensions, increasing the cost for vector storage.

Big CO LLMs + APIs

OpenAI announces that the GPT store is coming next week!

I can't wait to put the visual weather GPT I created and see how the store prompts it and if I get some revenue share like OpenAI promised during dev day. My daughter and I are frequent users of Alice - the kid painter as well, a custom GPT that my Daughter named Alice, that knows it's speaking to kids over voice, and is generating coloring pages. Will see how much this store lives up to the promises.

This weeks Buzz (What I learned with WandB this week)

This week was a short one for me, so not a LOT of learnings but I did start this course from W&B, called Training and Fine-tuning Large Language Models (LLMs).

It features great speakers like Mark Sarufim from Meta, Jonathan Frankle from Mosaic, and Wei Wei Yang from Microsoft along with W&B MLEs (and my team mates) Darek Kleczek and Ayush Thakur and covers the end to end of training and fine-tuning LLMs!

The course is available HERE and it's around 4 hours, and well well worth your time if you want to get a little more knowledge about the type of stuff we report on ThursdAI.

Vision

SeeAct - GPT4V as a generalist web agent if grounded (X, Paper)

In June OSU NLP released Mind2Web which is a dataset for developing and evaluating web acting agents, LLMs that click buttons and perform tasks with 2350 tasks from over 130 website, stuff like booking flights, finding folks on twitter, find movies on Netflix etc'

GPT4 without vision was terrible at this (just by reading the website html/text) and succeeded at around 2%.

With new vision LMMs, websites are a perfect place to start training because of the visual (how website is rendered) is no paired with HTML (the grounding) and SeeAct uses GPT4-V to do this

SeeAct is a generalist web agent built on LMMs like GPT-4V. Specifically, given a task on any website (e.g., “Compare iPhone 15 Pro Max with iPhone 13 Pro Max” on the Apple homepage), the agent first performs action generation to produce a textual description of the action at each step towards completing the task (e.g., “Navigate to the iPhone category”), and then performs action grounding to identify the corresponding HTML element (e.g., “[button] iPhone”) and operation (e.g., CLICK, TYPE, or SELECT) on the webpage.

SeeAct achieves a 50% score on the Mind2Web evaluation task!

QWEN-VL was updated to PLUS (14B) and it's pretty good compared to GPT4V

Capabilities include: image captioning, visual question answering, visual grounding, OCR, visual reasoning. We had a chat with Junyang Lin, the tech lead for Qwen with Alibaba on the pod, and he mentioned specifically that they noticed that adding a larger "brain" (as in, LLM) to vision models, significantly increases the performance and vision understanding of the LMMs.

While this model is not yet released, you can demo it here, and Junyang told us that it is coming to a release, like the previous QWEN models did before.

I noticed the advanced OCR capabilities and understanding, this example was really spot on. Notice the "logo for Browser company" , the model understood that this text was in fact a logotype! (which even GPT4V failed at in my test)

Voice

Parakeet from NVIDIA beats Whisper on English with a tiny model (blog)

Brought to you by @NVIDIAAI and @suno_ai_, parakeet beats Whisper and regains its first place. The models are released under a commercially permissive license! The models inherit the same FastConformer architecture and come in 2 flavors: 1. RNNT (1.1B & 0.6B) 2. CTC (1.1B & 0.5B) Each model is trained on 65K hours of English data (40K private proprietary data by Suno & NeMo teams) over several hundred epochs. Key features of the parakeet model: 1. It doesn't hallucinate (if the audio sample has silence, the output is silent). 2. It is quite robust to noisy audio (if the audio sample has non-vocal sounds, it outputs silence).

We had the great please to have VB from the Audio team at HuggingFace, and he went in depth into the way in which Parakeet is better than Whisper (higher quality transcriptions while also being much much faster), it was trained on only 65K hours vs a few million with whisper, and we also covered that because of this different architecture, Parakeet is not able to receive any guidance for words that are hard for it to understand. For example, with whisper, I often provide ThursdAI in initial_prompt parameter to help guide whisper to know what it should say.

Regardless, having a model that's superfast, and can beat whisper, and is commercially licensed to build on top of is incredible! Here's a demo for you to try it out and it's available with the NVIDIA NeMO framework.

Coqui shuts down :(

We've had Josh from Coqui on our pod before, when they released XTTS, and they have been friends ever since. It's sad to see Coqui shut down, and we want to wish all the team an easy and great transition 👏 You guys did a great job and we're rooting for each and every one of you.

* Coqui is closing down.

* The team is praised for being small yet impactful, competing with big tech despite limited resources.

* Coqui began as the Machine Learning Group at Mozilla, creating DeepSpeech, Common Voice, and TTS.

* Spun out as Coqui in 2021 to accelerate their mission.

* Major achievement: XTTS, with openly released model weights for versions 1 and 2.

* 2021: Coqui STT v1.0 released, Coqui Model Zoo and SC-GlowTTS launched.

* 2022: YourTTS became viral, numerous open-source releases, team expansion.

* 2023: Coqui Studio webapp and API launched, XTTS open release, first customers acquired.

* Acknowledgment of the community, investors, customers, and partners for their support.

* Partners include HuggingFace, Mozilla, Masakhane, Harvard, Indiana University, Google, MLCommons, Landing AI, NVIDIA, Intel, and Makerere University.

* Future of generative AI in 2024 predicted to grow, with open-source playing a significant role.

* Coqui TTS remains available on Github for further innovation.

Tools

Stanford Mobile ALOHA bot open sources, shows cooking

Back in March, Stanford folks introduced ALOHA, (A Lowcost Open Hardware system for Bimanual Teleoperation)

Basically a 4 arm robot, that a human operator can operate tasks and do fine motor skills like break an egg or tie ziptie. Well now, just 10 months later, they are introducing the Mobile version. A mounted ALOHA gear, that uses the human to perform tasks like cooking, calling the elevator and is able to learn from those actions, and then perform them.The operating gear can be easily detached for self operation, it's mobile so compute and battery pack are on the wheel base.

Recently Meta released a huge dataset of first person operations called Ego-Exo 4D which combines first person and third person perspective for a big variety of tasks, such as cooking, cleaning, sports, healthcare and rock climbing, and this open hardware from Stanford is an additional example of how fast robotics advances into the physical world

ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

And just like that, the first ThursdAI of the year is done! 🫡 Thank you for being a subscriber, see you next week 👏



This is a public episode. If you’d like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe
Fri, 05 Jan 2024 00:48:41 GMT
📅 ThursdAI - Dec 28 - a BUNCH of new multimodal OSS, OpenAI getting sued by NYT, and our next year predictions

Hey hey hey (no longer ho ho ho 🎄) hope you had a great Christmas! And you know that many AI folks have dropped tons of OpenSource AI goodies for Christmas, here’s quite a list of new things, including at least 3 new multi-modal models, a dataset and a paper/technical report from the current top model on HF leaderboard from Upstage.

We also had the pleasure to interview the folks who released the Robin suite of multi-modals and aligning them to “good responses” and that full interview is coming to ThursdAI soon so stay tuned.

And we had a full 40 minutes with an open stage to get predictions for 2024 in the world of AI, which we fully intent to cover next year, so scroll all the way down to see ours, and reply/comment with yours!

TL;DR of all topics covered:

* Open Source LLMs

* Uform - tiny(1B) multimodal embeddings and models that can run on device (HF, Blog, Github, Demo)

* Notux 8x7B - one of the first Mixtral DPO fine-tunes - (Thread, Demo)

* Upstage SOLAR 10.7B technical report (arXiv, X discussion, followup)

* Capybara dataset open sourced by LDJ (Thread, HF)

* Nous Hermes 34B (finetunes Yi34B) - (Thread, HF)

* Open Source long context pressure test analysis (Reddit)

* Robin - a suite of multi-modal (Vision-Language) models - (Thread, Blogpost, HF)

* Big CO LLMs + APIs

* Apple open sources ML-Ferret multi-modal model with referring and grounding capabilities (Github, Weights, Paper)

* OpenAI & Microsoft are getting sued by NewYorkTimes for copyright infringement during training (Full Suit)

* AI Art & Diffusion & 3D

* Midjourney v6 alpha is really good at recreating scenes from movies (thread)

Open Source LLMs

Open source doesn't stop even during the holiday break! Maybe this is the time to catch up to the big companies? During the holiday periods?

This week we got a new 34B Nous Hermes model, the first DPO fine-tune of Mixtral, Capybara dataset but by far the biggest news of this week was in Multimodality. Apple quietly open sourced ml-ferret, an any to any model able to compete in grounding with even GPT4-V sometimes, Uform released tiny mutli-modal and embeddings versions for on device inference, and AGI collective gave NousHermes 2.5 eyes 👀

There's no doubt that 24' is going to be the year of multimodality, and this week we saw an early start of that right on ThursdAI.

Ml-Ferret from Apple (Github, Weights, Paper)

Apple has been in the open source news lately, as we've covered their MLX release previously and the LLM in a flash paper that discusses inference for low hardware devices, and Apple folks had 1 more gift to give. Ml-Ferret is a multimodal grounding model, based on Vicuna (for some... reason?) which is able to get referrals from images (this highlighted or annotated areas) and then ground the responses with exact coordinates and boxes.

The interesting thing about the referring, is that it can be any shape, bounding box or even irregular shape (like the ferred in the above example or cat tail below)

Ferret was trained on a large new dataset called GRIT containing over 1 million examples of referring to and describing image regions (which wasn't open sourced AFAIK yet)

According to Ariel Lee (our panelist) these weights are only delta weights and need to be combined with Vicuna weights to be able to run the full Ferret model properly.

Uform - tiny (1.5B) MLLMs + vision embeddings (HF, Blog, Github, Demo)

The folks at Unum have released a few gifts for us, with an apache 2.0 license 👏 Specifically they released 3 vision embeddings models, and 2 generative models.

Per the documentation the embeddings can yield 2,3x speedup improvements to search from Clip like models, and 2-4x inference speed improvements given the tiny size. The embeddings have a multi-lingual version as well supporting well over 20 languages.

The generative models can be used for image captioning, and since they are tiny, they are focused on running on device, and are already converted to ONNX format and core-ML format. Seen the results below compared to LLaVa and InstructBLIP, both at the 7B range

I've tried a few images of my own (you can try the demo here), and while there was hallucinations, this tiny model did a surprising amount of understanding for the size!

Also shoutout to Ash

Robin suite of multimodal models (Thread, Blogpost, HF)

The folks at the CERC-AAI lab in MILA-quebec have released a suite of multi-modal models, that they have finetuned and released a fork of NousHermes2.5 that can understand images, building on top of CLIP, and SigLIP as the image encoder.

In fact, we did a full interview with Irina, Kshitij, Alexis and George from the AGI collective, that full interview will be released on ThursdAI soon, so stay tuned, as they had a LOT of knowledge to share, from fine-tuning the clip model itself for better results, to evaluation of multimodal models, to dataset curation/evaluation issues and tips from Irina on how to get a government supercomputer compute grant 😈

Big CO LLMs + APIs

OpenAI is being used by NYT for copyright infringement during training (Lawsuit)

New York times is suing OpenAI and Microsoft for copyright infringement, seeking damages (amount unclear) and removal of NYT data from OpenAI models. The full lawsuit is a worthwhile read, and in includes a whopping 100 pages of examples of GPT4 completing NYT articles verbatim. I personally wasn't able to reproduce this behavior in the chatGPT app, but some folks on X suggested that it's possible in the OpenAI playground, with the right prompt and NYT URL in the prompt.

This lawsuit came after a round of attempted negotiations between NYT and OpenAI, which apparently failed, and it's worth noting a few things. First, OpenAI (with almost every other AI company) have a "Copyright shield" feature, where they protect the user of these services from getting sued for copyright violations. So there is no direct exposure for customers of OpenAI. Additional thing of note is, the NYT information is compiled not by OpenAI directly, rather, OpenAI (and almost every other LLM) have used the CommonCrawl dataset (among others) which did the crawling and collection of text itself.

Per the CommonCrawl license, OpenAI should have reached out to each individual URL in that dataset and worked out the copyright on their own, which is a bit difficult to do, as CommonCrawl includes 3-5 billion pages collected each month.

Regardless of the claims, the hottest takes I saw in regards to this are, that by the time anything moves with this lawsuit, we will be on GPT-6 or so and it won't matter by then, or that OpenAI will have to retrain a model without NYT data, which I find quite ludicrous personally and very unlikely to happen.

If this lawsuit actually sets a precedent, this will IMO be a very bad one for the US, considering other countries like Japan are already getting ahead of this, declaring all scraped data as fair us if used for training (source)

Of course, all of X became IP experts overnight, and the debates are very interesting, some are confusing technical terms, some are claiming that OpenAI will just cave and pay NYT, while some super libertarian ones take it all the way down to: if AI has human rights, and if it does, then preventing it learning from copyright material is like preventing people to read Hemingway.

This weeks buzz (What I learned in WandB this week)

This week, we sent out our annual emails of wrapped cards for everyone who used Weights & Biases to train models this year. This is a yearly tradition, similar to Spotify, however, for ML purposes, and this year the cards were generated with stable diffusion XL, generating hundreds of thousands of images based on autogenerated model run names!

The interesting thing I noticed also, is just how many folks shared their stats screenshots right from the email we send, including not only how many hours they spend training models this year, but also how many other features they used, like reports and sweeps. And I noticed just how many folks don't use reports, which is a shame, as it's such a cool feature! WandB literally has a built in blogging platform for all your ML needs and it includes live widgets of every metric you're tracking in your runs, it's really great.

AI Art & Diffusion

Midjourney v6 is incredible at recreating actual movie stills and scenes (Thread)

Another potential lawsuit is waiting to happen? We already saw lawsuits against StabilityAI for supposed copyright infringement and stability did a lot of work to exclude proprietary art from their training datasets, however, the new incredible version of Midjourney, shows just.. a mind-blowing accuracy in recreating scenes from movies, and cartoon styles. Just look at some of these examples (collected by some folks on X)

This + the above lawsuit news coming for OpenAI & Microsoft from New York Times is setting up 24' to be the year where copyright law and AI finally meet for real. And we'll keep reporting on the outcomes.

Predicitons for 24'

In the last 20 minutes of the pod recording we opened up the floor to folks giving us their predictions for AI developments in the year 2024, and I also asked this question on X itself. The idea was, to come back next year during our yearly summary and see which predictions we hit, and which predictions we were not even remotely thinking about!

Here's a list of predictions with their category (Thanks to AI to help me sort these from different sources and transcription)

* Open Source LMs

* 1GB models with Mix Trail performance levels - Nisten

* Continual pretraining and building on top of each other's work - Irina Rish

* Smaller models trained on more data - Irina Rish

* Consolidation and standardization of models - Irina Rish

* Agents running on 7B models with capabilities like web search and code interpretation - Shroominic

* End of dominance of transformer architecture - Far El

* Marriage of reinforcement learning and language models - Far El

* New benchmarking standards - Far El

* Plug and play weights for expertise - Umesh

* Self-improving pipeline framework - Umesh

* Big Companies/APIs

* Mistral to become a major player, surpassing companies like Anthropic - Alex Volkov

* Apple AI device with multimodal capabilities - Umesh

* Google Gemini Pro commoditizing APIs - Umesh

* Model that can ace undergrad computer science curriculum - George Adams

* Extremely good but expensive model (~$1 per response) - Shroominic

* Apple spatial computing + AI product innovation - John Doe

* Real-time multilingual translation app/device - Umesh

* Vision/Video

* AI-generated full length feature film - Umesh

* Artist AI model galleries for art generation - Umesh

* Real-time video understanding and multimodal models - Alex Volkov

* Public release of high quality, fast voice cloning tech - Alex Volkov

* 3D model/animation generation for video games - tobi

* Meta will outperform most companies in video AI and mixed reality - Alex Volkov

* Other

* Localized national AI models - Ravi

* Rise in use of deepfakes - Ravi

* Surge in metadata embedding for ownership identification - R.AI.S.E

* Advances in AI for biology/healthcare - Ravi, Ash Vardanian

* A model capable of completing an undergrad CS curriculum at an A level by the end of the year - George Adams

* AI device, fully capable of multimodal capabilities, from Apple - Educated Guess

* Development in domain-specific LMs for bio applications, especially in synthetic biology - Ravi

* Twitter Prediction

* CodeInterpreterAPI V2 - Shroominic

* Gemini will NOT outperform ChatGPT - Alex Northstar

* Tech slowdown in mass adoption, human creativity as bottleneck - “charles harben”

* Biology and Robots - Sinan

* Code LLMs near junior developer productivity - Karthik Kannan

* Tokenizers will work - Geronimo

* LLM curve plateaus, focus on refining and multimodal, OpenAI settles with NYT - hokiepoke

* Fully generated, rigged, voiced game characters, minimal human intervention - Rudzinski Maciej

* AI affects politics - 𝕄𝕏𝕊ℍℝ🤖

* Audio reaches DallE3 level, video and 3D advancements, new cool modality - Darth thromBOOzyt

* Synthetic data will be huge - Leo Tronchon

Ok now that our predictions are here, we'll come back here next year and see who hit what predictions!

If you have predicitons if your own, please reply to this email/substack and post them here as well, so we'll have a record 🫡

With that, I want to wish you a happy new year, and as always, see you here next week 👏



This is a public episode. If you’d like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe
Fri, 29 Dec 2023 00:41:19 GMT
🎄ThursdAI - LAION down, OpenChat beats GPT3.5, Apple is showing where it's going, Midjourney v6 is here & Suno can make music!

Hey everyone, happy ThursdAI!

As always, here's a list of things we covered this week, including show notes and links, to prepare you for the holidays.

TL;DR of all topics covered:

* Open Source AI

* OpenChat-3.5-1210 - a top performing open source 7B model from OpenChat team beating GPT3.5 and Grok (link, HF, Demo)

* LAION 5B dataset taken down due to CSAM allegations from Stanford (link, full report pdf)

* FLASK - New evaluation framework from KAIST - based on skillset (link)

* Shows a larger difference between open/closed source

* Open leaderboard reliability issues, vibes benchmarks and more

* HF releases a bunch of MLX ready models (LLama, Phi, Mistral, Mixtral) (link)

* New transformer alternative architectures - Hyena & Mamba are heating up (link)

* Big CO LLMs + APIs

* Apple - LLM in a flash paper is making rounds (AK, Takeaways thread)

* Anthropic adheres to the messages API format (X)

* Microsoft Copilot finally has plugins (X)

* Voice & Audio

* AI Music generation Suno is now part of Microsoft Copilot plugins and creates long beautiful songs (link)

* AI Art & Diffusion

* Midjourney v6 is out - better text, great at following instructions (link)

Open Source AI

We start today with a topic I didn't expect to be covering, the LAION 5B dataset, was taken down, after a report from Stanford Internet Observatory found instances of CSAM (Child Sexual Abuse material) in the vast dataset. The outlined report had identified hundreds to thousands of instances of images of this sort, and used something called PhotoDNA by Microsoft to identify the images by hashes, using a sample of NSFW marked images.

LAION 5B was used to train Stable Diffusion, and 1.4 and 1.5 were trained on a lot of images from that dataset, however SD2 for example was only trained on images not marked as NSFW. The report is very thorough, going through the methodology to find and check those types of images. Worth noting that LAION 5B itself is not an image dataset, as it only contains links to images and their descriptions from alt tags.

Obviously this is a very touchy topic, given the way this dataset was scraped from the web, and given how many image models were trained on it, the report doesn't allege anything close to influence on the models it was trained on, and outlines a few methods of preventing issues like this in the future. One unfortunate outcome of such a discovery, is that this type of work can only be done on open datasets like LAION 5B, while closed source datasets don't get nearly to this level of scrutiny, and this can slow down the advancement of multi-modal open source multi modal models while closed source will continue having these issues and still prevail.

The report alleges they found and validated between hundreds to a few thousand of CSAM verified imagery, which considering the size of the dataset, is infinitesimally small, however, it still shouldn't exist at all and better techniques to clean those scraping datasets should exist. The dataset was taken down for now from HuggingFace and other places.

New version of a 7B model that beats chatGPT from OpenChat collective (link, HF, Demo)

Friend of the pod Alpay Aryak and team released an update to one of the best 7B models, namely OpenChat 7B (1210) is a new version of one of the top models in the 7B world called OpenChat with a significant score compared to chatGPT 3.5 and Grok and with very high benchmark hits (63.4% on HumanEval compared to GPT3.5 64%)

Scrutiny of open source benchmarks and leaderboards being gamed

We've covered State of the art models on ThursdAI, and every time we did, we covered the benchmarks, and evaluation scores, Whether that's the popular MMLU (Multi-Task Language Understanding) or HumanEval (Python coding questions) and almost always, we've referred to the HuggingFace Open LLM leaderboard for the latest and greatest models. This week, there's a long thread on the hugging face forums that HF eventually had to shut down, that alleges that a new contender for the top, without revealing methods, used something called UNA to beat the benchmarks, and folks are suggesting that it must be a gaming of the system, as a model that's trained on the benchmarks can easily top the charts.

This adds to the recent observations from friend of the pod Bo Wang from Jina AI, that the BGE folks have stopped focusing on the MTEB leaderboard (Massive Text Embedding Benchmark) benchmarks as well, as those are also seem to be gamed (link)

This kicked off a storm of a discussion about different benchmarks and evaluations, ability to score and check wether or not we're advancing, and the openness of these benchmarks. Including one Andrej Karpathy that chimed in that the only way to know is to read the r/LocalLlama comment section (e.g. vibes based eval) and check the ELO score on the LMSys chatbot arena, which pits 2 random LLMs behind the scenes and lets users choose the best answer/score.

LMsys also has a leaderboard, and that one only includes models they have explicitly added to their Arena, and also merges 3 different scores, the ELO score by human raters, the MTBench score and the MMLU.

This is the latest benchmark, showing that Mixtral is the highest ranking open source model at this point, and that a few other Apache 2.0 models like OpenChat (the previous version, the one from today should score even higher) and OpenHermes are inching closer as well and have honorable mentions given their license and size!

However, given the latest releases in HuggingFace lineage, where you could track the model finetunes to what models they were fine-tuned on, it's still a good place to check out those leaderboards, just... self evaluation and running models on your own tasks is always a good idea! Also a good idea is additional benchmarks, like the one proposed by KAIST this week called FLASK that shows quite a significant distance between closed source models and open source ones based on several skills.

This weeks Buzz (What I learned this week in Weights & Biases)

This week we kicked off a buildweek internally, which unfortunately I wasn’t able to be a super active participant in, due to lying on my couch with a fever for most of the week, but regardless, I noticed how important is it to have these build weeks/hack weeks from time to time to actually use some of the new techniques we often talk about, like chain-of-density prompting techniques, or agent fine-tunes. I also got paired with my colleague Anish on our project, and while we work on our project (to be revealed later) he gave a kick ass webinar on the famous deeplearning.ai platform on the topic of enhancing performance for LLM agents in automation that more than 5K folks tuned into! Anish is a wealth of knowledge, so check it out if this topic interests you 👏

Big CO LLMs + APIs

Apple - LLM in a Flash + MLX stuff

Apple has been more and more in the AI news lately, having recently released MLX framework for running models directly on apple silicon devices, without a lot of dependencies, which was always possible, but is not optimized. This got many folks to start converting models to an MLX compatible format and there's no even a new tag on HF for those converted models

But the main news this week don't stop there, folks from Apple also released the LLM in a flash paper, which shows advances in running LLMs in hardware restricted environments like smartphones, where memory is limited, and shows interesting promise, and also a glimpse that Apple is likely moving towards on device or partially on device inference at some point if we combine the MLX stuff and this paper attempts.

Anthropic moves towards messages API

Anthropic Claude finally gives us some DX and introduces a similar to OpenAI messages API.

Voice

Microsoft copilot now has plugins and can create songs!

Microsoft copilot (FKA Bing Chat) now has Plugins (probably not new from this week, but we haven't yet reported on this) and one of the coolest ones is SUNO, which is an audio generation platform that has been around. And now it's super easy to create whole songs, directly from the Microsoft Copilot interface!

Here’s my 1 shot attempt and creating a holiday jingle for ThursdAI, it’s not good, but it’s fun 😂

And I’ve seen some quite decent examples like return to monkey

AI Art & Diffusion

Midjourney v6 looks stunning and follows prompts very well

Midjourney finally dropped their version 6, and it looks, really really good. Notably, it's likely the highest quality / fidelity diffusion model out there that we can use, has better support for text, and follows prompts closely. DALL-E is still very impressive for folks given that the iteration via chatGPT interface is very easy and convinient, but still ,just look at some of these MJv6 generations 😻

Nick gave it a very details prompt with 8 specific color assingments and besides the image looking insane, MJ nailed the super complex prompt!

35mm film still, two-shot of a 50 year old black man with a grey beard wearing a brown jacket and red scarf standing next to a 20 year old white woman wearing a navy blue and cream houndstooth coat and black knit beanie. They are walking down the middle of the street at midnight, illuminated by the soft orange glow of the street lights --ar 7:5 --style raw --v 6.0

And just for fun, here’s a comparison of all previous versions of MJ for the same prompt, just to… feel the progress 🔥

Thanks for reading all the way through, I think I got more than I bargained for during NeurIPS and I came back with a fever and was debating wether to even record/send this weeks newletter, but now that I’m at the end of it I’m happy that I did! Though, if you listen to the full recording, you may hear me struggling to breathe a bit 😅

So I’ll go rest up before the holidays, wishing you merry Christmas if you celebrate it 🎄 See you next week 🫡



This is a public episode. If you’d like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe
Fri, 22 Dec 2023 00:45:59 GMT
📅 ThursdAI - Live @ NeurIPS, Mixtral, GeminiPro, Phi2.0, StripedHyena, Upstage 10B SoTA & more AI news from last (insane) week

Wow what a week. I think I’ve reached to a level that I’m not phased by incredible weeks or days that happen in AI, but I… guess I still have much to learn!

TL;DR of everything we covered (aka Show Notes)

* Open Source LLMs

* Mixtral MoE - 8X7B experts dropped with a magnet link again (Announcement, HF, Try it)

* Mistral 0.2 instruct (Announcement, HF)

* Upstage Solar 10B - Tops the HF leaderboards (Announcement)

* Together -Striped Hyena architecture and new models (Announcement)

* EAGLE - a new decoding method for LLMs (Announcement, Github)

* Deci.ai - new SOTA 7B model

* Phi 2.0 weights are available finally from Microsoft (HF)

* QuiP - LLM quantization & Compression (link)

* Big CO LLMs + APIs

* Gemini Pro access over API (Announcement, Thread)

* Uses character pricing not token

* Mistral releases API inference server - La Platforme (API docs)

* Together undercuts Mistral with serving Mixtral by 70% and announces OAI compatible API

* OpenAI is open sourcing again - Releasing Weak-2-strong generalization paper and github! (announcement)

* Vision

* Gemini Pro api has vision AND video capabilities (API docs)

* AI Art & Diffusion

* Stability announces Zero123 - Zero Shot image to 3d model (Thread)

* Imagen 2 from google (link)

* Tools & Other

* Optimus from Tesla is coming, and it looks incredible

This week started on Friday, as we saw one of the crazier single days in the history of OSS AI that I can remember, and I’ve been doing this now for .. jesus, 9 months!

In a single say, we saw a new Mistral model release called Mixtral, which is a Mixture of Experts (like GPT4 is rumored to be) of 8x7B Mistrals, and beats GPT3.5, we saw a completely new architecture that competes with Transformers called HYENA from Tri Dao and Together.xyz + 2 new models trained with that architecture, we saw a new SoTA 2-bit quantization method called QuiP from cornell AND a new 3x faster decoding method for showing tokens to users after an LLM has done “thinking”.

And the best thing? All those advancements are stackable! What a day!

Then I went to NeurIPS2023 (which is where I am right now, writing these words!), which I cover at length at the second part of the podcast, but figured I’d write about it here as well, since it was such a crazy experience.

NeurIPS is the biggest AIML conference, I think they estimated 15K people from all over the world attending! Of course this brings many companies to sponsor, raise booths, give out swag and try to record!

Of course with my new position at Weights & Biases I had to come as well and experience this for myself!

Many of the attendees are customers of ours, and I was not expecting this amount of love given, just an incredible stream of people coming up to the booth, and saying how much they love the product!

So I manned the booth, did interviews and live streams, and connected with a LOT of folks and I gotta say, this whole NeurIPS thing is quite incredible from the ability to meet people!

I hung out with folks from Google, Meta, Microsoft, Apple, Weighs & Biases, Stability, Mistral, HuggingFace and PHD students and candidates from most of the top universities in the world, from KAIST to MIT and Stanford, Oslo and Shaghai, it's really a worldwide endeavor!

I also got to meet many of the leading figures in AI, all of whom I had to come up to and say hi, shake their hand, introduce myself (and ThursdAI) and chat about what they or their team released and presents at the conference! Truly an unforgettable experience!

Of course, This Weeks’ Buzz is that, everyone here loves W&B, from the PHD students, to literally every big LLM lab! They all came up to us (yes yes, even researches at Google who kinda low-key hate their internal tooling) and told us how awesome the experience was! (besides Xai folks, Jimmy wasn’t that impressed haha) and of course I got to practice the pitch so many times, since I manned the W&B booth!

Please do listen to the above podcast, there’s so much detail that’s in there that doesn’t get up on the newsletter, as it’s impossible to cover all, but it was a really fun conversation, including my excited depiction of this weeks NOLA escapades!

I think I’ll end here, cause I can go on and on about the parties (There were literally 7 at the same time last night, Google, Stability, OpenAI, Runway, and I’m sure there were a few more I wasn’t invited to!) and about New Orleans food (it’s my first time here, I ate a soft shell deep fried crab and turtle soup!) and I still have the poster sessions to go to and workshops! I will report more on my X account and the Weights & Biases X account, so stay tuned for that there, and as always, thanks for tuning in, reading and sharing ThursdAI with your friends 🫡

P.S - Still can’t really believe I get to do this full time now and share this journey with all of you, bringing you all with me to SF, and now NeurIPS and tons of other places and events in the future!

— Alex Volkov, AI Evangelist @ Weights & Biases, Host of ThursdAI 🫡



This is a public episode. If you’d like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe
Thu, 14 Dec 2023 23:24:53 GMT
🌉 ThursdAI Dec 7th - Gemini is out-ish, Grok is out, OSS AI Event in SF, Waymo rides, and more AI news from the past week 👏

ThursdAI December 7th TL;DR

Greetins of the day everyone (as our panelist Akshay likes to sometimes say) and Happy first candle of Hannukah for those who celebrate! 🕎

I'm writing this newsletter from the back of an Waymo self driving car, in SF, as I'm here for just a few nights (again) to participate in the Open Source AI meetup, that was co-organized by Ollama and Nous Research, Alignment Labs and hosted by A16Z in their SF office.

This event was the highlight of this trip, it was quite a packed meetup in terms of AI talent, and I got to meet quite a few ThursdAI listeners, mutuals on X, and AI celebs

We also recorded the podcast this week from the arena, thanks to Swyx and Alessio from latentspace pod for hosting ThursdAI this week form their newly built out pod studio (and apologies everyone for the rocky start and the cutting out issues, luckily we had local recordings so the pod version sounds good!)

Google finally teases Gemini Ultra (and gives us Pro)

What a week folks, what a week, as I was boarding the flight to SF to meet with Open Source folks, Google announced (finally!) the release of Gemini, their long rumored, highly performant model with a LOT of fanfare!

Blogposts authored by Sundar and Demis Hassabis, beautiful demos of unseen before capabilities, comparisons to GPT-4V which the Ultra version of Gemini outperforms on several benchmarks, and rumors that Sergey Brin, the guy who's net worth is north of 100Bn is listed as the core contributor on the paper and reports on benchmarks (somewhat skewed) show Ultra beaing GPT-4 on many coding and reasoning evaluations!

We've been waiting for Gemini for such a long time, that we spend the first hour of the podcast discussing it and it's implications basically. We were also fairly disillusioned by the sleight of hand tricks Google marketing department played with the initial launch video, where it purportedly shows Gemini being a fully multi-modal AI, that reacts to a camera feed + user voice in real time, when in fact, it was quickly clear (from their developer blog) that it was not video+audio but rather images+text (the same two modalities we already have in GPT-4V and given some prompting, it's quite easy to replicate most of it. We've also discussed how we again, got a tease, and not even a waitlist for the "super cool" stuff, while getting a GPT3.5 level of a model today in Bard upgrade.

To me, the most mind-blowing demo video was actually one of the other ones in the announcement, which showed that Gemini has agentic behavior in understanding user intent, asks for clarifications, creates a PRD (Product Requirement Document) for itself, and then, generates Flutter code to create a UI on the fly, based on what the use asked it! This is pretty wild, as we all should expect that Just In Time UI will come to many of these big models!

Tune in to the episode if you want to hear more takes, opinions and frustrations as none of us actually got to use Gemini Ultra, and the experience with Gemini Pro (which is now live on Bard) was at least for me, underwhelming

This weeks buzz (What I learned in Weights & Biases this week)

I actually had a blast talking about W&B to many of the open source and fine-tuners community this and past week. I already learned that W&B doesn't only help huge companies (like OpenAI, Anthropic, Meta, Mistral and tons more) to train their foundational models, but is widely used by the open source fine-tuners community as well. I've met with folks like Wing Lian (aka Caseus), maintainer of Axolotl, who uses W&B together with Axolotl, and got to geek out about W&B, met with Teknium and LDJ (Nous Research, Alignment Labs) and in fact, got LDJ to walk me through some of the ways he uses and used W&B in the past, including how it's used to track model runs, show artifacts in the middle of runs, and run mini-benchmarks and evaluations for LLMS as they finetune.

If you're interested in this, here's an episode of a new “series” of me learning publicly (from scratch) so if you want to learn from scratch with me, welcome to check it out:

Open Source AI in SF meetup

This meetup was the reason I flew in to SF, I was invited by dear friends in the open source community, and couldn't miss it! There was such a talent density there, it was quite remarkable. Andrej Karpathy who's video about LLM I just finished re-watching, Jeremy Howard, folks from Mistral, A16Z, and tons of other startups, open source collectives, and enthusiasts, all came together to listen to a few lightning talks, but mostly to mingle and connect and share ideas.

Nous Research announced that they are a company (not anymore just a discord collective of rag tag open sourcers!) and that they are working on Forge, a product offering of theirs, that runs local AI, has a platform for agent behavior, and is very interesting to keep an eye for.

I've spent most of my time going around, hearing what folks are using (Hint: a LOT of axolotl), what they are finetuning (mostly Mistral) and what is the future (everyone's waiting for next Llama or next Mistral). Funnily enough, there was not a LOT of conversation about Gemini there at all, at least not among the folks that I talked to!

Overall this was really really fun, and of course, being in SF, at least for me, especially now as an AI Evangelist, feels like coming home! So expect more trip reports!

Here's a recap and a few more things that happened this week in AI:

* Open Source LLMs

* Apple released MLX - machine learning framework on apple silicon

* Mamba - transformers alternative architecture from Tri Dao

* Big CO LLMs + APIs

* Google Gemini beats GPT-4V on a BUNCH of metrics, shows cool fake multimodal demo

* Demo was embellished per the google developer blog

* Multimodal capabilities are real

* Dense model vs MOE

* Multimodel on the output as well

* For 5-shot, GPT-4 outperforms Gemini Ultra on MMLU

* AlphaCode 2 is here and Google claims it performs better than 85% competitive programmers in the world and it performs even better, collaborating with a competitive programmer.

* Long context prompting for Claude 2 shows 27% - 98% increase by using prompt techniques

* X.ai finally released grok to many premium+ X subscribers. (link)

* Vision

* OpenHermes Vision finally released - something there was not right there, back to drawing board

* Voice

* Apparently Gemini beats Whisper v3! As part of a unified model no less

* AI Art & Diffusion

* Meta - releases a standalong EMU AI art generator websites https://imagine.meta.com

* Tools

* Jetbrains finally releases their own AI native companion + subscription

That's it for me this week, this Waymo ride took extra long as it seems that in SF, during night rush hour, AI is at a disadvatage against human drivers. Maybe I'll take an Uber next time.

P.S - here’s Grok roasting ThursdAI

See you next week, and if you've scrolled all the way here for the emoji of the week, it's hidden in the middle of the article, send me that to let me know you read through 😉



This is a public episode. If you’d like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe
Fri, 08 Dec 2023 07:35:07 GMT
🎉 ThursdAI Nov 30 // ChatGPT 1 year celebration special episode // covering the past 1 year in LLM/OSS AI 🥳

🎶 Happy birthday to you, happy birthday to you, happy birthday chat GPT-eeeeeeee, happy birthday to you.

Hey everyone, welcome to this special edition of ThursdAI where you're probably gonna have two emails and two podcast episodes today and you can choose which one you want to but we actually recorded both of them live it just they went a little long.

ThursdAI - The only podcast that brings you yearly recaps since chatGPT was released (😂)

This one is the more celebratory one, today is one year from the release of chat GPT and we (and by we I mean I, Alex) decided to celebrate it by recapping not just the last week in AI but the last year (full timeline posted at the bottom of this newsletter)

Going month by month with a swoosh sound in the editing and covering the most important thing that happened in LLM and open source LLMs since chatGPT was released and imagination unlocked the capability for everyone!

We also covered Meta stepping in with Lama and then everything that happened since then in the multi modality and vector databases and agents and everything everything everything, it was a one hell of an hour and a half, we had almost 1K audience members! and so I recommend you listen to this one first and then the week updates later because there were some incredible releases this week as well! (as there are every week)

I think it's important to do like a Spotify wrapped type thing for AI, for something like a one year for chat GPT and I think we'll be doing this every year so hopefully in the year we'll see you here on November 30th covering the next year in AI.

And hopefully the next year in AI system will actually help me summarize all this because it's a lot of work but with that I will just leave you with the timeline and no notes and you should listen to everything because we talked about everything live!

I hope you enjoy this special birthday celebration! (OpenAI sure did, check out this incredibly cute little celebration video they just posted)

Here’s the full timeline with everything important that happened month by month that we’ve covered:

* December 2022 - ChatGPT becomes the fastest growing product in history

* GPT3.5 with 4K context window, instruction finetuning and conversational RLHF

* January

* Microsoft invests additional $10B into OpenAI (Jan 23, Blog)

* February

* LLaMa 1 - Biggest Open Source LLM (February 24 - Blog)

* No commercial license

* 30% MMLU

* No instruction fine-tuninig (RL;HF)

* ChatGPT unofficial APIs exist

* March (the month of LLM superpowers)

* ChatGPT API (March 1, announcement)

* Developers can now build chatGPT powered apps

* All clones so far were completion based and not conversation based

* LLama.cpp from ggerganov + Quantization (March 10, Blog)

* Stanford - Alpaca 7B - Finetune on self-instruct GPT3.5 dataset (March 13, Blog)

* GPT4 release + chatGPT upgrade (March 14 - GPT-4 demo)

* 67.0% HumanEval | 86.4% MMLU

* 8K (and 32K) context windows

* Anthropic announces Claude + Claude instant (March 14 - Blog)

* 56.0% HumanEval

* Folks previously form OAI leave to open Anthropic as research, then pivot from research into commercial

* LMSYS Vicuna 13B - Finetuned based on shareg.pt exports (March 30, Blog)

* April (Embedings & Agents)

* AutoGPT becomes the fastest github starred project + writes it's own code (April 1, Blog)

* Agents start to pop up like mushrooms after the rain

* LLaVa - Multimodality open source begins (April 18, Blog)

* CLIP + Vicuna smushed together to get LLMs eyes

* Bard improvements

* May (Context windows)

* Mosaic MPT-7B with 64K context, 1T parameters, commercial license (May 5, Blog)

* Anthropic updates Claude with 100K context window (May 11, Blog)

* LLongBoi summer begins (Context windows are being stretched)

* Nvidia shows Voyager agents that play Minecraft + Memory stored in Vector DB (May 27, Blog)

* June

* GPT-3.5-turbo + functions API (June 6, Blog)

* GPT3.5 and 4 got a boost in capabilities and steer-ability

* Price reduction on models + 75% reduction on ada embeddings model

* LLaMa context window extended to 8K with RoPE scaling

* AI Engineers self determination essay by

* July

* Code Interpreter GA - ChatGPT can code (July 11, Blog)

* Anthropic Claude 2 - (July 11 - Blog)

* 200K context window

* 71% HumanEval

* LLaMa 2 (July 18 - Blog)

* Base & Chat models (RLHF)

* Commercial license

* 29.9% Human Eval | 68.9% MMLU

* August

* Meta releases Code-LlaMa, code finetune models

* September

* DALL-E 3 - Adds multi-modality on output and chat to image gen (Sep 20, Blog)

* Mistral 7B top performing open source LLM via torrent link (Sep 27, Blog)

* GPT4-V (vision & voice) - Adds multimodality on input (Sep 27, Blog)

* October

* OpenHermes - Mistral 7B finetune that tops the charts from Teknium / Nous Research (Oct 16, Announcement)

* Inflection PI gets connected to the web + supportPi mode (Oct 16, Blog)

* Adept releases multimodal FuYu 8B (Oct 19, blog)

* November

* Grok from Xai - with realtime access to all of X content

* OpenAI dev day

* Combined mode for MMIO (multi modal on input and output)

* GPT-4 Turbo with 128K context, 3x cheaper than GPT-4

* Assistants API with retrieval capabilities

* Share-able GPTs - custom versions of GPT with retrieval, DALL-E, Code Interpreter and vision

* Chatbots with real business use-cases, for example WandBot (that we just launched today! Blog)

* Has vector storage memory

* Available via Discord/Slack

* And custom GPT!

* Microsoft has copilot everywhere in office

Aaaand now we’re here!

What an incredible year, can’t imagine what the next year holds for all of us, but 1 thing is for sure, ThursdAI will be here to keep you all up to date!

P.S - If you scrolled all the way to here, DM me the 🎊 emoji so I know you celebrated with us! It really helps me to know that there is at least a few folks out of the thousands that get this newsletter that scrolls all the way through!



This is a public episode. If you’d like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe
Thu, 30 Nov 2023 22:33:53 GMT
🦃 ThursdAI Thanksgiving special - OpenAI ctrl+altman+delete, Stable Video, Claude 2.1 (200K), the (continuous) rise of OSS LLMs & more AI news

ThursdAI TL;DR - November 23

TL;DR of all topics covered:

* OpenAI Drama

* Sam... there and back again.

* Open Source LLMs

* Intel finetuned Mistral and is on top of leaderboards with neural-chat-7B (Thread, HF, Github)

* And trained on new Habana hardware!

* Yi-34B Chat - 4-bit and 8-bit chat finetune for Yi-34 (Card, Demo)

* Microsoft released Orca 2 - it's underwhelming (Thread from Eric, HF, Blog)

* System2Attention - Uses LLM reasons to figure out what to attend to (Thread, Paper)

* Lookahead decoding to speed up LLM inference by 2x (Lmsys blog, Github)

* Big CO LLMs + APIs

* Anthropic Claude 2.1 - 200K context, 2x less hallucinations, tool use finetune (Announcement, Blog, Ctx length analysis)

* InflectionAI releases Inflection 2 (Announcement, Blog)

* Bard can summarize youtube videos now

* Vision

* Video-LLaVa - open source video understanding (Github, demo)

* Voice

* OpenAI added voice for free accounts (Announcement)

* 11Labs released speech to speech including intonations (Announcement, Demo)

* Whisper.cpp - with OpenAI like drop in replacement API server (Announcement)

* AI Art & Diffusion

* Stable Video Diffusion - Stability releases text2video and img2video (Announcement, Try it)

* Zip-Lora - combine diffusion LORAs together - Nataniel Ruiz (Annoucement, Blog)

* Some folks are getting NERFs out from SVD (Stable Video Diffusion) (link)

* LCM everywhere - In Krea, In Tl;Draw, in Fal, on Hugging Face

* Tools

* Screenshot-to-html (Thread, Github)

Ctrl+Altman+Delete weekend

If you're subscribed to ThursdAI, then you most likely either know the full story of the crazy OpenAI weekend. Here's my super super quick summary (and if you want a full blow-by-blow coverage, Ben Tossel as a great one here)

Sam got fired, Greg quit, Mira flipped then Ilya Flipped. Satya played some chess, there was an interim CEO for 54 hours, all employees sent hearts then signed a letter, neither of the 3 co-fouders are on the board anymore, Ilya's still there, company is aligned AF going into 24 and Satya is somehow a winner in all this.

The biggest winner to me is open source folks, who got tons of interest suddenly, and specifically, everyone seems to converge on the OpenHermes 2.5 Mistral from Teknium (Nous Research) as the best model around!

However, I want to shoutout the incredible cohesion that came out of the folks in OpenAI, I created a list of around 120 employees on X and all of them were basically aligned the whole weekend, from ❤️ sending to signing the letter, to showing how happy they are Sam and Greg are back!

Yay

This Week's Buzz from WandB (aka what I learned this week)

As I’m still onboarding, the main things I’ve learned this week, is how transparent Weights & Biases is internally. During the whole OAI saga, Lukas the co-founder sent a long message in Slack, addressing the situation (after all, OpenAI is a big customer for W&B, GPT-4 was trained on W&B end to end) and answering questions about how this situation can affect us and the business.

Additionally, another co-founder, Shawn Lewis shared a recording of his update to the BOD of WandB, about out progress on the product side. It’s really really refreshing to see this information voluntarily shared with the company 👏

The first core value of W&B is Honesty, and it includes transparency outside of matters like personal HR stuff, and after hearing about this during onboarding, it’s great to see that the company lives it in practice 👏

I also learned that almost every loss curve image that you see on X, is a W&B dashboard screenshot ✨ and while we do have a share functionality, it’s not built for viral X sharing haha so in the spirit of transparency, here’s a video I recorded and shared with product + feature request to make these screenshot way more attractive + clear that it’s W&B

Open Source LLMs

Intel passes Hermes on SOTA with a DPO Mistral Finetune (Thread, Hugging Face, Github)

Yes, that intel, the... oldest computing company in the world, not only comes out strong with the best (on benchmarks) open source LLM, it also does DPO, and has been trained on a completely new hardware + Apache 2 license!

Here's Yam's TL;DR for the DPO (Direct Policy Optimization) technique:

Given a prompt and a pair of completions, train the model to prefer one over the other. This model was trained on prompts from SlimOrca's dataset where each has one GPT-4 completion and one LLaMA-13B completion. The model trained to prefer GPT-4 over LLaMA-13B.

Additionally, even tho there is custom hardware included here, Intel supports the HuggingFace trainer fully, and the whole repo is very clean and easy to understand, replicate and build things on top of (like LORA)

LMSys Lookahead decoding (Lmsys, Github)

This method significantly improves the output of LLMs, sometimes by more than 2x, using some jacobian notation (don't ask me) tricks. It's copmatible with HF transformers library! I hope this comes to open source tools like LLaMa.cpp soon!

Big CO LLMs + APIs

Anthropic Claude comes back with 2.1 featuring 200K context window, tool use

While folks on X thought this was new, Anthropic actually announced Claude with 200K back in the May, and just gave us 100K context window, which for the longest time was the largest context window around. I was always thinking, they don't have a reason to release 200K since none of their users actually wants it, and it's a marketing/sales decision to wait until OpenAI catches up. Remember, back then, GPT-4 was 8K and some lucky folks got 32K!

Well, OpenAI releases GPT-4-turbo with 128K so Anthropic re-trained and released Claude to gain an upper hand. I also love the tool use capabilities.

Re: longer context window, there were a bunch of folks testing if 200K context window is actually all that great, and it turns out, besides being very expensive to run (you pay per tokens) it also loses a bunch of information at lengths over 200K using needle in the haystack searches. Here's an analysis by Greg Kamradt that shows that:

* Starting at ~90K tokens, performance of recall at the bottom of the document started to get increasingly worse

* Less context = more accuracy - This is well know, but when possible reduce the amount of context you send to the models to increase its ability to recall.

I had similar issues back in May with their 100K tokens window (source)

Voice & Audio

ElevenLabs has speech-to-speech

Creating a significant jump in capabilities, ElevenLabs now allows you to be the actor behind the voice! With speech to speech, they would transfer the pauses, the intonation, the emotion, into the voice generation. Here's my live reaction and comparison:

* Notable: Whisper.CPP now supports a server compatible with OpenAI (Announcement, Github)

AI Art & Diffusion

Stable Video diffusion - text-2-video / img-2-video foundational model (Announcement, Hugging Face, Github, DEMO)

Stable has done it again, Stable Video allows you to create increidbly consistent videos with images or just text! They are short now, but working on extending the times, and they videos look incredible! (And thanks to friends at Fal, you can try right now, here)

And here’s a quick gif I created with DALL-E 3 and Fal to celebrate the Laundry Buddy team at OAI while the outage was happening)

Tools

Screenshot to HTML (Github)

I… what else is there to say? Someone used GPT4-Vision to … take screenshots and iteratively re-create the HTML for them. As someone who used to spend month on this exact task, I’m very very happy it’s now automated!

Happy Thanksgiving 🦃

I am really thankful to all of you who subscribe and come back every week, thank you! I would have been here without all your support, comments, feedback! Including this incredible art piece that Andrew from spacesdashboard created just in time for our live recording, just look at those little robots! 😍

See you next week (and of course the emoji of the week is 🦃, DM or reply!)



This is a public episode. If you’d like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe
Thu, 23 Nov 2023 22:21:02 GMT
📅 ThursdAI Nov 16 - Live AI art, MS copilots everywhere, EMUs from Meta, sketch-to-code from TLDraw, Capybara 34B and other AI news!

Hey yall, welcome to this special edition of ThursdAI. This is the first one that I'm sending in my new capacity as the AI Evangelist Weights & Biases (on the growth team)

I made the announcement last week, but this week is my first official week at W&B, and oh boy... how humbled and excited I was to receive all the inspiring and supporting feedback from the community, friends, colleagues and family 🙇‍♂️

I promise to continue my mission of delivering AI news, positivity and excitement, and to be that one place where we stay up to date so you don't have to.

This week we also had one of our biggest live recordings yet, with 900 folks tuned in so far 😮 and it was my pleasure to again to chat with folks who "made the news" so we had a brief interview with Steve Ruiz and Lou from TLDraw, about their incredible GPT-4 Vision enabled "make real" functionality and finally got to catch up with my good friend Idan Gazit who's heading the Github@Next team (the birthplace of Github Copilot) about how they see the future. So definitely definitely check out the full conversation!

TL;DR of all topics covered:

* Open Source LLMs

* Nous Capybara 34B on top of Yi-34B (with 200K context length!) (Eval, HF)

* Microsoft - Phi 2 will be open sourced (barely) (Announcement, Model)

* HF adds finetune chain genealogy (Announcement)

* Big CO LLMs + APIs

* Microsoft - Everything is CoPilot (Summary, copilot.microsoft.com)

* CoPilot for work and 365 (Blogpost)

* CoPilot studio - low code "tools" builder for CoPilot + GPTs access (Thread)

* OpenAI Assistants API cookbook (Link)

* Vision

* 🔥 TLdraw make real button - turn sketches into code in seconds with vision (Video, makereal.tldraw.com)

* Humane Pin - Orders are out, shipping early 2024, multimodal AI agent on your lapel (

* )

* Voice & Audio

* 🔥 DeepMind (Youtube) - Lyria high quality music generations you can HUM into (Announcement)

* EmotiVoice - 2000 different voices with emotional synthesis (Github)

* Whisper V3 is top of the charts again (Announcement, Leaderboard, Github)

* AI Art & Diffusion

* 🔥 Real-time LCM (latent consistency model) AI art is blowing up (Krea, Fal Demo)

* 🔥 Meta announces EMU-video and EMU-edit (Thread, Blog)

* Runway motion brush (Announcement)

* Agents

* Alex's Visual Weather GPT (Announcement, Demo)

* AutoGen, Microsoft agents framework is now supporting assistants API (Announcement)

* Tools

* Gobble Bot - scrape everything into 1 long file for GPT consumption (Announcement, Link)

* ReTool state of AI 2023 - https://retool.com/reports/state-of-ai-2023

* Notion Q&A AI - search through a company Notion and QA things (announcement)

* GPTs shortlinks + analytics from Steven Tey (

https://chatg.pt

* )

This Week's Buzz from WandB (aka what I learned this week)

Introducing a new section in the newsletter called "The Week's Buzz from WandB" (AKA What I Learned This Week).

As someone who joined Weights and Biases without prior knowledge of the product, I'll be learning a lot. I'll also share my knowledge here, so you can learn alongside me. Here's what I learned this week:

The most important things I learned this week is just how prevelant and how much of a leader Weights&Biases is. W&B main product is used by most of the foundation LLM trainers including OpenAI.

In fact GPT-4 was completely trained on W&B!

It's used by pretty much everyone besides Google. In addition to that it's not only about LLMs, W&B products are used to train models in many many different areas of the industry.

Some incredible examples are a pesticide dispenser that's part of the John Deere tractors that only spreads pesticides onto weeds and not actual produce. And Big Pharma who's using W&B to help create better drugs that are now in trial. And it's just incredible how much machine learning that's outside of just LLMs is there. But also I'm absolutely floored by just the amount of ubiquity that W&B has in the LLM World.

W&B has two main products, Models & Prompts, Prompts is a newer one, and we're going to dig into both of these more next week!

Additionally, it's striking how many AI Engineers, API users such as myself and many of my friends, have no idea of who W&B even is, of if they do, they never used it!

Well, that's what I'm here to change, so stay tuned!

Open source & LLMs

In the open source corner, we have the first Nous fine-tune of Yi-34B, which is a great model that we've covered in the last episode and now is fine-tuned with the Capybara dataset by ThursdAI cohost, LDJ! Not only is that a great model, it now tops the charts for the resident reviewer we WolframRavenwolf on /r/LocalLLama (and X)

Additionally, Open-Hermes 2.5 7B from Teknium is now second place on HuggingFace leaderboards, it was released recently but we haven't covered until now, I still think that Hermes is one of the more capable local models you can get!

Also in open source this week, guess who loves it? Satya (and Microsoft)

They love it so much that they not only created this awesome slide (altho, what's SLMs? Small Language Models? I don't like it), they also announced that LLaMa and Mistral are coming to Azure services as inference!

And they gave us a little treat, Phi2 is coming. They said OpenSource (but folks looking at the license saw that it's only for research capabilities) but supposedly it's a significantly more capable model while only being 2.7B weights (super super tiny)

Big Companies & APIs

Speaking of Microsoft, they announced so much during their Ignite event on Wednesday (15th) that it's impossible to cover all of it in this newsletter, but basically here are the main things that got me excited!

CoPilot everywhere, everything is CoPilot

Microsoft rebranded Bing Chat to Copilot and it now lives on copilot.microsoft.com

and it's basically a free GPT4, with vision and DALL-e capabilities. If you're not signed up for OpenAI's plus membership, this is as good as it gets for free!

They also announced CoPilot for 365, which means that everything from office (word, excel!) to your mail, and your teams conversations will have a CoPilot that will be able to draw from your organizations knowledge and help you do incredible things. Things like help book appointments, pull in relevant people for the meeting based on previous documents, summarize that meeting, schedule follow ups, and like a TON more stuff. Dall-e integration will help you create awesome powerpoint slides.

(p.s. all of this will be allegedly data protected and won't be shared with MS or be trained on)

They literally went and did "AI everywhere" with CoPilot and it's kinda incredible to see how big they are betting the farm on AI with Microsoft while Google... where's Google™?

CoPilot Studio

One of the more exciting things for me was, the CoPilot Studio announcement, a low-code tool to extend your company's CoPilot by your IT, for your organization. Think, getting HR data from your HR system, or your sales data from your SalesForce!

They will launch with 1100 connectors for many services but allow you to easily build your own.

One notable thing is, Custom GPTs will also be a connector! You will be literally able to connect yyour CoPilot with your (or someone's) GPTs! Are you getting this? AI Employees are coming faster than you think!

Vision

I've been waiting for cool vision demos since GPT-4V API was launched and oh boy did we get them! From friend of the pod Robert Lukoshko Auto screenshot analysis which will take screenshots periodically and will send you a report of all you did that day, to Charlie Holtz live webcam narration by David Attenborough (which is available on Github!)

But I think there's 1 vision demo that takes the cake this week, by our friends (Steve Ruiz) from TLDraw, which is a whiteboard canvas primitive. They have added a sketch-to-code button, that allows you to sketch something out and GPT-4 Vision will analyze this, and GPT-4 will write code, and you will get live code within seconds. It's so mind-blowing that I'm still collecting my jaw of the floor here. It also does coding, so if you ask it nicely to add JS interactivity, the result will be interactive 🤯

GPT4-V Is truly as revolutionary as I imagined it to be when Greg announced it on stage 🫡

P.S - Have you played with it? Do you have cool demos? DM me with 👁️‍🗨️ emoji and a cool vision demo to be included in the next ThursdAI

AI Art & Diffusion & 3D

In addition to the TLDraw demo, one mind-blowing demo after another is coming this week from the AI Art world, using the LCM (Latent Consistency Model) + a whiteboard. This is yet another see it to believe it type thing (or play with it)

(video from Linus)

Dear friends from Krea.ai were the first to implement this insanity, that allows you to see real time AI art generation almost as fast as you type your prompts, and then followed up by the wizards at Fal to get the generations down to several mili-seconds (shoutout Gorkem!), the real time drawing thing is truly truly mind-blowing. It's so mind-blowing that folks add their webcam feeds into this, and see almost real time generation on the fly of their webcam feeds.

Meta announcing new Emus (Video & Edit)

Meta doesn't want me to relax, and during the space, announced their text-to-video and textual-editing models.

Emu Video produces great videos from a prompt, and emu-edit is really interesting, it allows you to edit parts of images by typing, think "remove the tail from this cat" or "remove the hat from this person"

They have this to say, which... dayum.

In human evaluations, our generated videos are strongly preferred in quality compared to all prior work– 81% vs. Google’s Imagen Video, 90% vs. Nvidia’s PYOCO, and 96% vs. Meta’s Make-A-Video. Our model outperforms commercial solutions such as RunwayML’s Gen2 and Pika Labs

It's really compelling, can't wait to see if they open source this, video is coming ya'll!

Audio & Sound

Deepmind + Youtube announced Lyria (blogpost)

This new music model is pretty breathtaking, but we only got a glimpse, not even a waitlist for that one, however, check out the pre-recorded demoes, folks at deep mind have a model you can hum into, sing into, and it'll create a full blown track for you, with bass, drums, and singing!

Not only that, it will also license vocals from mucisians (al-la Grimes) and will split the revenue between you and them if you post it on Youtube!

Pretty cool Google, pretty cool!

Agents & Tools

Look, I gotta be honest, I'm not sure about this category, Agent and Tools, if to put them into one or not, but I guess GPTs are kinda tools, so I'm gonna combine them for this one.

GPTs (My Visual Weather, Simons Notes)

This week, the GPT that I created called Visual Weather GPT has blown up, with over 5,000 chats opened with it, and many many folks using this and texting me about this. super cool way to just like check all the capabilities of a GPT. If you remember, I thought of this idea a few weeks ago when we got a sneak preview to the "All tools" mode, but now I can share it with you all in the form of a GPT, that will browser the web for real time weather data, and create a unique art piece for that location and weather conditions!

It's really easy to make as well, and I do fully expect everyone to start making their own versions very soon, and I think we're inching towards the era of JIT (just in time) software, where you'll create software as you require it, and it'll be as easy as talking to a chatGPT!

Speaking of, friend of the pod Steven Tey from Vercel (who's dub.sh I use and love for thursdai.news links) has released a GPT link shortener, called chatg.pt and you can register and get your own cool short link like https://chatg.pt/artweather 👏 And it'll give you analytics as well!

Pro tip for weather GPT, you can ask for a specific season or style in parentesises and then those as greeting cards for your friends. Happy upcoming Thanks giving everyone!

Speaking of Thanks

giving, we're not taking a break, next ThursdAI, November 23, join us for a live discussion and podcast recoding! We'll have many thanks, cool AI stuff, and much more!



This is a public episode. If you’d like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe
Fri, 17 Nov 2023 00:55:11 GMT
📅 ThursdAI - OpenAI DevDay recap (also X.ai grōk, 01.ai 200K SOTA model, Humane AI pin) and a personal update from Alex 🎊

Hey everyone, this is Alex Volkov 👋

This week was an incredibly packed with news, started strong on Sunday with x.ai GrŌk announcement, Monday with all the releases during OpenAI Dev Day, then topped of with Github Universe Copilot announcements, and to top it all of, we postponed the live recording to see what hu.ma.ne has in store for us as AI devices go (Finally announced Pin with all the features)

In between we had a new AI Unicorn from HongKong called Yi from 01.ai which dropped a new SOTA 34B model with a whopping 200K context window and a commercial license by ex-Google China lead Kai Fu Lee.

Above all, this week was a monumental for me personally, ThursdAI has been a passion project for the longest time (240 days), and it led me to incredible places, like being invited to ai.engineer summit to do media, then getting invited to OpenAI Dev Day (to also do podcasting from there), interview and befriend folks from HuggingFace, Github, Adobe, Google, OpenAI and of course open source friends like Nous Research, Alignment Labs, and interview authors of papers, hackers of projects, and fine-tuners and of course all of you, who tune in from week to week 🙏 Thank you!

It's all been so humbling and fun, which makes me ever more excited to share the next chapter. Starting Monday I'm joining Weights & Biases as an AI Evangelist! 🎊

I couldn't be more excited to continue ThursdAI mission, of spreading knowledge about AI, connecting between the AI engineers and the fine-tuners, the Data Scientists and the GEN AI folks, the super advanced cutting edge stuff, and the folks who fear AI with the backing of such an incredible and important company in the AI space.

ThursdAI will continue as a X space, newsletter and podcast, as we'll gradually find a common voice, and continue bringing folks awareness of WandB incredible brand to newer developers, products and communities. Expect more on this very soon!

Ok now to the actual AI news 😅

TL;DR of all topics covered:

* OpenAI Dev Day

* GPT-4 Turbo with 128K context, 3x cheaper than GPT-4

* Assistant API - OpenAI's new Agent API, with retrieval memory, code interpreter, function calling, JSON mode

* GPTs - Shareable, configurable GPT agents with memory, code interpreter, DALL-E, Browsing, custom instructions and actions

* Privacy Shield - Open AI lawyers will protect you from copyright lawsuits

* Dev Day emergency pod with with Swyx, Allesio, Simon and Me! (Listen)

* OpenSource LLMs

* 01 launches YI-34B, a 200K context window model commercially licensed and it tops all HuggingFace leaderboards across all sizes (Announcement)

* Vision

* GPT-4 Vision API finally announced, rejoice, it's as incredible as we've imagined it to be

* Voice

* Open AI TTS models with 6 very-realistic, multilingual voices, no cloning tho

* AI Art & Diffusion

*

Thu, 09 Nov 2023 23:52:23 GMT
📅 ThursdAI Nov 02 - ChatGPT "All Tools", Bidens AI EO, many OSS SOTA models, text 2 3D, distil-whisper and more AI news 🔥

ThursdAI November 2nd

Hey everyone, welcome to yet another exciting ThursdAI. This week we have a special announcement, the co-host of and I will be hosting a shared X space live from Open AI Dev Day! Monday next week (and then will likely follow up with interviews, analysis and potentially a shared episode!)

Make sure you set a reminder on X (https://thursdai.news/next) , we’re going to open the live stream early, 8:30am on Monday, and we’ll live stream all throughout the keynote! It’ll be super fun!

Back to our regular schedule, we covered a LOT of stuff today, and again, were lucky enough to have BREAKING NEWS and the authors of said breaking news (VB from HuggingFace and Emozilla from Yarn-Mistral-128K) to join us and talk a little bit in depth about their updates!

[00:00:34] Recap of Previous Week's Topics

[00:00:50] Discussion on AI Embeddings

[00:01:49] Gradio Interface and its Applications

[00:02:56] Gradio UI Hosting and its Advantages

[00:04:50] Introduction of Baklava Model

[00:05:11] Zenova's Input on Distilled Whisper

[00:10:32] AI Regulation Week Discussion

[00:24:14] ChatGPT new All Tools mode (aka MMIO)

[00:35:45] Discussion on Multimodal Input and Output Models

[00:36:55] BREAKING NEWS: Mistral YaRN 7B - 128K context window

[00:37:02] Announcement of Mistral Yarn Release

[00:46:47] Exploring the Limitations of Current AI Models

[00:47:25] The Potential of Vicuna 16k and Memory Usage

[00:49:43] The Impact of Apple's New Silicon on AI Models

[00:51:23] Introduction to New Models from Nius Research

[00:51:39] The Future of Long Context Inference

[00:53:42] Exploring the Capabilities of Obsidian

[00:54:29] The Future of Multimodality in AI

[00:58:48] The Exciting Developments in CodeFusion

[01:06:49] The Release of the Red Pajama V2 Dataset

[01:12:07] The Introduction of Luma's Genie

[01:16:37] Discussion on 3D Models and Stable Diffusion

[01:17:08] Excitement about AI Art and Diffusion Models

[01:17:48] Regulation of AI and OpenAI Developments

[01:18:24] Guest Introduction: VB from Hug& Face

[01:18:53] VB's Presentation on Distilled Whisper

[01:21:54] Discussion on Distillation Concept

[01:27:35] Insanely Fast Whisper Framework

[01:32:32] Conclusion and Recap

Show notes and links:

* AI Regulation

* Biden Executive Order on AI was signed (Full EO, Deep dive)

* UK AI regulation forum (King AI speech, no really, Arthur from Mistral)

* Mozilla - Joint statement on AI and openness (Sign the letter)

* Open Source LLMs

* Together AI releases RedPajama 2, 25x larger dataset (30T tokens) (Blog, X, HF)

* Alignment Lab - OpenChat-3.5 a chatGPT beating open source model (HF)

* Emozilla + Nous Research - Yarn-Mistral-7b-128k (and 64K) longest context window (Announcement, HF)

* LDJ + Nous Research release Capybara 3B & 7B (Announcement, HF)

* LDJ - Obsidian 3B - the smallest open source multi modal model (HF, Quantized)

* Big CO LLMs + APIs

* ChatGPT "all tools" MMIO mode - Combines vision, browsing, ADA and DALL-E into 1 model (Thread, Examples, System prompt)

* Microsoft CodeFusion paper - a tiny (75M parameters) model beats a 20B GPT-3.5-turbo (Thread, ArXiv)

* Voice

* Hugging Face - Distill whisper - 2x smaller english only version of Whisper (X, paper, code)

* AI Art & Diffusion & 3D

* Luma - text-to-3D Genie bot (Announcement, Try it)

* Stable 3D & Sky changer

AI Regulation IS HERE

Look, to be very frank, I want to focus ThursdAI on all the news that we're getting from week to week, and to bring a positive outlook, so politics, doomerism, and regulation weren't on the roadmap, however, with weeks like these, it's really hard to ignore, so let's talk about this.

President Biden signed an Executive Order, citing the old, wartime era Defence Production act (looks like the US gov. also has "one weird trick" to make the gov move faster) and it wasn't as bombastic as people thought. X being X, there has been so many takes pre this executive order even releasing about regulatory capture being done by the big AI labs, about how open source is no longer going to be possible, and if you visit Mark Andressen feed you'll see he's only reposting AI generated memes to the tune of "don't tread on me" about GPU and compute rights.

However, at least on the face of it, this executive order was mild, and discussed many AI risks and focused on regulating models from huge compute runs (~28M H100 hours // $50M dollars worth). Here's the relevant section.

Many in the open source community reacted to the flops limitation with a response that it's very much a lobbyist based decision, and that the application should be regulated, not only the compute.

There's much more to say about the EO, if you want to dig deeper, I strongly recommend this piece from AI Snake oil :

and check out Yan Lecun's whole feed.

UK AI safety summit in Bletchley Park

Look, did I ever expect to add the King of England into an AI weekly recap newsletter? Surely, if he was AI Art generated or something, not the real king, addressing the topic of AI safety!

This video was played for the attendees of a few day AI safety summit in Blecheley park, where AI luminaries (Yan Lecun, Elon Musk, Arthur Mensch Mistral CEO, Naveen Rao) attended and talked about the risks and benefits of AI and regulation. I think Naveen Rao had a great recap here, but additionally, there were announcements about Safety Institute in the UK, and they outlined what actions the government can take.

In other regulation related news, Mozilla has a joint statement on AI safety and openness (link) that many signed, which makes the case for openness and open source as the way to AI safety. Kudos on mozilla, we stand by the letter 🤝

Big CO LLMs + APIs

OpenAI - ChatGPT "all tools" aka MMIO mode (that's now dubbed "confidential")

Just a week before the first Dev Day from OpenAI, we were hanging out in X spaces talking about what the regulation might bring, when a few folks noticed that their ChatGPT interface looks different, and saw a very specific popup message saying that ChatGPT can now talk with documents, and "use tools without switching", see and interact with DALL-E and Advanced Data Analysis (FKA Code Interpreter) all in one prompt.

While many X takes focused solely on just how many "chat with your PDF" startups OpenAI just "killed", and indeed, the "work with PDFs" functionality seemed new, chatGPT could now get uploads of files, had the ability to search, to go to a specific page, even do a basic summary on PDF files, I was interested in the second part!

Specifically because given GPT-4V is now basically enabled for everyone, this "combined" mode makes ChatGPT the first MMIO model that we have, which is a multi modal on input (Text, Voice, Images) and output (Text, Images). You see, most MultiModal Models so far have been only multimodal on the input, ie, take in text or images or a combination, and while playing around with the above, we noticed some incredible use-cases that are now available!

ChatGPT (for some lucky folks) can now do all these things in one prompt with shared context:

* Read and interact with PDFs

* See and understand images + text

* Browse & Search up to date info with Bing

* Write and execute code with ADA

* Generate images with DALL-E

All in the same prompt, one after another, and often for several steps and iterations.

One such example was, I asked for "get the current weather in Denver and generate an image based on the conditions" and we got this incredible, almost on the fly "weather" UI, showing the conditions (it was the first snow in CO this year), weather, humidity and everything. Now, DALL-E is ok with text but not great, but it's incredible with scenery, so having this "on the fly UI" that has real time info was super great to show off the capabilities of a general model.

We also saw prompts from folks who uploaded a picture of an obscure object, and asked DALL-E to "add" this object to an already generated image, so DALL-E now has eyes, and can understand and "draw" some of the objects and add them to other images, which was an amazing thing to see, and I can't wait to play around with this functionality.

We noticed a few more things, specifically that DALL-E images are now stored on the same disk that you get access to with ADA, so you can then ask ChatGPT to upscale, crop and do things with those images for example, and generate code with those images as a background!

There are so many new potential use-cases that have opened up, that we spent a long evening / night on X spaces trying to throw the kitchen sink onto this mode, in the fear that it was a fluke by OpenAI and they weren't meant to release this, and we were right! Today on ThursdAI live recording, some users reported that they no longer have access to it (and they miss it!) and some reported that it's now called something like "Confidential"

Someone also leaked the full prompt for this "all tools" mode and it's a doosy! The "All Tools" omni-prompt takes a whopping 2,756 tokens, but it's also using the GPT-4 32k model, with a 32,767 token context window. (link)

I guess we're going to see the announcement on Dev Day (did you set a reminder?)

This new mode that we saw and played with, added to the many many leaks and semi-confirmed modes that are coming out of Reddit make it seem like ChatGPT is going to have an all out Birthday party next week and is about to blow some people's minds! We're here for it! 👏

Open Source LLMs

CodeFusion - 75M parameters model based on Diffusion Model for Code Generation

Code-fusion was taken down from ArXiv, claimed 3.5 is 20B params (and then taken down saying that this was unsubstantiated) X link

The paper itself discusses the ability to use diffusion to generate code, and has much less data to get to a very good coding level, with a model small enough to fit on a chip's cache (not even memory) and be very very fast. Of course, this is only theoretical and we're going to wait for a while until we see if this replicates, especially since the PDF was taken down due to someone attributing the 20B parameters note to a forbes article.

The size of the model, and the performance score on some coding tasks make me very very excited about tiny models on edge/local future!

I find the parameter obsession folks have about OpenAI models incredible, because parameter size really doesn’t matter, it's a bad estimation anyway, OAI can train their models for years and keep them in the same parameter size and they would be vastly different models at the start and finish!

Together releases a massive 30T tokens dataset - RedPajama-Data-v2 (Announcement, HF)

This massive massive dataset is 25x the previous RedPajama, and is completely open, deduplicated and has enormous wealth of data to train models from. For folks who were talking the "there's no more tokens" book, this came as a surprise for sure! It's also multi-lingual, with tokens in English, French, Italian, German and Spanish in there. Kudos to Together compute for this massive massive open source effort 👏

Open source Finetunes Roundup

This week was another crazy one for open source fine-tuners, releasing SOTA after SOTA, many of them on ThursdAI itself 😅 Barely possible to keep up (and that's quite literally my job!)

Mistral 7B - 128K (and 64K) (HF)

The same folks who brought you the YaRN paper, Emozilla, Bowen Peng and Enrico Shippole (frequent friends of the pod, we had quite a few conversations with them in the past) have released the longest context Mistral fine-tune, able to take 64K and a whopping 128K tokens in it's context length, making one of the best open source model now compatible with book length prompts and very very long memory!

Capybara + Obsidian (HF, Quantized)

Friend of the pod (and weekly cohost) LDJ releases 2 Nous research models, Capybara (trained on StableLM 3B and Mistral 7B) and Obsidian, the first vision enabled multi modal 3B model that can run on an iPhone!

Capybara is a great dataset that he compiled and the Obsidian model uses the LLaVa architecture for input multimodality and even shows some understanding of humor in images!

Alignment Lab - OpenChat-3.5 a chatGPT beating open source model (Announcement, HF)

According to friends of the pod Alignment Lab (of OpenOrca fame) we get a Mistral finetune that beats! chatGPT on many code based evaluations (from march, we all think chatGPT became much better since then)

OpenChat is by nature a conversationally focused model optimized to provide a very high quality user experience in addition to performing extremely powerfully on reasoning benchmarks.

Open source is truly unmatched, and in the face of a government regulation week, open sources is coming out in full!

Voice

HuggingFace Distill Whisper - 6x performance of whisper with 1% WER rate (Announcement, HF)

Hugging face folks release a distillation of Whisper, a process (and a paper) with which they use a "teacher" model like the original Open AI whisper, to "teach" a smaller model, and in the process of distillation, transfer capabilities from one to another, while also making the models smaller!

This makes a significantly smaller model (2x smaller) with comparative (and even better) performance on some use-cases, while being 6x faster!

This distill-whisper is now included with latest transformers (and transformers.js) releases and you can start using this faster whisper today! 👏

That's it for today folks, it's been a busy busy week, and many more things were announced, make sure to join our space and if you have read all the way until here, DM me the 🧯 emoji as a reply or in a DM, it’s how I know who are the most engaged users are!

ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.



This is a public episode. If you’d like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe
Fri, 03 Nov 2023 03:05:54 GMT
📅 ThursdAI Oct-26, Jina Embeddings SOTA, Gradio-Lite, Copilot crossed 100M paid devs, and more AI news

ThursdAI October 26th

Timestamps and full transcript for your convinience

## [00:00:00] Intro and brief updates

## [00:02:00] Interview with Bo Weng, author of Jina Embeddings V2

## [00:33:40] Hugging Face open sourcing a fast Text Embeddings

## [00:36:52] Data Provenance Initiative at dataprovenance.org

## [00:39:27] LocalLLama effort to compare 39 open source LLMs +

## [00:53:13] Gradio Interview with Abubakar, Xenova, Yuichiro

## [00:56:13] Gradio effects on the open source LLM ecosystem

## [01:02:23] Gradio local URL via Gradio Proxy

## [01:07:10] Local inference on device with Gradio - Lite

## [01:14:02] Transformers.js integration with Gradio-lite

## [01:28:00] Recap and bye bye

Hey everyone, welcome to ThursdAI, this is Alex Volkov, I'm very happy to bring you another weekly installment of 📅 ThursdAI.

ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

TL;DR of all topics covered:

* Open Source LLMs

* JINA - jina-embeddings-v2 - First OSS embeddings models with 8K context (Announcement, HuggingFace)

* Simon Willison guide to Embeddings (Blogpost)

* Hugging Face - Text embeddings inference (X, Github)

* Data Provenance Initiative - public audit of 1800+ datasets (Announcement)

* Huge open source LLM comparison from r/LocalLLama (Thread)

* Big CO LLMs + APIs

* NVIDIA research new spin on Robot Learning (Announcement, Project)

* Microsoft / Github - Copilot crossed 100 million paying users (X)

* RememberAll open source (X)

* Voice

* Gladia announces multilingual near real time whisper transcriptions (X, Announcement)

* AI Art & Diffusion

* Segmind releases SSD-1B - 50% smaller and 60% faster version of SDXL (Blog, Hugging Face, Demo)

* Prompt techniques

* How to use seeds in DALL-E to add/remove objects from generations (by - Thread)

This week was a mild one in terms of updates, believe it or not, we didn't get a new State of the art open source large language model this week, however, we did get a new state of the art Embeddings model from JinaAI (supporting 8K sequence length).

We also had quite the quiet week from the big dogs, OpenAI is probably sitting on updates until Dev Day (which I'm going to cover for all of you, thanks to Logan for the invite), Google had some leaks about Gemini (we're waiting!) and another AI app builder thing, Apple is teasing new hardware (but nothing AI related) coming soon, and Microsoft / Github announced that CoPilot has 100 million paying users! (I tweeted this and Idan Gazit, Sr. Director GithubNext where Copilot was born, tweeted that "we're literally just getting started" and mentioned November 8th as... a date to watch, so mark your calendars for some craziness next two weeks)

Additionally, we covered the Data provenance initiative that helps sort and validate licenses for over 1800 public datasets, a massive effort led by Shayne Redford with assistance from many folks including friend of the pod Enrico Shippole, we also covered another massive evaluation effort by a user named WolframRavenwolf on the LocalLLama subreddit, that evaluated and compared 39 open source models and GPT4. Not surprisingly the best model right now is the one we covered last week, OpenHermes 7B from Teknium.

Two additional updates were covered, one of them is Gladia AI, released their version of whisper over web-sockets, and I covered it on X with a reaction video, it allows developers to stream speech to text, with very low latency and it's multi-lingual as well, so if you're building an agent that folks can talk to, definitely give this a try, and finally, we covered SegMind SSD-1B, a distilled version of SDXL, making it 50% smaller in size and 60% faster in generation speed (you can play with it here)

This week I was lucky to host 2 deep dive conversations, one with Bo Wang, from Jina AI, and we covered embeddings, vector latent spaces, dimensionality, and how they retrained BERT to allow for longer sequence length, it was a fascinating conversation, even if you don't understand what embeddings are, it's well worth a listen.

And in the second part, I had the pleasure to have Abubakar Abid, head of Gradio at Hugging Face, to talk about Gradio, it's effect on the open source community, and then joined by Yuichiro and Xenova to talk about the next iteration of Gradio, called Gradio-lite that runs completely within the browser, no server required.

A fascinating conversation, if you're a machine learning engineer, AI engineer, or just someone who is interested in this field, we covered a LOT of ground, including Emscripten, python in the browser, Gradio as a tool for ML, webGPU and much more.

I hope you enjoy this deep dive episode with 2 authors of the updates this week, and hope to see you in the next one.

P.S - if you've been participating in the emoji of the week, and have read all the way up to here, your emoji of the week is 🦾, please reply or DM me with it 👀

Timestamps and full transcript for your convinience

## [00:00:00] Intro and brief updates

## [00:02:00] Interview with Bo Weng, author of Jina Embeddings V2

## [00:33:40] Hugging Face open sourcing a fast Text Embeddings

## [00:36:52] Data Provenance Initiative at dataprovenance.org

## [00:39:27] LocalLLama effort to compare 39 open source LLMs +

## [00:53:13] Gradio Interview with Abubakar, Xenova, Yuichiro

## [00:56:13] Gradio effects on the open source LLM ecosystem

## [01:02:23] Gradio local URL via Gradio Proxy

## [01:07:10] Local inference on device with Gradio - Lite

## [01:14:02] Transformers.js integration with Gradio-lite

## [01:28:00] Recap and bye bye

Full Transcription:

[00:00:00] Alex Volkov: Hey, everyone. Welcome to Thursday. My name is Alex Volkov, and I'm very happy to bring you another weekly installment of Thursday. I. This week was actually a mild one in terms of updates, believe it or not. Or we didn't get the new state of the art opensource, large language model this week. However, we did get a new state of the art embeddings model. And we're going to talk about that. we got very lucky that one of the authors of this, a medics model, gold Gina embeddings V2, Bo Wang joined us on stage and gave us a masterclass in embeddings and share some very interesting things about this, including some stuff they haven't charged yet. So definitely worth a listen. Additionally recovered the data provenance initiative that helps sort and validate licenses for over 1800 public data sets. A massive effort led by Shane Redford with assistance from many folks, including a friend of the pod. Enrico Shippole.

[00:01:07] we also covered the massive effort by another user named Wolf from Ravenwolfe on the local Lama subreddit. Uh, that effort evaluated and compared to 39 open source models ranging from 7 billion parameters to 70 billion parameters and threw in the GPT4 comparison as well. Not surprisingly, the best model right now is the one we covered last week from friends of the politic new called open Hermes seven B.

[00:01:34] Do additional updates we've covered. One of them is Gladia AI, a company that offers transcription and translation APIs release their version of whisper over WebSockets. So live transcription, and I covered it on X with a reaction video. And I'll add that link in the show notes. It allows developers like you to stream speech, to text and. Very low latency and high quality and it's multi-lingual as well. So if you're building an agent that your users can talk to. Um, definitely give this a try. And finally Segmind segued mind accompany that just decided to open source a distilled version of. SDXL, making it 50% smaller in size and the in addition to that 60% faster in generation speed. The links to all these will be in the show notes.

[00:02:23] But this week I was lucky to host two deep dives, one with Bo Weng which I mentioned. Uh, we've covered the embeddings vector led in spaces that dimensionality and how they retrained Bert model to allow for a longer sequence length. It was a fascinating conversation. Even if you don't understand what embeddings are, it's well worth the listen. And, , I learned a lot. Now I hope you will, as well. And the second part, I had the pleasure to have a Brubaker a bit. The head of grandio at hugging face to talk about gradient. What is it? Uh, its effect on the open source community. And then joined by utero. And Sunnova to talk about the next iteration of Grigio called Grigio light that runs completely within the browser. No Serra required. We also covered a bit of what's coming to Gradio in the next release. on October 31st.

[00:03:15] A fascinating conversation. If you're a machine learning engineer, AI engineer, or just somebody who's interested in this skilled. You've probably used radio, even if you haven't written any Gradio apps, every model and hugging face usually gets a great deal demo.

[00:03:30] And we've covered a lot of ground, including M scripting. Then by filling the browser. As a tool for machine learning, web GPU, and so much more.

[00:03:38] Again, fascinating conversation. I hope you enjoy this deep dive episode. Um, humbled by the fact that sometimes the people. Who produced the updates we cover actually come to Thursday and talk to me about the things they released. And I hope this trend continues, and I hope you enjoyed this deep dive over an episode. And, um, I'll see you in the next one. And now I give you thursday october 26. oh, awesome. It looks like Bo, you joined us. Let's see if you're connecting to the audience, and can you unmute yourself, can you see if we can hear you?

[00:04:22] Bo Wang: Hi, can you hear me? Oh, we can hear you fine, awesome. this, this, this feature of, of Twitter.

[00:04:30] Alex Volkov: That's awesome. This, this usually happens, folks join and it's their first face and then they can't leave us. And so let me just do a little, maybe... Maybe, actually, maybe you can do it, right? Let me just present yourself.

[00:04:42] I think I followed you a while ago, because I've been mentioning embeddings and the MTB dashboard and Hug and Face for a while. And, obviously, embeddings are not a new concept, right? We started with Word2Vec ten years ago, but now, with the rise of LLMs, And now with the rise of AI tools and many people wanting to understand the similarity between the user query and an actual thing they, they, they stored in some database, embeddings have seen a huge boon.

[00:05:10] And also we've saw like all the vector databases pop up like mushrooms after the rain. I think Spotify just released a new one. And my tweet was like, Hey, do we really need another vector database? But Boaz, I think I started following you because you mentioned that you were working on something that's.

[00:05:25] It's coming very soon, and finally this week this was released. So actually, thank you for joining us, Beau, and thank you for doing the first ever Twitter space for yourself. How about can we start with your introduction of who you are and how are you involved with this effort, and then we can talk about Jina.

[00:05:41] Bo Wang: Yes, sure. Basically I have a very different background. I guess I was oriJinally from China, but my bachelor was more related to text retrieval. I have a retrieval experience rather than pure machine learning background, I would say. Then I came to the Europe. I came to the Netherlands like seven or eight years ago as a, as an international student.

[00:06:04] And I was really, really lucky and met my supervisor there. She basically guided me into the, in the world of the multimedia information retrieval, multimodal information retrieval, this kind of thing. And that was around 2015 or 2016. So I also picked up machine learning there because when I was doing my bachelor, it's not really hot at that moment.

[00:06:27] It's like 2013, 2014. Then machine learning becomes really good. And then I was really motivated, okay, how can I apply machine learning to, to search? That is, that is my biggest motivation. So when I was doing my master, I, I collaborated with my friends in, in, in the US, in China, in Europe. We started with a project called Match Zoo.

[00:06:51] And at that time, the embedding on search is just a nothing. We basically built a open source. Software and became at that time the standard of neural retrieval or neural search, this kind of thing. Then when the bird got released, then our project basically got queue because. Everyone's focus basically shifted to BERT, but it's quite interesting.

[00:07:16] Then I graduated and started to work as a machine learning engineer for three years in Amsterdam. Then I moved to Berlin and joined Jina AI three years ago as a machine learning engineer. Then basically always doing neural search, vector search, how to use machine learning to improve search. That is my biggest motivation.

[00:07:37] That's it.

[00:07:38] Alex Volkov: Awesome. Thank you. And thank you for sharing with us and, and coming up and Gene. ai is the company that you're now working and the embeddings thing that we're going to talk about is from Gene. ai. I will just mention the one thing that I missed in my introduction is the reason why embeddings are so hot right now.

[00:07:53] The reason why vectorDB is so hot right now is that pretty much everybody does RAG, Retrieval Augmented Generation. And obviously, For that, you have to store some information in embeddings, you have to do some retrieval, you have to figure out how to do chunking of your text, you have to figure out how to do the retrieval, like all these things.

[00:08:10] Many people understand that whether or not in context learning is this incredible thing for LLMs, and you can do a lot with it, you may not want to spend as much tokens on your allowance, right? Or you maybe not have enough in the context window in some in some other LLMs. So embeddings... Are a way for us to do one of the main ways to interact with these models right now, which is RAC.

[00:08:33] And I think we've covered open source embeddings compared to OpenAI's ADA002 embedding model a while ago, on ThursDAI. And I think It's been clear that models like GTE and BGE, I think those are the top ones, at least before you guys released, on the Hugging Face big embedding model kind of leaderboard, and thank you Hugging Face for doing this leaderboard.

[00:09:02] They are great for open source, but I think recently it was talked about they're lacking some context. And Bo, if you don't mind, please present what you guys open sourced this week, or released this week, I guess it's open source as well. Please talk through Jina Embeddings v2 and how it differs from everything else we've talked about.

[00:09:21] Bo Wang: Okay, good. Basically, it's not like embeddings for, how can I say, maybe two... point five years. But previously we are doing at a much smaller scale. Basically we built all the algorithm, all the platform, even like cloud fine tuning platform to helping people build better embeddings. So there is a not really open source, but a closed source project called fine tuner, which we built to helping user build better embeddings.

[00:09:53] But we didn't, we found it okay. Maybe we are maybe too early. because people are not even using embeddings. How could they find embeddings? So we decided to make a move. Basically, we basically scaled up our how can I say ambition. We decided to train, train our own embeddings. So six months ago, we started to train from scratch, but not really from scratch because in binding training, normally you have to train in two stages.

[00:10:23] The first stage, you need to pre train on massive scale of like text pairs. Your objective is to bring these text pairs as closer as possible, as possible, because these text pairs should be semantically related to each other. In the next stage, you need to fine tune with Carefully selected triplets, all this kind of thing.

[00:10:43] So we basically started from scratch, but by collecting data, I think it was like six months ago, we working with three to four engineers together, basically scouting every possible pairs from the internet. Then we basically created like one billion, 1. 2 billion sentence pairs from there. And we started to train our model based on the T5.

[00:11:07] Basically it's a very popular encoder decoder model. This is on the market. But if you look at the MTB leaderboard or all the models on the market, the reason why they only support 512 sequence lengths is constrained actually by the backbone itself. Okay, we figure out another reason after we release the V1 model.

[00:11:31] Basically, if you look at. And the leaderboard or massive text embedding leaderboard, that is the one Alex just mentioned. Sorry, it's really bad because everyone is trying to overfitting the leaderboard. That naturally happens because if you look at BGE, GTE, the scores will never that high if you don't add the training data into the, into the, That's really bad.

[00:12:00] And we decided to take a different approach. Okay. The biggest problem we want to solve first, improving the quality of the embeddings. The second thing we want to solve is. Enable user to making longer context lens. If we want to making user make user have longer context lens, so we have to rework the BERT model, because every basically the embedding model, the backbone was from BERT or T5.

[00:12:27] So we basically started from scratch. Why not we just borrow the latest research from large language model? Every large language model wants large context. Why not we just borrow the research ideas? into the musk language modeling modelings. So we basically borrowed some ideas, such as rotary position embeddings or alibi, maybe you did, and reworked BERT.

[00:12:49] We call it JinaBERT. So basically now the JinaBERT can handle much longer sequence. So we trained BERT from scratch. Now BERT has been a byproduct of our embeddings. Then we use this JinaBERT to contrastively train the models on the semantic pairs and triplets that finally allow us to encode 8K content.

[00:13:15] Alex Volkov: Wow, that's impressive. Just, just to react to what you're saying, because BERT is pretty much every, everyone uses BERT or at least use BERT, right? At least in the MTB leaderboard. I've also noticed many other examples that use BERT or distilled BERT and stuff like this. You're saying, what you're saying, if I'm understanding correctly, is this was the limitation for sequence length?

[00:13:36] for other embedding models in the open source, right? And the OpenAI one that's not open source, that does have 8, 000 sequence length. Basically, sequence length, if I'm explaining correctly, is just how much text you can embed without chunking.

[00:13:51] Yes. And you're basically saying that you, you guys saw this limitation and then retrained BERT to use rotary embeddings. We've talked about rotary embeddings multiple times here. We had folks behind the yarn paper for extending context windows. Alibi is we follow Ophir Press.

[00:14:08] I don't think Ophir ever joined ThursdAI, but Ophir, if you hear this, you're welcome to join as well. So Alibi is another way to extend context windows and I think Mosaic folks used Alibha and some other folks as well. Bo, could you speak more about like borrowing the context from there and retraining BERT to JinaBERT and whether or not JinaBERT is also open source?

[00:14:28] Bo Wang: Oh, we actually want to make JinaBERT open source, but I need to align with my colleagues. That's, that's, that's really, that's a decision to be made. And the, the idea is quite naive. If you didn't know, I don't want to dive into too much about technical details, but basically the idea of Alibi basically removed the position embeddings from the large language model pre training.

[00:14:55] And the Alibi technique allow us to train on the shorter sequences. But inference at every very long sequence. So in the end, I think if I, my remember is correct, the author of alibi paper, basically trained model on 512 sequence lens and 1,024 sequence lens, but he's able to inference on 16 K. 16 K, like sequence lens.

[00:15:23] If you further expand it, you are not capable because that's the limitation of hardware, that's the limitation of GPE. So he, he actually tested 16 K like a sequence lens. So what we did is just. Borrowed this idea from the autoregressive models into the mask language models. And integrate Alibi, remove the position embeddings from the bird, and add this Alibi slope and all the Alibi stuff back into the bird.

[00:15:49] And just borrowed the things how we train bird or something Roberta, something from Roberta, and retrained the bird. I never imagined bird could be a by product of our embedding model, but this... This happened. We could open source it. Maybe I have to discuss with my colleague.

[00:16:09] Alex Volkov: Okay. So when you talk to your colleagues, tell them that first of all, you already said that you may do this on ThursdAI Stage.

[00:16:15] So your colleagues are welcome also to join. And when you open source this, you guys are welcome to come here and tell us about this. We love the open source. The more you guys do, the better. And the more it happens on ThursdAI Stage, the better, of course, as well. Bo, you guys released the Jina Embedding Version 2, correct?

[00:16:33] Gene Embedding Version 2 has a sequence length of 8k tokens. So that actually allows to, if, just for folks in the audience, 8, 000 tokens is, I want to say, maybe like 6, 000 words in English around, right? And different languages as well. Could you talk about multilinguality as well? Is it multilingual, is it only English?

[00:16:53] How that how that appears within the embedding model?

[00:16:57] Bo Wang: Okay, actually, our Jina Embedding V2 is only English, so it's a monolingual embedding model. If you look at the MTV benchmark or all the public multilingual models, they are multilingual. But to be frankly, I don't think this is a fair solution for that.

[00:17:18] I think at least every major language.

[00:17:24] We decided to choose another hard way. We will not train a multilingual model, but we will train a bilingual model. Our first target will be German and Spanish. What we are doing at Jina AI is we basically Fix our English embedding model as it is just keep it at is, but we are continuously adding the German data, adding the Spanish data into the embedding model.

[00:17:51] And our embedding model cares two things. We make it bilingual. So it's either German, English or German English, Spanish, Spanish, English, German, English, or Japanese, English, whatever. And what we are doing is we want to build this embedding model to make it monolingual. So imagine you are, you have a German English embedding model.

[00:18:12] So if you search for German, you'll get German results. If you use English, you'll get English results. But we also care about the cross linguality of this bilingual model. So imagine you, you, you encode two, two sentences. One is in German, one is in English, which they are With the same meaning, we also want these vectors to be mapped into the similar semantic space.

[00:18:36] Because I, I'm a foreigner myself, sometimes, imagine I, I, I buy some stuff in the supermarket. Sometimes I have to translate, use Google Translate, for example, milk into Milch in German, then, then, then put it into the search box. I really want this bilingual model happen. And I believe every, at least, major language deserves such an embedding model.

[00:19:03] Alex Volkov: Absolutely. And thanks for clarifying this because one of the things that I often talk about here on Thursday Night is as a founder of Targum, which is inside videos, is just how much language barriers are preventing folks from conversing to each other. And definitely embeddings are... The way people extend memories parallel lines, right?

[00:19:21] So like a huge, a huge thing that you guys are working on and especially helpful. The sequence length is, and I think we have a question from the audience is what is the sequence lengths actually allow people to do? I guess Jina and I worked with some, some other folks in the embedding space. Could you talk about what is the longer sequence lengths now unlocking for people who want to use open source embeddings?

[00:19:41] Obviously. My answer here is, well, OpenAI's embeddings is the one that's most widely used, but that one you have to do online, and you have to send it to OpenAI, you have to have a credit card with them, blah, blah, blah, you have to be from supported countries. Could you talk about a little bit of what sequence length allows unlocks once you guys release something like this?

[00:20:02] Bo Wang: Okay, actually, we didn't think too much about applications. Most of the vector embeddings applications, you can imagine search and classification. You build another layer of, I don't know, classifier to classify items based on the representation. You can build some clustering. You can do some anomaly detection on the NLP text.

[00:20:22] This is something I can imagine. But the most important thing I I have to be frankly to you because we are, we are like writing a technical report as well. Something like a paper maybe we'll submit to academic conference. Longer embeddings doesn't really always work. That is because sometimes if the important message is in in the front of the document you want to embed, then it makes most of the sense just to encode let's say 256 tokens.

[00:20:53] or 512. But sometimes if you you have a document which the answer is at the middle or the end of the document, then you will never find it if the message is truncated. Another situation we find very interesting is for clustering tasks. Imagine you want to visualize your embeddings. Longer longer sequence length almost always helps and for clustering tasks.

[00:21:21] And to be frankly, I don't care too much about the application. I think people, we, what we're offering is the, how can I say, offering is, is like a key. We, we unlock this 512 sequence length. To educate and people can explore it. People, let's say I, I only need two K then, then people just set tokenize max lens to two k.

[00:21:44] Then, then embed. Based on their needs, I just don't want to be, people to be limited by the backbone, by the 500 to 12 sequence lengths. I think that's the most important thing.

[00:21:55] Alex Volkov: That's awesome. Thank you. Thank you for that. Thank you for your honesty as well. I love it. I appreciate it. The fact that, there's research and there's application and you not necessarily have to be limited with the application set in mind.

[00:22:07] We do research because you're just opening up doors. And I love, I love hearing that. Bo maybe last thing that I would love to talk to you about as the expert here on the topic of dimensions. Right. So dimensionality with embeddings I think is very important. Open the eye, I think is one of the highest ones.

[00:22:21] The kind of the, the thing that they give us is like 1200 mentioned as well. You guys, I

[00:22:26] think

[00:22:26] Jina is around 500. Or so is that correct? Could you talk a bit about that concept in broad strokes for people who may be not familiar? And then also talk about the why the state of the art OpenAI is so far ahead?

[00:22:39] And what will it take to get the open source embeddings also to catch up in dimensionality?

[00:22:46] Bo Wang: You mean the dimensionality of the vectors? Okay, basically we follow a very standard BERT size. The only thing we modified is actually the the alibi part and some training part.

[00:22:58] And our small model dimensionality is 512, and the base model is 768 and we have also a large model, haven't been released because of the training is too slow. We have so much data to change. Even the model size is small, but we have so much data and so large model dimensionality size is 1,024. And if my memory is correct, so are I embedding 0 0 2?

[00:23:23] Have but dimensionality of. 1, 5, 3, 6, something like that, which is a very strange dimensionality, I have to say, but I would say the dimensionality is, is, is the longer might be more Better or more expressive, but shorter, which means when you are doing the vector search, it's gonna be much more faster.

[00:23:48] So it's something you have to balance. So if you think the speed query speed, or the retrieval speed or whatever is more important to you. And if I, if I know correct, some of the Vector database, they make money by the dimensionality, let's say. They, they charge you by the dimensionality, so it's actually quite expensive if your dimensionality is too high.

[00:24:13] So it's a balance between expressionist and the, the, the, the speed and the, the, the, the cost you want to invest. So it's. It's very hard to determine, but I think 512, 768, and 1024 is very common as BERT.

[00:24:34] Alex Volkov: So great to hear that a bigger model is also coming, but it hasn't been released yet. So there's like the base model and the small model for embeddings, and we're waiting for the next one as well.

[00:24:46] I wanted to maybe ask you to maybe simplify for the audience, the concept of dimensionality. What does it mean between, what is the difference between embeddings that were stored with 512 and like 1235 or whatever OpenAI does? What does it mean for quality? So you mentioned the speed, right? It's easier to look up nearest neighbors, maybe within the 512 dimension space, what does it actually mean for quality of look up of different other ways that strings can compare? Could you maybe simplify the whole concept, if possible, for people who don't speak embeddings?

[00:25:19] Bo Wang: Okay maybe let me quickly start with the most basic version.

[00:25:24] If you imagine, if you type something in the search box right now, when doing, doing the matching, and it's actually also embedding, but it's something like if I make a simple version, it's a binary embedding. Imagine there 3, 000 words in English. Maybe there are much more, definitely. Imagine it's 3, 000 words in English, then the vector is 3, 000 dimensionality.

[00:25:48] Then what current solution of searching or matching do is just making... If the query has a token, if your document has a token, if your document has this token, then your occurrence will be one. If you query has the token, and this one will match your document token. But it's also about the, the frequency it appears, it's how, how rare it is.

[00:26:12] But the current solution is basically matching by the. By the English word, but with neural network, basically if you know about this, for example, ResNet know about a lot of different, for example, classification models, basically the output class of item, but if you chop up the classification layer, it will give you some a vector.

[00:26:36] Basically this vector is It's the representation of the information you want to encode. Basically it's a compressed version of the information in a certain dimensionality such as 512, 768, something like this. So it's a compressed list of non numerical numbers, which we normally call it dense vectors.

[00:26:57] because it's much more how can I say in English dense, right? Compared to the traditional way we store vectors, it's much more sparse. There is a lot of zero, there is a lot of one, because zero means not exist, one means exist. When one exists, then there is a match, then you've got the search result.

[00:27:16] So these dense vectors capture more about semantics, but if you match by the occurrence, then you might lose the semantics. But only matching by the occurrence of a token or a word.

[00:27:31] Alex Volkov: Thank you. More dimensions, basically, if I'm not saying it correctly, more dimensions just have more similarity vector. So like more things two strings or tokens can be similar on. And this basically means higher match rate. For more similarity things. And I think the basic stuff I think is covered in the Simon Wilson, the first pin tweet here, Simon Wilson did a basic, basic intro into what do dimensions embeddings mean and why they matter.

[00:28:00] And I specifically love the fact that there's arithmetic that can be done. I think somebody reads the paper even before this whole LLM thing, where if you take embeddings for Spain and embeddings for Germany, and then you take you, you can subtract like the embedding for Paris and then you get something closer to, to like Berlin, for example, right?

[00:28:19] So there's like concepts in, inside these things that are they're even arithmetic works and if you take like King and you subscribe male, then you get something closer to Queen and stuff like this. It's really, really interesting. And also Bo, you mentioned visualization as well. It's really impossible to visualize.

[00:28:36] 10, 24, et cetera, dimensions, right? Like we humans, we have perceived maybe three, maybe three and a half, four with time, whatever. And usually what happens is those multiple dimensions get down scaled to 3D in order to visualize in neighborhoods. And I think we've talked with folks from ARISE. They have a software called Phoenix that allows you to visualize embeddings for clustering and for semantics.

[00:29:02] Atlas does this as well, right? Nomic AI's Atlas does this as well. You can provide dimensions as well. And so you can provide embeddings and see clustering for concepts. And it's really pretty cool. If you haven't played with this, if you only did VectorDBs and you stored your stuff after you've done chunking, but you've never visualized how this looks, I strongly recommend you to do and I think well, thank you so much for joining us and explaining to us, the internals and sharing with us some exciting things about what's to come. Jina Burt is hopefully hopefully is coming, a, a retrained version of Burt, the, the, the, the, the... The grease of all how should I say, I can't, it's hard for me to define a verb, but I see it everywhere it's, it's the big base bone of a lot of NLP tasks, and it's great to see that you guys are about to first of all, retrain it for longer sequences, using tricks like Alibi and and I think you said Positional Embeddings, and hoping to see some open source action from this, but also that Jina Embedding's large model is coming as well with more dimensions waiting for that. Hopefully you guys didn't stop training that. And I just want to tell folks why I'm excited for this. And this kind of will take us to the next.

[00:30:08] Point as well is because, while I love OpenAI, I honestly do, I'm going to join their Dev Day, I'm going to report from their Dev Day and tell you all the interesting things that OpenAI does. We've been talking about we've been talking and we'll be talking today about local inference, about running models on edge, about running models of your own.

[00:30:28] Mistin is here, he even works on some bootable stuff that you can like completely off the grid run. And, so far, we've been focused on open source LLMs, for example, right? So we've had I see Pharrell in the audience from Skunks Works, and many other fine tuners, like Tignium, Alignment Labs, all these folks are working on local LLMs, and they never get to GPT 4 level yet.

[00:30:51] We're waiting for that, and they will. But the whole point of them is, you run them locally, they're uncensored, you can do whatever you want, you can fine tune them on whatever you want. However, the kind of the embeddings part Is the glue to connect it to an application and the reason is because there's only so much context window also context window is expensive and even if theoretically the yarn paper that we've talked with the authors of allows you to extend the context window to 128, 000 tokens The hardware requirements for that are incredible, right?

[00:31:22] Everybody in the world of AI engineers, they switch up to, to, to retrieval of data generation. Basically, instead of shoving everything in the context, they switched Hey, let's use a vector database. Let's say a Chroma. Or Pinecone, or Waviate, like all of those, vectorized from Cloudflare, and the other one from Spotify there, I forget its name or even Superbase now has one.

[00:31:43] Everybody has a vector database it seems these days, and the reason for that is because all the AI engineers now understand that you need to put some text into some embeddings, store them in some database. And many pieces of that were still requiring internet, requiring OpenAI API calls, requiring credit cards, like all these things.

[00:32:03] And I think it's great that we've finally got to a point where, first of all there are embeddings that are matching whatever OpenAI has given us. And now you can run them locally as well. You don't have to go to OpenAI. If you don't want to host, you can probably run them. I think though GeneEmbedding's base is very tiny.

[00:32:20] Like it's half like the small model is 770 megabytes, I think. Maybe a little bit more, if

[00:32:27] Bo Wang: I'm looking at this correctly. Sorry, it's half precision. So you need to double it to make it FP32.

[00:32:33] Alex Volkov: Oh yeah, it's half precision. So it's already quantized, you mean?

[00:32:37] Bo Wang: Oh no, it's just to store it as FV16,

[00:32:39] Alex Volkov: if you store it as FV16.

[00:32:43] Oh, if you store it as FV16. But the whole point is the next segment in ThursdAI today is going to be less about updates and more about the very specific things. We've been talking about local inference as well, and these models are tiny, you can run them on your own hardware, on Edge via Cloudflare, let's say, or on your computer.

[00:32:58] And you now can do almost end to end application wise. From the point of your user inputting a query embedding this query, running a match, a vector search, KNNN and whatever you want nearest neighbor search for that query for the user. Retrieve that all from like local open source. You basically you, you can basically go offline.

[00:33:20] And this is what we want in, in the era of upcoming regulation towards what AI can be and cannot be. And the era of like open source models getting better and better. We've talked last week where Zephyr and I think Mistral News from Technium is also matching some GPT 3. 5. All of those models you can download and nobody can tell you not to run inference on them.

[00:33:40] Hugging Face open sourcing a fast Text Embeddings Inference Server with Rust / Candle

[00:33:40] Alex Volkov: But the actual applications, they still require the web or they used to. And now I'm, I'm loving this like new move towards. Even the application layer, even the RAG systems, which are augmented generation, even the vector databases, and even the embeddings are now coming to, to open source, coming to your local computer.

[00:33:57] And this will just mean like more applications either on your phone or your computer. And absolutely love that. Bo, thank you for that. And thank you for coming to the stage here and talking about the things that you guys open sourced and hopefully we'll see more open source from Jina and everybody should follow you and, and Jina as well.

[00:34:13] Thank you. It looks like. Thank you for joining. I think the next thing that I wanna talk about is actually in this vein as well. Let me go find this o Of course, we love hug and face and the thing that I think that's already on top if you look, yeah if you look at the last thing, last tweet that's pinned it's a tweet from Jeri Lou from Lama Index, obviously.

[00:34:33] Well, well, well, we're following Jerry and whatever they're building and doing over at Lama Index because they implement everything like super fast. I think they also added support for Jina like extremely fast. He talks about this thing where HugInFace opensource for us something in Rust and Candlestick?

[00:34:51] Candlelight? Something like that? I forgot that they're like iteration on top of Rust. Basically, the open source is a server that's called TextEmbeddingsInferenceServer that you can run on your hardware, on your Linux. boxes and basically get the same thing that you get from OpenAI Embeddings.

[00:35:07] Because Embeddings is just one thing, but it's a model. And I think you could use this model. You could use this model with transformers but it wasn't as fast. And as Bo previously mentioned, there's considerations of latency for user experience, right? If you're building an application, you want it to be as responsive as possible.

[00:35:24] You need to look at all the places in your stack and say, Hey. What slows me down? For many of us, the actual inference, let's say use GPT 4, waiting on OpenAI to respond and stream that response is what slows many applications down. And but many people who do embeddings, let's say you have a interface of a chat or a search, you need to embed every query the user sends to you.

[00:35:48] And one such slowness there is how do you actually How do you actually embed this? And so it's great to see that Hackenface is working on that and improving that. So you previously could do this with transformers, and now they released this specific server for embeddings called TextEmbeddings Inference Server.

[00:36:04] And I think it's four, four times faster. than the previous way to run this, and I absolutely love it. So I wanted to highlight this in case you are interested. You don't have to, you can use OpenAI Embeddings. Like we said, we love OpenAI, it's very cheap. But if you are interested in doing the local embedding way, if you want to go end to end, complete, like offline, you want to build like an offline application, using their internet server I think is a good idea.

[00:36:29] And also it shows what HuggingFace is doing with Rust and I really need to remember what language there is but definitely a great attempt from Hug and Face, and yeah, just wanted to highlight that. Let's see. Before we are joined from the Grad. io folks, and I think there's some folks in the audience who are ready from Grad.

[00:36:48] io to come up here and talk about local inference which 15 minutes left,

[00:36:52] Data Provenance Initiative at dataprovenance.org

[00:36:52] Alex Volkov: I wanted to also mention the Data Provenance Initiative. Let me actually find this announcement, and then quickly... Quickly paste this here, and I was hoping that Enrico can be here. . There's a guy named Shane Longfree,

[00:37:05] and he released this massive, massive effort, included with many people. And basically what this effort is, it's called the Data Provenance Initiative. Data Provenance Initiative is now existing in dataprovenance. org. And hopefully can somebody maybe send me the, the direct link to the suite to add this.

[00:37:23] It... It is a massive effort to take 1, 800, so 1, 800 Instruct and Align datasets that are public, and to go through them to identify multiple things. You can filter them, exclude them, you can look at creators, and the most important thing, you can look at licenses. Why would you do this? Well, I don't know if somebody who builds an application needs this necessarily, but everybody who wants to fine tune models, the data is the most important key for this, and building data sets and running them through your fine tuning efforts is basically the number one thing that many people do in the fine tune community, right?

[00:38:04] Data wranglers, and now, thank you, Nishtan, thank you so much, and a friend of the pod, Enrico. is now pinned to the top of the tweet. Thank you for to the top of the space, the nest, whatever it's called. A friend of Enrico Cipolla, who we've talked previously in the context of extending I think Lama to first 16k and then 128k.

[00:38:24] I think Enrico is part of the team on yarn paper as well. I joined this effort, and I was hoping Enrique could join us to talk about this. But basically, if you're doing anything with data, this seems like a massive, massive effort. Many datasets from Lion, and we've talked about Lion, and Alpaca, GPT 4L Gorilla, all these datasets.

[00:38:46] It's very important when you release your model as open source that you have the license to actually release this. You don't want to get exposure, you don't want to get sued, whatever. And if you're in finding data sets and creating different mixes to fine tune different models, this is a very important thing.

[00:39:03] And we want to shout out, Shane Longpre, Enrico, and everybody who worked on this because I think... Just, I love these efforts for the open source, for the community, and it just makes, it's easier to fine tune, to train models. It makes it easier for us to advance and get better and smaller models, and it's worth celebrating and ThursdAI is the place to celebrate this, right?

[00:39:27] LocalLLama effort to compare 39 open source LLMs + GPT4

[00:39:27] Alex Volkov: On the topic of extreme, how should I say efforts that are happening by the community on the same topic, I want to add another one, and this one I think I have a way to pull it up, so give me just a second give me just a second, yes. A Twitter user named Wolfram Ravenwolf who is a participant of the local Lama community on Reddit and now is pinned to the nest at the top of the tweet did this massive effort of comparing open source LLMs and tested 39 different models ranging from 7 billion parameters to 70 billion, and also compared them to chat GPT, GPT 4.

[00:40:06] And I just want to circle back to something we've said. In the previous space as well, and I welcome like folks on stage also to jam in here. I've seen also the same kind of concepts from Hug and Face folks. I think Glenn said the same thing. It's really unfair to, to take a open source model like Mistral7b and then start comparing this to GPT 4.

[00:40:26] It's unfair for several reasons. But also I think it, it, it can obfuscate to some people when they do this comparison of how, just how advanced we've come for the past year in open source models. OpenAI has the infrastructure, they're backed by Microsoft, they have the pipelines to serve these models way faster.

[00:40:47] And also, those models don't run on like local hardware, they don't run on like one GPU. It's like a whole, a whole... Amazing MLOps effort to bring you this speed. When you're running local source model open source models locally when they're open source, they're, they're, they're small, there's drawbacks and there's like takeaways that you have to bake in into your evaluation.

[00:41:09] So comparing to the GPT 4, which is super general in many, many things, that will just lead to your disappointment. However, and we've been talking about this like with other open source models If you have a different benchmark in your head of if you're comparing open source to open source, then it's a whole completely different ballgame.

[00:41:26] And then you start seeing things like, Hey, we're noticing that the 7 billion parameter model is, beating 70 billion. We're noticing that size is not necessarily the king because if you guys remember, Three months ago, ni, I wanna say we've talked about Falcon 180 B. 180 B was like, three times the size of like the, the, the next largest model.

[00:41:47] And it was incredible the Falcon Open source this, and then it was like, like a wo like, no, nobody really was able to run 180 B because it's huge. But also once, once we did run it, we saw that like the difference between that and LAMA are not great at all. Maybe a few percentage points on, on the valuations.

[00:42:04] However, the benefits that we see are from local, like for tinier and tinier models from like 7D Mistral, for example, which is the, the one that the fine tuners of the world are now preferring to everything else. And so the kind of, when you're about to evaluate whatever next model that's coming up that we're going to talk about please remember that Comparing to large, open, big companies backed by billions of dollars that run on multiple split hardware, it's just going to lead to disappointment.

[00:42:34] However, when you do some comparisons, like the guy did, that is now pinned to the tweet this is the way to actually do this. However, on specific tasks, like for, say, coding go ahead,

[00:42:46] Nisten Tahiraj: Nisam. I was going to say, we're still a bit early to judge for example, Falcon could have used a lot more training.

[00:42:53] There's also other. parts where larger models play a big effect stuff like if you want to do very long context summarization then you want to use the 70b and as far as i'm getting it and this is probably inaccurate right now but the more tokens you have the more meat you have in there the Then the larger the thoughts can can be.

[00:43:23] So that's the principle which are going by. Well, Mistral will do extremely well in small analytical tasks and in benchmarks, and it's amazing as a tool. It doesn't necessarily mean that it'll be good at thinking big. You still need The meat there, the amount of tokens to do that. Now you could chop it up and, and do it one, one at a time, but anyway, just something to keep in mind because lately we also saw the announcement from Lama70Blong, which started getting really good at at summarization.

[00:44:04] So again, there's one particular part. Which is summarization where you, it looks like you need longer you need bigger models for that. And I've tested it myself with Falcon and stuff, and it's pretty good at summarization, I just want to also give them the benefit of the doubt that there is still something that could be done there.

[00:44:28] I wouldn't just outright dismiss.

[00:44:31] Alex Volkov: Yeah, absolutely, absolutely, and I want to join this non dismissal. Falcon open sourced fully commercially like Falcon 70B before, and this was the biggest open source model at the time. And then they gave us 180B, they didn't have to, and we appreciate like open sourcing.

[00:44:46] We're not going to say no. Bigger models have more information, more, more, maybe world model in them, and there's definitely place for that, for sure. The, the next thing you mentioned also, and I think I, I strongly connect to that, Nissen, and thank you, is GPT 4, for example, is very generalized. It does many, many, many things well.

[00:45:08] It's like kind of impressive and whatever Gemini is going to be from Google soon, hopefully, we're always waiting on ThursdAI, that the breaking news will come on Thursday. We're gonna be talking about something else and then Google suddenly drops Gemini on us. There's also other rumors for Google's other stuff.

[00:45:22] Whatever OpenAI's Arrakis was, and then they stopped training, and whatever next they're coming from OpenAI, will probably blow everything we expect in terms of generality out of the water. And this, the open source models, as, as they currently are, they're really great at... Focused tasks, right? So like the coder model, for example that recently Glaive Coder was released by Anton Bakaj, I think is doing very well on the evaluations for code.

[00:45:51] However, on general stuff, it's probably less, less good. And I think for open models expecting generality on the same level as GPT 4, I think, is, is going to lead to disappointment. But for tasks, I see, I think we're coming close to different things that a year ago seemed state of the art. If you guys remember, it's not even a year since JetGPT was released, right?

[00:46:14] I think JetGPT was released in November? No, not as an API even, just the UI, like middle of November. So we're coming up on one year, I think the Dev Day will actually be one year. That was 3. 5. 3. 5 now, many people use 3. 5 for applications, but, you want to go for 4. If you're paying for Chattopadhyay Plus and you have a task to solve, you're not going to go 3.

[00:46:35] 5 just because you feel like it. You know that 4 is better. But now we're having open source models way smaller. They're actually getting to some levels of 3. 5, and the above effort is actually an effort to try to figure out which ones. And so I strongly recommend, first of all, to get familiar with local Llama subreddit.

[00:46:54] If you don't use Reddit, I feel you, and I've been a Reddit user for a long time, and I stopped. Some parts of Reddit are really annoying. This is actually a very good one, where I get a bunch of my information outside of Twitter. And I think Andrej Karpathy also recommended this recently, which... Then became an item on that subreddit.

[00:47:12] It was really funny. And this massive effort was done by this user and he, he did like a full comparison of just 39 different models. And he outlined the testing methodology as well. We've talked about testing and evaluation methodology. Between ourselves, it's not easy to evaluate these things. A lot of them are like gut feeling.

[00:47:31] A lot of the, the evaluation, and Nathan and I have like our own prompts that we try on every new model, right? It's, it's like a lot, a lot of this for many people is like gut feel. And many people also talk about the problem with evals and I think Bo mentioned the same thing with the embedding leaderboards that then, you know.

[00:47:48] It then becomes like a sport for people to like fine tune and, and release models just to put their name on the board to overfit on whatever whatever metrics and evaluations happen there. And then there's a whole discussion on Twitter whether or not this new model that beats that model on, on some, some score actually, was trained on the actual evaluation data.

[00:48:09] But... Definitely the gut feeling variation is important and definitely having different things to test for is important. And you guys know, I think, those of you who come to ThursdAI, my specific gut feels are about like translation and multilingual abilities, for example, and direction following some other people like Jeremy, Jeremy Howard from ThursdAI have his own like approach.

[00:48:29] Everybody has their own approach. I think what's interesting there is... Kind of the community provides, right? We're like this decentralized brain of evaluating every new model. And for now, the community definitely landed on Mistral as being like the top. At least a model in the 7b range, and Falcon, even though it's huge and can do some tasks like Nissan said is less, less and Lama was there before. So if you start measuring the community responses to open source models, you start noticing better what does what. And this effort from this guy, he actually outlined the methodology, and I want to shout out... Friends of the pod, Tignium being the go to many, many things, specifically because Open Hermes, which, Hermes, which we've talked about before which was fine tuned from Mistral7b is probably like getting the, the, the top leaderboard from there, but also based on my experiences, right?

[00:49:22] So we've talked last week about Open Hermes being able, you're able to run Open Hermes on your... Basically, M1, M2, Max with LM Studio, which also shout out to LM Studio, they're great, and I've tested this, and this seems to be, like, a very, very well rounded model, especially for one that you can run yourself and comparing to GPT 4 and other stuff, this model for specific things is really great.

[00:49:45] It's good for coding. It's not the best for coding. I think there's a coding equivalent. And I just really encourage you, if you're interested, like figuring out what to use. And we've talked about this before. What to use. Is an interesting concept, because if you come to these spaces every week and you're like, Oh, this model is now state of the art, that model is state of the art.

[00:50:05] You may end up not building anything, because you just won't have the, you always keep chasing the latest and greatest. The differences are not vast from week to week, we're just seeing like better scores. But it's well worth checking out this effort for the methodology, for the confirmation that you have.

[00:50:21] Let's say you, you felt that Mistral is better and now you can actually understand. And also for friends of the pod I think John Durbin is also, Error Boris model is really great and it's also up there. And what Nistan highlighted is that bigger models sometimes excel at different things some summarization or just more knowledge.

[00:50:38] It's also outlined there as well. And You can also see models that are not that great, that maybe look good on the leaderboards, but don't necessarily perform as well, and you can see them as well in that effort.

[00:50:49] So maybe actually let me reset the space. Everybody who joined in the middle of me speaking is like, why is this guy speaking?

[00:50:56] And what's going on here? You are welcome. You, you're in the space of ThursdayAI. ThursdayAI we are meeting every week to talk about everything that happens in the world of AI. If you're listening to this and you're enjoying, you're the target audience, but generally we talk about everything from open source LLMs and now embeddings.

[00:51:13] We, we talk about big company APIs. There's not a lot of updates from OpenAI this week. I think they're quiet and they're going to release everything in a week and a half in their dev day. And, and Tropic obviously, and, and... Cloud and Microsoft and Google, like all these things we cover as much as possible.

[00:51:29] We also cover voice and audio. And in that vein, I want to shout out to friends from Gladia and I'll pin there actually, let me just pin this right now. Gladia just released a streaming of Whisper and I've been waiting for something like this to happen. Sometimes for AI engineers, you don't want to host everything yourself. And you want to trust that, the, the WebSocket infrastructure is going to be there when you don't want to build it out. And I'm not getting paid for this.

[00:51:53] This is like my, my personal, if I had to implement like something like the voice interface with ChatGPT, I would not build it myself. I would not trust my own MLOps skills for that. And so for that Gladia is, I've been following them since I wanted to implement some of their stuff and they just implemented like a WebSocket.

[00:52:11] Whisper transcription streaming, and it's multilingual, and it's quite fast, and I definitely recommend folks to check it out. Or check out my review of it, and try out the demo, and if you want it, use it. Because we've talked last week about the interface for ChatGPT that's voice based, and you can actually have a FaceTime call with ChatGPT, and that's incredible.

[00:52:30] And I think more and more removing the screen out of this talking to your AI agents, I think, with the latest releases also in text to speech, like 11 Labs and XTTS that we've covered as well. With advances there, with speed, you can actually start getting interfaces where you can talk, and the AI listens and answers back to you very fast.

[00:52:52] Worth checking out, and definitely an update. Thank you.

[00:52:57] Nisten Tahiraj: Okay. So this is a complete product. I was,

[00:53:00] Alex Volkov: yeah, this is a full, pay a little bit, get a WebSocket and then you use this WebSocket to just like stream and you can embed this into your applications like very fast. Setting that up, I think Koki, you can do this with Koki, which we also covered.

[00:53:13] Gradio Interview with Abubakar, Xenova, Yuichiro

[00:53:13] Alex Volkov: Alright, I think it's time to again, reset the space. ThursdAI I wanna thank Bo who is still on stage. Bo, you're welcome to keep, stay with us a little bit and now we're moving on to the second part of this.

[00:53:30] Welcome, Abubakar. Welcome, Zinova, Joshua. Welcome some folks in the audience from Hugging Face. It's great to see you here on ThursdAI, well, Zinova is always here, or hopefully, but Abubakar, I think this is your first time.

[00:53:41] I'll do a brief intro, and then we can, we can go and talk about Gradio as well.

[00:53:45] I my first inference that I ran on a machine model was a year and something ago, and this was via Gradio, because I, I got this weights file, and I was like, okay, I can, I can probably run something with CLI, but how do I actually visualize this? And back then, Gradio was... was the way and I think since then you already guys you were already part of Hug and Face and Everybody who visited a model page and tried a demo or something probably experienced Gradua even without knowing that this is what is behind all the demos So welcome, please feel free to present yourself.

[00:54:17] Give us Maybe two line, three line of how you explain Gradio to folks, and then we can talk about some exciting stuff that you guys have released this week.

[00:54:25] Abubakar Abid: Awesome. Yeah, first of all, thank you again for, for having me and for having several folks from the Gradio team here. I've known you, Alex, for a long time.

[00:54:32] I think you were one of the early users of Gradio or at least one of the early users of Gradio blocks and, and some of these viral demos. So I've seen, this podcast develop over time and it's It's a real honor to be to be able to come here and to be able to talk about Gradio.

[00:54:45] Yeah. Hi everyone. I'm Abu Bakr. I'm, I lead the Gradio team at Hugging Face. So Gradio is basically the way we describe it is it's the fastest way to, to build a GUI or an app from a machine learning model. So traditionally have, taking a machine learning model to production or at least letting...

[00:55:01] Users try it out has meant that you need to know a lot of front end. You need to know how to, setting up a server, web hosting. You have to figure all of these things out so that other people can play around with your machine learning model. But Gradio lets you do all of that with just a few lines of Python as I think Joshua was mentioning earlier.

[00:55:18] And Gradio has been used by a lot of people. We're very lucky that, we kind of coincide. We started Gradio a few years ago late 2019. It grew out of A project at Stanford, and then spun out to be a startup, and then we got acquired by Hugging Face, and we've been growing Gradio within that kind of ecosystem.

[00:55:32] But we're very lucky because during this time has coincided with a lot of real developments in machine learning. I come from an academic background, so before 2019 I was doing my PhD at Stanford. And, everyone's been doing, machine learning for a while now, but...

[00:55:45] The types of machine learning models that people wanted to build, you built it, you published a paper and that was it, but, since then, recently people are building machine learning models that other people actually want to use, other people want to play around with, things have gone very, exciting, and so that's led to a lot of people building radio demos I think, I was looking at the the stats recently we have something around more than three, four million demos, Gradio demos that have been built since we started the, library.

[00:56:09] And yeah, so recently we released something called Gradio Lite, which lets you run...

[00:56:13] Gradio effects on the open source LLM ecosystem

[00:56:13] Alex Volkov: Wait, before, before, Abubakar, if you don't mind, before Gradio Lite, let's not I just want to highlight how important this is to the ecosystem, right? I'm oriJinally a front end engineer I do component libraries for breakfast, and basically, I don't want to do them it's really nice to have a component library maybe Tailwind UI, or ShadCN, like, all these things, so even front end engineers, they don't like building things from scratch.

[00:56:35] Switching to machine learning folks who like build the model, let's say, and want to run some inference, that's not their cup of tea at all. And, just thinking about like installing some JavaScript packages, like running NPM, like all these things, it's not like where they live at all. And so what Gradio allows us to do this in Python.

[00:56:51] And I think this is, let's start there. That's on its own is incredible and lead, led to so many demos just look to happen in Gradio. And you guys built out pretty much everything else for them, like everything that you would need. And I think recently you've added stuff, before we get to Gradual Light, like components like chat, because you notice that, many people talk to LLMs, they need the chat interface, right?

[00:57:10] There's a bunch of multi modal stuff for video and stuff. Could you talk about, the component approach of how you think about providing tools for people that don't have to be designers?

[00:57:20] Abubakar Abid: Yeah, absolutely. So yeah, that's exactly right. Most of the time when you're, machine learning, developer you don't want to be thinking about writing front end, components that coupled with some, an interesting insight that we had with machine learning models.

[00:57:31] It's much more like the components from machine learning models are tend to be much more usable than in other kinds of applications, right? So one thing I want to be clear is that Gradio is actually not meant to be like a, build web apps in general in Python. That's not our goal. Our goal, we're heavily optimized toward building machine learning apps.

[00:57:50] And what that means is, the types of inputs and outputs that people tend to work with are a little bit more, contained. So we have a library right now of about 30 30, Different types of like inputs and outputs. So what does that mean? So things like images, image editing video inputs and outputs, chatbots as outputs JSON, data frames, various types of inputs and outputs that components that come prepackaged with Gradio.

[00:58:15] And then when you build a Gradio application, you basically say, Hey, this is my function. These are my inputs, and these are my outputs. And then Gradio takes care of everything else, stringing everything together sending them, message back and forth, and pre processing, post processing everything in the right way.

[00:58:29] So yeah, you just have to define your function in the backend, and your inputs and your outputs, and then Gradio spins up a UI for you.

[00:58:36] Alex Volkov: And so I really find it funny and I sent the laughing emoji and said that Gradio was not meant to build web apps, like full scale web apps, because I think the first time that we've talked, you reached out, because I joined whatever open source that was running for stable diffusion, this was before automatic, I think, and you told me hey, Alex, You did some stuff that we didn't mean for you to do, so I injected like a bunch of JavaScript, I injected a bunch of CSS, like I had to I had to go with my full on like front end developer, I was limited with this thing, and so I, even despite the limitation, I think we did like a bunch of stuff with just like raw JavaScript injection, and since then I think it's very interesting, you're mentioning like Gradio demos, Gradio demos, Automatic 1.

[00:59:16] 1. 1, which is maybe for most people, is the only way they know like how to run stable diffusion, is now getting investments from like NVIDIA and getting right, I saw like a bunch of stuff that Automatic does, so it's very interesting like how you started and how the community picked it up. So can you talk about like the bigger parts of this, like Automatic and some other that are like taking Gradio and pushing it to the absolute limit?

[00:59:37] Abubakar Abid: Yeah, absolutely. So that's yeah we're, we're, I'm, I'm, like, perpetually shocked by Automatic 111, every time I see a plug in, or, kind of the, the I think, like you said NVIDIA, now IBM, or something, released a plug in for Automatic 111? It's crazy. But yeah, so basically it's ever since we started Gradio, we've been noticing that, okay, okay, Gradio seems to work for 90 percent of the use cases, but then the last 10 percent people are pushing the limits of, of what's possible with Gradio.

[01:00:06] And so we've progressively increased what's possible. So in the early days of Gradio, there was actually just one class called Interface. And what that did was it allowed you to Specify some inputs and some outputs and a single function. And we quickly realized, okay, people are trying to do a lot more.

[01:00:20] So then about a year and a half ago, we released grad your Blocks, which allow you to like have arbitrary layouts. You can have multiple functions, string together connect inputs and in different ways. And that is what kind of allowed these very, very complex apps like automatic 1 1 1 SSD Next, and the equivalence in other domains as well.

[01:00:36] Of course, like the, the text, the text web, the, the UBA Booga. Text UI as well and then there's also similar kind of, very complex demos in the audio space as well. And music generation as well. So like these super complex, multiple tabs, all of that, that's possible with this new kind of architecture that we laid out called GradioBlox.

[01:00:55] And GradioBlox is this whole system for specifying layouts and and, and functions. And it's defined in a way that's intuitive to. Python developers, the, we like a lot of these like web frameworks in Python have, have popped up. And one of the things that I've noticed as someone who knows Python, but really not much JavaScript is that they're very much coming in from the perspective of a JavaScript engineer, and so like this kind of React inspired kind of frameworks and, and stuff like that.

[01:01:21] And, and what, that's not very intuitive to a Python developer, in my opinion. And so we've defined this whole thing. Where you can, have these, build these arbitrary web, kind of web apps, but still in this Pythonic way. And we're actually about to take this a step farther, and maybe I can talk about this at some point but next week we're going to release Gradio 4.

[01:01:38] 0, which takes this idea of being able to control what's happening on the page. To the next level. You can have arbitrary control over the ui, ux of any of our components. You can build your own components and use them within a Grady app app and get all of the features that you want in a grad app.

[01:01:52] Like the, the, API usage, pre-processing, post-processing. Everything just works out of out of out of the box. But now with your own kind of level of control, yeah. Awesome.

[01:02:01] Alex Volkov: And it's been honestly great to see just how much enablement. Something like as simple as Gradio for folks who don't necessarily want to install npm and css packages.

[01:02:11] There's not much enablement this gave the open source community because People release, like you said, different significant things. Many of them, maybe you are not even aware of, right? They're running in some Discord, they're running in some Reddit. It's not like you guys follow everything that happens.

[01:02:23] Gradio local URL via Gradio Proxy

[01:02:23] Alex Volkov: Additional thing that I want to just mention that's very important that. When you run Gradio locally, you guys actually expose it via like your server, basically my local machine. And that's been like a blast that that's been like a very, very important feature that people may be sitting behind the proxy or everything.

[01:02:39] You can share your like local instance with some folks, unfortunately only for 72 hours. But actually

[01:02:44] Abubakar Abid: that's about to change. So in 4. 0, one of the things that we're trying to get, so actually, we've been very lucky because Gradio has been developed along with the community. Like you said, like often times we don't know what people are using Gradio for until, they come to us and tell us that this doesn't work, and then they'll link to their repo and it's this super complex Gradio app and we're like, what?

[01:03:01] Okay, why are you even trying that? That's way too complicated. But, but, then we'll realize like to the extent to what people are building. And so this you mentioned the share, these share links as well, which I want to just briefly touch upon. So one, one of the things that we released in like the early days of, of, of Gradio is we realize People don't want to worry about hosting their machine learning apps.

[01:03:19] Oftentimes you want to share your machine learning app with your colleague. Let's say you're like the engineer and you have a colleague who's a PM or something who wants to try it out. Or it might be if you're in academia, you want to share it with fellow researchers or your professors, whatever it may be.

[01:03:33] And like, why do all of this hosting stuff if you just are, are, like, building an MVP, right? So we built this idea of a share link. So you just, when you launch your Gradio app, you just say share equals true. And what that does is it creates a it uses something called Fast Reverse Proxy to actually expose your local port to a to this FRP server which is running in a public...

[01:03:53] Machine, and what that does is it forwards any request from a public URL to your local, port. And, what the, in a, the long story short, what that does is it makes your Gradio app available on the web for anyone to try. It runs for 72 hours by default, but now what we're doing as part of 4.

[01:04:08] 0, we'll, announce this, is you can actually build your own share servers. So we have instructions for how to do that very easily and you can point your Gradio instance to that share server. So if you have an EC2 instance running somewhere, just point to it and then you can have that share link running for as long as you want and you can, share your share servers with other people at your company or your organization or whatever it may be and they can use that share link and, again, they can run for however they want.

[01:04:30] Wait,

[01:04:31] Nisten Tahiraj: wait, wait, is this out? Which branch is this? Is

[01:04:34] Abubakar Abid: this going to be out? This is going to be out on Tuesday for Gradio 4. 0 we're, we're going to launch on Tuesday.

[01:04:41] Nisten Tahiraj: It's like the most useful feature I'd say of, of Gradio, especially when you make a Google collab that you want people to just run in one click.

[01:04:49] And like, how are they going to even use this model? And you just throw the entire Gradio interface in there and you share equals true. And then they know, they can just give it, give the link to their friends and stuff. It's really, it makes it really easy, especially with Google Colab. But now that you can host your own, this is huge.

[01:05:09] This is going to... to another level. I have more questions for

[01:05:14] Alex Volkov: Google. I think I just Nissen, thank you. I just want to touch upon the Google collab thing. I think at some point Google started restricting how long you can run like a collab for, and I think you guys are the reason. This exact thing that Nissen said.

[01:05:30] People just kept running the Gradio thing with the URL within the Google collab and exposing like stable diffusion. They didn't build collab for that, and I think they quickly had to figure out how to go around this.

[01:05:41] Abubakar Abid: Yeah. And their approach is like literally blacklisting the name of the the of specific, GitHub repos, which, I, I completely understand where, where Colab is coming from, right?

[01:05:50] They're giving these GPUs for free. They have to have to prioritize certain use cases, but we're working with the Colab team and we're seeing if, there's ways, like right now it's like a blacklist on, on automatic one on one, and some other repos. So we're hoping we can find another way that's not That's not so restrictive.

[01:06:05] Nisten Tahiraj: No, but it still works. You can just fork the repo. It works for everything else. It works for LLMs. So if anybody else really needs it. Gradio works on Colab. Well, as far as language stuff goes, I haven't done that much.

[01:06:18] Abubakar Abid: Yeah, so Gradio works on Colab for sure. And, and that's, and that's early on, like one of the decisions we had to make actually was...

[01:06:25] Should we use like, the default python runtime or should we like change, like the interpreter and stuff like that? Because building GUIs is not necessarily python's like strength, and like oftentimes you wanna render re-render everything, and you, you wanna do certain things that may not be like what Python is suited for.

[01:06:42] But early on we decided, yeah, we wanna stick with the default python runtime because. One of the reasons was things like Colab, because we wanted people to be able to run Gradio wherever they normally run Python without having to change their workflows. And Colab, Gradio works in Colab.

[01:06:56] We had to do a lot of... Trickery to make it work. But yeah, it works. It's just like these certain very, very specific apps that have become too popular and apparently consume too many resources. They're blacklisted by Colab right now.

[01:07:10] Local inference on device with Gradio - Lite

[01:07:10] Alex Volkov: Alright thank you for this intro for Gradio. To continue this, we have on stage Zinova who introduced himself, authors of TransformerJS, we've been talking with Bo in the audience, also somebody who's like just recently open sourced, with Jina, the embeddings model, and everything that we love to cover in ThursdAI, a lot of it is talking about As open source, as local as possible, for different reasons, for, not getting restricted reasons.

[01:07:36] And you guys just recently launched Gradio Lite, and actually we have Yuichiro here on stage as well. So I would love to have you, Abubakar introduce and maybe have Yuichiro then follow up with some of the stuff about what is Gradio Lite? How does it relate to running models on, on device and open source?

[01:07:52] And yeah, please, please introduce it.

[01:07:54] Abubakar Abid: Yeah, absolutely. Yeah. Like you mentioned, I think one of the things that we think about a lot about at Gradia is like, it's the open source ecosystem and, right now, for example, where can open source LMs, for example, really shine and things like that.

[01:08:06] And one of the places is on device, right? On device or in browser is, open source has a huge, edge over proprietary models. And so we were thinking about how can Gradio be useful in this setting. And we were thinking about the in browser application in particular. And we were very, very lucky to have Yuchi actually reach out to us.

[01:08:25] And Yuchi has this, fantastic tracker, but if you don't already don't know Yuchi, he built Streamlit Lite, which is a way to run Streamlit apps in the browser. And then he reached out to us and basically had this idea of doing something similar with Gradio as well. And basically, I, almost like single handedly refactored much of the Gradio library so that it could run.

[01:08:43] In, with Pyodide, in WebAssembly, and basically just run in the browser. I'll let Yuchi talk more about that, but basically, if you know how to use Gradio, then you know how to use Gradiolite. You just write the same Python code, but wrap it inside Gradiolite tags, and then it runs within the browser, like in the front end.

[01:08:59] You can execute arbitrary Python, and it just works. Yuchi, if you want to share a little bit more. About that, yeah, or introduce yourself.

[01:09:08] Yuichiro: All right, Hey, can you hear me? Well, thank you very much, thank you very much for the quick short interaction about Gradual Light and Streaming Light 2.

[01:09:15] Well, as Abhakal explained about it,

[01:09:18] there was

[01:09:18] Sorry. OriJinally there was kind of a, a tech technological movement about edge computing or Python. It was started by uh ide that was c Python runtime compiled for web assembly that can completely run on web browsers. It started, it triggers the big band of edge computational Python, random starting with project that was.

[01:09:43] That was already deported to Web Assembly runtime as D Light, and it inspired many other Python frameworks, including Streamlet and any other existing Python frameworks. I dunno uh pet script or. HoloVis panel or Shiny for Python, something like that. So there was a huge movement about to, to make Python frameworks to compatible with WebAssembly or web browser environment.

[01:10:13] And I thought that was a great, opportunity to make machine learning or data science stuff completely run on web browser, including transformer things or many more stuff existing in the stream machine learning ecosystem. And I first created a Streamlit Lite that was forked version of Streamlit to WebAssembly.

[01:10:36] And yeah, the, remaining story were the same as what Abhakaal introduced. So technically it was not, my oriJinal stuff, but there was a huge movement about such kind of stuff. And I, simply followed that. flow and my the transfer such kind of analogies to gradial repository.

[01:10:58] Alex Volkov: Yeah, that's it. Awesome. Thank you so much. And okay. So can we talk about what actually do we do with now the ability to run Gradio all in the browser? Could maybe both of you give some examples and then I would like also to to add Zenova to the conversation because much of the stuff is using Transformers.

[01:11:18] js, correct? Can we maybe go and talk about what is now actually possible compared to like when I run Gradio on my machine with a GPU and I can run like Stable Diffusion? I just

[01:11:27] Nisten Tahiraj: want to say that this is crazy that this can happen at all for the audience to

[01:11:32] Abubakar Abid: prepare. Yeah, I was honestly blown away the first time Yuchi showed me a demo as well.

[01:11:36] Imagine you have a, any sort of machine learning model. Practically, not almost anything, but a super, really good speech recognition model running completely in your browser. Meaning that, for example, now you can take that demo, you can put it inside GitHub Pages.

[01:11:51] You can host it inside. We've seen people embed Gradio demos now with Gradio Lite inside Notion. So you have a Notion whatever page, you can take that demo, you can embed it inside Notion. One of the things that we launched when we launched Gradio Lite at the same time is we also launched something called the Gradio Playground.

[01:12:07] Now the Gradio Playground, you can actually just Google this, you can find this. But basically what it allows you to do is it allows you to write code in the browser. And as you're editing the code, you can see live previews of your Gradio application. And, and basically what's happening is, is taking that Gradio code, it's wrapping it inside Gradio Lite tags, and it's just running it.

[01:12:27] It's just straightforward application of Gradio Lite. And, we're excited by this personally just because if one, it opens up, it allows us to write interactive documentation. You can write, you can try stuff, you can, you can immediately, see the results. We're also excited because we've seen interest from other libraries including, for example, SacketLearn, who want to embed Gradio demos within their documentation.

[01:12:49] Within their docs, right? But they're they were hesitant before because they didn't want to have a separate server running these radio applications and have to worry about maintaining those servers, making sure they were up all the time, making sure they could handle the load. Now they can write it in their docs and they're like, their, their demos and everything, they'll just run in the user's browser.

[01:13:07] They won't have to worry about maintaining everything since it's, in the same code base and everything. So I think that's another kind of cool application that we're excited by is just... These potential for interactive documentations that, maybe potentially other, other maintainers or other libraries might want to include.

[01:13:22] So yeah, so stuff like, security, privacy, serverless type stuff, hosting, and all of that. And then also like these interactive documentations.

[01:13:30] Alex Volkov: I think the demo that you mentioned with the transition within Notion from VB from Hive Interface, I think that was great. I'm trying to find the actual link, but basically, because Notion allows to embed like basically iframes, right?

[01:13:42] So he embedded this whole Gradio Lite interface to translate. I think using... Burt or something like very similar that all runs within like the notion page. I think that's awesome. Joshua, you want to chime in here and say how transformers is built into this and now this allows for way more people to use transformers in like a UI way.

[01:14:02] Transformers.js integration with Gradio-lite

[01:14:02] Xenova: Yeah, sure. So first of all, literally nine, like almost, I would say the whole. Everything that we are talking about now has been, like,

[01:14:12] Abubakar Abid: led by the Gradio team.

[01:14:14] Xenova: And I am here piggybacking and be like, Whoa, look at this, Transformers JS is now working. That's really not what we're talking about today.

[01:14:23] It's the amazing work that the team has been able to do. To achieve the past, this has been going on for, for quite a while. It's been like codenamed like Gradioasm and now finally being released as as GradioLite and now. Sort of like the Transformers J side of it. Just oh, by the way there's this library called Transformers J you can sort of use it and, and with the, the Transformers.

[01:14:48] Oh, was that ? Sorry. You've been way too humble.

[01:14:51] Abubakar Abid: No, no, absolutely

[01:14:52] Xenova: not. I think it's, it's, it's so much has been done by, by you and your, and, and the amazing radio team that's it, it just so happens to be that these things are like coinciding. And now you can end up using Transformers. js with, with Gradio and Gradio Lite.

[01:15:07] And obviously this is also made possible by, okay, everyone, everyone stick with me. It's going to be a little, get a little complicated when I try to explain this. But Transformers. js Pi, which is, are you ready? A JavaScript port of a Python library turned into a JavaScript library so that I can run in a Python environment.

[01:15:29] Okay. We all caught up? That's, that's Transformers. js. py, which is which, which Yushi wrote in the audience obviously with his experience with streamlets bringing streamlets to the browser. It's sort of his, his invention, which is quite funny, but that's sort of how Transformers. js is able to be run.

[01:15:49] Inside Gradiolites there are other ways, but from what you'll see in the documentation, that's sort of like the, the go to way. And it's

[01:15:57] Alex Volkov: Yeah, I wanna ask about this, because I saw from Transformers. js, import import on Discord Transformers. js.

[01:16:04] So maybe you should, could you talk about this part that, that Zenova tried to explain? It was, like, a little A little complex Transformer. js is you can install it through npm and then run this, right? And then it runs in the, in the node environment and browser environment. Gradualite is basically Python within JavaScript.

[01:16:19] So then you have to turn Transformers into Python in order to get it into Gradualite so that it runs within the JavaScript context again? Is that, is that correct? Am I getting this right?

[01:16:30] Nisten Tahiraj: If I could say something for the audience, what has, what's happening here is that there's a layer called Pyodide and that uses kind of like what WebAssembly uses to run Python at native speeds.

[01:16:44] So it runs in the browser. And it goes down that stack, there's like a virtual machine and compiler, all that stuff in there. And then that's how Python is able to run at native speed. And this means that with PyScript, you can have inside the same index. html, just your regular index. html, you can have your JavaScript code and your objects and stuff.

[01:17:06] And you can have just straight Python code in there. Like you just add the tag. You just dump the Python as is nothing else. And the crazy part is that it can access JavaScript objects now. So you can do the math in Python in the browser, 'cause JavaScript can do math well again, but, and then you can access those objects.

[01:17:30] So this is a whole crazy stack here with PIO IDE and EM scripting. And again, that's only WebAssembly. So that's CPU. Only for now, because there's still a mountain of work to get, and to finish it off, Emscripten is like your POSIX layer, like your Unix layer. It's like there's an operating system being built inside the browser here that's going on.

[01:17:54] So that's why things are getting complicated, but yeah, just to keep that in mind, that's the base.

[01:17:59] Yuichiro: Yeah, but, what Nisten talked about was everything, because we can access the JS object from Python world inside the browser if you import transformer.

[01:18:10] js. py on Gradle Lite, under the hood, transformer. js is now still being imported in the browser environment. And what... When you write a Python code as a Gradle like application on the browser, what you do is, simply using the oriJinal JavaScript, JavaScript version of Transformer. js, just proxy from the Python code through the, proxying mechanism provided by Pyodite.

[01:18:42] What Transformer. js. py does is just a, thin, Proxying layer or some glue code between bridging these two, two words, Python and JavaScript. That's it.

[01:18:56] Abubakar Abid: Yeah, I just zooming out a little bit. So basically what, what Transformers. js underscore pi does, it lets you run everything that Transformers.

[01:19:03] js does. And what Transformers. js does, it lets you run a lot of the models, a lot of the tasks. There's a lot of the models there, you can now run in your browser, right? We're talking about all of the NLP related tasks, like things like translation, LLMs, but also, a lot of the vision tasks, a lot of the audio stuff.

[01:19:22] We're talking about speech recognition that's powered by Transformers, what Josh has been doing with Transformers. js. And I think Transformers. js just released, even for example, speech generation. Text to speech. And so now you can do that within Transformer. js, which means you can do it within Transformer.

[01:19:34] js Pi, which means now you can do it within Gradial Light as well.

[01:19:40] Alex Volkov: That's incredible. And I think the biggest part for me is that... Now that you guys ported Gradio, which is ubiquitous in machine learning, and everybody who releases the model uses either this or Streamlit, but I think it's, it's a clear winner between the two, as this is, as I'm concerned and as I see, then now you basically ported the same thing towards the browser, and the more we see, Models getting smaller and we've been always talking about this models getting smaller, models being uploaded to the browser.

[01:20:08] Browsers

[01:20:09] Abubakar Abid: getting more powerful and,

[01:20:10] Alex Volkov: and WebP is more browser getting more powerful. Yeah. And yeah, I'm getting, I'm getting to web GPU because we have backend here, Arthur, on stage. And I would love to introduce you guys if, unless you're already familiar. The more we see this move, the more like the need for something like a component library that's built in is, is very interesting.

[01:20:25] Even though this world already has a bunch of libraries. But you're basically, with this, you're also porting the people with the experience of Gradio, right? With the existing with the existing frameworks, with the existing Gradio interfaces, to this world. I find it very exciting, so thank you.

[01:20:38] And I want to introduce Arthur. Arthur, feel free to unmute yourself and maybe introduce yourself, briefly, and then yeah, feel free to chime in to this conversation.

[01:20:46] Arthur Islamov: Okay, so I did quite a lot of things with ONNX to create the Diffuser. js library and to load stable diffusion in the browser, and now I'm working on the SDXL version.

[01:20:58] So I was going to ask, do you know if there are some plans on adding WebGPU backends for PyTorch? Because when it happens... It'll be so much easier as web GPU backend can be launched on any platform, not even in the browser, but also locally without the browser, just using the metal backend the direct ticks or Vulcan or Linux.

[01:21:28] So I guess when that happens, we'll go to a whole new era as you'll be able to run those PyTorch models in the browser with GPU acceleration.

[01:21:40] Xenova: I can tag on to this. The TLDR of it is it's not at the point...

[01:21:46] Where, sort of, I'm sort of comfortable with upgrading the ONNX Runtime web Runtime, basically, to support the WebGPU backend right now, just because there's quite a few issues still left, like left, so we'll see. To solve before we get to the point where you can start running these models completely on WebGPU.

[01:22:07] The main, I think, the current issue at the moment is with, like, when you're generating text a lot of the... The buffers aren't reused properly during when you, when you start decoding. That's sort of leading to quite a massive performance bottleneck just because you're transferring memory between CPU and GPU every single time you're you're decoding.

[01:22:31] So that's, that's not quite there yet, however, with things like image image classification and I guess models with encode only, encoder only models, those are getting quite good, like births pretty fast we've, segment anything when you're just doing the encoding step, we the Onyx Runtime team has got to the point where it used to take around 40 seconds and now it takes around 4 seconds.

[01:22:55] And that's currently being worked on in like a dev branch, basically, of Transformers. js, just like making sure the integration's working. But it's, it's almost there. I keep, I keep saying it's almost there, but the amazing Microsoft team has been, has been really working hard on this. And if you just look at the commit history of on GitHub Microsoft slash Onyx Runtime and you go to the web version.

[01:23:18] There's just so many amazing people working on it and it's slowly getting to a point where and this will sort of be released with Transformers. js version 3. When we upgrade the Onyx runtime version to probably 1. 17, which will be, which will be the next one. It's currently 1. 16. 1. And then they'll, it's, and, and literally from the user's perspective, it's as simple as adding a line of code just saying, Basically use web GPU instead of web assembly.

[01:23:46] And, and also in the case where it's not supported, it'll fall back to the web assembly implementation. And, and that's, this will completely be transferable to how grid your light works, just because as was mentioned, it's sort of use as transformers js under the hood. So you any benefits that you'll see in Transformers j you'll see in transformers Jss pie, which you'll see in radio lights, which is, which is great.

[01:24:11] TLDR coming soon, it's an, it's an annoying answer to give, but it's, it's so close. And I guess this is also good because it sort of aligns with the time that more browsers will support WebGPU, sort of like without flags. I know Chrome is sort of leading the charge and other Chromium based browsers.

[01:24:30] But if you look at things like Safari and Firefox, it's quite far behind to the point that you it's. It's not, it's not ready for like mass adoption yet, but once it is, and once the ONNX Runtime backend has the WebGPU support has improved, you'll definitely be seeing that in Transformers Jest.

[01:24:48] So hopefully that answers the question. I

[01:24:52] Nisten Tahiraj: think stuff's about to get crazy on the front end because of this because the thing about you have all your WebGL stuff, you have all your maps, all your 3D, all your games. Now you can have an LLM even generate code for them, manipulate those objects, move stuff around on screen in 3D, and like the, the AI.

[01:25:14] Does that at all, all within your machine, but I do want to say that for, for Pyodide itself, it might take a long time for a WebGPU support because it depends on EM scripting. And if you want to do anything with Python, like open a file, write a file, output a file, you only can do what EM scripting gives you and EM scripting is like the base layer.

[01:25:39] Of the operating system, like it pretends, it fools your apps into thinking that there's an operating system there when, when there isn't. And as far as I've seen, like two, three months ago, WebGPU support was like really, really early on and might take a while for Emscripten to support that. So you're going to have to do that other ways by going straight to using WebGPU versus using it with that layer.

[01:26:06] So it might get a bit complex

[01:26:09] Alex Volkov: there. I agree about the stuff is about to get crazy. Go ahead, Arthur, and then we'll follow up on Gradio 4 and then we'll conclude.

[01:26:18] Nisten Tahiraj: Yeah, I

[01:26:18] Arthur Islamov: just wanted to tell that as yesterday or a few days ago I have seen that This distilled stable diffusion model I saw that they have previously released not Excel version, but the ordinary 2.1 or something like that, the distilled one.

[01:26:35] So I'm thinking to try to make my edema work with that distilled model without 64 beats. So for just ordinary 32 bit that will work in almost any browser without any additional flags or launching with some special parameters.

[01:26:54] Alex Volkov: Yeah. Arthur, you can't just mention an item on my updates list and not talk about this, right?

[01:26:59] Folks, let me just briefly cover what Arthur just said. Just on the fly SegMind, a company called SegMind, introduced like a distilled version of SDXL. And it's okay, Diffusion Excel, something they released a while ago. We've covered this multiple times. Understand like way better. Quality, obviously, generations and diffusion, but also way better text understanding, right?

[01:27:20] And it has two parts there's like a refiner part in addition. And so this company basically distilled that distillation we've talked about multiple times before. It's when you train your own model, but then you steal data from GPT 4 and you create the data so that GPT 4, you basically distill, it's like smartness to your own models.

[01:27:37] So they basically did this for SDXL they call it SegMind Stable Diffusion 1B, and it's a. 50 percent smaller and 60 percent faster than SDXL. Again, just to put in some time frames what Abubakar and I talked about, where I first experienced Gladio, this was Stable Diffusion 1. 4, a year ago, a year and a couple of months ago.

[01:27:57] Since then, we got Stable Diffusion multiple iterations of Stable Diffusion, then there is SDXL, which is like the Excel version. It it generates 124 by 124 images. And, and then now a few months after they released that, now we have a version that's 50 percent smaller and 60 percent faster.

[01:28:16] And so what Arthur is like now talking about diffusers JS is the ability to like load stable diffusion in the browser. Now there's a model that's half the size, and 60 percent has passed, which is good for the browser context. So I pinned it to the top of the tweet check out SegMind, it's definitely super cool.

[01:28:34] And the advancements that we see from week to week, and this is obviously super cool as well. And Arthur, sorry to interrupt with this, but you had one of my tasks that I had to finish before we finish and talk about this. So are you, have you already introduced it to diffusers? Have you tried it?

[01:28:52] Nisten Tahiraj: I have

[01:28:52] Arthur Islamov: tried it to convert it to omx, but it didn't work or maybe some of my code didn't work. So I guess I will try again on the weekend and yeah, most likely I will make it running.

[01:29:06] Alex Volkov: I think we had some folks on segment react and we will, let's try to connect there and, and hopefully get it running on, on as well so that we all benefit and I guess, maybe as the last part of this conversation Abukar and thank you for joining Uchi ABA Ali and the folks from Hugging Face. It is great to see all of you. I think you mentioned some folks that joined before like Uhb and some other folks on the Hugging Face. We're big fans here on ThursdAI, and we're like, always welcome you guys.

[01:29:33] Could you talk about what's coming in version four, because I think you, you, you gave us like one tidbit. But give us an update on that. I would love to hear.

[01:29:40] Abubakar Abid: Yeah, yeah, definitely. Yeah, so we're launching Gradio 4. 0 on Tuesday October 31st. And basically, the, the team has been working very, you mentioned earlier that, people are, are building these very, very complex apps with, with Gradio and, and really, honestly, stuff that we did not anticipate when we were designing Gradio.

[01:29:57] And more and more, what we want to do is almost take ourselves out of this feed, feedback loop. And let people build what they want to build, but let the community build stuff that, whatever you imagine, kind of just be, just be able to put that in a Gradio app. Let me be a little bit more concrete.

[01:30:11] So what is Gradio 4. 0 going to introduce? For example it's going to introduce the idea of custom components. So if you know a little bit of Python, a little bit of JavaScript, you can build your own component. You can use that within a Gradio app, just like you do normally, just like you use our built in, 30 or so built in components.

[01:30:27] Speaking of the built in components, we're redesigning some of the components from scratch, particularly the media components. So things like image audio video, they're going to be much, much nicer and they're going to be fully accessible. So one of the things that we're realizing is that, we're not, at Gradio, we're not just building a product for a specific audience, but we're building tools that let people build, apps for many different audiences.

[01:30:50] And so we want to make sure that all of the core components are accessible. That way it's easy to do the right thing and build accessible web applications. So we're redesigning that we're switching over from WebSockets to as server side events. There's several reasons for this and, and we'll talk about more about this on, on Tuesday.

[01:31:07] We're, we're having a little long live stream as well, but there's several reasons why server side events is the way to go for Gradio. And so there's, that's more of an internal refactor. You probably won't notice things. You might notice some speed ups in certain situations. It'll unlock a lot of things later on.

[01:31:22] We're open sourcing the the the sharing links process, the share servers at Gradio. So everyone will be able to set up their own, custom share links. So instead of, whatever dot Gradio dot live, you can have, you can have, some, some code, dot turjom dot video if you want, you can have whatever URL, custom URL you want for your share links.

[01:31:42] And then a lot of other changes as well we'll, we'll, we'll talk more about that on Tuesday. The team has been working super hard, so I'm, I'm excited to, to get it out for you guys to try out.

[01:31:51] Alex Volkov: That's so awesome, and, and can't, can't wait to see this. I I think the share links is like such a powerful virality thing, that once people start adding this to their domains, and, and start running different Gradio interfaces within Colab, outside of Colab, with their own domains.

[01:32:08] I think it's going to be super cool, especially if they don't expire. I absolutely received many of these links over DMs from multiple people. I think even people in the audience so far. And I think adding them to the custom domains. Thank you for open sourcing that. That's great.

[01:32:21] Abubakar Abid: I think part of it is also we want to reduce the load on our shared servers.

[01:32:25] We're getting too many of these links being created

[01:32:27] and stuff.

[01:32:27] Alex Volkov: Yes, absolutely. And I think the accessibility features are great. Folks, definitely check out follow Abu Bakr, follow Yuchi, and folks on stage, do Ali as well. To stay tuned to what's coming up to Gradio and then make sure to update your Gradio interfaces to the new accessible ones because you're building it's no longer demos.

[01:32:46] Everybody's using every new model is, is getting a Gradio interface and accessibility is very important for the level of web applications. I think... With that, I want to thank you guys for coming up and sharing with us Radio Light, which is very, very in accordance to what we love to talk about here, open source, open source LLM, on device inference, and taking control of your own, LLMs.

[01:33:07] I think, Nistan, you briefly, briefly talked about how crazy it's going to be. Where there is an LLM built into your website or web application that runs on the GPU of your device and is able to do stuff. And you can interact with it without basically offline. That's great. I think Nishtan, something that I want to talk about, but maybe I'll let you talk about this.

[01:33:27] I will say this thing. Now that we've concluded the interview with Gladio folks, one of the things that we love. Most of all, on Thursday, I have breaking news, and we actually have some breaking news. Nissen, go ahead, please, present the breaking news that you just sent.

[01:33:39] Nisten Tahiraj: I pasted a Gradius space above.

[01:33:43] If you click on it, that's what it is. And it's it's Kokui's, new release, new voice model. This is huge because they're allowing fine tuning on their voice model. And one criticism of the open source voice models has been that the dataset for training them has been of poor quality, like the, the microphone and, and stuff and the, the dataset that people use to, to train the models has been bad.

[01:34:12] So this is. Pretty important in this regard, because it's one of the very few, there's the one that Zenova released, and the Kokui one, that are open source and usable when it comes to text to speech, and that are like somewhat, somewhat pleasant, and that run relatively fast. Otherwise, it's pretty hard to have text to speech.

[01:34:37] Yeah, the, the part that you can fine tune, they, they open source the fine tuning code. Yeah, go there and

[01:34:43] Alex Volkov: get that, yeah. Thank you, Nissan. The, the folks from Cochlea, when they released XTTS, which is the open source text to speech that kind of, we know 11 Labs, we know Play. ht, we know OpenAI has one that Spotify uses to translation, and OpenAI haven't released any.

[01:34:59] We'll see next week if they're gonna give us an API for that. All of those require... A lot of money, just a lot of money 11 labs is basically rolling in cash because everybody wants to get their AIs to talk, right? And so we previously here, we talked about the listen part, we've talked about the streaming from Gladiator, but now you can have Whisper basically streaming.

[01:35:18] The other part of that was, hey, well, once your LLM listens and thinks, which is the inference part, you also want it. to talk to you. And TTS Texas speech is the way to do that. And Kokui, we had a chat with Joshua when they released TTS, which was very exciting. And now live on stage, live on Thursday.

[01:35:34] I, because this is why Thursday exists. Many people release stuff on Thursday. There is their own fine tuning with minutes of data. So you can create a voice, let's say maybe this is going to be disappointing for folks here on stage, but everybody here who spoke on stage more than a minute now is basically public for everybody else to take your voice and clone it with XTDS.

[01:35:56] It was possible before, somebody just had to pay money for it, but now... And Ali's laughing because Ali didn't talk yet, but everybody's basically now is, is going to get voice cloned. It's very easy. We're going towards this future. If this future scares you, there's no escape from that. Even I think VAL E from Microsoft, when it was released, they talked about like maybe 30 seconds of voice is enough to clone.

[01:36:18] But XTTS now gives us basically a framework and, and even the new language they said to, to add to Kokui, to the XTTS, and then you can use this. Within Transformers. Zinova, can we use Cochlear within Transformers. js or not yet? I think, I think we can. Not yet. Not yet. Okay, so soon, soon you'll be able to even do this all completely all within the browser.

[01:36:41] Hopefully once integration with WebGPU lands. So here we have it folks. We had an incredible Thursday I today. We started with talking with Bo and the guys from the embeddings team that released like the, the most kind of up how should I say, most comparable to OpenAI embeddings in open source.

[01:36:59] And that was great. Bo actually gave us like a masterclass in how embeddings work and Jina embedding models are available within, talked with Bakar and Yuchi and ANOVA and Arthur and all on stage the team behind Grado that if you haven't used Grado, you probably have used Grado. You just didn't know that it's Grado And, actually, this interface or slash library that started for demos only scaled all the way up to something like automatic, where like multiple people compute, contribute thousands of contributions including like NVIDIA and and I think IBM contribute now. It's like a full businesses run on this quote unquote component library.

[01:37:37] And I I just want to invite you to join Thursday and next week as well because... Some of this was planned, but definitely not all of this. But also, this is the way to stay up to date. And next week we're going to see some more incredible things. I think some very interesting things are coming up.

[01:37:52] I will have a personal announcement to make that's going to be very surprising to some folks here on stage. But definitely we'll keep ThursdAI going significantly, significantly more. And with that, I just want to thank you for joining us. It's been a pleasure to have these. It's been a pleasure to like have a space where, the graduate folks and the Jina folks can come and talk about what they released.

[01:38:12] And we can actually ask them questions. I want to thank everybody who joined on stage. Nistan, thank you always for joining, Zinova, Arthur. We, we were joined by new folks that we'll introduce next. Thank you so much for joining us this week because we just don't have the time and obviously thank you for folks in the audience who join every week and I see Enrico in there and Junaid and Tony and some other folks that like I love to see from week to week.

[01:38:33] If you missed any part of this, any part at all, or if the internet connection for you got stuck ThursdAI is about a live recording, but then it's getting released as a podcast episode, so you will get, if you're subscribed, and you should be already subscribed to ThursdAI on Apple or Spotify, you'll get this episode hopefully very quickly edited, if I'm not getting lost in some other interesting stuff, like Kokui.

[01:38:57] Thank you. And we will also release a newsletter with all the links and the conversations with Guardia team and, and Bo and all the updates as well in the form of links. And with that, I thank you for joining. It's been two hours. It's been a lovely time and now I need to go and actually edit the podcast.

[01:39:12] See you here next week. Thank you and yeah, please share with your friends as much as possible. The more crowd there is, the better these will be. And yeah, help and participate. Thank you all and have a good rest of your week. Bye bye.



This is a public episode. If you’d like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe
Thu, 26 Oct 2023 22:31:43 GMT
🔥 ThursdAI Oct 19 - Adept Fuyu multimodal, Pi has internet access, Mojo works on macs, Baidu announces ERNIE in all apps & more AI news

Hey friends, welcome to ThursdAI Oct - 19. Here’s everything we covered + a little deep dive after the TL;DR for those who like extra credit.

ThursdAI - If you like staying up to date, join our community

Also, here’s the reason why the newsletter is a bit delayed today, I played with Riffusion to try and get a cool song for ThursdAI 😂

ThursdAI October 19th

TL;DR of all topics covered:

* Open Source MLLMs

* Adept open sources Fuyu 8B - multi modal trained on understanding charts and UI (Announcement, Hugging face, Demo)

* Teknium releases Open Hermes 2 on Mistral 7B (Announcement, Model)

* NEFTune - a "one simple trick" to get higher quality finetunes by adding noise (Thread, Github)

* Mistral is on fire, most fine-tunes are on top of Mistral now

* Big CO LLMs + APIs

* Inflection Pi got internet access & New therapy mode (Announcement)

* Mojo 🔥 is working on Apple silicon Macs and has LLaMa.cpp level performance (Announcement, Performance thread)

* Anthropic Claude.ai is rolled out to additional 95 countries (Announcement)

* Baidu AI announcements - ERNIE 4, multimodal foundational model, integrated with many applications (Announcement, Thread)

* Vision

* Meta is decoding brain activity in near real time using non intrusive MEG (Announcement, Blog, Paper)

* Baidu YunYiduo drive - Can use text prompts to extract precise frames from video, and summarize videos, transcribe and add subtitles. (Announcement)

* Voice & Audio

* Near real time voice generation with play.ht - under 300ms (Announcement)

* I'm having a lot of fun with Airpods + chatGPT voice (X)

* Riffusion - generate short songs with sound and singing (Riffusion, X)

* AI Art & Diffusion

* Adobe releases Firefly 2 - lifelike and realistic images, generative match, prompt remix and prompt suggestions (X, Firefly)DALL-E 3 is now available to all chatGPT Plus uses (Announcement, Research paper!)

* Tools

* LMStudio - a great and easy way to download models and run on M1 straight on your mac (Download)

* Other

* ThursdAI is adhering to the techno-optimist manifesto by Pmarca (Link)

Open source mLLMs

Welcome to multimodal future with Fuyu 8B from Adept

We've seen and covered many multi-modal models before, and in fact, most of them will start being multimodal, so get ready to say "MLLMs" or... we come up with something better.

Most of them so far have been pretty heavy, IDEFICS was 80B parameters etc'

This week we received a new, 8B multi modal with great OCR abilities from Adept, the same guys who gave us Persimmon 8B a few weeks ago, in fact, Fuyu is a type of persimmon tree (we see you Adept!)

In the podcast I talked about having 2 separate benchmarks for myself, one for chatGPT or any MultiModal coming from huge companies, and another for open source/tiny models. Given that Fuyu is a tiny model, it's quite impressive! It's OCR capabilities are impressive, and the QA is really on point (as well as captioning)

An interesting thing about FuYu architecture is, because it doesn't use the traditional vision encoders, it can scale to arbitrary image sizes and resolutions, and is really fast (large image responses under 100ms)

Additionally, during the release of Fuyu, Arushi from Adept authored a thread about visualQA evaluation datasets are, which... they really are bad, and I hope we get better ones!

NEFTune - 1 weird trick of adding noise to embeddings makes models better (announcement thread)

If you guys remember, a "this one weird trick" was discovered by KaiokenDev back in June, to extend the context window of LLaMa models, which then turned into RoPE scaling and YaRN scaling (which we covered in a special episode with the authors)

Well, now we have a similar "1 weird trick" that by just adding some noise to embeddings at training time, the model performance can grow by up to 25%!

The results very per dataset of course, however, considering how easy it is to try, literally:

It's as simple as doing this in your forward pass if training: return orig_embed(x) + noise else: return orig_embed(x)

We should be happy that the "free lunch" tricks like this exist.

Notably, we had a great guest, Wing Lian the maintainer of Axolotl, a very popular tool to streamline fine-tuning, chime in and say that in his tests, and among the discord folks, they couldn't reproduce some of these claims (as they are adding everything that's super cool and beneficial for finetuners to their library) so it remains to be seen how far this "trick" scales, and what else needed to be done here.

Similarly, back when the context extend trick was discovered, there was a lot of debates about it's effectiveness from Ofir Press (author of ALiBi, another context scaling methond) and futher iterations of the trick made into a paper and a robust method, so this develompment is indeed exciting!

Mojo 🔥 now supports Apple silicon Macs and has LLaMa.cpp level performance!

I've been waiting for this day! We've covered Mojo from Modular a couple of times and it seems that the promise behind it starts to materialize. Modular promises an incredible unbelieavable 68,000X boost over vanilla python, and it's been great to see that develop.

Today (October 19) they have released their support of Mojo Lang on Apple silicon which most developers use, and it's a native one and you can use it right now via CLI.

A friend of the pod Aydyn Tairov, hopped on the live recording and talked to use about his LLama.🔥 project (Github) that he ported to the Apple silicon, and showed an incredible, LLaMa.cpp like performance, without crazy optimizations!

Aydyn collected many LLaMa implementations, including Llama.cpp, LLama.c by Karpathy and many others, and included his LLama.mojo (or Llama.🔥) and saw that the mojo one is coming very very close to LLama.cpp and significantly beats Rust and Go and Julia examples (on specific baby llama models)

The Mojo future is bright, and we'll keep updating with more, but for now, go play with it!

Meta is doing near-real time brain → image research! 🤯

We've talked about fMRI signals (and EEG) signals being translated to diffusion imagery before, and this week, Meta has shown that while fMRI signals to brain imagery is pretty crazy on it's own, using something called MEG (non invasive Magnetoencephalography) they can generate and keep generating images based on the brain signals, in near real time!

[TK video here]

I don't have a LOT to say about this topic, besides the fact that as an Aphant (I have Aphantasia) I can't wait to try this on myself and see what my brain actually "sees"

Baidu announces ERNIE and a bunch AI native products including maps, drive, autonomous ride hailing and more.

Baidu has just wrapped up their biggest conference of the year, BaiduWorld, where they announced a new version of their foundational model called ERNIE4, which is a multimodal (of unknown size) and is now integrated into quite a few of their products, many of which are re-imagined with AI.

A few examples beyond a basic LLM chat like interface are, a new revamped map experience with an AI assistant (with voice) built in to help you navigate and find locations, a new office management app that handles appointments and time slots called InfoFlow, and it apparently even does travel booking, to an AI "google drive" like product called YunYidou, that is able to find video content, based on what was said and when, and even pinpoint specific frames, summarize and do a bunch fo other incredible AI stuff, here's a translated video of someone interacting with YunYinou and asking for a bunch of stuff one after another.

Disclosure: I don't know if the video is edited or in real time.

Voice & Audio

Real time voice for agents is almost here, chatGPT voice mode is powerful

I've spent maybe 2 hours this week, with chatGPT in my ear, using the new voice mode + AirPods. It's almost like... being on a call with chatGPT. I started talking to it in the store, asking for different produce to buy for a recipe, then drove home and ask it to "prepare" me for the task (I don't usually cook this specific thing) and then during my cooking, I kept talking to it, asking for next steps. With the new IOS the voice mode shows up as a live activity and you can pause it and resume without opening the app:

It was literally present in my world, without me having to watch the screen or type.

It's a completely new paradigm of interactions when you don't have to type anymore, or pick up a screen and read, and it's wonderful!

Play.ht shows off an impressive

Fri, 20 Oct 2023 02:22:07 GMT
A week of horror, an AI conference of contrasts

A week of horror, an AI conference of contrasts

Hi, this is Alex. In the podcast this week, you'll hear my conversation with Miguel, a new friend I made in AI.engineer event, and then a recap of the whole Ai.engineer event I had with Swyx after the end.

This newsletter is a difficult one for me to write, honestly, I wanted to skip this one entirely, struggling to fit the current events into my platform and the AI narrative, however, decided to write one anyway, as the events of the last week have merged into 1 for me in a flurry of contrasts.

Contrast 1 - Innovation vs Destruction

I was invited (among a few other Israelis or Israeli-Americans) to the ai.engineer summit in SF, to celebrate the rise of the AI engineer, and I was looking forward to that very much. Meeting many of you (Shoutout to everyone who listens to ThursdAI who I've met face to face!) and talking to new friends of the pod, interviewing speakers, meeting and making connections was a dream come true.

However a few days before the conference began, in a stark contrast to this dream, I had to call my mom, who was sheltering, 20km from the Gaza strip border, to ask if our friends and family are alive and accounted for, and to hear sirens as rockets flying above her head, as Hamas terrorists murder, pillage and kidnap, in what seems to be the 10x equivalent of 9/11 terror attack, relative to population size.

I grew up in Ashkelon, rocket attacks are nothing new to me, we've learned to live with them (thank you Iron Dome heroes) but this was something else entirely, a new world of terror.

So back to the conference, given that there's not a lot to be gained by doom scrolling, and watching (basically snuff) films coming out of the region, given that all my friends and family were accounted for, I decided to not give the terrorists what they want (which is to get people in state of terror) and instead to choose to have compassion, without empathy towards the situation and not bring sadness to every conversation I had there (over 200 I think)

So participating at an AI event, which hosts and celebrates folks who are literally at the pinnacle of innovation, building the future, using all the latest tools while also hurting and holding the dear ones in my thoughts was a very stark contrast between past and future, and huge credit goes to Dedy Kredo, CTO of Codium, who was in the same position, and gave a hell of a talk, with a kick-ass (no backup recording!) demo live, and then shared this image:

This is his co-founder, Itamar, who was called to reserve duty to protect his family and country, sitting with his rifle and his dashboard, seeing destruction + creation, past and future, negativity and positivity all at once. As Dedy masterfully said, we will prevail 🙏

Contrast 2 - Progress // Fear

At the event, Swyx and Benjamin gave me a media pass and a free reign, and I asked to be teamed with a camera-person to go around the event and do some (not live) interviews. I was teamed with the lovely Stacey, from Chico, CA. Stacey has nothing to do with AI, in fact she's a wedding photographer, however she definitely listened with interest to the interviews I was holding, and to speakers on stage.

While we were taking a break, I looked out the window, and saw a driverless car (waymo) zip by, and since they only started operating after I left SF 3 years ago, I didn't yet have a chance to ride in one.

So I asked Stacey and some other folks, if they'd like to go for a ride, and to my complete bewilderement, Stacey said "no 😳" and when I asked why not, she didn't want to admin but then said that it's scary.

This struck me and since that moment, I've had as many conversations with Stacey as I had with other folks who came to be AI.engineers, since this was such a stark contrast between progress and fear. I basically was walking, almost hand in hand, with a person who doesn't use or understand AI, and fears it, amongst the folks who are building the future, exist at the pinnacle of innovation and discuss how to connect more AI to more AI, and how to build complete autonomous agents to augment human productivity and bring about the world of abundance.

This contrast was supported by several new friends of mine, who came to the AI.engineer and SF for the first time, from countries where English is not the first language, and where Waymo's are not zipping about on the streets freely, and it highlighted for me, how much of this shift is global, and how concentrated the decision making, the building, the innovation is, within the arena, SF, California and US. It's almost expected that AI is going to speak english, and to use/build it, we have to speak it as well, while most of the world doesn't use English as their first language.

Contrast 3 - Technological // Spiritual

This contrast was intimate and personal to me. You see, this ai.engineer event was the first such sized event, professional, with folks talking "my language" since I had burned out this summer. If you've followed for a while, you may remember we talked about Lk-99 and superconductor, and I overclocked myself back then so much (scaling a whole another podcast, hosting 7 spaces in 2 weeks, creating a community of 1,500 and following all the news 24/7) that I had didn't want to go on speaking, doing spaces, recording podcasts... I was just done.

Luckily my friend Junaid sent me a meditation practice recording with the saying "fill your own cup, before you give out to others"

That recording led me to discover effortless meditation (via AirChat, DM me for details), and going on a 5 day free virtual retreat, which changed my life in several very significant ways and just keeps going.

So back to the conference, coming from that place, a place of center, a place where I balance my own inner piece + interact with 150+ people (I think, I haven't really counted) and equalizing energy among them, connecting many folks to many other folks, introducing and getting introduced, giving interviews, remembering names and faces, this was an incredible experience to me, and many of my new friends asked me, what's going on, and why and how I meditate for an hour every day.

So the contrast between my own spiritual journey, and the technological journey was not a very strong one, beyond 1 very... interesting and weird conversation I had, everyone else seemed to be into self improvement, spiritual growth, I've met many folks who went on vipasana retreats, had meditation practices themselves or used to have one and wanted to come back to it.

Maybe it was self selection, but I really enjoyed seeing and hearing all the stories from folks, as I believe that given contrast number 2, the way we talk about AI, the way we build AI, the way we then show AI to the world, needs to take into account compassion towards those who don't understand it, who fear it, and want nothing to do with it.

Not to mention, that I doubt that any of the doomers are walking on their own spiritual path.

I wanted to just send this newsletter but as I always do, I pasted it into Claude.ai, and asked for a review. And Claude suggested I end it on a positive note, because the middle was heavy, and even gave me some suggestion as to how to end it on a positive note. So instead of me doing it, here's an AI, doing a human job on ending something on a positive note:

[Claude] Here are some examples of how you could end your newsletter on a more hopeful, forward-looking note:

* Remind readers of the optimism and excitement you felt at the AI conference, and how seeing the potential of technology firsthand gives you hope for the future.

* Share an insight or realization you had about how to stay grounded and centered even amidst chaos - something your spiritual practices have taught you.

* Affirm that while destruction and violence will always be part of the human experience, you believe innovative thinkers will lead us to a brighter tomorrow.

* Note how connecting deeply with other humans, even strangers, fills you with hope. Finding common ground and shared humanity counterbalances the forces that divide.

* Point to a specific AI-related project or breakthrough that gives you hope for progress. e.g. medical advancements, climate change solutions, etc.

* Share an inspirational quote or verse that reflects resilience and perseverance in difficult times.

* End by expressing gratitude - for your community of readers, your loved ones, the chance to have profound human experiences, or your inner stillness amidst outer turmoil.

The key is ending on a note of hope, possibility, love, resilience - something that uplifts readers after processing weighty material. Let me know if these spark any ideas for how to conclude your newsletter on a high note!

P.S - I know this newsletter is not what you expected, and maybe not what you signed up for, and I deliberated if I even should write it and what if anything should I post on the podcast. However, this week was an incredibly full of contrast, of sadness and excitement, of sorrow and bewilderment, so I had to share my take on all this.

P.P.S - as always, if you read all the way to the end, dm me the ☮️ emoji



This is a public episode. If you’d like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe
Fri, 13 Oct 2023 19:21:00 GMT
📅 ThursdAI Oct 4 - AI wearables, Mistral fine-tunes, AI browsers and more AI news from last week

Boy am I glad that not all AI weeks are like last week, where we had so much news and so many things happening that I was barely able to take a breath for the week!

I am very excited to bring you this newsletter from San Fancisco this week, the AI mecca, the arena, the place where there are so many AI events and hack-a-thons that I don’t actually know how people get any work done!

On that topic, I’m in SF to participate in the AI.engineer (by and ) next week, to host spaces and interviews with the top AI folks in here, and to discuss with the audience, what is an AI engineer, if you have any questions you’d like me to ask, please comment with them and I’ll make sure I’ll try to answer.

ThursdAI - subscribe eh? ↴

Here’s a table of contents of everything we chatted about:

[00:00:00] Intro and welcome

[00:04:53] Alex in San Francisco - AI Engineer

[00:07:32] Reka AI - Announcing a new multimodal Foundational model called Yasa-1

[00:12:42] Google adding Bard to Google Assistant

[00:18:56] Where is Gemini?

[00:23:06] Arc browser adding Arc Max with 5 new AI features

[00:24:56] 5 seconds link AI generated previews

[00:31:54] Ability to run LLMs on client side with WebGPU

[00:39:28] Mistral is getting love from Open Source,

[00:48:04] Mistral Open Orca 7B

[00:58:28] Acknowledging the experts of ThursdAI

[01:01:14] Voice based always on AI assistants

[01:09:00] Airchat adds voice cloning based translation tech

[01:14:23] Effects of AI voice cloning on society

[01:21:32] SDXL IKEA LORA

[01:23:17] Brief Recap

Show notes:

Big Co

* Google - adding Bard to Google Assistant (Announcement)Come on google, just give us Gemini already!

* Reka AI - Multimodal Yasa-1 from Yi Tay and team (Announcement)With Yi Tay from Flan/Bard fame as chief scientist! But I wasn’t able to test myself!

* Arc - first browser AI features (My thread, Brief video review, Arc Invite)I love Arc, I recommend it to everyone I meet, now with AI preview features it’s even more a non brainer, strongly recommend if you like productivity

Open Source LLMs

* Mistral vs LLaMa 2 boxing match (link)A fun little battle arena to select which responses you personally find better to see the difference between Mistral 7B and LLaMa 13B

* Mistral-7B-OpenOrca (announcement)The folks from Alignment labs do it again! Great finetune that comes very close (98%) to LLaMa 70B on benchmarks!

* SynthIA-7B-v1.3 - (Huggingface)An uncensored finetune on top of Mistral that Reddit claims is a great model, especially since a chain of thought is somehow built in apparently

VISION

* Radiologists thread about GPT-4 V taking over radiology (or maybe not?) (Thread)

Voice

* AirChat added voice clone + translation features (Room, Demo)I’ve been an avid AirChat user (It’s Naval’s social media platform that’s voice based) for a while, and am very excited they are destroying language barriers with this feature!

* Tab was revealed in a great demo by Avi Schiffman (Demo)Go Avi! Rooting for you brother, competition makes folk stronger!

* Rewind announced Rewind Pendant (Announcement)I ordered one, but Rewind didn’t announce a date of when this hits the market, going to be interesting to see how well they do!

Ai Art and Diffusion

- IKEA Lora generate IKEA style tutorials for everything with SDXL (Announcement, HuggingFace)

* DALL-E3 seems to be available to all Plus members now

This weeks pod was generated by talking to chatGPT, it’s so fun, you gotta try it!

No longer breakdown this week ,but we covered a bunch of it in the show, and I highly recommend listening to it!

Don’t forget to follow me on X to be aware of the spaces live from ai.engineer event in SF, the conference will be live-streamed as well on youtube!

See you next week 🫡

ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.



This is a public episode. If you’d like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe
Thu, 05 Oct 2023 22:25:24 GMT
📅🔥ThursdAI Sep 28 - GPT4 sees, speaks and surfs, Cloudflare AI on GPUs,Mistral 7B, Spotify Translates, Meta AI everywhere, Qwen14B & more AI news from this INSANE week

[00:00:00] Intro and welcome everyone

[00:00:52] GPT4 - Vision from OpenAI

[00:05:06] Safety concern with GPT4-V

[00:09:18] GPT4 can talk and listen as well

[00:12:15] Apple rumors, on device inference, and Siri

[00:17:01] OpenAI Voice Cloning Tech used in Spotify to translate podcasts

[00:19:44] On the risks of Voice Cloning tech being open sourced

[00:26:07] Alex statement on purpose of ThursdAI

[00:27:53] “AGI has been achieved internally”;

[00:32:10] OpenAI, Jonny Ive and Masa are rumored to be working on a hardware device

[00:33:51] Cloudflare AI - Serverless GPU on global scale

[00:37:13] Cloudflare AI partnership with HuggingFace to allow you to run many models in your own

[00:40:34] Cloudflare announced the Vectorize DB and embedings on edge

[00:46:52] Cloudflare AI gateway - proxy LLM calls, caching, monitoring, statistics and fallback

[00:51:15] Part 2 - intro an recap

[00:54:14] Meta AI announcements, bringing AI agents to 3 billion people next month

[00:56:22] Meta announces EMU image model to be integrated into AI agent on every platform

[00:59:38] Meta RayBan glasses upgraded to spatial computing, with AI and camera access

[01:00:39] On the topic os smart glasses, GoogleGlass, and the acceptance society wide to have

[01:05:37] Safety and societal implications of everyone having glasses and recording everything

[01:12:05] Part 3 - Open Source LLMs, Mistral, QWEN and CapyBara

[01:21:27] Mistral 7B - SOTA 7B general model from MIstralAI

[01:23:08] On the topic of releasing datasets publically and legal challenges with obtaining that

[01:24:42] Mistral GOAT team giving us a torrrent link to a model with Apache 2 license.

Truly, as I’ve been doing these coverages in one form or another for the past 9 months, and I don’t remember a week this full of updates, news, state of the art open source models and more.

So, here’s to acceleration (and me finally facing the fact that I need a niche, and decide what I’ll update on and what I won’t, and also be transparent with all of you about it)

On a separate note, this past two weeks, ThursdAI had exposure to Yann Lecun (RTs), joined on stage by VP of DevRel in Cloudflare and their counterpart in HuggingFace, CEO of Anaconda joined us on stage this episode and we’ve had the chief scientist of Mistral join in the audience 😮 ThursdAI really shapes to be the place where this community meets, and I couldn’t be more humbled and prouder of the show, the experts on stage that join from week to week, and the growing audience 🙇‍♂️ ok now let’s get to the actual news!

ThursdAI - Weeks like this one highlight how important it is to stay up to date on many AI news, subscribe, I’ve got some cool stuff coming! 🔥

All right so here’s everything we’ve covered on ThursdAI, September 28th:

(and if you’d like to watch the episode video with the full transcript, it’s here for free):

Show Notes + Links

* Vision

* 🔥 Open AI announces GPT4-Vision (Announcement, Model Card)

* Meta glasses will be multimodal + AI assistant (Announcement)

* Big Co + API updates

* Cloudflare AI on workers, serverless GPU, Vector DB and AI monitoring (Announcement, Documentation)

* Cloudflare announces partnerships with HuggingFace, Meta

* Claude announces $4 billion investment from Amazon (Announcement)

* Meta announces AI assistant across WhatsApp, Instagram

* Open Source LLM

* 🔥 Mistral AI releases - Mistral 7B - beating LLaMa2 13B (Announcement, Model)

* Alibaba releases Qwen 14B - beating LLaMa2 34B (Paper, Model, Vision Chat)

* AI Art & Diffusion

* Meta shows off EMU - new image model

* Still waiting for DALL-E3 😂

* Tools

* Spotify translation using Open AI voice cloning tech

Vision

GPT 4-Vision

I’ve been waiting for this release since March 14th (literally) and have been waiting and talking about this on literally every ThursdAI, and have been comparing every open source multimodality image model (IDEFICS, LlaVa, QWEN-VL, NeeVa and many others) to it, and none came close!

And here we are, a brief rumor about the upcoming Gemini release (potentially a multimodal big model form Google) and OpenAi decided to release GPT-4V and it’s as incredible as we’ve been waiting for!

From creating components from a picture of UI, to solving complex math problems with LaTex, to helping you get out of a parking ticket by looking at a picture of a complex set of parking rules, X folks report that GPT4-V is incredibly helpful and unlocks so many new possibilities!

Can’t wait to get access, and most of all, for OpenAI to land this in the API for developers to start building this into products!

On the pod, I’ve talked about how I personally don’t believe AGI can work without vision, and how personal AI assistants are going to need to see what I see to be really helpful in the real world, and we’re about to unlock this 👀 Super exciting.

I will add this one last thing, here’s Ilya Sutskever, OpenAI chief scientist, talking about AI + Vision, and this connects with our previous reporting that GPT-4 is not natively multimodal (while we’re waiting for rumored Gobi)

If you need more use-cases, check out this great breakdown by friend of the pod, SkalskiP (Pyotr) who is a vision engineer at RoboFlow which got really high Hacker News rankings.

https://blog.roboflow.com/gpt-4-vision/

Meta RayBan smartglasses will have multimodal AI 👀

To add to the above increased interest about AI (and to rumors about OpenAI working with Jonny Ive from Apple + Masayoshi San about a rumored hardware device) Meta has announced a new iteration of their RayBan glasses, that will include a camera that will help you go live, include an AI agent in the Glasses and most of all, will be multimodal, by which they mean, the AI agent in there (we don’t know if it’s LLaMa based or something else) will have access to the camera, and to what you see.

Given how well this works, it may be revolutionary on it’s own right!

I’ve been on a MultiModality kick since that incredible March 14th day, and I’m very excited that it’s here! 🙌

Big CO + API updates

Cloudflare AI - Serverless GPU inference, VectorDB and AI Gateway

I was blown away by this, so much so, that I’ve hopped on an emergency space on Wednesday, to talk all about this. Some of you know, I’ve created https://targum.video a year ago, and it’s been accepted to CloudFlare workers launchpad. The whole website and backend is on workers, but the GPU and inference, I had to build in python and put on a LambdaLabs GPU machine.

So starting today, folks could build something like Targum, end to end on Cloudflare with the announcement of GPU inference.

If you’d like all the details, I was really humbled to host Ricky Robinette (VP Developer Experience @ Cloudflare) and Phillip Schmidt from Hugging Face join the X space on launch day (to my complete surprise) and you can find that conversation here (it’s going to be on the pod soon after I find some time to edit this 😅)

Here’s my notes from that conversation:

* Inference on edge is here

* Serverless GPUs on cloudflare edge network

* Integrated with Workers platform

* What is the workers platform

* Give example of the many tools it has

* Targum example for what is done on workers and what is done on GPU

* Easy to get started and deploy

* Will have a free tier 🔆

* Models and Usecases

* LLMs - LLaMa 7B

* Embeddings - BGE-base

* Text Classification - DistillBert

* Translation - m2m100

* ASR - Whisper

* Preselected models right now

* Vectorize - an edge native vector DB

* Integrates with wrangler and ecosystem

* Supports existing vectors from OpenAI Ada (importable)

* Metadata can include R2 objects, KV storage and more!

* Build and deploy full RAG apps, including your own local models all inside 1 platform

* AI - gateway

* Proxy for OpenAI (and other providers calls)

* Shows a usage dashboard

* Global Coverage:

* Plan to be in 100 data centers by the end this year

* And nearly everywhere by the end of 2024

* WebGPU in workers

* Many HF models support ONNX

* WebGPU is now supporting FP-16

* This could open a new path to run smaller models within workers even without CFAI

* Partnership with HuggingFace

* 1 click deploy in a dropdown on half a million models

* Serverless inference - no more downloading and uploading

* Way faster as well

* Cloudflare will have a de-facto proxy/mirror of HF? 🤔

I’m very very excited by the HuggingFace partnership and you can hear it in the recording!

Meta announces AI assistant across chat apps, Instagram, WhatsApp, Messenger

I haven’t tested this yet, but this is going to be incredible to make AI experiences to over 3B people around the world!

In addition to just “chat with AI” , Meta has partnered with many celebs to “Embody” them into AI characters, which I found.. a bit unsettling? But I guess we’ll see how much this will affect the “personas” of the AI assistants.

Open Source LLM

Qwen 14B with chat and vision versions

QWEN model from Alibaba, which we’ve already talked about multiple times, then was taken down from the web, comes back, with a vengeance!

Qwen team comes back with a 14B model, that beats LlaMa2 34B on most evaluations, including a VL version (only 7B), which according to my tests, was the best performing open source vision model even at 7B

It was really cool to see the Qwen authors interact with Yam and I on Twitter, it’s like crossing the great firewall and hopefully we’ll have that team on ThursdAI recording at some point!

🔥 Mistral 7B (torrent tweet) - SOTA LLM

Mistral team have made news when they raised $113 million without a product, just 3 co-founders, back in Jun, and the “takes” on twitter were, “we’re in a bubble, bla bla bla” and yesterday, this Goated team just posted a tweet with a magnet torrent link, and no description. So of course everybody downloaded it and found the best SOTA 7B model, that outperforms a much larger LLaMa 2 13B and MUCH larger LLaMa 34B on several benchmarks!

It even comes very close to the Code LLaMa performance benchmarks on code, while being a general model, which is incredible.

Needless to say, the team delivered the promise, and to see them commit this fully to OpenSource, by dropping a modal with Apache 2 license, straight to bit-torrent, is a great sight to see!

Also, we caught glimpes of Guillaume Lample in the audience while we were gassing Mistral up, and potentially at some point we may get Mistral folks to join a ThursdAI live space? 🫡

AI Art + Diffusion

Meta introduced EMU - A diffusion model integrated into it’s AI offerings with a /imagine command, available for free, in all their products, and it looks really good!

I wonder if it will do the same “chat with image” thing as DALL-E3 was announced to do, but in any case, giving this, for free, in this quality, to so many people, is remarkable 🙇‍♂️ Kudos to the team at Meta for ALL the releases today! Can’t wait to play with them.

Tools

Spotify translates podcasts using stealth OpenAI tech

Spotify announced translations for podcast, using some secret OpenAI voice cloning tech, and we had a long discussion about the implication of voice cloning, deep fakes and everything in between with Peter Wang and other folks on the pod, definitely recommended listening!

I love this, absolutely, not just because you may want to listen to ThursdAI pod in your native language (and I could finally show my mom who doesn’t speak English what I’m doing!) but also because language barriers should NOT exist, and Targum.video and this and all the models that Meta is releasing are a great testament to how fast language barriers are coming down!

I’m very very happy with this development and will keep you guys posted on these developments.

With that, I should probably stop here, it’s been an absolutely insane week, and if this summary helped, like, share and consider a premium subscription?

ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

P.S - If you scrolled all the way to here, send me 🧨 in a DM on any platform 😉, I may have something for you



This is a public episode. If you’d like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe
Fri, 29 Sep 2023 03:16:21 GMT
📆 ThursdAI Sep 21 - OpenAI 🖼️ DALL-E 3, 3.5 Instruct & Gobi, Windows Copilot, Bard Extensions, WebGPU, ChainOfDensity, RemeberAll

Hey dear ThursdAI friends, as always I’m very excited to bring you this edition of ThursdAI, September 21st, which is packed full of goodness updates, great conversations with experts, breaking AI news and not 1 but 2 interviews

ThursdAI - hey, psst, if you got here from X, dont’ worry, I don’t spam, but def. subscribe, you’ll be the coolest most up to date AI person you know!

TL;DR of all topics covered

* AI Art & Diffusion

* 🖼️ DALL-E 3 - High quality art, with a built in brain (Announcement, Comparison to MJ)

* Microsoft - Bing will have DALL-E 3 for free (Link)

* Big Co LLMs + API updates

* Microsoft - Windows Copilot 🔥 (Announcement, Demo)

* OpenAI - GPT3.5 instruct (Link)

* OpenAI - Finetuning UI (and finetuning your finetunes) (Annoucement, Link)

* Google - Bard has extensions (twitter thread, video)

* Open Source LLM

* Glaive-coder-7B (Announcement, Model, Arena)

* Yann Lecun testimony in front of US senate (Opening Statement, Thread)

* Vision

* Leak : OpenAI GPT4 Vision is coming soon + Gobi multimodal? (source)

* Tools & Prompts

* Chain of Density - a great summarizer prompt technique (Link, Paper, Playground)

* Cardinal - AI infused product backlog (ProductHunt)

* Glaive Arena - (link)

AI Art + Diffusion

DALL-E 3 - High quality art, with a built in brain

DALL-E 2 was the reason I went hard into everything AI, I have a condition called Aphantasia, and when I learned that AI tools can help me regain a part of my brain that’s missing, I was in complete AWE. My first “AI” project was a chrome extension that injects prompts into DALL-E UI to help with prompt engineering.

Well, now not only is my extension no longer needed, prompt engineering for AI art itself may die a slow death with DALL-E 3, which is going to be integrated into chatGPT interface, and chatGPT will be able to help you… chat with your creation, ask for modifications, alternative styles, and suggest different art directions!

In addition to this incredible new interface, which I think is going to change the whole AI art field, the images are of mind-blowing quality, coherence of objects and scene elements is top notch, and the ability to tweak tiny detail really shines!

Additional thing they really fixed is hands and text! Get ready for SO many memes coming at you!

Btw, I created a conversational generation bot in my telegram chatGPT bot (before there was an API with stability diffusion and I can only remember how addicting this was!) and so did my friends from Krea :) so y’know… where’s our free dall-e credits OpenAI? 🤔

Just kidding, an additional awesome thing that now, DALL-E will be integrated into chatGPT plus subscription (and enterprise) and will refuse to generate any living artists art, and has a very very strong bias towards “clean” imagery.

I wonder how fast will it come to an API, but this is incredible news!

P.S - if you don’t want to pay for chatGPT, apparently DALL-E 3 conversational is already being rolled out as a free offering for Bing Chat 👀 Only for a certain percentage of users, but will be free for everyone going forward!

Big Co LLM + API updates

Copilot, no longer just for code?

Microsoft has announced some breaking news on #thursdai, where they confirmed that Copilot is now a piece of the new windows, and will live just a shortcut away from many many people. I think this is absolutely revolutionary, as just last week we chatted with Killian from Open Interpreter and having an LLM run things on my machine was one of the main reasons I was really excited about it!

And now we have a full on, baked AI agent, inside the worlds most popular operating system, running for free, for all mom and pop windows computers out there, with just a shortcut away!

Copilot will be a native part of many apps, not only windows, here’s an example of a powerpoint copilot!

As we chatted on the pod, this will put AI into the hands of so so many people for whom opening the chatGPT interface is beyond them, and I find it incredibly exciting development! (I will not be switching to windows for it tho, will you?)

Btw, shoutout to Mikhail Parakhin who lead the BingChat integration and is now in charge of the whole windows division! It shows how much dedication to AI Microsoft is showing and it really seems that they don’t want to “miss” this revolution like they did with mobile!

OpenAI releases GPT 3.5 instruct turbo!

For many of us, who used GPT3 APIs before it was cool (who has the 43 character API key 🙋‍♂️) we remember the “instruct” models where all the rage, and then OpenAI basically told everyone to switch to the much faster and more RLHFd chat interfaces.

Well now, they brought GPT3.5 back, with instruct and turbo mode, it’s no longer a chat, it’s a completion model, that is apparently much better at chess?

An additional interesting thing is, it includes logprobs in the response, so you can actually build much more interesting software (by asking for several responses and then looking at the log probabilities), for example, if you’re asking the model for a multiple choice answer to a question, you can rank the answers based on logprobs!

Listen to the pod, Raunak explains this really well!

FineTune your finetunes

OpenAI also released a UI for finetuning GPT3.5 and upped the number of concurrent finetunes to 3, and now, you can finetune your finetunes!

So you can continue finetuning already finetuned models!

Bard extensions are like chatGPT plugins but more native.

While we wait for Gemini (cmon google, just drop it!) the multi modal upcoming incredible LLM that will beat GPT-4 allegedly, Google is shoving new unbacked features into Bard (remember Bard? It’s like the 5th most used AI assistant!)

You can now opt in, and @ mention stuff like Gmail, Youtube, Drive and many more Google services and Bard will connect to them, do a search (not a vector search apparently, just a keyword search) and will show you results (or summarize your documents) inside Bard interface.

The @ ui is really cool, and reminded me of Cursor (where you can @ different files or documentation) but in practice, from my 2 checks, it really didn’t work at all and was worse than just a keyword search.

Open Source LLM

Glaive-coder-7B reaches an incredible 63% on human eval

Friends of the pod Anton Bacaj and Sahil Chaudhary have open sourced a beast of a coder model Glaive-coder-7B, with just 7B parameters, this model achieves an enormous 63% on HumanEval@1, which is higher than LLaMa 2, Code LLaMa and even GPT 3.5 (based on technical reports) at just a tiny 7B parameters 🔥 (table from code-llama released for reference, the table is now meaningless 😂)

Yann Lecun testimony in front of US senate

Look, we get it, the meeting of the CEOs (and Clem from HuggingFace) made more waves, especially on this huge table, who wasn’t there, Elon, Bill Gates, Sundar, Satya, Zuck, IBM, Sam Altman

But IMO the real deal government AI thing was done by Yann Lecun, chief scientist at Meta AI, who came in hot, with very pro open source opening statements, and was very patient with the very surprised senators on the committee. Opening statement is worth watching in full (I transcribed it with Targum cause… duh) and Yann actually retweeted! 🫶

Here’s a little taste, where Yann is saying, literally “make progress as fast as we can” 🙇‍♂️

He was also asked about, what happens if US over-restricts open source AI, and our adversaries will … not? Will we be at a disadvantage? Good questions senators, I like this thinking, more of this please.

Vision

Gobi and GPT4-Vision are incoming to beat Gemini to the punch?

According to The Information, OpenAi is gearing up to give us the vision model of GPT-4 due to the hinted upcoming release of Gemini, a multi modal model from Google (that’s also rumored to be released very soon, I’m sure they will release this on next ThursdAI, or the one after that!)

It seems to be the case for both DALL-E 3 and the leak about GPT-4 Vision, because apparently Gemini is multi modal on the input (can take images and text) AND the output (can generate text and images) and OpenAI maybe wants to get ahead of that.

We’ve seen images of GPT-4 Vision in the chatGPT UI that were leaked, so it’s only a matter of time.

The most interesting thing from this leak was the model codenamed GOBI, which is going to be a “true” multimodal model, unlike GPT-4 vision.

Here’s an explanation of the difference from Yam Peleg , ThursdAI expert on everything language models!

Voice

Honestly, nothing major happened with voice since last week 👀

Tools

Chain of Density

The Salesforce AI team has developed a new technique for improving text summarization with large language models. Called Chain of Density (CoD), this prompting method allows users to incrementally increase the informational density of a summary.

The key insight is balancing the right amount of details and main ideas when summarizing text. With CoD, you can prompt the model to add more detail until an optimal summary is reached. This gives more control over the summary output.

The Salesforce researchers tested CoD against vanilla GPT summaries in a human preference study. The results showed people preferred the CoD versions, demonstrating the effectiveness of this approach.

Overall, the Salesforce AI team has introduced an innovative way to enhance text summarization with large language models. By tuning the density of the output, CoD prompts can produce higher quality summaries. It will be exciting to see where they take this promising technique in the future.

RememberAll - extend your LLM context with a proxy

We had Raunak from rememberAll on the pod this week, and that interview is probably coming on Sunday, but wanted to include this in tools as it’s super cool.

Basically with 2 lines of code change, you can send your API calls through RememberAll proxy, and they will extract the key information, and embed and store it in a vectorDB for you, and then inject it back on responses.

Super clever way to extend memory, here’s a preview from Raunak (demo) and a more full interview is coming soon!

Cardinal has launched on ProductHunt, from my friends Wiz and Mor (link)

Quick friendly plug, Wix and Mor are friends of mine and they have just launched Cardinal, an AI infused product backlog, that extracts features, discussion about feature requests, and more, from customer feedback, from tons of sources.

Go give them a try, if you’re looking to make your product backlog work better, it’s really really slick!

Hey, if you arrived here, do me a quick favor? Send me a DM with this emoji 🥔 , and then share this newsletter with 1 friend who like you, loves AI?

Thanks, I expect many potatoes in my inbox! See you next ThursdAI 🫡

Here’s the full transcript (no video this time, I’m finishing this up at 10:30 and video will take me at least 3 more hours, apologies 🙇‍♂️)

[00:10:21] Alex Introduces Yam Peleg

[00:10:57] Alex Introduces Nisten Tahiraj

[00:11:10] Alex Introduces Far El

[00:11:24] Alex Introduces Xenova

[00:11:44] Alex Introduces Roie S. Cohen

[00:11:53] Alex Introduces Tzafrir Rehan

[00:12:16] DALL-E 3 - An AI art model with a brain, coming to chatGPT plus

[00:20:33] Microsoft c launches Windows CoPilot

[00:30:46] Open AI leaks, GPT-4 Vision, Gobic

[00:38:36] 3.5 instruct model from OpenAI

[00:43:03] Raunak intro

[00:43:25] Bard Extensions allow access to GMail, Youtube, Drive

FULL transcript:

[00:00:00] Alex Volkov: So, Thursday I is this wonderful thing that happened and happened organically as well.

[00:00:26] And basically what happens is we have this live recording every Thursday, every ThursdAI on Twitter spaces. I am I'm very grateful to share the stage with experts in their fields, and we all talk about different things, because AI updates are so multidisciplinary right now. It's really hard for even experts in their one field to follow everything.

[00:00:51] I find this mixture of experts type model on stage very conducive because we all go and find the most up to date things from the last week. And then we have folks who, it's their specification, for example, to comment on them. And you guys in the audience get the benefit of this. And it just happened organically through many conversations we had on, on Spaces since GPT 4 was launched.

[00:01:16] Literally the day, March 14th, 2023 aka Pi Day. It was the first day we started these spaces, and since then the community has grown to just... An incredible amount of people who join quality experts, top of their field people. I'm, I'm just so humbled by all of this. And since then, many folks told me, like Roy here in the audience, that, Hey, Alex, you're doing this in this weirdest hour.

[00:01:42] Thursday a. m. in San Francisco, nobody's gonna come. It's really hard to participate in the actual live recording. And so, I started a newsletter and a podcast for this. And so, if you aren't able to make it, I more than welcome you to register to the newsletter. You know what? Even if you are here every week, register to the newsletter, because why not?

[00:02:03] Because, share it with your friends. We're talking about everything AI related. Hopefully, hopefully no hype. And I have friends here to reduce the hype when I'm getting too hypey. Definitely none of the, Hey, here's a new AI tool that will help you fix the thing you don't need fixing.

[00:02:18] And I think that's, that's been resonating with the community. And so, as you now are here, you're also participant in this community. I welcome everybody to Tag Thursday AI on their news about ai or #thursdAI, or just like the Thursday iPod, which probably should join this so people get some more visibility. but you are part of the community. Now, those of you who come back, those of you who listen in, those of you who share all of them. All of these things are very helpful for the community to grow and for us to just know about more stuff.

[00:02:49] It's actually an incredible signal when two or three or more of you react under a piece of news and say, hey, we probably should cover this in Thursday. It really helps, truly. I think with that, yeah, I think this intro is enough intro. Welcome. What's up, Tzafrir? How are you?

[00:03:06] Tzafrir Rehan: All's well. Thank you very much. I wanted to, to strengthen your point about the time factor. So we expand. So anyone here who wants to be a little bit interested in generative technologies and breaking news and have some things to do in the meanwhile, and also looking to actually build something cool from all of this.

[00:03:31] Time is the limiting factor here. That's like the, the hardest resource here. Having this group and having everyone explore everything together. It's a lifesaver. It's like a order of magnitude improvement on our ability to move forward each one individually. And that's a group together just to give examples.

[00:03:53] So I'm interested in generative images, videos, and audio. And for each of these, there are hundreds of models right now available. With the availability to make fine tunes on specific datasets for some of these generating a single asset like a video can take hours. Training takes hours. If you want to explore a little bit like the effect of different prompts, just generating hundreds of samples takes hours.

[00:04:26] So without this group, it would be impossible to even know. Where to go and where to invest my time And the name of the game right now is to just choose where you invest your time on To actually get things done and keep up. So thank you. Thank you. Thank you for you and for this group And let's have fun.

[00:04:46] Alex Volkov: Thank you. Thank you everyone. I definitely feel super powered by the people in this group who can like back me up on, I read one tweet and then I saw some people react to this tweet, but I didn't have the time or the capability or the experience to dive in.

[00:05:00] And then there's folks here who did, and then we're going to complete each other. And I think our model, I haven't shared since we started, but our motto is we stay up to date. So you don't have to and have to, I think is the operating word. You want to stay up to date and you're welcome to stay up to date and you're welcome to tag us and talk with us and leave comments here in the chat as well, but you don't have to anymore because, there's a, there's a newsletter that will update you and there's folks on stage who will talk about this.

[00:05:26] I want to briefly cover one tiny thing that I did on the podcast that I think I will start doing as well. So, so far editing this hour and a half, two hours that we have here live was a pain, but I just decided to lean into this because. The conversation we're having here is so much more informative and interesting that any type of summary that I want to do or wanted to do is not going to do it justice.

[00:05:50] And so I had some different feedback from different folks about the length of the podcast. Some people said, yeah, 25 minutes, just the updates is like the right spot. And yeah, the podcast is moving towards. This is going to be the live recording. I'm going to edit this don't worry.

[00:06:04] But besides that, the podcast will be this conversation. Going forward as much as I'm able to edit this, and ship both the newsletter and the podcast in time on Thursday But with that Tzafrir thank you for the kind words, man. I appreciate you being here and sharing with us your expertise

[00:06:20] I want to say hi to Zenova and Arthur.

[00:06:22] We'll start with Zenova. Welcome Josh. How are you?

[00:06:27] Xenova: Yeah. Hey Yeah, pretty good. Been busy, busy, busy

[00:06:33] for those who Don't know. I'll just quickly introduce myself. I am the creator of Transformers. js, which is a JavaScript library for running HuggingFace Transformers directly in the browser, or Node, or Deno, or maybe Bunsoon.

[00:06:49] Who knows when that gets sorted out properly, but any JavaScript environment that you're, that you're looking for. And, yeah, I recently joined HuggingFace, which is exciting. Now I'm able to sort of work on it basically full time. And yeah, lots of, lots of exciting things are, are in the pipeline.

[00:07:06] Alex Volkov: It's been incredible to have you here and then see your progress with Transformer.

[00:07:10] js and then you joining Hug and Faceman. I appreciate the time here.

[00:07:13] Arthur, thank you for joining. Please feel free to introduce yourself.

[00:07:18] Arthur Islamov: Okay. So, my name is Arthur and I'm fixing and making WebAssembly to work with big models.

[00:07:25] So, soon you will be able to run anything huge in the browser, and I'm particularly interested in diffusion models, so right now I'm making the Staple Diffusion 2. 1 to work in the browser, and then have some plans to make SDXL, and maybe as well as Lama and other models too. With all that work done.

[00:07:50] Alex Volkov: That's awesome. Thank you for joining.

[00:07:52] Far El: Yo, what's up? Yeah, I'm my name is Farouk. I'm like founder of Nod. ai where we build autonomous agents and also working on skunkworks. ai, which is an open source group where we are pushing the boundaries of what we can do with LLMs and AI as a whole, really.

[00:08:10] Our first, like, major project is this open source MOE architecture that we've been tinkering around with for the last couple months. We're also exploring even more, exotic AI arcs to try to get, to GPT 4 level capability for open source.

[00:08:28] Alex Volkov: Awesome. Awesome. Awesome. And Nistan, welcome brother.

[00:08:33] Yam Peleg: Yeah. Hey everyone, I'm Nistan Tahirai and I'm terminally online. That's the introduction. Thank you. Yeah, I, I'm also, I'm a dev in Toronto. I worked on the first doctor wrapper which is still doing pretty well. Like no complaints so far, six months later, knock on wood. And yeah, recently started doing a lot more open source stuff.

[00:09:03] Put out a bunch of open source doctor models on, on HuggingFace, which I still need to write a benchmark for because there is no safety benchmarks that are public. And yeah, lately been working with Farouk to make the whole Sconcrooks AI mixture of experts model more usable because it's still, it's not even bleeding edge.

[00:09:26] And this one is more like hemorrhaging edge technology. It takes like three people to get it to work. And yeah, I've been extremely interested on the web GPU side ever since Zenova on a random tweet just gave me the command to start Chrome Canary properly. And then I was able to load it. Whole seven B model.

[00:09:48] And yeah, I'm thinking next for the future, if, if things go okay. I mean, my goal that I've set myself is to have some kind of distributed. Mixture of experts running via WebGPU and then having Gantt. js encrypts the connections between the, the different nodes and experts. And we'll see how that plays out because everything is changing so quickly.

[00:10:14] But yeah, it's, it's good to be here. And I'm glad I found this Twitter space randomly way back in

[00:10:21] Alex Introduces Yam Peleg

[00:10:21] Alex Volkov: Yeah, for a long time. I just want to welcome Yam to the stage. And Yam doesn't love introducing himself, but I can do it for you Yam this time if you'd like.

[00:10:31] All right. So, I will just run through the speakers on stage just real quick. Yam, thank you for joining us. Folks, Yam is our, I could say, resident... Machine learning engineer extraordinaire everything from data sets and training large language models understanding the internals of how they work and baking a few of his own definitely The guy who if we found the interesting paper, he will be able to explain this to us

[00:10:57] Alex Introduces Nisten Tahiraj

[00:10:57] Alex Volkov: Nisten. I call you like The AI engineer hacker type, like the stuff that you sometimes do, we're all in awe of being able to run stuff on CPU and doing different, like, approaches that, like, nobody thought of them before.

[00:11:10] Alex Introduces Far El

[00:11:10] Alex Volkov: Far El you're doing, like, great community organizing and we're waiting to see from the MOE and Skunkworks.

[00:11:15] And folks should definitely follow Far El for that and join Skunkworks OS. It's really hard for me to say. Skunks. Works OS efforts in the discord.

[00:11:24] Alex Introduces Xenova

[00:11:24] Alex Volkov: Zenova is our run models on the client guy so Transformers. js, everything related to ONNX and everything related to quantization and making the models smaller.

[00:11:35] All of that. All models, all modularities, but I think the focus is on, on the browser after you're new, but obviously you introduce yourself, WebGPU stuff.

[00:11:44] Alex Introduces Roie S. Cohen

[00:11:44] Alex Volkov: We have Roy, who's a DevRel in Pinecon, who he didn't say, but Pinecon and VectorDB is in Context Windows and, and discussion about RAG, like all of these things Roy is our go to.

[00:11:53] Alex Introduces Tzafrir Rehan

[00:11:53] Alex Volkov: And Tzafrir also introduced himself, everything vision, audio, and excitement. So a very well rounded group here. And I definitely recommend everybody to follow. And now with that, now that we are complete, let's please start with the updates because we have an incredible, incredible Thursday, literally every week, right folks?

[00:12:12] Literally every week we have an incredible Thursday

[00:12:16] DALL-E 3 - An AI art model with a brain, coming to chatGPT plus

[00:12:16] Alex Volkov: so we'll start with, with two big ones. I want to say the first big update was obviously DALL-E 3. So I will just share briefly about my story with DALL-E and then I would love folks on stage also to chime in. Please raise your hand so we don't talk over each other. DALL-E when it came out, When the announcement came out for DALL-E 2, I want to say it was a year ago in, a year and a half ago, maybe, in January, February or something, this blew me away.

[00:12:47] I have something called aphantasia, where, I don't know if you saw this, but like, I don't have like the visual mind's eye, so I can't like visually see things, and it's been a thing with me all my life, and then here comes the AI tool that can draw. Very quickly, then I turned my, I noticed stable diffusion, for example, and I just like.

[00:13:04] It took away from there. Everything that I have, all my interest in AI started from DALL-E basically. And DALL-E 3 seems like the next step in all of this. And the reason I'm saying this is because DALL-E 3 is visually incredible, but this is not actually like the biggest part about this, right? We have mid journey.

[00:13:22] I pinned somebody's comparison between DALL-E and mid journey. And Midrani is beautiful and Gorgias is a way smaller team. DALL-E 3 has this beautiful thing where it's connected to ChatGPT. So not only is it like going to be not separate anymore, you're going to have the chat interface into DALL-E 3.

[00:13:41] ChatGPT will be able to help you. As a prompt engineer, and you'd be able to chat with the creation process itself. So you will ask for an image, and if you don't know how to actually define what you want in this image, which types, you'd be able to just chat with it. You will say, you know what, actually make it darker, make it more cartoony, whatever.

[00:14:01] And then chatGPT itself with its brain is going to be your prompt engineer body in the creation. And I think. Quality aside, which quality is really, really good. The thing they're highlighting for, for DALL-E 3 is the ability to have multiple. Objects and subjects from your prompt in one image because it understands them.

[00:14:23] But also definitely the piece where you can keep talking to an image is changing the image creation UI significantly where, mid journey. With all, all the love we have for Midjourney is still stuck in Discord. They're still working on the web. It's, it's taking a long time and we've talked about Ideogram to lead them from the side.

[00:14:44] We know that Google has multiple image models like Imogen and different ones. They have like three, I think at this point, that they haven't yet released. And DALL-E, I think is the first. Multimodal on the output model that we'll get, right? So multimodal on the output means that what you get back towards you is not only text generation and we saw some other stuff, right?

[00:15:06] We saw some graphs, we saw some code interpreter can run code, etc. But this is a multimodal on the output. And Very exciting. I, I, DALL-E 3 news took Twitter by storm. Everybody started sharing this, including us. We can't wait to play with DALL-E 3. I welcome folks on stage. I want to start with Zafreer reaction, but definitely to share what we think about this.

[00:15:26] And the last thing I'll say... Say is that now that the community community is growing, suddenly people dmm me. So first of all, you're all welcome to DM me about different stuff. I see I see somebody in the audience with DM me. I think she's still here. So shout out about joining the better test for DALL-E three, which now they, they're able to share about Funny tidbit, it will, it's right now baked into the UI.

[00:15:48] So Dally 3 is going to be baked into ChatGPT and ChatGPT Enterprise UIs. However, when they tested this, they tested it via a plugin. So OpenAI actually built a plugin and had like a restricted access to this plugin. And folks who like talked with this plugin, the plugin ran the Dally ChatGPT version behind the scenes.

[00:16:06] And we don't have access to it yet. I don't know if anybody on stage has access. Please tell me if you do. The access is coming soon, which is interesting from OpenAI. And I think that's most of the daily stuff that I had. And I want to, please, please, buddy, I want to hear from Zafira, please.

[00:16:23] And please raise your hand. I really need us to not talk over each other.

[00:16:30] Thank you.

[00:16:31] Tzafrir Rehan: So yeah, DALL-E 3 is looking amazing. I did see some examples that people with early

[00:16:38] access were

[00:16:38] generating, and it's far more detailed and coherent than the things we are used to seeing from stable diffusion. And much less randomness, I would say. And what's exciting here is a few changes in the paradigm of how it works.

[00:16:56] For example, like you said,

[00:16:59] it doesn't expect you to know all the intricacies. You can describe in

[00:17:03] your natural language what you want to see

[00:17:05] and it will use

[00:17:07] GPT, however much they are powering the, for generating a prompt to make the whole image. That's the one thing. The other thing is that it's not

[00:17:19] text to image.

[00:17:21] It's more a conversation. Similar to how chat GPT is a conversation between you and the assistant. DALL-E 3 is a chat. So you can see in the video that they released. You generate one image and then you discuss if you want to make changes to it, if you want to make more variations, and that would be very interesting to see the flow.

[00:17:44] From the AI artist perspective, I think it will be met with a little bit hesitation, at least not knowing how much fine control they are providing. If they are letting away... to influence all these various parameters that the model uses. That is a lot of the workflow for generating AI art.

[00:18:06] And when you want to make a piece for release as an artist, you spend a lot of time fine tuning it.

[00:18:13] And today with Stable Diffusion, and with Mid Journey, we have a lot of fine grained control over changing the parameters by a little bit, adding one more word, That's one thing, and another thing is that artists usually actually want to have that control over the prompt. For example, this week I saw an interesting example, I'll try to find it for you, where the artist adds the words Event horizon to an image.

[00:18:44] Now the image is not of space, but the model does take that idea of the event horizon shape, and makes the image more shaped like an event horizon. So those are the kinds of tricks that right now prompt engineers use to make very specific changes in the image. So I'm interested to knowing if DALL-E 3 will allow that kind of control.

[00:19:08] And most of all, finally, we had DAL E2 very early in the game, before Stable Diffusion even gave the first clunky models, before everything, and there was so much work and mid journey. And so many much interesting things coming out in image generation and open AI will always like hanging back.

[00:19:30] We have this very basic value too, which sometimes works and usually doesn't gives you very weird results. So yeah, good to see that they are still working on actually

[00:19:43] innovating

[00:19:44] and thinking of the next step and how we can combine all of these technologies. To make something that's much more fun to the user experience.

[00:19:53] Alex Volkov: Absolutely. And I will remind some folks the internals behind kind of diffusion models, like stable diffusion, et cetera. OpenAI actually made the whole field happen, I think, with some was it VIT? Vision Transformer that they released and,

[00:20:05] Yam Peleg: they released the first diffusion. The first diffusion model.

[00:20:08] Alex Volkov: Yes. And so like the whole field is all to open the eye and it's great. I, it's a fair, I joined you in the, it's super great to see them innovate and give us some new UIs for this because. I heard from multiple people who have access to this, that this, you can get lost in just chatting to a picture, to the creation process.

[00:20:26] It's like a whole new creation process, basically, like prompting, but chatting. I'm very excited about this, very excited.

[00:20:31] , so we'll definitely talk more about this.

[00:20:33] Microsoft c launches Windows CoPilot

[00:20:33] Alex Volkov: I want to move on to the next thing, which is exciting. And so. Until today, basically, the word co pilot meant GitHub co pilot, at least for those of us with VS Code, those of us who write code. GitHub co pilot obviously is the auto complete engine that, gives you code abilities.

[00:20:50] And many of us use it, many of us don't use it. But, today, I think, Microsoft who owns GitHub and who is very close with OpenAI has announced Copilot for Windows. And it's coming soon with the Windows update. And we've seen some previews about this in some discussions. And I find it very interesting that Microsoft is innovating in AI, whereas we're waiting for Google to come up with Gemini.

[00:21:18] We're waiting for Google to, we're going to talk about Bard updates as well. But Copilot for Windows will be able To be just like a shortcut away. I think windows C is the new shortcut and you'd be able to ask it like he asked you for different things. And for those of us in the audience who didn't join us in the previous ThursdAIs, we.

[00:21:40] Talked with Killian from this open source called Open Interpreter. And one of the things that we all like in Open Interpreter is that it runs on my machine and it generates code, and some of that code could be AppleScript. And so it's very easy to run stuff on the Mac using AppleScript. You can open Calendar, you can send emails, you can do a bunch of stuff.

[00:21:58] And so it was beautiful to see that, like, even an open source agent like Open Interpreter is able to Run code and then, activate stuff on your computer. Having, and I think Kilian mentioned, like, Microsoft's Copilot is coming. And not just a week later, exactly a week later after that discussion, we now have Windows Copilot.

[00:22:16] Which is going to be able to run Windows for you. It's going to be able to open apps and shut down apps. It's going to be able to just like... Be a, chat GPT, but living inside windows. And I think it's going to be based on GPT 4. It only makes sense with the Microsoft OpenAI collaboration. And like I can't understate this for a second.

[00:22:38] GPT 4 was released on March, right? Chat GPT was released less than a year ago on November something. And now the next version of world's probably most. Common operating system, Windows, is going to have AI built in as a companion. How insane is this, folks? I, I, I, I have a Windows machine, because I have an NVIDIA GPU, blah, blah, blah, and not only I'm not only on the Mac and I'm really excited to, like, play with this.

[00:23:09] An additional thing that they've announced together with this update is connecting to the previous thing that we said, which is Bing, Chat, and Windows Copilot will both have DALL-E 3 built in for free. So DALL-E 3 is going to be possible on GPT Plus subscribers, the ones of us who paid the 20 bucks.

[00:23:32] However... For, through Bing, you'll be able to get it for free, and it's going to be part of Windows. Right, so, my mom, who probably doesn't use Windows, okay, her husband, my mom's husband uses Windows, he'd be able to use GPT 4 to run his Windows and also generate images. I think that's incredible, and, only Microsoft can give it out for free.

[00:23:52] I think that's mostly it in... The Microsoft update. However, it's breaking news. Literally, they released the tweet once we started the space So I'm sure more stuff will come out of there But I invite folks on stage to chime in with Windows Copilot news What do you think about this whether or not, you know This is going to change multiple people's usage of Windows or Or not

[00:24:16] Nisten Tahiraj: I mean the whole Using software thing is all up in the air now, right? Everyone's in creative mode. Yeah, it's pretty hard to predict what's going to be the, the better interface voice is getting really good. Open interpreter show that it can do a whole bunch of stuff. You can also delete all the Jason files on your computer accidentally, but I think those, those will be worked out those issues.

[00:24:43] Yeah, it is hard to, it's hard to call because again, being is still a free beta service, they haven't quite figured out how to fully monetize that, because that's not cheap to run especially considering that it is the multimodal image one, so. Yeah, don't have that much an opinion.

[00:25:05] I think it's still too early to call as to how interfaces will change.

[00:25:09] Alex Volkov: I agree. I just, I'm excited that AI that we've come to known for less than a year is now baked into an operating system for everyone, right? Even going to a website like chatGPT registering is not for everyone and they will. They will definitely , lower the bar for usage here. What's up, Yam

[00:25:28] Yam Peleg: hi I just want to say that we've seen, because everything is so early, we've seen really great infrastructure for RAG but we haven't seen a wide scale product using RAG on this scale. So, and, and it makes sense at the end.

[00:25:47] I mean, you have a lot of information scattered around all different software and different devices. It's, I think it's the perfect idea to just merge everything with the RAG and just allow you to chat with whatever information you have everywhere. And Microsoft is perfectly positioned to do that. And I'm looking forward.

[00:26:13] I think that I think it's a great idea. I don't know if the implementation. Will be great. It's, we need to see, I think it will, but we need to see, but I think that's it. As a concept is a great concept.

[00:26:26] Alex Volkov: Something that I saw from a person who's very close with the Microsoft team, for some reason, the guy behind being his name is Michael Perakin, and he has this like very non branded Twitter account that barely has an avatar image.

[00:26:43] And he's been doing, he's open. Yeah. He's been doing, he's been doing like customer support basically on Twitter. Like people will say, Oh, Bing has this, has that. And he's like been very, very responsive to some people. And so two things that he did say, first of all, Dally three is already part of Bing for some percentage of population.

[00:27:00] So if you use Bing, and we've talked about Bing before about image and vision. If you use Bing, go try and generate images with it. It used to be Dally too, but if you get. Good ones. You may get value three, which is incredible. You may already have this. And the second thing is I saw somebody commented that he is now head of windows, right?

[00:27:17] So the guy behind being the guy who pushed a I into being is now moving to be ahead of windows. And I think this together with this release shows us that. How just how much Microsoft is serious about a I everywhere and is determined to not miss this new wave like they missed the mobile wave. And everybody says that, Apple overtook Microsoft and Microsoft was like late to mobile.

[00:27:37] And And it just goes to show like how much they invest in this whole thing. And I find it like very, very good because for many people, even going to a website is a barrier of entry. And then when it's just like one click in their operating system of choice, I think it's going to be very it's going to shove AI into way more people's faces.

[00:27:54] I also want to say that Microsoft out of the big ones is fairly based in terms of. Safety and regulation, which we usually don't talk about we can talk about in maybe the next space, but like, we can have worse than Microsoft, which is surprising for me because I used to hate on the Internet Explorer most of my life.

[00:28:12] And so now Microsoft is very based. I think less comments on Windows Copilot here, folks, and then we can move on to the next stuff from OpenAI, actually.

[00:28:22] Nisten Tahiraj: So my last one is I've started using Edge Canary as my daily browser just because of the sidebar and the splitting. So if you have a widescreen monitor, it's actually very handy because you can have code interpreter on one side, and I'll show an image of it very quickly.

[00:28:39] And I have Bing, which has an excellent voice back and forth. And it has really good voice generation, which normally would be very expensive if you're paying for it, but it's in beta And then I have the actual work and on the sidebar you can have Anyway, this interface is a bit convoluted and edge browser is it's still a little bit clunky, but Overall, it's been working pretty well for me.

[00:29:06] So I I don't know. I sort of see the browser as being more and more important. That's your operating system. Some people disagree. They're trying like Sean is, is trying to do more of a OS native stuff with his tool that lets you run multiple ones. But Yeah, you can see the screenshot of how I started using it with voice, so.

[00:29:28] In general, I see it as you'll just talk to it back and forth. I think That's,

[00:29:32] Alex Volkov: at least that's what I want. Were you referring to Swix's Godmode app where you can run all the LLMs in like a window?

[00:29:39] Nisten Tahiraj: Yes, but that one, for example, on the Mac is right, there's an icon right beside the clock. And you just click that and it pops up, so it's unintrusively there.

[00:29:49] And it adds to your experience instead of getting in the way. And I, I do like that part because it is using real estate on the screen efficiently, but again, if you have a. If you use a wider monitor, so can Edge with all of its right sidebar shortcuts, because then you can add your discord, your outlook and stuff there too, right where the GPT like right where I use the code interpreter window and even have some completion and document writing stuff too now.

[00:30:19] So that's how I see it. I, it's again, it's up in the air, what people will find most helpful

[00:30:25] Alex Volkov: absolutely. And I've been using Bing somewhat as well. And yes. The sidebar can also read from the page, right? So the Bing chat in the sidebar has access to the page if you give it.

[00:30:37] And that for like summarization and different things, that's really, really excellent as well. Like it completes your browsing experience. So I'm assuming that they're doing some stuff with the co pilot.

[00:30:46] Open AI leaks, GPT-4 Vision, Gobic

[00:30:46] Alex Volkov: All right, folks, we're moving forward because we have much to cover. And, there's more news from OpenAI.

[00:30:52] They actually came before DALL-E, and we were supposed to talk about them first, and then DALL-E, but sorry, and then DALL-E came out. And now let's cover some news from OpenAI. So... It feels like the theme behind all of these news is OpenAI is trying to rush stuff to the door or to announce some stuff to the door because they know or they hear or they saw the information from Google breaking out about Gemini, the multi model wolf.

[00:31:19] Huge model from, from Google that is potentially GPT 4 like and can do images in the input and output is multimodal on the output as well. And so we don't know many sorry, we don't know much information about Gemini so far, but we do know that the information kind of the publication called the information released that Gemini is coming very soon.

[00:31:40] And we see the response from OpenAI in multiple places, right? So DALL-E 3 is one of them. OpenAI released so the information also leaked. about open the eye gearing up to give us vision for those of you who remember pretty much every space since march we're talking about gpt4 that is also multi model on the input and yeah we can probably go into the details whether or not it's fully multi model versus gobby and i would love for you to participate in this but basically gpt4 when they announced they showed the demo of it they gave it some screenshot they gave it like a sketch of a website that was able to code that and then we didn't get That feature, the Multimodality from GPT 4, we didn't get it.

[00:32:20] The only people who got it, and me and Nisten interviewed the CEO of this, is Be My Eyes, which is this app for blind folks, and they just like shove GPT 4 vision in there to help those with eyesight issues. And it seems that now Google has finally stepping into the arena, sorry for the pun, and that we may get GPT 4 vision very soon.

[00:32:42] I actually saw some screenshots how it looks inside the GPT 4 chat GPT interface. And the additional exciting thing is, they have a different model. With the code name Gobi, that as apparently it works in OpenAI. And that one is going to be multi modal and like fully. So, Yam, I would love to, if you can repeat what we talked about last night, about the differences and how GPT 4 is multi modal, but not fully.

[00:33:06] I would love for you to expand on this.

[00:33:09] Yam Peleg: Yeah. First it's important to understand that there is a huge difference in infrastructure between the two companies. And the infrastructure dictates what is possible or not possible, what is hard or not hard. From the rumors nothing is confirmed, but from the rumors the, the structure and the size of GPT 4 is.

[00:33:34] It was chosen to fit the hardware, the infrastructure to actually run the model. It doesn't matter if you have the best model in the world, if you cannot just serve it. So Google is using its own hardware, which is not sharing with anyone else. And it's important to understand this. So when we see that Google is doing according to the rumors.

[00:33:58] And, and insane training run or, or preparing to ship or, or serve an insane model that is multimodal on the input and on the output. It, the reason we didn't see, I think, I think the reason open AI I didn't release a GPT with the image head is simply because it's. It's probably expensive. It's not that easy to deploy something like this, especially not with the amount of people that use OpenAI services.

[00:34:31] And, and I think this is this is what we see. This is the reason for what we see at the moment. Now it's important to understand that according to rumors, again, nothing is confirmed, take with a grain of salt, according to the rumors, which makes sense, GPT 4 is first a language model. It was trained as a language model, just language model.

[00:34:53] And once it was trained. It, there was they added an image head to the frozen model. This basically, this reduced the risk of something going wrong with full multimodality end to end. And moreover it allows you to just use the model on its own. And if you want, you can plug the head so you can use them.

[00:35:14] You can, it's flexible. You can use them with or without a head. Now, the thing is that there is you do pay a price because again, with a grain of salt, there, there is there is, there are caveats to this, but we have already seen multiple times that multimodality, when done right, benefits both modalities.

[00:35:36] So GPT 4 allegedly did not benefit from the multimodality. And this is the difference between GPT 4 and the new rumored model that we have. According to the rumors, the rumored model was trained end to end images and text throughout the whole training. So, we should, if it's true, if everything is true we should expect a better model only if you just use it for text, we should expect a better model because the, the images just influence the text and text influence the images and So on and so forth.

[00:36:12] Alex Volkov: That's great. That's what I have. One follow up question. You spoke about benefits from training on text and vision. And I remember Ilya Asatkov also talked about this. I think with the Jensen CEO of NVIDIA. He talked about different other places. Could you speak to some of those potential benefits of how multi model trains on text and images is actually better?

[00:36:37] Yam Peleg: If I remember correctly Ilya said Ilya gave the perfect example for this. You can, if you really want, you can describe what the color red mean with text or what, what objects are red. All of this will be nothing like just seeing the color red. So there is a difference between actually training on images.

[00:37:04] Versus training on text that describe the images which is just, it's just a different sensation. So the whole you can say the world model inside the, the language model is influenced by, by the images. And I think color is, is just a great example. And if I remember correctly, that was Example he gave in this interview.

[00:37:27] Alex Volkov: Yeah, absolutely. And I think the other one he said is It's obviously better at stuff like math or physics where it's able to actually read different, the graphs and everything. It's like, it just arrives at the question faster, but also like Yam you correctly pointed out the world model of this model is way better because it's able to see basically.

[00:37:50] So We have potentially exciting news. One thing I will add is that Yam I think you're correct opening. I just didn't want to spend the kind of this GPU cycles on the vision model and the being able to attach a head with vision. I think it's exciting. I do want to highlight that Microsoft likely has.

[00:38:08] The bandwidth for that, because being has the ability to have vision. Now, I don't know if it's like the full one. I don't know if they did some work because the examples that I tested with being vision gave less quality like responses on images than I was expecting GPT four from the example.

[00:38:25] So maybe they know if they, maybe they did some stuff for optimization speed, but yeah, definitely it feels like infrastructure was gearing up for this and hopefully we'll see it soon. From OpenAI.

[00:38:36] 3.5 instruct model from OpenAI

[00:38:36] Alex Volkov: Another thing we saw from OpenAI, and I think this is this last one, we have a bunch of OpenAI updates, is the 3.

[00:38:42] 5 Instruct model. And unlike the ChagGPT model, 3. 5 Instruct is very similar to how OpenAI APIs We're actually working before the ChatGPT explosion, right? Before you were able to like do back and forth conversation. Before it was RLHF for conversation purposes. And I saw many, many folks get very excited about 3.

[00:39:05] 5 Instruct. Because it's very similar to what we had before ChatGPT. But it's much faster. Now we don't know if it's faster because way less people use this because it's new. Or it's faster because they actually did some TurboMagic on it. But, we'd love to invite folks on stage, maybe Roy, maybe Mr.

[00:39:21] Yang to talk about the instruct and the difference between kind of this end point in the API versus the regular chat end point. If you have anything to, to, to, to add here from what you read, please feel free to, to add.

[00:39:36] Nisten Tahiraj: I used it in the playground to just like write An agreement for the site, like a privacy agreement.

[00:39:41] It was pretty good for that. It just it's annoying that the context window is so small. It's only a 4k context window. And it's more like only three and a half K because some of it will be your prompt. I think it has some very other very good usability uses, which we haven't experimented with yet.

[00:40:02] Like the one person got it to play chess very well. And I think it's, yeah, it's really worth looking at for stuff like doing automation or you're continuing some work on your desktop, for example, with open interpreter, and it'll be able to continue generating in that regard. So there are quite a few things to explore there.

[00:40:26] I'm just glad it's cheap and it's good. So that's that's what we want at the end of the day

[00:40:34] Alex Volkov: Yeah it's it's cheap and I think for many folks they were surprised with like the chat interface They had to switch for chgpt to get like the benefits and the speed and now they're happy that they have the instruct model of old They also added log props.

[00:40:47] So I, I would love to ask folks on stage because I'm not entirely sure what like logprops is in the API response. And I saw Alex Gravely and some other folks are getting excited about logprops. And I want to say, just before, I want to say hello to Ronak, if I'm pronouncing this correctly. Ronak. And we're going to talk about RamboRole in a second, or in a few minutes, but if you have comments on the Instruct API and LogProps, feel free to share.

[00:41:18] Raunak Chowdhuri: Yeah, I do. LogProps is awesome. It basically gives you like, token level probability distributions on, in terms of the model. So normally when you are using GPT 4 or GPT 3, You just get words back when you're, when you're querying the model. But what LogProbs allows you to do is, is see the probability distribution that's outputted by the model that is normally sampled by, like, the temperature parameter.

[00:41:43] And you can use that to do a lot of, like, really interesting things. Like, for example, if you're, if you're asking GPT to solve a multiple choice question, for example it's really useful to actually understand. the model's confidence in whether it's A, B, C, or D. And you can actually get that directly from the model by examining that probability distribution from the log prop.

[00:42:05] So it actually provides... A lot more insight into what the model is thinking and I think that's a pretty useful technology. You can actually do a lot of clever things with it, like someone built something called like JSONformer which is basically like a tool that allows you to if you have a model that exposes log props, you can only sample the tokens.

[00:42:24] That basically are valid JSON tokens, and construct a response that is very much aligned with like a certain format that you want. So I think that's a pretty powerful tool.

[00:42:36] Alex Volkov: Thank you, Ronak. Thank you. And I remember JSONformer and did not know that they use log, log prox for that. So here you have it, folks.

[00:42:43] There's a new endpoint for you, your usages that now exposes the token probabilities. So you can use this to build better tools and different types of tools. And yeah, Ronak, would you care to introduce yourself briefly? I will ask again once we record kind of your section, but feel free to introduce yourself.

[00:43:03] Raunak intro

[00:43:03] Raunak Chowdhuri: Yeah, absolutely. So I'm a senior at MIT. I'm graduating in a couple months. My background's in machine learning, artificial intelligence. I've been doing research in this area for quite a few years now. Yeah, I'm working on some interesting projects that we'll dive into later, but basically building long term memory for Language models.

[00:43:25] Bard Extensions allow access to GMail, Youtube, Drive

[00:43:25] Alex Volkov: Awesome, awesome. Thank you. Thank you for coming up and thank you for explaining log, log props as well. All right. So the next thing I want to talk about briefly, really briefly, because it's not that great is bard from Google. Before we get to Gemini, before we hear from Googles like Explosive, GPT 4, Combating Model, etc. Right now we have Bard. For some reason we also have Google Assistant, which I'm not sure what's the involvement with LLMs there. But Bard is something that some folks on stage here use. And I was never like very, very excited about Bard for some reason.

[00:44:00] However, they just released a few updates to Bard and they say like this is the best Bard ever. And it feels like very googly, very like product manager y to me, at least. What they released is something called extensions, right? So if you use Bard before and you haven't touched it in a while, like I haven't, if you go to Bard right now, what you will have is the chance to...

[00:44:22] Updated with extensions. Those extensions could access your Gmail, all of it, your Google Drive, all of it, YouTube. I think some other ones that I'm trying to remember. And the cool thing about this, which I actually like, is the UI. You can do at sign, like like you mentioned somebody on Twitter.

[00:44:38] And then you have access to those extensions. It's a different take on the plugins with ChagPT, where like ChagPT plugins, you have to be in that mode, it decides for you, blah, blah, blah. So here you can actually say like, add Gmail, and then ask it questions. It will actually go and do a search in your Gmail account and give you back answers with, with natural text.

[00:44:56] So. Conceptually pretty cool, right? We all use Gmail, or like at least most of us use Gmail. And so to be able to like get summaries of the latest emails, blah, blah, blah. So conceptually very cool. Google Docs as well. You can tag Google Docs. You can do Google Drive. Oh, Google Maps is the, is the other one.

[00:45:10] So you can actually say like, hey, what are some of the stuff that, in San Francisco, Seattle, whatever it will give you. The thing that I was really surprised by is just how bad it is, just honestly not to... If there's folks in the audience who work on Bard, I apologize. And sometimes we say these things, but there's like so many, so many people working on this stuff.

[00:45:31] And like, it's, the, the nights and weekends, they don't see family. So like, I apologize. Just network from the comparison point in my experience, I was really disappointed in how... Google, who's this like huge company that like created Transformers for us, like they, they're not afraid to release something this bad.

[00:45:50] And what is bad, I mean, specifically, I literally used two of the extensions. One is Gmail. To ask it about my upcoming flight to San Francisco, which I told you guys about. I'm going to be at the AI engineer event as, as a media person. And it couldn't find any information from this flight and just gave me flights from the past.

[00:46:07] I literally asked, give me flights from the future or like, give me my upcoming flights. And it gave me flights from the past. It also gave me two trips to the Denver museum, which is, which are not flights. And so, yeah, we know LLM hallucinates, blah, blah, blah. But if you, if you put your brand behind this and you're Google and you put Gmail in there and you cannot like do a basic search, that's upsetting.

[00:46:30] And so I said, all right, I'll give it another try. I did YouTube. And I asked, Hey, what does MKBHD, Marques Brownlee, if you guys don't follow him, he's like this great tech reviewer. What does he think about the latest iPhone? And it went to YouTube and it searched and it gave me. Marquez's videos from last year about the iPhone 14, and I literally took the same string that I pasted into Barg, went to the YouTube interface, pasted it in the YouTube search, and got like the latest videos that he had about the iPhone 15.

[00:46:58] And so I was thinking there like why would I ever use this if like the first two searches did not work, where this is the whole promise of this. So again, not to be negative. I don't love being negative. It's just like from a comparison standpoint. It's really I really got to wonder how many folks in Google are trying to rush through the LLM craze.

[00:47:19] We remember Sundar Pichai saying, AI, AI, AI, AI, AI, AI, AI, AI, AI, on the stage like 48 times, right? And they're shoving AI into everywhere. It's just, for me, it wasn't that useful. So I would love to hear, Safrir, I see your hand up. I would love to hear from folks on stage about your experience with BARD and those specific kind of extension new things.

[00:47:41] Tzafrir Rehan: So I don't have much experience with it, actually, for the same reasons that you said. But I want to give the perspective that I think what we're seeing here is Google jumped early to stay in the game. Maybe they didn't expect ChatGPT to go viral that big so fast. Well, this was developed like a sci fi technology and suddenly it's a household item overnight.

[00:48:09] But, if you're talking about Google, and I worked at Google actually for three years, about a decade ago, it's a company that can make very big moves very slowly. That means, if Gmail data, Drive data, it's the holiest of holy of privacy. If you want as an engineer at Google, if you want to touch that data to read even a single bite, you need to go through quarters of legal meetings.

[00:48:41] So the fact that they are going in this direction indicates a course heading that they took the time to think of it through and decide, yes, we are doing this very risky move in terms of privacy and user expectations. Because they believe in the value. So let's see where they get to when they actually, when they are actually fully implemented.

[00:49:05] Because I think right now, what we are seeing is a rushed out version.

[00:49:09] Alex Volkov: I agree. I think that's how it definitely feels where the basic stuff, like. A keyword search works better than like this search and they're basically hitting the API, which they have behind it definitely feels rushed very polished UI wise, very safe, very like protective, like googly, but very, it's not super helpful.

[00:49:27] I think at this point Yeah, I think this is most of the news unless I'm missing some so let me look and see in my template that I already Drafted for myself. Let's see if we have any more things to cover before we before we move on to the interviews So yes, one last thing I wanted to find I'll just find this thing.

[00:49:48] It's called chain of density So, I saw this, I think it was a paper first, and then yeah, I'll share this in the chat. I'm sorry, not in the chat, in the, in the Jumbotron. I saw somebody release a paper on this, and then I think Harrison from LangChain reposted this and actually put it up on their website with the prompt sharing, where you can play with prompts, is this new method called chain of density, which is actually really, really good at getting summarizations from From ChatGPT and different other places like Cloud as well.

[00:50:21] And I think it's really cool because I just posted it on top. It it asks for four summarizations with more and more density, right? So it starts with like, hey, summarize this text or article. And then it says give me like a JSON file in response with like four summarizations. The second

[00:50:37] one, give me a summarization.

[00:50:40] Extract from the first one that you just gave me, extract the entities that were missing, and give me another summarization with those entities, and then do it again and again. And I think there's, like, some cool prompt magic in there that says something to the tune of, make sure that this is understood on its own, and the person doesn't have to read the article to understand the summarization.

[00:51:00] I personally have gotten really good summarizations based on this technique, so much so that I've added it to my snippets. Where, where I have different snippets for prompts. And if you are doing any type of summarization, definitely check it out. Nistan, I saw your hand briefly up if you want to comment on this thing.

[00:51:16] Nisten Tahiraj: Yeah. Like the first person that I knew who got a contract as a prompt engineer actually used this technique a lot last year. And the way he was explaining it was when you do, when you compress an idea and then you extrapolate, that's how creativity happens in general. Like you, you compress. You extrapolate out of it, you compress and then you extrapolate.

[00:51:36] So it's pretty interesting that someone did this in a much more systematic way. I'm, I'm going to check it out.

[00:51:43] Alex Volkov: Chain of density. And I wanted to ping back real quickly on the compressing part, because yeah, I saw your tweet and there was a paper about compression as well. And Ilya gave a talk about compression recently.

[00:51:55] And I wanted to see if you want to talk about that compression part and paper. Briefly and if not, that's also okay. We can move on but I just like I think this is also this week.

[00:52:07] Yam Peleg: Yeah, I got I Had some controversial opinions in the last couple of weeks And as it turns out there are papers that support them coming up after them.

[00:52:19] But yeah, I highly, highly suggest reading the compression paper. Basically, basically what it says is that it just it just conveys the idea that what, what we are actually doing is I want to say. Reversing the process that generates the data and by reversing the process that generates the data.

[00:52:39] If you think about it, the process that generates the data is us. So, I don't, I don't want to, I don't wanna say the, the the words that I shouldn't. I got a, I got some heat for them, but you can find in my tweets. It's it's a really good paper. It's really It's much more scientific, you can say versus other papers that talk about intelligence, about general intelligence, and poke on this idea, and I highly recommend reading this paper if you're interested in this part of what we're doing.

[00:53:13] It doesn't prove anything because, general intelligence is a, is a big thing, but it. It is it is interesting the ideas there are, are, are solid and great to see.

[00:53:24] Alex Volkov: Yeah, I, I, I heard this multiple times this comparison or metaphor that intelligence is compression and, compressing a lot of ideas into, First of all, it compares to natural language, the ability of us to understand something, to put it into words, that's compression.

[00:53:39] Obviously, Feynman's quote, where like, you really understand something if you can explain this to a five year old, is also like, compressing down and also being able to explain some stuff. And so I heard this multiple times, and it's great to see that, there's now papers to talk about this. And continuous compression, like Nissen said, actually...

[00:53:54] Actually brings out better results and it's also good to see on the topic of literal compression. I know this like it's confusing There was also another paper that's worth checking out from this week Where they actually used llms and different transformers for an actual compression to compare to like png or or jpeg, etc And I think they saw very interesting compression results as well I don't remember if I have a tweet for that But yeah, be on the lookout for the for multiple types of different compression, uh as we as we move forward Thank you I think With that, I think we are ready to move on to our guests here on stage and to talk about two exciting things.

[00:54:30] So, first of all, actually three exciting things. One of them is, Nisten you had a visit to Geoffrey Hinton's lab that I wanted to hear from you a brief story about. After that, we're going to talk with Arthur and Zenova about WebGPU and going to do like a brief interview about like... Running models locally in the browser.

[00:54:47] And then at the end, we're going to talk about remember all with Ronak and his exciting approach to extending complex windows. So with that, I'll just give a brief kind of summary of the spaces we had today and some logistics, and then we can get started with the second part of Thursday. So again, everybody in the audience, we're just drawing the joint in the middle or have joined us from week to week.

[00:55:08] Thursday is about. Staying up to date together and give giving updates every week so that folks don't have to follow everything because it's almost impossible. I'm very happy to be joined by multiple folks from different disciplines and folks who can answer stuff and complete and find new things to get excited about about AI.

[00:55:28] From different fields every week here on stage. We also have a podcast and newsletter. If you're here and you're new and you just like just joined us and you can join next week, you can sign up for the newsletter as well. We stay up to date. So you don't have to, this is the model. And the first part of this is usually updates from last week, breaking news.

[00:55:46] There's another breaking news with YouTube something, but I think we'll cover this next time unless folks here want to read up on this and then give us an update at the end. But the second part of it is usually A deep dive into different conversations and, and guests. And today we have Arthur and we have Ronak to, to talk about different, very exciting things.

[00:56:05] And we'll start with Nistan's brief foray into the lab, AKA yeah, Nistan, give us, give us like a few minutes on, on your, your excursions.

[00:56:16] Nisten Tahiraj: Well, I've been going as a guest to Vector Institute for. Over a year and a year and a half and this time I, I went in and I'd never met Pharrell in real life.

[00:56:28] I didn't even know what he looked like. It was just some dude. He was GitHub. And yeah, so I, I invited him in and we were going to work on making the bootable. Bootable OS that just boots straight into a GDML model and then hopefully gets Chromium running with WebGPU. And essentially I just, I made before a, a tiny 4.

[00:56:54] 7 gig ISO that includes an entire Llama 7b model and an entire Linux distro. I use Slackworks, that's the smallest, and I used that for like 20 years. And yeah, so we were in the lab and Eventually, he's like, let's just try and get the whole thing working. So let's just try and get the mixture of experts.

[00:57:14] Let's just do it all at once and see where, where we get stuck. And anyway, I had to call another friend who was an extremely good DevOps engineer. To help and, and yeah, anyway, long story short, I couldn't get it to run on the GPU because there were no instances and I only had an A10 24 gig and MixtureFX needs more than that because it's 32 experts.

[00:57:39] So I had to run it on the CPU and that's what we spent the entire day and evening on. And it was really slow, but then we realized, yeah, this is probably like the first time someone has effectively ran Mixture of experts model on on, on a CPU. And again, it's, you can check out the REPL.

[00:57:58] I made a CPU branch and it's the V1 branch if you really want to get it to work. But yeah, that was the story. I just met with a random person from Twitter for the first time who was in their discord and yeah, it was, it was fun. And we also, the funniest part was that. Happened to be there a call in Rafael, who has been teaching about mixture of experts and writing a lot of the papers, and then we look behind and he's just like literally like five dusks away.

[00:58:30] And I was just like, taking a back. It's like, Oh, holy cow. He's here. And he had no idea who we were or anything. So, yeah, that was that was fun.

[00:58:39] Alex Volkov: There, if you don't mind me completing this story from what you told me multiple times, because I think it's like way more colorful than you, than you let on. First of all, VectorLab is the lab of Geoffrey Hinton, the grandfather of AI, right?

[00:58:52] This is the lab. This is like, he's widely considered the person who like, have kickstarted this whole field, basically. Is that, is that, that lab? Was he there?

[00:59:02] Nisten Tahiraj: Yeah, yeah, yeah. Ilyas Iskever has been a student. He wasn't there. He's rarely there. He only has like one PhD, one student under his wing this year.

[00:59:12] So he comes in very rarely. But yeah, Ilya Suskever was not in, in the smaller lab before they moved here. Also Adrian Gomez, the, one of the writers of the Transformers paper still comes there every once in a while. He was there regularly up until Cohere got funded last year. And yeah, this is, this is the lab and it's it's pretty funny because everyone's very, very academic and we're just straight up hacking and whatever we can find.

[00:59:45] Alex Volkov: So the second thing that I wanted to cover here is that exactly what you built in the lab of Geoffrey Hinton, because He's now very public about the AI kind of doomerism and AI different kind of potential bad things that will happen with AI and how to not open source, how to regulate. He's very public.

[01:00:04] He's on every news. And here you are, you and Pharrell are working on an ISO, a bootable AI disc that you literally can run offline that has Lama and offline LLM. The, that basically will say, even if they regulate, you can just like take an old. CPU based machine and run this thing. So you basically get democratizing AI in the lab of the person who's now like very, very vocal about like stopping it.

[01:00:27] So that's, that's the second part that I personally like very enjoy.

[01:00:31] Nisten Tahiraj: It's not just that. Also, if you listen further than what the news media. Shows it's a lot more complex than that. He, he wants people to acknowledge that the risks are real and show that they are mitigating them. But at the same time, he's been doing research to do molecularly grown chips.

[01:00:51] And that architecture first didn't work. So. They're still going full speed ahead. They're just making, the reason that they went that way was just saying to a lot of the community, just don't act like idiots, just regulate yourselves. That, that was why they were challenging that.

[01:01:09] It's it was a lot more complex than people realize. And the professor there, Colin, he's been a big pusher for demarketizing and open sourcing. Model C in general, and so, yeah, it's a lot more, it's a lot more nuanced than what you see in the media, and when you think about it, the safest form of AI that you can have is one that you can just literally unplug, and you have full control over, so there is nothing safer than that.

[01:01:40] Otherwise, you're just trusting some incompetent politician with regulatory or whatever legal hacks to control it. So, it's yeah. It's a lot, I want to say, it's a lot more nuanced than people, than what you've just seen in media snippets and reactionary Twitter checks.

[01:01:58] Alex Volkov: Yeah, I hear you. And definitely we'll, we'll, we'll check out the nuances and Jeffrey Hinton on the topic very briefly before after our apologies, we'll get to in just a second, just like something that also happened this week. Yan Likun, the GOAT, aka, the chief meta AI chief scientist, well, went in front of the Senate. I think a couple of days ago, and he, I just pinned the tweet on the top that he actually retweeted, which was like, sent by notices, and he gave an incredible opening statement, talking about how open sourcing is very important, why the open source LLAMA, talking about the fact that, the open source LLAMA1 and the sky didn't fall, and all of these things, and he also outlined a bunch of the safety protocols that they have into account when they release LLAMA2, and I think it's a First of all, very important to have somebody like Jan in front of Senate and talking about legislators and regulators and about regulation, because we see more and more Jan.

[01:02:52] I think you brought up last week about there was another discussion and Elon Musk was there and Sundar Pichai was there. Everybody was there talking about AI and how to regulate. And I think it's very important to have voices like Jan LeCun talk about like, talk with different things with clarity and safety.

[01:03:07] And so definitely. Recommend everybody to check out his opening statement because you know the doomers it's very easy to scare Especially on like the engaged baiting networks like x and etc It's very easy to like take something that people don't understand use it to scare folks And I think it's very important to have very clear very Credentialed and very like, understanding people from this world to actually explain that there's benefits and explain how open source can benefit as well.

[01:03:36] And I think you also mentioned how excited the open source community was about the Lama to release. And I want to believe that we all had like a small, tiny part to play in, in this. And so, yeah, we're definitely on Yam's map sorry Yam Likun's map and definitely worth checking this out. I think with that listen, thank you for sharing your story.

[01:03:52] Tell us more escapades from vector lab. And if you get to meet Geoffrey Hinton, tell him about Thursday night and also Colin.

[01:03:59] ​

[01:03:59] Alex Volkov: All right folks this actually concludes the two hours that we've allotted for thursday night today I I know there's like many folks. I see dave in the audience. What's up, dave? I haven't I see. I see other folks just stepping in with, with all the sadness of, I want to keep talking with all of you.

[01:04:20] There's also now a need to, transcribe this and, and, and put this into a newsletter, a podcast form Thursday is here every week. We're here literally every week since GPT four came out, I think. Mr. Did I miss one week on vacation? Yeah, newsletter came out, but we didn't talk that week. I felt like, oh, I miss my guys.

[01:04:37] I miss my friends. We need to get up to date together. So we're here every Thursday. I there's so much always to talk about. I want to just like to highlight how much boring this would have been without friends like distance and Nova Arthur, now the new friend of the pod, it's a freer and some other folks who stepped away young and far and for real, like many, many other folks who joined this week to week.

[01:04:58] And Help us bring you, the audience, the best AI news roundup possible on, on X slash Twitter. Now almost six, seven months already into this. This has opened many, many opportunities for many folks on stage, including myself. I'm going to the AI Engineer Conference. As a media person, I'm going to do some spaces from there.

[01:05:19] If you're in the AI Engineer Conference in a couple of weeks, definitely reach out and, we'll talk over there. With that, I want to just say... Without the audience here, this also would be very, very boring. So thank you for joining from week to week. Thank you for listening. Tuning in. Thank you for subscribing.

[01:05:34] Thank you for sharing with your friends. And thank you for leaving comments as well. And with that, I wanna wish you a happy Thursday. I, I'm sure there's going to be many, many, many new things we're listening just today. But you know, we can only cover so much. With that, thank you folks. Have a nice rest of your Thursday.

[01:05:49] I, and we'll meet you here next week. And yeah. Cheers. Have a good one.



This is a public episode. If you’d like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe
Fri, 22 Sep 2023 04:33:02 GMT
📅 ThursdAI - Special interview with Killian Lukas, Author of Open Interpreter (23K Github stars for the first week) 🔥
This is a free preview of a paid episode. To hear more, visit sub.thursdai.news

Hey! Welcome to this special ThursdAI Sunday episode. Today I'm excited to share my interview with Killian Lucas, the creator of Open Interpreter - an incredible new open source project that lets you run code via AI models like GPT-4 or local models like Llama on your own machine.

Just a quick note, that while this episode is provided for free, premium subscribers enjoy the full write up including my examples of using Open Interpreter, the complete (manually edited) transcript and a video form of the pod for easier viewing, search, highlights and more. Here’s a trailer of that in case you consider subscribing

If you haven’t caught up with GPT-4 Code Interpreter yet (now renamed to Advanced Data Analytics), I joined and when it first launched and we had a deep dive about it on and even at the day of the release, we were already noticing a major restricting factor, Code Interpreter is amazing, but doesn’t have internet access, and can’t install new packages, or use new tools.

An additional thing we immediately noticed was, the surface area of “what it can do” is vast, given it can write arbitrary code per request, it was very interesting to hear what other folks are using it for for inspiration, and “imagination unlock”.

I started a hashtag called #codeinterpreterCan and have since documented many interesting use cases, like comitting to git, running a vector DB, convert audio & video to different formats, plot wind rose diagrams, run whisper and so much more.

I personally have all but switched to Code Interpreter (ADA) as my main chatGPT tab, and it’s currently the reason I’m still paying the 20 bucks!

Enter, Open interpreter

Just a week after open sourcing Open Interpreter, it already has over 20,000 stars on GitHub and a huge following. You can follow Killian on Twitter and check out the Open Interpreter GitHub repo to learn more.

Installing is as easy as pip install open-interpreter. (but do make sure to install and run it inside a venv or a conda env, trust me!)

And then, you just.. ask for stuff! (and sometimes ask again as you’ll see in the below usage video)

Specifically, highlighted in the incredible launch video, if you’re using a mac, Code Interpreter can write and run AppleScript, which can run and control most of the native apps and settings on your mac.

Here’s a quick example I recorded while writing this post up, where I ask Open Interpreter to switch system to Dark mode, then I use it to actually help me extract all the chapters for this interview and cut a trailer together!

Sun, 17 Sep 2023 15:11:56 GMT
🔥 ThursdAI Sep 14 - Phi 1.5, Open XTTS 🗣️, Baichuan2 13B, Stable Audio 🎶, Nougat OCR and a personal life update from Alex
This is a free preview of a paid episode. To hear more, visit sub.thursdai.news

Hey, welcome to yet another ThursdAI 🫡

This episode is special for several reasons, one of which, I shared a personal life update (got to listen to the episode to hear 😉) but also, this is the first time I took the mountainous challenge of fixing, editing and “video-fying” (is that a word?) our whole live recording! All 3 hours of it, were condensed, sliced, sound improved (x audio quality is really dogshit) and uploaded for your convenience. Please let me know what you think!

Premium folks get access to the full podcast in audiogram format, and a full transcription with timestamps and speakers, here’s a sneak preview of how that looks, why not subscribe? 😮

TL;DR of all topics covered

* Open Source LLM

* Microsoft Phi 1.5 - a tiny model that beats other 7B models (with a twist?) (Paper, Model)

* Baichuan 7B / 13B - a bilingual (cn/en) model with highly crafted approach to training (Paper, Github)

* Big Co LLMs + API updates

* Nothing major this week

* Voice & Audio

* Stable Audio 🎶 - A new music generation model from Stability AI. (Website)

* Coqui XTTS - an open source multilingual text to speech for training and generating a cloned voice (Github, HuggingFace)

* AI Art & Diffusion

* Würstchen v2 - A new super quick 1024 diffusion model (Announcement, Demo, Github)

* DiffBIR - Towards Blind Image Restoration with Generative Diffusion Prior (Annoucement, Demo, Github)

* Tools

* Nougat from Meta - open-source OCR model that accurately scans books with heavy math/scientific notations (Announcement, Github, Paper)

* GPT4All Vulkan from Nomic - Run LLMs on ANY consumer GPUs, not just NVIDIA (Announcement)

* Nisten’s AI ISO disk - Announcement

And here are timestamps and chapter/discussion topics for your convenience:

[00:05:56] Phi 1.5 - 1.3B parameter model that closely matches Falcon & LLaMa 7B

[00:09:08] Potential Data Contamination with Phi 1.5

[00:10:11] Data Contamination unconfirmed

[00:12:59] Tiny models are all the rage lately

[00:16:23] Synthetic Dataset for Phi

[00:18:37] Are we going to run out of training data?

[00:20:31] Breaking News - Nougat - OCR from Meta

[00:23:12] Nisten - AI ISO disk

[00:29:08] Baichuan 7B - an immaculate Chinese model

[00:36:16] Unique Loss Terms

[00:38:37] Baichuan ByLingual and MultiLingual dataset

[00:39:30] Finetunes of Baichuan

[00:42:28] Philosophical questions in the dataset

[00:45:21] Let's think step by step

[00:48:17] Is breath related text in the original dataset?

[00:50:27] Counterintuitive prompting for models with no breath

[00:55:36] Idea spaces

[00:59:59] Alex - Life update about ThursdAI

[01:04:30] Stable Audio from Stability AI

[01:17:23] GPT4ALL Vulkan

[01:19:37] Coqui.ai releases XTTS - an open source TTS - interview With Josh Meyer

[01:30:40] Summary

Here’s a full video of the pod, and a full transcription, and as always, 🧡 thank you for bring a paid subscriber, this really gives me the energy to keep going, get better guests, release dope podcast content, and have 3 hours spaces and then spend 7 hours editing 🔥

Fri, 15 Sep 2023 02:35:37 GMT
🔥🎙️ ThursdAI Sunday special - Extending LLaMa to 128K context window (2 orders of magnitude) with YaRN [Interview with authors]
This is a free preview of a paid episode. To hear more, visit sub.thursdai.news

Happy Sunday everyone, I am very excited to bring you this interview with the folks who took LLaMa 2 and made it LLoooooongMa!

Extending LLaMa 2 context window from 4,000 to a whopping 128,000 tokens (Yarn-Llama-2-13b-128k on Hugging Face), these guys also came up with a paper called YaRN (Efficient Context Window Extension of Large Language Models) and showed that YaRN is not only requires 10x less tokens to create these long contexts, but also 2.5x less training steps!

And, the models generalize so there’s now no need to collect extremely long sequences (think books length sequences) for the models to understand those context lengths.

I have decided also to do something different (which took me half of Sunday so I can’t promise and am not committing to this format, but for the premium subscribers, you can now watch this interview with running Karaoke style subtitles and improved audio! This will be uploaded to Youtube in a week but aren’t you glad you subscribed and is getting this first?)

Here’s a teaser preview:

And here’s the chapter for your convenience (the only thing that’s ai generated 😂)

0:00 - Introduction

3:08 - Discussion of extending LLAMA2's context length from 4,000 tokens to 128,000 tokens using the YaRN method

8:23 - Explanation of rope scaling for positional encodings in transformers

13:21 - How the rope scaling idea allows for longer context through positional interpolation

18:51 - Using in-context learning to train models on shorter sequences but still handle long contexts

25:18 - Sourcing long-form data like books to train 128k token models

31:21 - Whether future models will natively support longer contexts

37:33 - New model from Adept with 16k context using rope scaling

42:46 - Attention is quadratic - need better algorithms to make long context usable

49:39 - Open source community pushing state of the art alongside big labs

52:34 - Closing thoughts

As always, full (manually edited) transcription (and this time a special video version!) is reserved for the premium subscribers, I promise it’ll be worth it, so why not .. y’know? skip a cup of coffee from SB and support ThursdAI?

Sun, 10 Sep 2023 19:51:43 GMT
ThursdAI Sep 7 - Falcon 180B 🦅 , 🔥 Mojo lang finally here, YaRN scaling interview, Many OSS models & more AI news

Hey ya’ll, welcome to yet another ThursdAI, this is Alex coming at you every ThursdAI, including a live recording this time!

Which was incredible, we chatted about Falcon 180B,had a great interview in the end with 3 authors of the YaRN scaling paper and LLongMa 128K context, had 3 breaking news! in the middle, MOJO🔥 has been released and Adept released a LLaMa comparable OSS model (and friend of the pod) @reach_vb showed an open ASR leaderboard on hugging face! We also covered an incredible tiny model called StarCoder 1B that was finetuned by friend of the pod (who joined the space to talk to us about it!)

As always, you can listen to the whole 3 hour long form conversation (raw, unedited) on our Zealous page (and add it to your podcatcher via this RSS) and this short-form pod is available on Apple, Spotify and everywhere.

ThursdAI - Hey, if you enjoy these, how about subscribing for real? Would love to do this full time! Every paid subscriber is like a dear friend 🧡

TL;DR of all topics covered

* Open Source LLM

* Falcon 180B announced by TIIUAE (Announcement, Demo)

* YaRN scaling paper - scaling LlaMa to 128K context (link)

* OpenHermes-13B from @teknium1 (link)

* Persimmon-8B from Adept.AI (link)

* Starcoder-1B-sft from @abacaj (link)

* Big Co LLMs + API updates

* OpenAI first ever Dev conference (link)

* Claude announces a $20/mo Claude Pro tier (link)

* Modular releases Mojo🔥 with 68,000x improvement over python (Link)

* Vision

* Real time deepfake with FaceFusion (link)

* HeyGen released AI avatars and AI video translation with lipsync (link, translation announcement)

* Voice

* Open ASR (automatic speech recognition) leaderboard from HuggingFace (link)

* Tools

* LangChain Hub (re) launched

* Open Interpreter (Announcement, Github)

Open Source LLM

🦅 Falcon 180B - The largest open source LLM to date (Announcement, Demo)

The folks at the “Technology Innovation Institute” have open sourced the huge Falcon 180B, and have put it up on Hugging Face. Having previously open sourced Falcon 40B, the folks from TIIUAE have given us a huge model that beats (base) LLaMa 2 on several evaluations, if just slightly by a few percentages points.

It’s huge, was trained on 3.5 trillion tokens and weights above 100GB as a file and requires 400GB for inference.

Some folks were not as impressed with Falcon performance, given it’s parameter size is 2.5 those of LLaMa 2 (and likely it took a longer time to train) but the relative benchmarks is just a few percentages higher than LLaMa. It also boasts an embarrassingly low context window of just 2K tokens, and code was just 5% of it’s dataset, even though we already know that more code in the dataset, makes the models smarter!

Georgi Gerganov is already running this model on his M2 Ultra because he’s the Goat, and co-host of ThursdAI spaces, Nisten, was able to run this model with CPU-only and with just 4GB of ram 🤯 We’re waiting for Nisten to post a Github on how to run this monsterous model on just CPU, because it’s incredible!

However, given the Apache2 license and the fine-tuning community excitement about improving these open models, it’s an incredible feat. and we’re very happy that this was released!

The complete open sourcing also matters in terms of geopolitics, this model was developed in the UAE, while in the US, the export of A100 GPUs was banned to the middle easy, and folks are talking about regulating foundational models, and this release, size and parameter model that’s coming out of the United Arab Emirates, for free, is going to definitely add to the discussion wether to regulate AI, open source and fine-tuning huge models!

YaRN scaling LLaMa to 128K context window

Last week, just in time for ThursdAI, we posted about the release of Yarn-Llama-2-13b-128k, a whopping 32x improvement in context window size on top of the base LLaMa from the folks at Nous Research, Enrico Shippole, @theemozilla with the help of Eluether AI.

This week, they released the YaRN: Efficient Context Window Extension of Large Language Models paper which uses Rotary Position Embeddings to stretch the context windows of transformer attention based LLMs significantly.

We had friends of the pod Enrico Shippole, theemozilla (Jeff) and Bowen Peng on the twitter space and an special interview with them will be released on Sunday, if you’re interested in scaling and stretching context windows work, definitely subscribe for that episode, it was incredible!

It’s great to see that their work is already applied into several places, including CodeLLaMa (which was released with 16K - 100K context) and the problem is now compute, basically, context windows can be stretched, and the models are able to generalize from smaller datasets, such that the next models are predicted to be released with infinite amount of context window, and it’ll depend on your hardware memory requirements.

Persimmon-8B from AdeptAI (announcement, github)

AdeptAI, the company behind Act-1, a foundational model for AI Agent that does browser driving, and has a few co-founders that are the original transformers paper authors, have dropped a ThursdAI surprise, a fresh (read, not a LLaMa clone) model!

Releasing an completely open source model called Persimmon-8B, with a full Apache 2 license, 16K context window (using custom RoPE scaling methods) and some interesting inference speedups with C++.

A very interesting 8B model that can fit on most consumer hardware, with additional tricks and a huge context window, is definitely welcome!

Additional interesting point is, they have 70K unused embeddings for multimodal extensions! Can’t wait to see what’s that about!

Starcoder-1B-sft - tiny model that’s great at code

Anton Bacaj (@abacaj) has finetuned StarCoder, to achieve some incredible results, for such a tiny model! Remember the first item, a whopping 180B parameter Falcon? We’ll, this is just 1B parameters model, finetuned on 65K sampled dataset of code, that’s outperforming Falcon, LLaMa2, Palm-2 (and Persimmon) on coding tasks, and runs on your device, so fast, that it’s hard to read!

It boasts an incredible 39% on HumanEval task and 31% on MBPP! (Anton reran and updated the MBPP score later) and can run locally. Friend of the pod has already ported this model to transformers.js and it’ll soon run in your browser!

OpenHermes-13B from @teknium1 (link)

Our friend Teknium1 (who we’ve interviewed a few weeks ago) releases OpenHermes on top of LLaMa2, but this time it’s a completely open model and datasets, marking this the first time that Hermes models have been open!

OpenHermes was trained on 242,000 entries of primarily GPT-4 generated data, from open datasets across the AI landscape, including:

* GPTeacher - General Instruct,

* Roleplay v1, Roleplay v2, and Code Instruct Datasets, by Teknium

* WizardLM (v1, evol_instruct 70k), by WizardLM

* Team/nlpxucan Airoboros GPT-4 (v1.0), by JonDurbin

* Camel-AI's domain expert datasets, by the Camel-AI Team

* CodeAlpaca, by Sahil2801

* GPT4-LLM and

* Unnatural Instructions, by Microsoft

Check it out folks!

Big Co LLM + API updates

Modular finally ships Mojo 🔥 (Announcement)

I just knew it, that Mojo will finally be shipped during ThursdAI, and in fact, this was a great #BreakingNews moment on twitter spaces!

Modular, and it’s co-founder Chris Lattner (author of LLVM, MLIR, Swift and many other things) have finally released their Mojo 🔥 language, for AI.

Mojo 🔥 is like Python++, includes strong types, full interoperability with python ecosystem but is able to run basic vanilla python, and has so so much more in it, but the main thing Modular is claiming is a whopping 68,000x improvement over vanilla python!

You didn’t misread this, 68,000 improvement, when using all the Modular inference compilers, and Mojo virtualization tricks and compilation improvements. It’s incredible.

The beauty of Mojo is that it meets developers where they are and allows them to adopt new features to achieve high performance gradually. By combining the best of dynamic and static languages, Mojo can deliver performance up to 68,000 times faster than Python today. That's quite a leap! If you want to delve deeper into Mojo's origin story, you can find more information in their documentation. But for now, let me highlight a few key benefits that Mojo offers:

Firstly, Mojo allows you to write everything in one language, merging the usability of Python with the systems programming features that typically require developers to rely on C, C++, or CUDA. This means that both research and deployment teams can work within a common codebase, streamlining the workflow from research to production.

Secondly, Mojo unlocks Python's performance potential. While Python is widely used, it may not be the best tool for high-performance or specialized hardware tasks. However, Mojo bridges that gap by enabling high performance on CPUs and providing support for exotic accelerators like GPUs and ASICs. With Mojo, you can achieve performance levels on par with C++ and CUDA.

Thirdly, and this is a big one, Mojo seamlessly integrates with the entire Python ecosystem. You can leverage the extensive library collection available in Python while making use of Mojo's features and performance benefits. This means you can easily combine libraries like NumPy and Matplotlib with your Mojo code – talk about flexibility!

Finally, Mojo allows you to upgrade your AI workloads effortlessly. By tightly integrating with the Modular AI Engine, Mojo empowers you to extend your AI workloads with custom operations. This includes pre-processing and post-processing operations, as well as high-performance mathematical algorithms. You can even integrate kernel fusion, graph rewrites, shape functions, and more. Mojo is all about expanding the possibilities!

Mojo’s playground has been around since May and I have a deep dive here but you should really watch for over 3 hours on everything from Why they chose to be a python superset, to why he thinks the community will pick it up, it’s an incredible watch and will make you excited about Mojo!

WebGPU ships with support for FP16 in Chromium

Chrome has shipped with WebGPU back in April of 23’, after years of development, it allows high performance 3D graphics (and of course, transformers inference) in the browser and on the web!

However, for inference of models, GPU access is not enough, you also need to be able to run smaller models. Well, one way to make models smaller is to run them in fp16 format. Essentially cutting the precision of the weights numbers by half, we can use much smaller (read compressed) models with a slight loss in accuracy.

Friends of the pod Nisten and Xenova (transformers.js author) have given us an update that a new, updated fp16 support has shipped in nightly of chromium, allowing for much much smaller models to be run on clientside!

OpenAI first dev conference (Announcement)

OpenAI has announced their first developer focused conference, to happen in SF during November 6th!

In person only (with the keynote being streamed to all) and they also said that they won’t do any model announcement like GPT-5 😂

But we'll all expect at least a few API updates!

Vision

FaceFusion 1.1.0 - a deepfake faceswapper (Announcement, Github)

We all know deepfakes are here, I mean, don’t we? But did you know that it’s now super easy to face swap your face into an image or a video?

FaceFusion does just that, an incredibly fast way to deepfake someone’s face into an image or a video with a few clicks, works on CPU (I couldn’t make it work on GPU but it’s possible) and shows incredible results!

Enjoy Steve Buschemi dance around as Harry Styles? 3 clicks and 10 minutes and you get this 🔥

Friend of the pod CocktailPeanut, has made it incredible easy to install with just 1 click with his pinokio.computer app, which I use and love!

Facefusion also has a webcam mode that is able to deepfake any image onto a webcam stream for a lot of fun on zoom calls! (which I wasn’t able to test for some reason)

HeyGen launches their deep AI face creator

Many of us used 11Labs to clone voices, but what if you can clone a voice AND an image of a person? With just 2 minutes of their recording?

That’s what HeyGen are claiming to be able to do, and we’ve previously reported that their incredible realistic AI avatar generation from videos/images + voice really blew us away.

Heygen just launched their service and you can sign up and get a few minutes for free, here’s a sample (with the CEO avatar, they couldn’t make my own for some launch day errors)

The video you see on top of just that, the CEO of HeyGen, thanking you for reading this weeks ThursdAI!

Voice

ASR leaderboard + New top ASR model from Nvidia

I love doing ThursdAI, and one of the things I love most, is folks sending me stuff they worked on, and then coming to ThursdAI to chat about it.

Friend of the pod Vaibhav (VB) Srivastav, who’s an incredible dev rel at HuggingFace, focusing on Audio, has shipped a new Open-ASR (automatic speech recognition) leaderboard on huggingface!

Showing the top ASR models like Whisper and a new comer, Nvidia FastConformer, which I didn’t even know existed, and now it’s topping Whisper for english speech to text tasks!

HuggingFace leaderboards like these are definitely a boon for the Open Source industry as they allow all of us to easily select open source models, but also allow the open source community to start racing towards the top, while we all benefit!

Tools

Open Interpreter (Announcement, Github)

One tool that I’ve used this week, and is incredible, is OpenInterpreter from @heyitskillian

It’s incredibly easy to install and run, and behaves like OpenAI Code Interpreter (renamed to Advanced Data Analytics) but on your computer, and is able to do things like control your apps, lower volume, edit images/files and tons more

pip install open-interpreter

And that’s it!

Give it a try (and you have to approve each command that it runs)

It’s a great agent, and hopefully we’ll get Killian to chat with us about it on next ThursdAI!

LangChain hub has launched (link)

If you’re into LangChain, and even if you aren’t, it’s undeniable the weight LangChain has in the ai engineer industry! They have a connector for everything, tons of folks use them, and they have raised a bunch of funding.

They have just launched their new LangChain Hub and it’s exciting! Many folks are sharing their best prompts on there, and ways to work with langchain, with upvotes and sharable links!

Also, worth nothing that our friends and Alessio from have recently released an episode with Harrison on Latent space, and it’s WELL worth listening (and reading) as did a deep dive into Landchain, it’s nay-sayers and everything in between!

Check it out below :

Thank you, see you next time (with some incredible personal news I’ll have to share)

ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.



This is a public episode. If you’d like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe
Thu, 07 Sep 2023 23:04:12 GMT
ThursdAI Aug 24 - Seamless Voice Model, LLaMa Code, GPT3.5 FineTune API & IDEFICS vision model from HF

Hey everyone, this week has been incredible (isn’t every week?), and as I’m writing this, I had to pause and go check out breaking news about LLama code which was literally released on ThursdAI as I’m writing the summary! I think Meta deserves their own section in this ThursdAI update 👏

A few reminders before we dive in, we now have a website (thursdai.news) which will have all the links to Apple, Spotify, Full recordings with transcripts and will soon have a calendar you can join to never miss a live space!This whole thing would have been possible without Yam, Nisten, Xenova , VB, Far El, LDJ and other expert speakers from different modalities who join and share their expertise from week to week, and there’s a convenient way to follow all of them now!

TL;DR of all topics covered

* Voice

* Seamless M4T Model from Meta (demo)

* Open Source LLM

* LLaMa2 - code from Meta

* Vision

* IDEFICS - A multi modal text + image model from Hugging face

* AI Art & Diffusion

* 1 year of Stable Diffusion 🎂

* IdeoGram

* Big Co LLMs + API updates

* GPT 3.5 Finetuninng API

* AI Tools & Things

* Cursor IDE

Voice

Seamless M4t - A multi lingual, mutli tasking, multimodality voice model.

To me, the absolute most mindblowing news of this week was Meta open sourcing (not fully, not commercially licensed) SeamlessM4T

This is a multi lingual model that takes speech (and/or text) can generate the following:

* Text

* Speech

* Translated Text

* Translated Speech

In a single model! For comparison sake, I takes a whole pipeline with whisper and other translators in targum.video not to mention much bigger models, and not to mention I don’t actually generate speech!

This incredible news got me giddy and excited so fast, not only because it simplifies and unifies so much of what I do into 1 model, and makes it faster and opens up additional capabilities, but also because I strongly believe in the vision that Language Barriers should not exist and that’s why I built Targum.

Meta apparently also believes in this vision, and gave us an incredible new power unlock that understands 100 languages and does so multilingually without effort.

Language barriers should not exist

Definitely checkout the discussion in the podcast, where VB from the open source audio team on Hugging Face goes in deeper into the exciting implementation details of this model.

Open Source LLMs

🔥 LLaMa Code

We were patient and we got it! Thank you Yann!

Meta releases LLaMa Code, a LlaMa fine-tuned on coding tasks, including “in the middle” completion tasks, which are what copilot does, not just autocompleting code, but taking into account what’s surrounding the code it needs to generate.

Available in 7B, 13B and 34B sizes, the largest model beats GPT3.5 on HumanEval, which is a metric for coding tasks. (you can try it here)

In an interesting move, they also separately release a specific python finetuned versions, for python code specifically.

Additional incredible thing is, it supports 100K context window of code, which is, a LOT of code. However it’s unlikely to be very useful in open source because of the compute required

They also give us instruction fine-tuned versions of these models, and recommend using them, since those are finetuned on being helpful to humans rather than just autocomplete code.

Boasting impressive numbers, this is of course, just the beginning, the open source community of finetuners is salivating! This is what they were waiting for, can they finetune these new models to beat GPT-4? 🤔

Nous update

Friends of the Pod LDJ and Teknium1 are releasing the latest 70B model of their Nous Hermes 2 70B model 👏

* Nous-Puffin-70B

We’re waiting on metrics but it potentially beats chatGPT on a few tasks! Exciting times!

Vision & Multi Modality

IDEFICS - a new 80B model from HuggingFace, was released after a years effort, and is quite quite good. We love vision multimodality here on ThursdAI, we’ve been covering it since we say that GPT-4 demo!

IDEFICS is a an effort by hugging face to create a foundational model for multimodality, and it is currently the only visual language model of this scale (80 billion parameters) that is available in open-access.

It’s made by fusing the vision transformer CLIP-VIT-H-14 and LLaMa 1, I bet LLaMa 2 is coming soon as well!

And the best thing, it’s openly available and you can use it in your code with hugging face transformers library!

It’s not perfect of course, and can hallucinate quite a bit, but it’s quite remarkable that we get these models weekly now, and this is just the start!

AI Art & Diffusion

Stable Diffusion is 1 year old

Has it been a year? wow, for me, personally, stable diffusion is what started this whole AI fever dream. SD was the first model I actually ran on my own GPU, the first model I learned how to.. run, and use without relying on APIs. It made me way more comfortable with juggling models, learning what weights were, and we’ll here we are :) I now host a podcast and have a newsletter and I’m part of a community of folks who do the same, train models, discuss AI engineer topics and teach others!

Huge thank you to Emad, Stability AI team, my friends there, and everyone else who worked hard on this.

Hard to imagine how crazy of a pace we’ve been on since the first SD1.4 release, and how incredibly realistic the images are now compared to what we got then and got excited about!

🎂

IdeaoGram joins the AI art race

IdeoGram - new text to image from ex googlers (announcement) is the new kid on the block, not open source (unless I missed it) it boasts significant text capabilities, and really great quality of imagery. It also has a remix ability, and is availble from the web, unlike… MidJourney!

Big Co LLMs + API updates

Open AI pairs with ScaleAI to let enterprises finetune and run finetuned GPT3.4 models!

This is an interesting time for OpenAI to dive into fine-tuning, as open source models inch closer and closer to GPT3.5 on several metrics with each week.

Reminder, if you finetune a GPT3.5 model ,you need to provide your own data to OpenAI but then also you have to pay them for essentially hosting a model just for you, which means it’s not going to be cheap.

Use as much prompting as humanly possible before you consider doing the above fine-tuning and you may be able to solve your task much better and cheaper.

Agents

The most interesting thing to me in the world of agents actually came from an IDE!

I installed Cursor, the new AI infused VsCode clone, imported my vscode settings, and off we went! It can use your own GPT-4 keys if you don’t want to send them our code or pay, it embeds your whole repo for easy import and code understand and does so much more, like adding a button to every error in console to “debug” and has an “new AI project” feature, which builds you a template just by typing a few words!

Our friends Alessio and Swyx have interviewed the founder of Cursor on their podcast, a strong recommendation to check that episode out!

After using Cursor for just a few days, I don’t want to go back to VSCode and even consider … maybe pausing my copilot subscription 🤯

That’s all for today folks! I wish you all a great week, and we’ll see you in the next ThursdAI 🫡

Thank you for reading ThursdAI - Recaps of the most high signal AI weekly spaces. This post is public so feel free to share it with a friend? Let’s get to 1K readers 🔥



This is a public episode. If you’d like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe
Fri, 25 Aug 2023 02:45:46 GMT
🎙️ThursdAI - LLM Finetuning deep dive, current top OSS LLMs (Platypus 70B, OrctyPus 13B) authors & what to look forward to
This is a free preview of a paid episode. To hear more, visit sub.thursdai.news

Brief outline for your convenience:

[00:00] Introduction by Alex Volkov[06:00] Discussing the Platypus models and data curation process by Ariel, Cole and Nathaniel[15:00] Merging Platypus with OpenOrca model by Alignment Labs* Combining strengths of Platypus and OpenOrca* Achieving state-of-the-art 13B model[40:00] Mixture of Experts (MOE) models explanation by Prateek and Far El[47:00] Ablation studies on different fine-tuning methods by Teknium

Full transcript is available for our paid subscribers 👇 Why don’t you become one?

Here’s a list of folks and models that appear in this episode please follow all of them on X:

* ThursdAI cohosts - Alex Volkov, Yam Peleg, Nisten Tajiraj

* Garage Baind - Ariel, Cole and Nataniel (platypus-llm.github.io)

* Alignment Lab - Austin, Teknium (Discord server)

* SkunkWorks OS - Far El, Prateek Yadav, Alpay Ariak (Discord server)

* Platypus2-70B-instruct

* Open Orca Platypus 13B

I am recording this on August 18th, which marks the one month birthday of the Lama 2 release from Meta. It was the first commercially licensed large language model of its size and quality, and we want to thank the great folks at MetaAI. Yann LeCun, BigZuck and the whole FAIR team. Thank you guys. It's been an incredible month since it was released.

We saw a Cambrian explosion of open source communities who make this world better, even since Lama 1. For example, LLaMa.Cpp by Georgi Gerganov is such an incredible example of how open source community comes together and this one guy in the weekend Took the open source weights and made it run on CPUs and much, much faster.

Mark Zuckerberg even talked about this, how amazing the open source community has adopted LLAMA, and that Meta is also now adopting many of those techniques and developments back to run their own models cheaper and faster. And so it's been exactly one month since LLAMA 2 was released.

And literally every ThursdAI since then, we have covered a new state of the art open source model all based on Lama 2 that topped the open source model charts on Hugging Face.

Many of these top models were fine tuned by Discord organizations of super smart folks who just like to work together in the open and open source their work.

Many of whom are great friends of the pod.

Nous Research, with whom we've had a special episode a couple of weeks back Teknium1 seems to be part of every orgm Alignment Labs and GarageBaind being the last few folks topping the charts.

I'm very excited not to only bring you an interview with Alignment Labs and GarageBaind, but also to give you a hint of two additional very exciting efforts that are happening in some of these discords.

I also want to highlight how many of those folks do not have data scientist backgrounds. Some of them do. So we had a few PhDs or PhD studies folks, but some of them studied all this at home with the help of GPT 4. And some of them even connected via ThursdAI community and space, which I'm personally very happy about.

So this special episode has two parts. The first part we're going to talk with Ariel. Cole and Natniel, currently known as GarageBaind, get it? bAInd, GarageBaind, because they're doing AI in their garage. I love it.

🔥 Who are now holding the record for the best performing open source model called Platypus2-70B-Instruct.

And then, joining them is Austin from Alignment Labs, the authors of OpenOrca, also a top performing model, will talk about how they've merged and joined forces and trained the best performing 13b model called Open Orca Platypus 13B or Orctypus 13B

This 13b parameters model comes very close to the Base Llama 70b. So, I will say this again, just 1 month after Lama 2 released by the great folks at Meta, we now have a 13 billion parameters model, which is way smaller and cheaper to run that comes very close to the performance benchmarks of a way bigger, very expensive to train and run 70B model.

And I find it incredible. And we've only just started, it's been a month. And so the second part you will hear about two additional efforts, one run by Far El, Prateek and Alpay from the SkunksWorks OS Discord, which is an effort to bring everyone an open source mixture of experts model, and you'll hear about what mixture of experts is.

And another effort run by a friend of the pod Teknium previously a chart topper himself with Nous Hermes models and many others, to figure out which of the fine tuning methods are the most efficient. and fast and cheap to run. You will hear several mentions of LORAs, which stand for Low Rank Adaptation, which are basically methods of keeping the huge weights of LAMA and other models frozen and retrain and fine tune and align some specific parts of it with new data, which is a method we know from Diffusion World.

And it's now applying to the LLM world and showing great promise in how fast, easy, and cheap it is to fine tune these huge models with significantly less hardware costs and time. Specifically, Nataniel Ruiz, the guy who helped Ariel and Cole to train Platypus, the co-author on DreamBooth, StyleDrop and many other diffusion methods, mentioned that it takes around five hours on a single A100 GPU to fine tune the 13B parameter model. That, if you can find an A100 GPU, that's around $10.

That's incredible.

I hope you enjoy listening and learning from these great folks, and please don’t forget to checkout our website at thursdai.news for all the links, socials and podcast feeds.

Brief outline for your convinience:

[00:00] Introduction by Alex Volkov

[06:00] Discussing the Platypus models and data curation process by Ariel, Cole and Nathaniel

[15:00] Merging Platypus with OpenOrca model by Alignment Labs

* Combining strengths of Platypus and OpenOrca

* Achieving state-of-the-art 13B model

[40:00] Mixture of Experts (MOE) models explanation by Prateek and Far El

[47:00] Ablation studies on different fine-tuning methods by Teknium

Full transcript is available for our paid subscribers 👇 Why don’t you become one?

Sun, 20 Aug 2023 20:05:13 GMT
ThursdAI Aug 17 - AI Vision, Platypus tops the charts, AI Towns, Self Alignment 📰 and a special interview with Platypus authors!

Hey everyone, this is Alex Volkov, the host of ThursdAI, welcome to yet another recap of yet another incredibly fast past faced week.

I want to start with a ThursdAI update, we now have a new website http://thursdai.news and a new dedicated twitter account @thursdai_pod as we build up the ThursdAI community and brand a bit more.

As always, a reminder that ThursdAI is a weekly X space, newsletter and 2! podcasts, short form (Apple, Spotify) and the unedited long-form spaces recordings (RSS, Zealous page) for those who’d like the nitty gritty details (and are on a long drive somewhere).

Open Source LLMs & Finetuning

Honestly, the speed with which LLaMa 2 finetunes are taking over state of the art performance is staggering. We literally talk about a new model every week that’s topping the LLM Benchmark leaderboard, and it hasn’t even been a month since LLaMa 2 release day 🤯 (July 18 for those who are counting)

Enter Platypus 70B (🔗)

Platypus 70B-instruct is currently the highest ranked open source LLM and other Platypus versions

We’ve had the great pleasure to chat with new friends of the pod Arielle Lee and Cole Hunter (and long time friend of the pod Nataniel Ruiz, co-author of DreamBooth, and StyleDrop which we’ve covered before) about this incredible effort to finetune LLaMa 2, the open dataset they curated and released as part of this effort and how quick and easy it is possible to train (a smaller 13B) version of Platypus (just 5 hours on a single A100 GPU ~= 6$ on Lambda 🤯)

We had a great interview with Garage BAIND the authors of Platypus and we’ll be posting that on a special Sunday episode of ThursdAI so make sure you are subscribed to receive that when it drops.

Open Orca + Platypus = OrctyPus 13B? (🔗)

We’ve told you about OpenOrca just last week, from our friends at @alignment_lab and not only is Platypus is the best performing 70B model, the open source community comes through with an incredible merge and collaborating to bring you the best 13B model, which is a merge between OpenOrca and Platypus.

This 13B model is now very close to the original LLaMa 70B in many of the metrics. LESS THAN A MONTH after the initial open source. It’s quite a remarkable achievement and we salute the whole community for this immense effort 👏 Also, accelerate! 🔥

Join the skunksworks

Speaking of fast moving things, In addition to the above interview, we had a great conversation with folks from so called SkunksWorks OS discord, Namely Far El, Prateek Yadav, Alpay Ariak, Teknium and Alignment Labs, and our recurring guest hosts Yam Peleg and Nisten covered two very exciting community efforts, all happening within the SkunksWorks Discord.

First effort is called MoE, Open mixture of experts, which is an Open Source attempt at replicating the Mixture of Experts model, which is widely attributed to why GPT-4 is so much better than GPT-3.

The second effort is called Ablation studies, which is an effort Teknium is leading to understand once and for all, what is the best, cheapest and most high quality way to finetune open source models, whether it's Qlora or a full finetune or Loras.

If you're interested in any of these, either by helping directly or provide resources such as GPU compute, please join the SkunksWorks discord. They will show you how to participate, even if you don't have prior finetuning knowledge! And we’ll keep you apprised of the results once they release any updates!

Big Co LLMs + API updates

In our Big CO corner, we start with an incredible paper from MetaAi, announcing:

Self-Alignment w/ Backtranslation method + Humpback LLM - MetaAI

Summarized briefly (definitely listen to the full episode and @yampeleg detailed overview of this method) it’s a way for an LLM to be trained on a unsupervised way of creating high quality datasets, for itself! Using not a lot of initial “seed” data from a high quality dataset. Think of it this way, fine-tuning a model requires a lot of “question → response” data in your dataset, and back-translation proposes “response → question” dataset generation, coming up with novel ways of saying “what would a potential instruction be that would make an LLM generate this result”

This results in a model that effectively learns to learn better and create it’s own datasets without humans (well at least human labelers) in the loop.

Here are some more reading material on X for reference.

OpenAI new JS SDK (X link)

OpenAI has partnered with StainlessAPI to released a major new version 4 of their TS/JS SDK with the following incredible DX improvements for AI engineers

* Streaming responses for chat & completions

* Carefully crafted TypeScript types

* Support for ESM, Vercel edge functions, Cloudflare workers, & Deno

* Better file upload API for Whisper, fine-tune files, & DALL·E images

* Improved error handling through automatic retries & error classes

* Increased performance via TCP connection reuse

* Simpler initialization logic

The most exciting part for me is, this is now very easy to get started with AI projects and get streaming on the incredible Cloudflare workers platform (Targum is part of the first Cloudflare workers launchpad but is not affiliated, we’re just superfans 🫶)

Vision & Multi Modality

There’s been some really cool stuff happening in computer vision and multi-modal AI recently. First up, a new method called 3D Gaussian Splatting that shows an incredibly clear and smooth way to generate 3d scenes from just a few images.

Compared to neural radiance fields (NeRFs), Gaussian splatting produces much smoother results without the grainy voxel artifacts NeRFs often have. However, it achieves this improved quality without sacrificing the speed and performance of NeRFs. So Gaussian splatting gives a big boost in realism compared to NeRF renderings, while maintaining real-time speeds in cleaning up those “clouds”

Supervision from Roboflow (and Piotr)

Btw our own friend of the pod and AI Vision expert @skalskiP (who reviewed Gaussian Splatting for us) is also having a crazy ThursdAI week, with their open source library called SuperVision, which is a computer vision toolkit, and is trending #2 on Github 👏

Apple stepping in their Vision (not the headset) Transformer game

Apple has open sourced ml-fastvit, which is their general purpose Vision Transformers model, which they claim runs at ~1ms on mobile devices, including code and pre-train weights available on Github 🔥

This is great to see from Apple ML teams, not only them open sourcing, but also them preparing all of us to the world of spatial computers (Vision Pro coming remember?) and many new Computer Vision heavy apps will be available at those incredible speeds.

This is also great for on device inference running these models in node / on edge (as Friend of the pod @visheratin demonstrated with WebAI)

Additional updates included Nvidia releasing a web playground for NeVa, which is their MLLM (Multimodal LLM, get used to seeing this term everywhere) and you can play with that here ), and Link-Context learning for MLLMs

Agents

OpenAi is also announced that Global Illumination joining OpenAI, that team is CEOd by the creator of Instagram stories algorithm and feed contributor and the team is behind a massive open world minecraft clone. Will we see OpenAI release agents into that world? We know that they are working on agents

A16Z - AI Town (🔗)

Speaking of agents roaming free and interacting, we covered the open sourcing of SmallVille just last week ↴ and now we see a new open source framework called AI Town of running letting agents roam and interact with each other from Andreessen Horowitz AI division.

AI Town (Github) is a web framework, written in TypeScript and is built to run, get customized and run with different LLMs (even Open source ones) in mind and you can see the AI agents running around in a live demo here

This ThursdAI was so packed with great information, that it’s really worth listening to the whole recording, you can do this on our Zealous page, RSS and on twitter (all those links can always be found on thursdai.news )

If you found this valuable, join our community and let your friends know? This is a great way to support us, as well as participate in the discussion on social, tag #thursdAI on anything you feel is worthwhile for us to summarize and



This is a public episode. If you’d like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe
Thu, 17 Aug 2023 22:47:14 GMT
ThursdAI Aug 10 - Deepfakes get real, OSS Embeddings heating up, Wizard 70B tops tops the charts and more!

Hey everyone, welcome to yet another ThursdAI update! As always, I’m your host, Alex Volkov, and every week, ThursdAI is a twitter space that has a panel of experts, guests and AI enthusiasts who join to get up to date with the incredible fast pace of AI updates, learn together and listen to subject matter experts on several of the topics.

Pssst, this podcast is now available on Apple, Spotify and everywhere using RSS and a new, long form, raw and uncut, full spaces recording podcast is coming soon!

ThursdAI - Is supported by readers, and I promised my wife I’d ask, if you find this valuable, why not upgrade your subscription so I can keep this going? Get better equipment and produce higher quality shows?

I started noticing that our updates spaces are split into several themes, and figured to start separating the updates to these themes as well, do let me know if the comments if you have feedback or preference or specific things to focus on.

LLMs (Open Source & Proprietary)

This section will include updates pertaining to Large Language Models, proprietary (GPT4 & Claude) and open source ones, APIs and prompting.

Claude 1.2 instant in Anthropic API (source)

Anthropic has released a new version of their Claude Instant, a very very fast model of Claude, with 100K, a very capable model that’s now better at code task, and most of all, very very fast!

Anthropic is also better at giving access to these models, so if you’ve waited in their waitlist for a while, and still don’t have access, DM me (@altryne) and I’ll try to get you API access as a member of ThursdAI community.

WizardLM-70B V1.0 tops OSS charts (source)

WizardLM 70B from WizardLM is now the top dog in open source AI, featuring the same License as LLaMa and much much better code performance than base LLaMa 2, it’s now the top performing code model that’s also does other LLMy things.

Per friend of the pod, and Finetuner extraordinaire Teknium, this is the best HumanEval (coding benchmark) we’ve seen in a LLaMa based open source model 🔥

Also from Teknium btw, a recent evaluation of the Alibaba Qwen 7B model we talked about last ThursdAI, by Teknium, actually showed that LLaMa 7B is a bit better, however, Qwen should also be evaluated on tool selection and agent use, and we’re waiting for those metrics to surface and will update!

Embeddings Embeddings Embeddings

It seems that in OpenSource embeddings, we’re now getting state of the art open source models (read: require no internet access) every week!

In just the last few months: - Microsoft open-sourced E5 - Alibaba open-sourced General Text Embeddings - BAAI open-sourced FlagEmbedding - Jina open-sourced Jina Embeddings

And now, we have a new metric MTEB and a new leaderboard from hugging face (who else?) to always know which model is currently leading the pack. With a new winner from this week! BGE (large, base and small (just 140MB) )

Embedding models are very important for many AI applications, RAG (retrieval augmented generation) products, semantic search and vector DBs, and the faster, smaller and more offline they are, the better the whole field of AI tools we’re going to get, including, much more capable, and offline agents. 🔥

Worth noting that text-ada-002, the OpenAI embedding API is now ranked 13 on the above MTEB leaderboard!

Open Code Interpreter 👏

While we’re on the agents topic, we had the privilege to chat with a new friend of the pod, Shroominic who’s told us about his open source project, called codeinterpreter-api which is an open source implementation of code interpreter. We had a great conversation about this effort, the community push, the ability of this open version to install new packages, access the web, run offline and have multiple open source LLMs that run it, and we expect to hear more as this project develops!

If you’re not familiar with OpenAI Code Interpreter, we’ve talked about it at length when it just came out here and it’s probably the best “AI Agent” that many folks have access to right now.

Deepfakes are upon us!

I want to show you this video and you tell me if you saw this not in an AI newsletter, would you have been able to tell it’s AI generated.

This video was generated automatically, when I applied to the waitlist by HeyGen and then I registered again and tried to get AI Joshua to generate an ultra realistic ThursdAI promo vid haha.

I’ve played with many tools for AI video generation and never saw anything come close to this quality, and can’t wait for this to launch!

While this is a significant update for many folks in terms of how well deepfakes can look (and it is! Just look at it, reflections, HQ, lip movement is perfect, just incredible) this isn’t the only progress data point in this space.

Play.ht announced version 2.0 which sounds incredibly natural, increased model size 10x and dataset to more than 1 million hours of speech across multiple languages, accents, and speaking styles and emotions and claims to have sub 1s latency and fake your voice with a sample of only… 3 seconds! 🤯

So have you and your loved ones chosen a code word to authenticate over the phone? Or switched to a verifiable communication style? While those of us with multiple accents don’t yet have to worry, everyone should stop believing any video or voice sample from now on, it’s just inevitable that all of that will be deepfaked and we should start coming up with ways to authenticate content.

If you made it this far, and any of the above was new/important to you, why not support this pod/newsletter/community? If you’d like to sponsor us more directly, please ping me at altryne [at] gmail.com , I’m also open to consulting, and if you’re a great company, Developer Relations positions :)

Finally, we’ve talked for a whopping 2 hours on the spaces, and that whole conversation can be heard on our Zealous page which has transcripts, AudioGrams of key moments, and space summarizations!

And the Long form space recordings can be added to your podcatcher separately if you’d prefer the “ThursdAI raw feed” by using this RSS link, and will come as it’s own podcast very soon! Thanks to our friends at Zealous

Thank you,

Alex Volkov.

Host

CEO @ Targum.video

AI Consultant with free slots (Lets Talk)



This is a public episode. If you’d like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe
Thu, 10 Aug 2023 22:31:42 GMT
ThursdAI Aug 3 - OpenAI, Qwen 7B beats LLaMa, Orca is replicated, and more AI news

Hi, today’s episode is published on a Friday, it’s been a busy week with at least 4 twitter spaces, countless DMs and research!

OpenAI announces UX updates

* Example prompts: No more staring at a blank page!

* Suggested replies: ChatGPT automatically synthesizes follow up questions. Then you just click a button

* GPT-4 by default: When starting a new chat as a Plus user, ChatGPT will remember your previously selected model!

* 4. Uploading multiple files is now supported in the Code Interpreter beta for all Plus users.

* 5. Stay logged in: You’ll no longer be logged out every 2 weeks and if you do, we have a sweet new welcome page!

* 6. Keyboard shortcuts: Work faster with shortcuts, Try ⌘ (Ctrl) + / to see the complete list.

ThursdAI - I stay up to date so you don’t have to

Alibaba releases Qwen7b

* Trained with high-quality pretraining data. Qwen-7B pretrained on a self-constructed large-scale high-quality dataset of over 2.2 trillion tokens. The dataset includes plain texts and codes, and it covers a wide range of domains, including general domain data and professional domain data.

* Strong performance. In comparison with the models of the similar model size, outperforms the competitors on a series of benchmark datasets, which evaluates natural language understanding, mathematics, coding, etc.

* Better support of languages. New tokenizer, based on a large vocabulary of over 150K tokens, is a more efficient one compared with other tokenizers. It is friendly to many languages, and it is helpful for users to further finetune Qwen-7B for the extension of understanding a certain language.

* Support of 8K Context Length. Both Qwen-7B and Qwen-7B-Chat support the context length of 8K, which allows inputs with long contexts.

* Support of Plugins. Qwen-7B-Chat is trained with plugin-related alignment data, and thus it is capable of using tools, including APIs, models, databases, etc., and it is capable of playing as an agent.

This is an impressive jump in open source capabilities, less than a month after LLaMa 2 release!

GTE-large a new embedding model outperforms OPENAI ada-002

If you’ve used any “chat with your documents” app or built one, or have used a vector database, chances are, you’ve used openAI ada-002, it’s the most common embedding model (that turns text into embeddings for vector similarity search)

This model is ousted by an OpenSource (nee. free) one called GTE-large with improvements on top of ada across most parameters!

OpenOrca 2 preview

Our friends from AlignmentLab including Teknium and LDJ have discussed the release of OpenOrca 2! If you’re interested in the type of finetuning things these guys do, we had a special interview w/ NousResearch on the pod a few weeks ago

OpenOrca tops the charts for the best performing 13B model 👏

Hyper-write releases a personal assistant

You know how much we love agents in ThursdAI, and we’re waiting for this field to materialize and I personally am waiting for an agent to summarize the whole links and screenshots for this summary, and… we’re not there yet! But we’re coming close, and our friends from HyperWrite have released their browser controlling agent on ThursdAI. Talk about a full day of releases!

I absolutely love the marketing trick they used where one of the examples of how it works, is “upvote us on producthunt” and it actually did work for me, and found out that I already upvoted

Superconductor continues

I was absolutely worried that I won’t make it to this thursdAI or won’t know what to talk about because, well, I’ve become a sort of host and information hub and a interviewer of folks about LK-99. Many people around the world seem interested in it’s properties, replication attempts and to understand this new and exciting thing.

We talked about this briefly, but if interests you (and I think it absolutely should) please listen to the below recording.

ThursdAI - See ya next week, don’t forget to subscribe and if you are already subscribed, and get value, upgrading will help me buy the proper equipment to make this a professional endeavor and pay for the AI tools! 🫡



This is a public episode. If you’d like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe
Fri, 04 Aug 2023 21:58:30 GMT
🧪 LK99 - The superconductor that can change the world, and the K-drama behind it!
This is a free preview of a paid episode. To hear more, visit sub.thursdai.news

First of all, let me address this from the get go, I’m not a material scientist! I am pretty good at finding information in twitter’s incredibly noisy info stream. (hey, this is how I bring you AI updates every ThursdAI)

Since LK-99 is potentially groundbreaking and revolutionary, I’ve compiled a twitter list of everyone who I found credible, interested and a source of new information, and there’s now over 1.5K followers to this list alone!

Since this clearly is interesting to a lot of you, I reached out to a few prominent people on this list, and asked them to join a twitter space, to try and stitch together an update on the current state of LK-99, replication attempts, history and lore, as it stands a week after the original papers release.

If you found this interesting, you’re the type of person who wants to stay up to date, feel free to subscribe and keep this Substack alive!

First of all, let’s do some level setting. Superconductors are real, we’ve used them in MRI machines for example, but the currently available superconductors need extremely low temperature and high pressure to well.., and the promise of a room temperature and ambient pressure superconductor is the holy grail of energy use.

For a breakdown on what superconductors are, and what they can mean for the world, I strongly recommend this thread from Andrew Cote (published presciently a full two weeks before the LK-99 paper) or watch this incredible breakdown:

July 22nd, the LK-99 arXiv day!

On July 22nd, two papers describing “worlds first room temperature superconductor” were uploaded to arXiv:

2307.12008 - Sukbae Lee, Ji-Hoon Kim, Young-Wan Kwon (submitted by Kwon)

and after 2 hours and 20 minutes another paper was uploaded

2307.12037 - Sukbae Lee, Jihoon Kim, Hyun-Tak Kim, Sungyeon Im, SooMin An, Keun Ho Auh (Submitted by Hyuntak Kim)

You may notice that the first two authors on both papers are Sukbae Lee and Ji-Hoon Kim, and in fact LK stands for Lee and Kim and 99 in the LK-99 name stands for the year 1999 they have started research on this.

You may also notice that YW Kwon who submitted the first paper, is not included on the second one, and in fact, is no longer part of the Quantum Energy Research Institute (Aka QCentre) where he was a CTO (he’s no longer listed on the site)

If this shakes out, and SC is replicated, there’s definitely going to be a Netflix series on the events that led to YW Kwon to release the paper, after he was no longer affiliated with QCentre, with limited information so let’s try to connect the dots (a LOT of this connecting happened on the ground by Seo Sanghyeon and his friends, and translated by me. Their original coverage has a LOT of details and is available in Korean here

Let’s go back to the 90s

On the LinkedIn page of Ji-Hoon Kim (the page turned blank shortly before me writing this), JH Kim showed that he started working on this back in 1999, and they estimated they have a material that contained “very small amount of superconductivity” and together with Sukbae Lee, in 2018 they have established QCentre to complete the work of their Professor Emeritus of Chemistry at Korea University, the late Choi Dong-Sik (1943-2017) who apparently first proposed the LK-99 material (following the 1986 bonanza of the discovery of high temperature superconductors by IBM researchers).

Fast forward to 2017, a wish expressed in a last will and testament starts everything again

Professor Choi passed away, and in this will requested follow-up research on ISB theory and LK-99 and Quantum Energy Research Institute is now established by Lee and Kim (LK) and they continue their work on this material.

In 2018, there’s a potential breakthrough, that could have been an accident that led to the discovery of the process behind LK-99?

Here’s a snippet of Seo Sanghyeon explaining this:

Kwon Young-Wan the ex-CTO

Kwon is a Research Professor at Korea University & KIST, is the third author on the first arXiv paper, and the submitter, was previously the CTO, but at the time of the paper to arXiv he was not affiliated with QCentre for “some months” according to an interview with Lee.

He uploads a paper, names only 3 authors (Lee, Kim and Himself) and then surprisingly presents LK-99 research at the MML2023 international conference held in Seoul a few days later, we haven’t yet found a video recording, however a few reports mention him asking for an interpreter, and talking about bringing samples without demonstration and proper equipment.

Important to note, that

Enter Hyun-Tak Kim

H.T Kim is probably the most cited and well-known professor in academia among the folks involved. See his google scholar profile, with a D-index of 43 and has 261 publications and 11,263 citations.

He’s a heavy hitter, and is the submitter and listed as the author of paper number 2 submitted to arXiv, 2 hours and 20 minutes after paper number 1 above.

In the second paper, he’s listed as the third author (and the submitter to arXiv) and his contribution is acknowledged like so:

An author, Hyun-Tak Kim (H. T. Kim),’s knowledge on mechanisms of both superconductivity and the metal-insulator (gap-nogap) transition highly contributed to writing the mechanism part. The knowledge was acquired over 20 years by processes of performing national projects including project [Grant 2017-0-00830] funded by Institute for Information and Communications Technology Promotion (IITP) in MSIT of Korea government in ETRI. H. T. Kim left ETRI on Nov. of 2022.

In the first paper H.T. is not acknowledged, and is only mentioned in in reference no. 52 to his paper from 2021.

Ok enough about the people Alex! Does the rock levitate?

In January, QCentre youtube channel uploaded an unlisted video that showed magnetic properties of LK-99 and another video, with partial levitation is widely shared on social media.

The partial levitation shown is attributed to the Meissner Effect and is a supposed proof of room temperature super conductivity. However, these two videos are inconclusive and are not enough for us to take QCentre claims at face value.

The scientific community, having been stung by a recent incident surrounding a supposed room temp superconductor, where the evidence was apparently falsified (Dais et. al.) are not so easily swayed.

Adding to that, the mess around the multiple papers, showing different theories, the lack of peer review, or independent replication, the surprised publication, and a rushed follow up publication, all makes people wonder, what is going on here? This doesn’t seem like a fabricated attempt.

Summary of replication attempts so far (Sun, Jul 20)

Given the importance of this discovery, and the “relative” triviality of replication, common enough materials, the process is not extremely complex (but kids, do not try this at home) so we can bet that “furnaces in solid-state materials labs around the world have been cooking yesterday and today to try to reproduce” [Science Magazine]

We have reports from China that supplies of Led Apatite are running dry as many are trying to replicate quietly?

Additional reports from India where Dr. VPS. Awana, the Chief scientist at CSIR-NPL and team are trying to replicate, with results expected as early as tomorrow (Monday, Jul 31) and has been emailing with Lee

In addition to this, we’ve had Andrew McCalip from Varda space who has been live-tweetin, twitch streamin his “Meissner effect or bust” campaign to reproduce LK-99, while the world watches (Andrew joined the space as well) and provides ideas, materials and an outpour of support for this gung-ho, almost cowboy effort.

We’ve also had folks from MIT who claimed that professors who want to remain anonymous, and went to MML2023 are also in contact with the team and are trying to test the material.

Replication failure is … not a failure?

Discussing the replication attempts with experts on stage, we all concluded that there are likely 2 ways for the world to know wether LK-99 is a superconductor.

* Replication succeeds and scientists analyze the replicated sample

* QCentre team provides a sample, and some very smart independent folks put it under a microscope, a magnetism analysis and a bunch of other measurements and confirm that it’s a superconductor at room temperature.

While we wait for either of those, I encourage you to check out the resources, the space recording, and the list of folks I’ve collected to stay in the loop!

Here’s a list of relevant links:

* Paper 1 DOI

* Paper 2 Arxiv

* Paper 3 Arxiv

* New Scientist Interview

* ChosunBiz Interview (Korean)

* Yonhap Interview (Korean)

* Twitter List

And the list of folks who participated in the space, give them a follow:

* Alex Volkov (@altryne)

* Seo Sanghyeon (@sanxiyn)

* Ate-a-Pi (@8teAPi)

* Andrew McCalip (@andrewmccalip)

* Andrew Cote (@Andercot)

* Ely Rabani (@radsci)

* Robotbeat (@Robotbeat)

* Marsh Ray (@marshray)

* Ben (@BenShindel)

* Ken Condon (@KenCondon1)

* Jesus (@jesuslares_me)

* Danielle Fong (@DanielleFong)

For your convenience, attached is an AI transcription of the space with speakers and timestamps (may be off by a few minutes) :

[00:02:40] Alex Volkov (@altryne): Hello. Hello, everyone. There's a lot of you here, and I wanna welcome a shoot for up on stage while we wait for a few more guests, and then we can get started. Thank you so much for taking the time joining us. as you're as interested as all of us in this very exciting, very confusing, very potentially groundbreaking news. So I wanna introduce 2 folks up on stage 2 folks up on stage already, and bringing up another one just now. And hey, Andrew. Hey.

[00:03:18] Alex Volkov (@altryne): Hey, How are you guys?

[00:03:23] Ben (@BenShindel):

Doing well. How are you?

[00:03:27] Alex Volkov (@altryne): A little bit you know, the palms are a little bit sweaty. This is a insane turnout. Twitter is indeed a public space on because that we have. And, hopefully, spaces or two spaces, whatever they call it now, will hold. And I only invite Sam here to speak as well. Hey, Tobias. How are you?

[00:03:51] Ate-a-Pi (@8teAPi):

I'm good. I'm good. So good to good to, you know, hear from you guys in person, Alex. Thanks for putting the space together.

[00:04:00] Alex Volkov (@altryne): Thirdly. Andrew, we're gonna introduce Andrew, but many folks who are here already follow you and and follow your work. How how's your evening going, Andrew?

[00:04:12] Andrew McCalip (@andrewmccalip):

Lee, this has been a wild ride. Thanks for putting all this together. It's gonna be great to get all the information in one place for the first time. This is my first time experiencing the full volume of the Internet, and just been a a lot of fun to see all the positivity around the progress.

[00:04:29] Alex Volkov (@altryne): That's great. So I'll do my best that, you know, Mother think this. I will maybe preface this that I am not a scientist. Many of the terms that we'll hear today in the space I've heard for the first time a couple of days ago. What I am is a Twitter for many, many years, and I have collected a a list of folks who I I personally wanted to follow to kinda see the updates as they roll out, and we've seen many, many things roll out very quick. with a lot of confusion and different replication attempts from different places. And I just compiled the list for myself. I started following.

[00:05:08] Alex Volkov (@altryne): 8 to buy had incredible incredible content diving into the the timeline. I found I I wanna introduce thank you. Am I saying this right? I think you need to hit the the mute button in a mute. If this is your first time talking on RESTASIS. let me know if you're able to do that. And if not, we'll try to solve this. And out as I was collecting folks, And I I started seeing that Andrew started doing their application attempts and even doing Twitch.

[00:05:46] Seo Sanghyeon (@sanxiyn):

Can you hear me?

[00:05:47] Alex Volkov (@altryne): Can you hear me? We can hear you. Hey, Sam Kim. How are you?

[00:05:57] Seo Sanghyeon (@sanxiyn):

It it it's the noon in South Korea, and I'm fine.

[00:06:01] Alex Volkov (@altryne): the afternoon. Right?

[00:06:03] Seo Sanghyeon (@sanxiyn):

It's 1. Yes. Yes. It's the 1 PM.

[00:06:06] Alex Volkov (@altryne): Awesome. And so I was just doing an introduction maybe as you were telling up, you maybe not heard some of it. However, folks in the audience who followed this kind of thread and how we came to be here I have a a thread that I'll post on top here that has all the folks from the Twitter list that I forgot. And San Kyung and his his team is basically the reason for the space. Me and Nathan kind of found Sunqun. Am I saying Sunqun correctly? Is that is that the right way to say this?

[00:06:41] Seo Sanghyeon (@sanxiyn):

My name is. Your your, yeah, your pronunciation is not actually not.

[00:06:48] Alex Volkov (@altryne): Okay. I'll I'll turn my best to put months at the at the right names. And so we both me and 8 to 5, a a 34 in Saint Kyung, who's in Seoul currently, and definitely speaks the language we don't speak, and so there's a lot of insight and translation. And so, yeah, I guess we'll will get started, so feel free to present yourself, and then talk a little bit about your last few days and how you came around getting in this topic. and then how kinda what you found so far.

[00:07:28] Seo Sanghyeon (@sanxiyn):

I I didn't really expect to to speak.

[00:07:30] Alex Volkov (@altryne): That's okay. That's okay.

[00:07:32] Seo Sanghyeon (@sanxiyn):

That's put me put me on the spot. Yeah.

[00:07:34] Alex Volkov (@altryne): I don't wanna put you on the spot, but give us maybe a brief summary.

[00:07:44] Ate-a-Pi (@8teAPi):

Maybe maybe do you do you want me to help Sanyon?

[00:07:47] Seo Sanghyeon (@sanxiyn):

Yes, please. Okay. You you have read my right top, so maybe maybe you can explain what's going on.

[00:07:57] Ate-a-Pi (@8teAPi):

Okay. So I'm I'm just gonna I'm just gonna just to preface everything, I I'm writing a work of fiction. So all of you guys are just participating in an experiment. So but I'm trying to keep everything to kinda, like, factual and trying to interpret what what is kind of happening on the ground. Right? Shyam is much more factual, and he he has actually been doing a primary source work. So he's been actually digging up the actual Korean language science papers. He's been sitting down with friends They've kinda, you know, summarized and kind of tried to understand what's going on.

[00:08:36] Ate-a-Pi (@8teAPi):

And he's really the one that's, you know, put together this that that the you know, the the the mentor, you know, whose name, I think, in some transliterations comes out to TS's chair, some Donsick He the mentor was basically in superconductors in this idea of this kind of 1 dimensional super and he had this theory.

[00:09:00] Seo Sanghyeon (@sanxiyn):

That so the name is che. che. Oh, sure. Yeah. Yeah. Yeah. He was a a professor in the Korean University's Department of Chemistry.

[00:09:13] Ate-a-Pi (@8teAPi):

Yeah. And and so he he had this idea, this theory, and he had graduate students. and one of those graduate students was Lee, and Lee kind of took up the mantle of this this theory. And then they, you know, tied up with who was an experiment list.

[00:09:37] Ate-a-Pi (@8teAPi):

And then they kinda discovered this trace this coast of a trace of a material in 1990 And at that point, what happens is having discovered this trace, their path kind of diverge this, and Kim, the experimentalist, goes on to do a masters, not in superconductors. So he does his masters in something else, and then he does the battery materials kind of PhD, and he graduates in 2008.

[00:10:12] Ate-a-Pi (@8teAPi):

while Lee continues on the superconductor path, does experimental any when he publishes his PhD. It's both a theory and synthesis of superconductors. And then he graduates, and then he he goes to work as a science adjunct professor, which we which we just found out. Like, a computer science adjunct professor, and he's there for about, you know, 4, 5 5 years. He doesn't publish. And and I'm guessing at this point, he kinda gets, like, you know, cashier out of out of academia completely, and he sets up a consulting firm, basically, Q Center.

[00:10:50] Ate-a-Pi (@8teAPi):

And they start taking on consulting work. And and then, again, the timeline is a little bit unclear on whether or not they continue to work on on the on on the product on what they discovered. And what happens then is in 2017, Chey Dongksik passes.

[00:11:18] Ate-a-Pi (@8teAPi):

And as he passes, he he gets his former students together, and he asked them to finish off what they started to find this superconducting material that they saw a ghost of a trace of in 1999. And he passes, and they have no money. basically. Song Young has done, again, primary source research, and, you know, the the office space is basically, like, like, a two story building, you know, somewhere in the you know, in in Seoul. It's a very modern kind of office. They don't have much money.

[00:11:57] Ate-a-Pi (@8teAPi):

My guess my guess is that they need Kim. because KIM is the experimentalist, and I'm guessing also that none of the theory works at this point. The only thing that they have to go on is that they actually did find something in 1999. And Kim, I'm guessing, is also quite practical because he didn't do he didn't pursue the superconductors for the PhD. Right? Because he's quite practical, he's like, dude, you get me money. I'll join you. You don't have money. I'm not joining you for your wild goose, Jason. Right?

[00:12:36] Ate-a-Pi (@8teAPi):

So Lee goes out and he recruits Kwan. And Kwan is kind of like you know, he's he's a US PhD. He has a research university, you know, position. recruit them, and they get funding. And I think I think Sam Young, you were you were saying that Kwon is the one on the, you know, National Science Foundation of Korea's like you know, list, like, grant. Right? I I think that's what you said.

[00:13:08] Seo Sanghyeon (@sanxiyn):

So the paper mentions the public grant from South Korea. called the National Resource Foundation, which is like National Science Foundation in United States. And Korn is listed as a primary invest mitigate our PI, if then.

[00:13:25] Ate-a-Pi (@8teAPi):

Right?

[00:13:26] Alex Volkov (@altryne): Mhmm.

[00:13:27] Ate-a-Pi (@8teAPi):

Yeah. Yeah. That's right. Okay. So he he's the PI. So they recruit him as the PI, and Jade Kim, who is, you know, Lee's partner, basically leaves his very comfortable position as a research director in a hearing aid test.

[00:13:44] Seo Sanghyeon (@sanxiyn):

Yeah.

[00:13:44] Alex Volkov (@altryne): Yeah. Yes.

[00:13:45] Seo Sanghyeon (@sanxiyn):

Yes. Yeah. Hearing aid Yeah. I Or the eye test there? Yeah. Yeah. For the ISER tech and in manufacture, the battery is specialized for the hearing aid. code. It is a medical device. They have a different standard from other batteries. And company a small business in South Korea, but seems competitive worldwide.

[00:14:13] Alex Volkov (@altryne): So he leaves his let me let me -- Yeah. Go ahead. Just real quick and to give folks a quick summary. The main paper that we saw the explosion from that was published on July 22nd, so a week and and almost a day we're, like, almost 8 days into this. The three people that you you just said, besides the first professor, Choi or chair or Troy and or several places write it separately. So the the three people, SoftBank, Jihoon Kim, which is the LK in LK 99, right, Lee and Kim. And the third person you just mentioned is Young Wan, Kwan. Yes.

[00:14:52] Alex Volkov (@altryne): Those are the the 3 authors on the paper that kind of was published on our side out of the blue. 8 days ago. Please continue.

[00:15:03] Ate-a-Pi (@8teAPi):

Right. And then so at this at this point, they're in 2017, And, you know, Lee goes out and does the fundraising. He recruits Kwan, who's the research professor, Kwon is basically he's on the paper. He he's he's the principal investigator on the grant, but he's still a professor at university. So he's basically, I'm guessing, like, a day a day in the, you know, in the office at Q Center, very modest place. I think the grand size is pretty small, and they get this ESR machine.

[00:15:41] Ate-a-Pi (@8teAPi):

And again, from what I can tell, the ESR machine only came knows how to use it. Because none of the other people are actually synthetic, you know, synthesis people. They're all like theory guys, Kuan is a physicist. And Kim himself, JH Kim himself, he's looking for something which you have to know what you're looking for, right? Because that's what he says in his LinkedIn. He's like, I'm looking for some if you don't know what you're looking for, then forget about it. Right?

[00:16:19] Ate-a-Pi (@8teAPi):

But he he knows what he's looking for, and they refine, they refine, and they refine, and he keeps doing experiments. He keeps refining the experiment, and he goes through, like, a 1000 iterations. And somehow, starting in 2018, somehow, By the middle of 2018, they find it. So that that's a surprising thing for me because they've I I I suspect they they've been working on it you know, before or, you know, Jay and Lee had a breakthrough on their theory, so they knew how to narrow the workspace down. But somehow in at the end of the day, Kim is the one grinding.

[00:16:58] Ate-a-Pi (@8teAPi):

Through that 1000 experiments, finally, to get, you know, a sample that works.

[00:17:03] Seo Sanghyeon (@sanxiyn):

And then they start by -- No. No.

[00:17:05] Alex Volkov (@altryne): No.

[00:17:05] Ate-a-Pi (@8teAPi):

No.

[00:17:05] Alex Volkov (@altryne): No.

[00:17:05] Seo Sanghyeon (@sanxiyn):

No. No. No. No. No. No? So so besides the two papers, there is a paper published in April returning query. And In their own words, they describe what what prompted their breakthrough in 2018.

[00:17:27] Seo Sanghyeon (@sanxiyn):

and it said that so so they are putting the material in a quartz tube And because they called it to best courts to cancel and Brooke, And the material left after the breaking of the glass was had the property they wanted. So so it was an accidental discovery.

[00:18:02] Ate-a-Pi (@8teAPi):

So can can you repeat that? Like, they what what happened? They put it in the quartz tube, and the quartz tube broke accidentally?

[00:18:10] Seo Sanghyeon (@sanxiyn):

Yes.

[00:18:10] Alex Volkov (@altryne): Yes. Yes.

[00:18:11] Seo Sanghyeon (@sanxiyn):

I see. And and And that what's the breakthrough in 2018? I see. It's what I'm saying.

[00:18:19] Alex Volkov (@altryne): Yeah. I just wanna confirm what I hear. The breaking of the course you led to the incidental discovery. This is this is the the breakthrough as it's written in the first paper in Korea? Yes. Yes. Okay. So I'll just call ASAP, I'll just give it back for some logistics. Folks, if you look up on on top of the space, there's a few tweets we're pinning. And as we go along, we're gonna add some information on top of this. The 3rd the third we pin from dystopian breaker has a link to the original kind of Korean paper. So please go ahead, Datapai.

[00:18:54] Seo Sanghyeon (@sanxiyn):

So so quick -- Okay. point.

[00:18:56] Alex Volkov (@altryne): Yeah.

[00:18:56] Ely Rabani (@radsci):

Go ahead. Go ahead. This this could be important because, you know, as as soon as you expose it to the atmosphere, your getting hydration. And hydration, you know, might be harmful, might be helpful. From this, like, little account, it seems like it it it either didn't do anything or was helpful. But, like, no what temperature it was at when it broke, and and things like that could could actually be really pertinent.

[00:19:30] Ate-a-Pi (@8teAPi):

Yeah. So, absolutely, like so it's not they he does do the 1000 experiments, but the 1000 experiments, whether that gets him there or not, at one point in the experiment, the quartz tube breaks, that gets them there. They get lucky. Right? So they get they get lucky. And then after that, things proceed pretty quick They isolate they isolate it, and then they they get the crystallization. They start working on the papers. They start on the patents, and they start also trying to figure out the chemical vapor deposition process. They seem to have made some way some headway on the chemical vapor deposition process.

[00:20:06] Ate-a-Pi (@8teAPi):

And then, you know, sometime around September 2021, something start happening. Quant takes a position, sabbatical at, I think, Seoul University at that point. I'm not sure whether that means he's putting more time in the office or not. And then that fast forwards to yeah. Go go ahead, Sunggham.

[00:20:33] Seo Sanghyeon (@sanxiyn):

No. No.

[00:20:33] Alex Volkov (@altryne): No.

[00:20:33] Ate-a-Pi (@8teAPi):

You go ahead. Okay. So that fast forward about March 2023 when basically the international patent has been filed. And Kuan leaves the team at this time. I'm not sure when Kim comes on board. That's not very to me at what point Yum Tuck comes on board.

[00:20:57] Ate-a-Pi (@8teAPi):

So I'm guessing it's after the nature, the nature paper gets dinged in 2020, And and and, you know, the the other thing that strikes me also is that every single person on the team is very aware of every single hoax in superconductors to date. Right? They they they all know the space well, They've seen every single hoax before. They know they know what the hoaxes look like. They know what to look for. They know what diamagmatism is. So I I I don't think yeah.

[00:21:29] Seo Sanghyeon (@sanxiyn):

Go ahead. So the date is So the day before the yesterday, Andrew McCully posted on his Twitter the translation of the Korean paper at Doctor Lloyd. Is that correct? And can can you so so how did you translate and can Can you say something about it?

[00:21:59] Alex Volkov (@altryne): Andrew, I think he's Frank to you. So I can just ring to you. You posted a translated paper also. Right?

[00:22:08] Andrew McCalip (@andrewmccalip):

Yes. Now that was just a machine translation from Google. That was just a very cursory translation.

[00:22:19] Seo Sanghyeon (@sanxiyn):

Okay.

[00:22:19] Ate-a-Pi (@8teAPi):

So in basically, quantity is team in March, and then you have the kind of papers being released, you know, haphazardly. The next the next point that of them is that they had started releasing the papers app as early, like, late last week.

[00:22:42] Alex Volkov (@altryne): And and then and then we have -- And by the way, I think it's it's important to highlight by Kwan, the guy who's no longer affiliated with with QCenter. Like, this this sole endeavor a business venture that's funded for for this for this purpose. Kwan is no longer affiliated with that. We've seen Sankyo posted an interview in Korea from Friday where I think both of the and Kim say that Kwan, the guy who published the first paper, is no longer affiliated.

[00:23:12] Alex Volkov (@altryne): there were some speculation as to maybe the limit of three people on the paper is the limit of the Nobel Prize or 2 or 3 authors. I don't have this confirmed, but this is speculation going around. And it's important to note like, both of them say that the paper was not ready when it was released, and it was released by Juan, the guy who left the first paper. 2 hours later, 2 than 20 minutes later, another paper gets released in the in the same archive with, I wouldn't say, 5 authors. not including Kwan. Right?

[00:23:48] Ate-a-Pi (@8teAPi):

So Lee -- Yeah. And -- The user the the user name is TumTuk team, the the college professor from, you know, Virginia is the username who who pushes the r archive paper at that Yeah.

[00:24:04] Seo Sanghyeon (@sanxiyn):

Chantakim is a big name with the 18 days of 45, and If you look at the paper, there is an error message in Korean saying that Bloomberg could not be found. It is a neutral error message when you did the some of the typesetting wrong.

[00:24:27] Seo Sanghyeon (@sanxiyn):

And You just don't probably see the room temperature, sugar conductor paper with the error deaths that had to bookmark cannot be found if you are following if you are in not in emergency.

[00:24:52] Alex Volkov (@altryne): So so it does feel to us at least from the summary so far that the paper that Quang released has different information than than the second paper, and the second paper feels like it was released in the Harry and included more people that currently work at Q Center, including Hyundai Kim. And Sonja, owner, you this question. You mentioned his h h score or something score. Can can you explain the importance of that score for him talking?

[00:25:20] Seo Sanghyeon (@sanxiyn):

creates someone else to the explanation.

[00:25:24] Ate-a-Pi (@8teAPi):

Okay. So so the h score is, you know, because we have a web web savvy audience here. It's kind of like a page rank for, you know, researchers. It shows you how influential how influential the researcher was, and so a higher score means that more people have been citing your paper.

[00:25:45] Ben (@BenShindel):

Go ahead, Ben. Yeah. More precisely. So, like, an h index of, say, 40 means you have 40 papers that each have 40 citations or more. That's a little tricky to understand. So, like, if I get another paper that has only 30 citations, it won't affect my h index at all. I have to get a 41st paper that has 41 citations to to to make it rise.

[00:26:07] Alex Volkov (@altryne): So I think it's it's safe to say HUNTAKIM, the guy who submitted the second paper, potentially haphazardly. Correct? Like, we're we're we're saying there's 2 hours after the first one. So likely prompted by these events is a well well sighted very well sighted scientist with a very high kind of confidence score. It's not like a random person of the street that decide that there's now a superconductor of room temperature and, you know, verified it.

[00:26:41] Seo Sanghyeon (@sanxiyn):

Okay. Sorry for being side tracked, but I just checked the the motion related to Korean paper or not to talk through it by Andrew. And on the page 5, we clearly said that the quartz tube was destroyed due to internal pressure during rapid cooling of reaction and etcetera. So I think, in fact, nobody really read ready carefully. It is it is just there about the quartz tube once destroyed.

[00:27:19] Ate-a-Pi (@8teAPi):

Yeah. So I think I think it's yeah. Definitely, like, probably the the rest of us are are are not very close readers. of of that paper.

[00:27:29] Seo Sanghyeon (@sanxiyn):

So so We can we can continue on after the upload to the archive.

[00:27:42] Ate-a-Pi (@8teAPi):

Indeed. So okay. So they they they it goes into our our archive, and then all of the events of the last week happen you know, I don't think any of us expected any of the events to happen. So we've all just been kind of, like, following along and seeing what happens next. I had no idea that there was a metallics conference in South Korea, and I I definitely had, like, no idea that you know, one of the authors would show up there, and it gets posted on Twitter. And so and then and then Seung Young points it out on the FM Korea Football message board.

[00:28:20] Ate-a-Pi (@8teAPi):

And so we translate, you know, what the audience reaction was in in in a bad translation to get -- So -- -- whatever message was across.

[00:28:30] Alex Volkov (@altryne): -- mind let me interject here because this is around the that I found out about this. Alex, frozen coffee. Alex, I forgot his nickname. We invited him here. He posted a a very long Twitter thread that got the attention of the algorithm and then boosted of this room template ambin pressure, superconductor paper from Korea. I think he only started talking about the first paper, and then after the second paper also came out. And I think at this point, or somewhere around there. Andrew, you found out about this. What what did you first hear about, you know, Twitter drama around LK 90 Right?

[00:29:08] Alex Volkov (@altryne): And, Andrew, feel free to at least produce you know, introduce yourself officially and BARDA and how you're interacting with this.

[00:29:16] Andrew McCalip (@andrewmccalip):

Yeah. So I was just cruising the Internet at night, and this came across. I think my my Twitter feed And so I I'm incredibly curious. This is something that has been a bit of a a hobby for me. And so I was always interested in superconductors, so it it caught my attention. I'm a mechanical engineer. So full disclosure. I am not a subject matter expert. I am simply an aerospace engineer that has a lot of curiosity and some assets at his disposal.

[00:29:50] Andrew McCalip (@andrewmccalip):

And so reading this paper, it it struck me just the simplicity of of the process. And so I realized that I probably had the ability to replicate with full fidelity, the process that was described in the paper. And so that within about 30 minutes, I I realized I should simply start down this road that Twitter was already picking up at the time.

[00:30:21] Andrew McCalip (@andrewmccalip):

There's some conversations going back and forth and the it was the classic scenario where on every superconductor discussion, there is the same conversation that happens over and over again. And this synthesis appeared so simple that it seemed that the most expedient thing was to simply test it physically. And so my my work is very receptive of of after hours projects. I'm I'm known as the the guy that has really aggressive hobbies, let's say.

[00:30:57] Andrew McCalip (@andrewmccalip):

And so I'm always in the back doing something interesting with materials or automation. So within 30 minutes of reading the paper, I had ticked off orders to various chemical suppliers. I've reached out to overseas vendors. to try to procure a couple of the the elements. And so it was just kind of an offhand comment that I made on Twitter and and then the ball really started rolling, and I realized that everyone wanted to see this this made.

[00:31:32] Andrew McCalip (@andrewmccalip):

And so it was just supposed to be a a a fun little project, but I was really overwhelmed by the the response. Everyone wanted to to see this done. I think there's this incredible curiosity, there's this incredible drive. People wanna see, like, incredible things happen for the the the human race. And so something if this magnitude pops up, everyone's motivated to drop everything and investigate. And I think that's where we're at.

[00:32:08] Alex Volkov (@altryne): And I think you met the algorithm at the right place where folks were excited about the future and think this could bring a lot of changes around the future, and you started saying, hey. You know? Here's a here's a direct approach. Let's try to replicate this. And I I wanna just highlight the fact the the materials involved in creating this. And the process, some folks say and please talk about this. Some folks say that has been an attempt at a hoax, it wouldn't be as simple. They wouldn't have released a simple instruction manual kind of quote, unquote simple that many labs around the work they replicate given the materials and and the right equipment. Right?

[00:32:48] Ely Rabani (@radsci):

So -- Yeah.

[00:32:48] Alex Volkov (@altryne): So -- -- straightforwardness of this potentially shows some stuff.

[00:32:51] Ely Rabani (@radsci):

So this this is a good time for for a PSA. I mean, I know that that Andrew is well aware of this, and and and many of peep of the people who've been following it. But in case anybody who's listening isn't. The these compounds in vapor form at any rate are are highly talked music, and you you have to know lab safety. If you're gonna start trying to experiment with them, you need things like, a glove box and, you know, all kinds of PPE, a fume hood, everything else. Taking risks with this kind of thing is just really not worth it.

[00:33:31] Alex Volkov (@altryne): I I I can't stress that. Absolutely. Don't try this at home.

[00:33:36] Andrew McCalip (@andrewmccalip):

kids definitely. Yeah. Absolutely. There's a lot of chatter in the beginning in the first couple hours about this can be replicated in a garage And, you know, I thought it was interesting. I thought maybe we've got the opportunity to to do it safely. we've got all the right equipment. We've got, you know, the the 1,000,000 of dollars of equipment that support our spacecraft business. that allow us to do some of these things safely. And so I thought Twitter wants to live vicariously through somebody why not do this?

[00:34:12] Andrew McCalip (@andrewmccalip):

I ended up being in sort of an interesting middle ground because I'm not in academia. I'm also not trying to commercialize any part of this tech. really just doing it for fun because it's incredibly interesting. So I've got no skin in the game except for making this work in a transparent manner. and then getting the materials into the hands of the experts.

[00:34:34] Andrew McCalip (@andrewmccalip):

So I thought if we can leverage some of our equipment and some of our, you know, very smart people that we have, to speed this timeline up, I didn't see anybody in the United States being vocal about trying to do replication there are so many stories coming out of other parts of the world that all the labs, there must be thousands of furnaces burning right now trying to replicate this. But I wanted to get material into the hands of some local experts in California.

[00:35:09] Andrew McCalip (@andrewmccalip):

And so that's really our our goal is, you know, can we can we sort of be the face of of the Internet do this experiment in a safe manner and then help advance the science and be sort of a forcing function to to doing this replication.

[00:35:27] Alex Volkov (@altryne): So, Andrew, just before just a a small pause before you continue, I want to ask the other, Andrew, here. The Andrew code, if if you're able to unmute and and and talk us if you're available about the potential reasons why all of Twitter jumped on this. Andrew Kot, you had a thread on room temperature superconductors. About 2 weeks before this, like, almost a permanent is kind of a threat. And could you give us some summary first of all, feel free to introduce yourself, but also some summary of what this means if this replicates, what this means for the world.

[00:36:07] Alex Volkov (@altryne): Applications, you know, give us, like, some excitement of what happens if this is an actual ambient pressure in room temperature superconductor? Andrew? Does not look like Andrew is Oh, hey.

[00:36:33] Andrew Cote (@Andercot):

Sorry. My my audio cut out for a second. I I missed the prompt. Oh, here you are. Let you only -- Sure. Yeah. Thanks. Thanks very much.

[00:36:44] Alex Volkov (@altryne): So so folks so so I I explained to folks your thread about MBN, you know, pressure room temperature superconductors that you've offered, what, 2 weeks before the paper came out. And then suddenly, this dropped. And I wanted you to highlight some of the potential applications of superconductors and give us some of the highlights of what happens in this replicating. This is an actual, you know, real thing.

[00:37:08] Andrew Cote (@Andercot):

Yeah. Sure. So it's kind of a funny thing. Yeah. I put that thread out there 7 weeks before this story broke. You know, just I have worked with this kind of stuff in in a few different areas now, so it's very, you know, superconducting radio frequency cavities are standard technology in accelerator physics to fill these to work in.

[00:37:31] Andrew Cote (@Andercot):

Like, my first job in physics was actually in a condensed matter lab using a a scanning tunneling microscope to look at, you know, electronic structures of potential high temperature superconductors So this has always been sort of like a holy grail of material science, like sort of a holy grail of applied physics. It's one of these properties it's one of these materials where the bulk properties come from its quantum mechanical behavior. And and, you know, when quantum mechanics and its effects escape the realm of the very tiny, it can really manifest as as magical phenomenon at our scale in the world of the kind of the bulk matter or the big stuff.

[00:38:10] Andrew Cote (@Andercot):

So, you know, superconductors are used currently today, You know, it's it's they've reached engineering applicability through decades of continuous refinements and improvements. And and some of the biggest things to think about in what lets these things get used in industrial applications is their ability to superconducts at higher and higher temperatures And, also most also importantly, is to operate at higher and higher background magnetic field strengths. And so the way to think about this is that a superconductor, it's allowing current to move through it with zero resistance, but it also perfectly spells magnetic fields.

[00:38:48] Andrew Cote (@Andercot):

And there's an operating point of these materials where it's basically the current density and the temperature and the magnetic field kind of put the bounds or the performance envelope on the material. So some conductors can carry tons of current, but they can't exist in a very high field. And so, you know, those are hard to make as useful. You can use them for carrying, like, electricity, which is awesome, but often what you really wanna do is generate very strong magnetic fields. So I think maybe the most familiar to the most people here would be, like an MRI machine. Right?

[00:39:27] Andrew Cote (@Andercot):

Magnetic resonance imaging. So the idea there is you're generating very high strength field, and magnetic fields are measured in Tesla, for example. So just for just for context, you know, 3 Tesla is a is a pretty strong field, and that's what is about the strength using an MRI. So, you know, MRIs use these cryogenically cooled magnets, or or they're not I don't think cryogenically cooled. They're actually often just copper, but they do have cooling. But they generate this high strength field, and then, you know, it kind of sets all these little protons in your body spinning and dancing in a little, you know, kind of radiating energy.

[00:40:03] Andrew Cote (@Andercot):

And then you have a pickup coil, which is like an antenna, and the antenna is trying to pick up that energy and kinda reconstruct what's going on in your body. And this is how we can get, like, a really high detailed, high fidelity, three-dimensional image of what's going on inside someone without any invasive surgery. So it's, like, you know, MRIs are a real kind of amazing breakthrough in medical imaging. Superconductors if they could work without cryogenics would really simplify and make cheaper and more available, high resolution, high fidelity, three d images of people's bodies.

[00:40:35] Andrew Cote (@Andercot):

not just for making the magnetic fields, but also for picking up the signal emitted by the protons that get put into motion by the field in the first place. So it's kind of, like, one sort of off the shelf example. I think another one that's kind of under the radar, we don't think about it's not just in carrying electricity without resistance, which is useful for long range, like energy transmission, that kind of stuff. But if you look at the national grid, I mean, only 5, 7 percent of energy total, which is still significant, but it's, you know, single digit percentage ends up, you know, burning as weight You're suddenly muffled.

[00:41:11] Alex Volkov (@altryne): I don't think yeah. You're suddenly a voice like your -- Oh, better.

[00:41:18] Andrew Cote (@Andercot):

Now it's better. Okay. Sorry about that. Yeah. So just gonna say so, you know, National Grid Scale Energy Production. Right? So trans transmitting the energy to its endpoint consumption, there's a bit of waste heat along the way. But what's what's also important to think about is how that energy is produced. It's produced also using high strength magnetic fields. And I was looking into this. There's a a experiment where these guys used sort of more modern high temperature superconducting tape to, you know, retrofit a large DC generator then it had, like, a 36 percent power improvement, right, which is pretty substantial. That's that's a that's a serious win.

[00:41:58] Andrew Cote (@Andercot):

Yeah. So there's there's, you know, sort of thousands of places this stuff could be used that would really just, like you know, it would either greatly improve the performance efficiency, reduce the cost, increase the accessibility of what we think of as, like, high technology like MRIs or particle accelerators. But it would also just decrease the cost of basic things like electricity generation and distribution And that's just the beginning. Right? So, you know, this kind of stuff there's a really good analogy here actually with the transistor, you know, for for years, scientists, then electrical engineers and physicists, they had this idea of a transistor. Right?

[00:42:35] Andrew Cote (@Andercot):

If only we could have some kind of simple, reliable, current model supplier. We could design all these wonderful things. We could design all these different kinds of logic functions and so forth. And so there was this search for the transistor people were searching for something that could do that, and they had anticipated all the places it could be used ahead of time. And it wasn't until at Bell labs, you know, a very kind of funny crossover here. One of the guys that's on the patent for the transistor is John Bardine. and John Bardeen's actually the only guy to win 2 Nobel Prizes. 1 was for the transistor. The other was for the theory of superconductivity, right, which is Barting Cooper Schiffer Theory, BCS.

[00:43:14] Andrew Cote (@Andercot):

So, again, it's one of it's one of those things where, you know, physicists, scientists, engineers kinda thought about this for a long time, realize this be amazing. And there's been this, you know, really complicated random walk through the configuration space of possible materials, right, which is so high dimensional. There's so many things you can construct. So I think it's I'm very optimistic about the field in general. I think one thing to think about with this particular result there's so much artisanal craft and and mastery that goes into producing these materials in a reliable, consistent way You know, science people don't often recognize. It's a lot of art involved too. Right?

[00:43:52] Andrew Cote (@Andercot):

Like like, things that are reduced to expert practice us and know how. And so I'd I'd just be cautious on, you know, jumping to conclusions either on this particular result, if it's if it's valid right now. But, also, if some labs can't fail to reproduce it, it doesn't actually rule it out entirely. I I think there's scientists that have traveled to Korea to work with the original authors. I look closely at that. You know, I'd also you know, I my internal odds are kind of like a 1 in 6 chance, this pans out, and it and it could be big.

[00:44:21] Andrew Cote (@Andercot):

But that doesn't mean that it's the end of the search or the end of the race, and I'm and I'm also optimistic that Getting people to understand what the massive long term and large scale social benefits of this kind of discovery could be could help direct a lot more basic science research towards this field. You know, I think we spend a lot of things on, like, how to make smartphone cameras better and not a lot of things on and not as much as we could spend on things like high temperature superconductors. And this is a final example.

[00:44:48] Andrew Cote (@Andercot):

I mean, so right now, you know, I work as a accelerator engineer, accelerator is a type of magnetic confinement fusion reactor The reason the company I work for can't exist, and and the reason there is this current burn and boom in nuclear fusion, is because we've engineered these high temperature superconductors to work in higher and higher magnetic fields, at at higher and higher temperatures. And and the big economic breakthrough there came when we can have these superconductors that can work at liquid nitrogen temperatures, right, which is 77 kelvin. And it's a lot cheaper to make liquid nitrogen and run that kind of cryogenics than it like liquid helium at, like, 4 Kelvin.

[00:45:24] Andrew Cote (@Andercot):

So, you know, we're already reaping some of the benefits of this sort of tech stack maturing over time. And I think really just getting started in terms of, like, the hunt for promising materials. I mean, I'm hoping this results in positive publicity and more effort, more energy, put into the field. I think if this doesn't pan out as the thing, you know, don't give up hope. Right? I mean, this is a long term game. Science sees by starts and stops. There's no fundamental physics here that's impossible. Right? There's no physical principle that says this can't work. Right? This isn't like a a momentumless or massless propulsion drive like the EM drive.

[00:46:04] Andrew Cote (@Andercot):

isn't, like, superluminal neutrinos. Right? Those things kind of break laws of physics. This is very much in the realm of, yeah, physically possible. seems seems very you know, in my mind, seems likely there could be something out there given the complexity of state space of electronic structures and given how you know, how large that space of exploration can be. And, yeah, so I think I'm just kind of you know, this is a great time to be interested in material science to appreciate basic science research and educating ourselves on on how good the future can be. You know, I think there's a lot of narratives right now in society and cultural in general. that kinda say, like, you know, you know, we we can't solve our way out of our biggest problems today. Right?

[00:46:43] Andrew Cote (@Andercot):

And and I'm very much on the other side of that debate. I think we can. I think it's through efforts like this. I think it's through people like Andrew at Varda that are willing to do stuff in their backyard or their garage or their fact or their their work workplace on their extra time. You know? I mean, this is the kind of this is the the let's build mentality. Right? And so I think we can build our way out of the world's greatest problems, and I its fundamental scientific advances like this discovery could be that that kind of paved the way out of there too. So, yeah, overall, very optimistic.

[00:47:11] Andrew McCalip (@andrewmccalip):

Andrew? That that's incredibly well said. That is an incredibly well balanced viewpoint. So how would you advise people to absorb the the next week of the new cycle? I mean, we're very much on a you know, we're we're back dead. We're back type of hype cycle. So how do you advise people to think about the results that they're seeing knowing that this is a a very difficult thing to replicate when it just because it a negative result is shown in a lab that doesn't mean it's not physically possible.

[00:47:49] Andrew McCalip (@andrewmccalip):

It's very difficult to prove the negative here. So tell us how we should absorb the new cycle coming up in the next few days.

[00:47:59] Ate-a-Pi (@8teAPi):

So I I I I I I might I might say something about that. I think I think this is basically tacit knowledge transfer, and you Kim Kim seems to have been this kind of, like, artisanal, like, you know, experiment list. So you need people to actually sit there in the lab with this guy, and he needs to demonstrate to them. And they need to pick up and and there might be things that he does, which he didn't write down. That that's the like, my my take on it given that He is the experiment list. He's the synthesis on on the team.

[00:48:38] Ate-a-Pi (@8teAPi):

Given that the team seems to have been only, like, 5 or 6 people, is that this guy is the maybe the only person in the world as of, like, you know, 18 months ago. I'm guessing that, you know, he managed to transfer some of that to the JungTux team. So I'm guessing that at at least one more one more team on on earth has this now. And I'm guessing that this knowledge transfer is now happening to a couple more people. So so you need to see this progress maybe 2 or 3 cycles for, like, a bunch of other people to have learned the skill, and then that's when that's when things get interesting.

[00:49:14] Seo Sanghyeon (@sanxiyn):

I mean, you don't really need to replicate to to verify this. There, the the team can just the team has the working samples. they can adjust the samples to the laps around the world.

Hey, the rest of the episode is for paid subscribers to thursdai.

I encourage you to subscribe or upgrade your subscription to access it, there’s almost 2 more hours of in depth conversation, stitching of facts, experts on material science, physics, electrical engineering and MIT folks chiming in. It’s really a great space, around 25K folks have listened to it on twitter so far.

Sun, 30 Jul 2023 22:09:53 GMT
🎙️ThursdAI - Jul 27: SDXL1.0, Superconductors? StackOverflowAI and Frontier Model Forum

⏰ Breaking news, ThursdAI is now on Apple Podcasts and in this RSS ! So use your favorite pod-catcher to subscribe or his this button right here:

Our friends at Zealous have provided an incredible platform for us to generate these awesome video podcasts from audio or from twitter spaces so if you prefer a more visual format, our deep thanks to them!

P.S - You can find the full 2 hour space with speakers on our Zealous page and on Twitter

Here’s a summary of the main things that happened in AI since last ThursdAI:

🧑‍🎨 Stability.ai releases SDXL1.0

* Generates 1024px x 1024x stunning images

* High high photorealism

* Supports hands and text

* Different (simpler?) prompting required

* Fine-tunes very well!

* Supports LORAs, ControlNet in-painting and outcropping and the whole ecosystem built around SD

* Refiner is a separate piece that adds high quality detail

* Available on Dreamstudio, Github, ClipDrop and HuggingFace

* Also, is available with incredible ComfyUI and can be used in a free Colab!

Image Credit goes to Thibaud

Superconductors on Hugging Face? What?

Honestly, this has nothing immediate to do with AI updates, but, if it pans out, it’s so revolutionary that it will affect AI also!

Here’s what we know about LK-99 so far:

* 2 papers released on arXiv (and hugging face haha) in the span of several hours

* First AND second paper both claim extraordinary claims of solving ambient superconductivity

* Ambient pressure and room temp superconductive material called LK-99

* Straightforward process with a clear replication manual and fairly common materials

* Papers lack rigor, potentially due to rushing out or due to fighting for credit for nobel prize

* The science is potentially sound, and is being “baked and reproduced in multiple labs” per science mag.

Potential effects of room temperature superconductivity on AI:

While many places (All?) can benefit from the incredible applications of superconductors (think 1000x batteries) the field of AI will benefit as well if the result above replicates.

* Production of GPU and CPU is power-constrained and could benefit

* GPU/CPUs themselves are power-constrained while running inference

* GPT-4 is great but consumes more power (training and inference) than previous models making it hard to scale

* Local inference is also power-restricted, so running local models (and local walking robots) could explode with superconductivity

* Quantum computing is going to have a field day if this is true

* So will fusion reactors (which need superconductors to keep the plasma in place)

As we wait for labs to reproduce, I created a twitter list of folks who are following closely, feel free to follow along!

AI agents protocol, discussion and state of for July 2023

* Participated in an e2b space with tons of AI builders (Full space and recap coming soon!)

* Many touted AI agents as a category and discussed their own frameworks

* Folks came up and talked about their needs from the agent protocol proposed by e2b

* Agents need to be able to communicate with other agents/sub agents

* Tasks payloads and artifacts and task completion can be async (think receiving a response email from a colleague)

* The ability to debug (with timetravel) and trace and reproduce an agent run

* Deployment, running and execution environment issues

* Reliability of task finish reporting, and evaluation is hard

Frontier model forum

* OpenAI, Anthropic, Google, and Microsoft are forming the Frontier Model Forum to promote safe and responsible frontier AI.

* The Forum will advance AI safety research, identify best practices, share knowledge on risks, and support using AI for challenges like climate change.

* Membership is open to organizations developing frontier models that demonstrate safety commitment.

* The Forum will focus on best practices, AI safety research, and information sharing between companies and governments.

* Some have expressed concern that this could enable regulatory capture by the “Big LLM” shops that can use the lobbying power to stop innovation.

StackOverflow AI - “The reports of my death have been greatly exaggerated”

Stack overflow has been in the news lately, when a graphic of it’s decline in traffic has become viral.

They have publicly disputed that information claiming they have moved to a different measuring and didn’t update the webpage, but then also… announced Overflow AI!

* AI search and aggregation of answers + ability to follow up in natural language

* Helps drafting questions

* AI answers with a summary, and citations with the ability to “extend” and adjust for your coding level

* VSCode integration!

* Focusing on “validated and trusted” content

* Not only for SO code, stack overflow for teams will also embed other sources (like your company confluence) and will give you attributed answers and tagging abilities on external content

This has been an insane week in terms of news (👽 anyone?) and superconductors and AI releases! As always, I’m grateful for your attention! Forward this newsletter to 1 friend as a favor to me if you learned something new? Or alternatively, retweet us on twitter for bigger reach!

Thank you! See you next ThursdAI (and on Sunday when I release the State Of Agents recap 😅 )

ThursdAI - Get in on this, and share w/ 1 friend 🫡



This is a public episode. If you’d like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe
Thu, 27 Jul 2023 23:05:55 GMT
ThursdAI - Special Episode, interview with Nous Research and Enrico Shippole, fine-tuning LLaMa 2, extending it's context and more

Hey there, welcome to this special edition of ThursdAI. This episode is featuring an interview with Nous Research, a group of folks who fine-tune open source large language models to make them better. If you are interested to hear how finetuning an open source model works, dataset preparation, context scaling and more, tune in!

You will hear from Karan, Teknium, LBJ from Nous Research and Enrico who worked along side them.

To clarify, Enrico is going in depth into the method called Rope Scaling, which is a clever hack, that extends the context length of LLaMa models significantly and his project LLongMa which is an extended version of LLaMa with 8000 token context window.

The first voice you will hear is Alex Volkov the host of ThursdAI who doesn’t usually have a lisp, but for some reason, during the recording, twitter spaces decided to mute all the S sounds.

Links and acknowledgments:

* Nous Research - https://nousresearch.com/ (@nousresearch)

* Redmond Puffin 13b - First LLaMa Finetune

* LLongMa - LLaMa finetune with 8K context (by Encrico, emozilla and KaioKenDev)

* Nous-Hermes-Llama2-13b-GPTQ - Hermes Finetune was released after the recording 🎊

Psst, if you like this, why don’t you subscribe? Or if you are subscribed, consider a paid subscription to support #ThursdAI

Show transcription with timestamps:

Alex Volkov - targum.video (@altryne)[00:00:55] Yeah. That's awesome. So I guess with this, maybe, Karan, if you if you are able to, can you you talk about Nous research and how kind of how it started and what the what are you guys doing, and then we'll dive into the kind of, you know, Hermes and and Puffin and the methods and and all of it.

karan (@karan4d)[00:01:16] Absolutely. Nous research. I mean, I I myself and many other of us are just, like, enthusiasts that we're fine tuning models like, you know, GPTJ or GPT 2. And, you know, we all are on Twitter. We're all on Discord, and kind of just found each other and had this same mentality of we wanna we wanna make these models. We wanna kinda take the power back from people like OpenAI and anthropic. We want stuff to be able to run easy for everyone. And a lot of like minds started to show up.

karan (@karan4d)[00:01:50] I think that Technium's addition initially to Nous research, Jim, kinda showing up. And himself, I and human working on compiling the Hermes dataset was really what came to attract people when Hermes came out. I think we just have a really strong and robust, like, data curation thesis in terms of that. And I think that have just some of the most talented people who have come to join us and just volunteer and work with us on stuff. And I absolutely must say, I can see in the in the listeners is our compute provider, Redmond AI.

karan (@karan4d)[00:02:30] And, you know, none of this none of these models would be possible without Redmond's generous sponsorship for us to be able to deliver these things lightning fast, you know, without making us through a bunch of hoops just a a total total pleasure to work with. So I would I have to shell and say, you know, I highly recommend everyone check out Redmond as because they really make our project possible.

Alex Volkov - targum.video (@altryne)[00:02:52] Absolutely. So shout out to Redmond AI and folks give them a follow. They're the the only square avatar in the audience. Go take them out. And, Karan, thanks for that. I wanna just do a mic check for teknium. Teknium. Can you speak now? Can you? Can I hear you?

Teknium (e/λ) (@Teknium1)[00:03:08] Yeah. My phone died right when you were introducing me earlier.

Alex Volkov - targum.video (@altryne)[00:03:10] Yep. What's up, Eric? -- sometimes on Twitter basis. Welcome, Technium. So briefly, going back to question. I don't know if you heard it. What besides the commercial and kind of the the contact window, what kind of caught your eye in the llama, at least the base until you guys started, or have you also, like, the other guys not had a second to play with the base model and dove into fine tuning directly?

Teknium (e/λ) (@Teknium1)[00:03:35] Yeah. The only thing that really caught my eye was the chat model and how horribly RLHF it was.

Alex Volkov - targum.video (@altryne)[00:03:41] Yeah. I've seen some conversations about and kind of the point of Ira, RLHF as well. And okay. So so now that we've introduced Neus research, sorry, I wanna talk to you guys about what you guys are cooking. Right? The we've seen, the the Hermes model before this was, like, loved it as one of the, you know, the best fine tunes that I've seen at least and the the the most performing ones. Could you guys talk about the process to get to the Hermes model, the previous one? and then give us things about what coming soon?

karan (@karan4d)[00:04:16] Teknium, you got this one. man.

Teknium (e/λ) (@Teknium1)[00:04:22] Yeah. It was basically I saw Alpaca, and I wanted to make it like, remake it with GPT 4, and then from there and just pretty much exclusively included anything that was GPT 4 only, and that was the beginning of the thesis for that. Going forward, though, We still have a lot of low quality data, I think, in Hermes data set that can be cleaned out, and then there's a lot of new data sets that have come out that I wanna start merging into there. also wanna move to something like chat ML or even Vikura format so that we can do some multi turn stuff. It's not very great, long chat.

Alex Volkov - targum.video (@altryne)[00:05:03] Yeah.

karan (@karan4d)[00:05:03] Within within within the Hermes dataset, you know, a lot of it is public available stuff that's particularly GPT 4. Of course, Technium's massive GP teacher dataset. We also have a bunch of GPT 4 data we had generate that we didn't release necessarily just yet, as well as an instruction set that's particularly focused on tasks like Python, transformers, linguistics, very small dataset of that. That's inside Hermes that, you know, we don't really talk about much, but figure that we'll put some exposure to right now on the spaces. And yeah.

Alex Volkov - targum.video (@altryne)[00:05:42] That's awesome. And so the previous Hermes was released on top of LAMA 1, and for many folks, know, obviously, they couldn't use this for different commercial points. And now that this model relates, what the models that you guys release, are you thinking about the license of them? And could you talk about, like, the availability of folks using them in commercial standing now that, you know, the the back of it is commercially available.

LDJ (@Dogesator)[00:06:07] Mhmm. I think we have puffin licensed us MIT I'll have to doublecheck on our own own model. I think that's right, Curran, right, or Tech?

karan (@karan4d)[00:06:18] Yeah. I think so either that or Apache 2 point Like, if if the base model is commercially usable, you know, the stuff we put out is you're good to go. It's -- Yeah.

LDJ (@Dogesator)[00:06:29] So And, like, in our announcements, I put in kind of, you know, one of the main things. It's it's commercially available. the first Nous as far as I think yeah. I'm pretty sure it's the first commercially available Nous model that's released, and a big differential data from Hermes is the fact that, like tech was saying, Hermes is pretty much all single turn data. And it's surprisingly can do pretty decent at multiturn conversations when you actually use it. But then puffin is almost kind of, like, a 180 where it's a vast majority really on context multi turn data.

LDJ (@Dogesator)[00:07:09] And oh, I think can you guys hear me so? I can hear. Okay. It's just something's up with that. Okay. Yeah. So puffin is a vast majority, multi turn data, GPT 4 specifically, and a lot of it is actually real human conversations with GPT for that go on for some of them 4k 6 k context, like, even all the way up to the max 8 k context length of GPT 4. And then we took those few thousand conversations of real humans interacting with GPT 4. And now after that, I'm not sure if you've A lot of people probably heard of Camel AI.

LDJ (@Dogesator)[00:07:46] So they have the physics, biology, chemistry, and mathematics data set. And then within those, there's a bunch of subtopics that you can carry it through. And I just pretty much spent a good few days curating just handpicking the right subtopics, like differential geometry, logic problems, optimization problems, a bunch of different GPT, for examples, and responses from those different subtopics. And then I specifically added those in certain ways to the puffin dataset.

Alex Volkov - targum.video (@altryne)[00:08:17] Awesome. So just just looking for the audience maybe. The puffin model that I think the official name is the red redmon puffin 7B or, sorry, 13B. Yes. This is this is the model that you guys fine tuned, and one of the first is maybe not the first fine tune of llama v Two. that's now publicly available, like you said, maybe with MIT license on Huggingspace, and I think you even added the GGML quantized version. Correct? Mhmm. So and so folks can can go and download that and and already start playing with this. And so first of all, thank you for contributing to the open source. That's great to see. And the speed with which you guys are fine tuned on this is also great to see.

Alex Volkov - targum.video (@altryne)[00:08:55] And maybe now that we've introduced this, maybe this is like repeating a bit. So could you speak about the the difference so the difference is the in the data set, in the task that you fine tune? Like, what is the actual difference between the Hermes or the Hermes that's coming out and the Puffin model? What would people use them for differently? Is that like that? That's a question.

Teknium (e/λ) (@Teknium1)[00:09:21] The profit model definitely be better at multi turn stuff. That's for sure. Yeah.

nisten (@nisten)[00:09:28] So if you want to do anything like OpenAI I'll I'll paste the link above the GGML version of it because I I really I'm I'm gonna test it thoroughly, but I I really think because you guys have use GPT 4, high quality, multi turn conversations, then this can have actual, like, practical use for whoever else was to use it either as, like, something that tells you about the documentation on the site or walks a user through. In other words, this should be better than Hermes then in for, like, customer service stuff, which is just one example.

nisten (@nisten)[00:10:08] Anyway, yeah, I'm gonna try. I'll I'll paste the the link above.

karan (@karan4d)[00:10:14] It's it's likely better for production use alongside, like, stuff that you have with, like, a retrieval pipeline, like, with lang chain, etcetera. Like, I I would believe that without to get it, you know, or just talking, of course. But, you know, there is even though, you know, with this Lima tech unique of of small examples where we can get, like, a a really good really good model that does really well.

karan (@karan4d)[00:10:41] The thing about Hermes dataset and just its size and the various types of data and topics that are in there, I think you get a totally different like, role play or storytelling experience or completion experience with Hermes. Personally, I feel that way.

Alex Volkov - targum.video (@altryne)[00:11:01] Awesome.

Teknium (e/λ) (@Teknium1)[00:11:01] So and that. Another thing about Puffin Dataset is that it does go up to, like, 8K and Enrico here. has been doing a ton of work on extending Llama's context.

Alex Volkov - targum.video (@altryne)[00:11:13] Right. So I wanna I wanna give an introduction then introduce Enrique and and talk about this real quick. Right? LAMA version 1 was released with, again, 2,000 tokens in the contact window. And then many folks, including KaioKendev, and Emozhila. Right? And and some other folks, I think, were involved in bringing some of the quote on quote tricks about what eventually ended up being named rope, scaling, if I'm if I'm not mistaken. And we follow this, and we've talked about the previous news ThursdAI, I. And Llama V2 was released with 4000 tokens in the context window.

Alex Volkov - targum.video (@altryne)[00:11:52] And, you know, we're now still used to kind of Claude and the 16k GPT 3 that four didn't seem like a lot. And then many folks were wondering, and, meanwhile, Enrico was working, whether or not the rope scaling method would apply to the next plumber and look like it did. And so I wanna introduce Enrico uh Enrico Shippole. I hope on staying this right. Welcome to the state. Hopefully, you can unmute and and this place works with you. And The second finetune that I saw rest of the was also back with Nous, the Nouse research, and this was the extended version, what's called Longma.

Alex Volkov - targum.video (@altryne)[00:12:28] So Enrique will go out of the stage and feel free to introduce yourself, your affiliation with news and LlongMa with with the context window.

Enrico Shippole (@EnricoShippole)[00:12:38] Hello. So I'm actually a independent researcher. I'm sponsored by Stability AI, Eleuther AI, and a few other different organizations, including NewsNow. Awesome. I work with different people like Tanishq from Medark, Aaron Komatsusaki, who also is from a Luther and Duck AI. John Ney from Nomosai. So I I have a I have a lot of affiliation with a bunch of different organizations. including together. We're starting a project right now with them.

Alex Volkov - targum.video (@altryne)[00:13:13] That's that's so great to hear, and so welcome to Thursday. Welcome to this day. And can you talk to us a little bit about kind of the ROPE scaling method and and how how were you able to, like, find them like this quickly and how the results looked so far? I wasn't able to run this myself. But hopefully, yeah, talk to us about

Enrico Shippole (@EnricoShippole)[00:13:34] Okay. So initially, The the thing is I actually was hoping that both Emozilla, Bowen, and KaioKenDev would have been able to make it because It was kinda like a equal parts effort on, like, all fronts from each of us. Initially, I had trained some pathways models at 8,000 context length about 4 months ago based on the exposition paper, which did rotary embedding scaling initially. They were one of the first people did it. They based their methodology off of ofer presses alibi.

Enrico Shippole (@EnricoShippole)[00:14:11] I would imagine that most people are pretty familiar with Ofir Press in this work on the alibi positional bias that's been used in a wide range of models now. So Emozilla and I came into contact based off of the work that he had seen me doing with the Palm models scaling those to 8000 context length pretraining, not fine tuning. So what we had initially done is basically take a section of c 4 in different data sets that had examples that were all over 8000 context length that pretrained on them packed together.

Enrico Shippole (@EnricoShippole)[00:14:50] with a beginning of string and end of string token to help with, like, the attention masking portion of that. After he had seen that, Emozilla actually became into contact with kaikode dev I believe Kaiokendev is how you pronounce it. Kaiokendev had also been following Ofir Press's research. He had started working on his own version of scaling the rotary embeddings, I believe based off of both alibi and exposition.

Enrico Shippole (@EnricoShippole)[00:15:22] And what he found is that by scaling the max position all embeddings and the rotary embedding from something like 2048, which you would initially train with. He scaled it up to 8000 or 8192. And he found that by applying, like, in interpolation to the encoding by scaling basically like the the positional index in the rotary embedding, that you were able to essentially turn down the frequency window and rope by like a factor of 0.25.

Enrico Shippole (@EnricoShippole)[00:16:01] The scaling depends on the length that you're trying extrapolate to and the initial context length that the model was trained with. So if you were training with LAMA 2, which had an context window of 4096, and you wanted to do the linear interpolation positional scaling to something like 8192. then you would use a scaling factor of 0.5. If you were trying to do it from 2048, which is the original LAMA was trained with, and you wanted to scale it to 8192, then you would use a scaling factor of 0 point 25.

Enrico Shippole (@EnricoShippole)[00:16:39] So basically, after we had done all of this, Meta had released a paper around the same time that Kaiokendev had released his blog. They both found very similar finding. They had shown in the meta paper that you only had to fine tune for 1000 steps with the linear positional interpolation scaling to be able to get the benefit of doing a full pretrain at a context window of 8192.

Enrico Shippole (@EnricoShippole)[00:17:13] So this is actually like a a big step because it shows that you no longer need to pre train right off the bat at a longer context length. Then you're able to do the fine tuning on essentially a a lower resource like, computational budget and still be able to get the, like, greater results of the longer context window. I know a lot of the major AI companies had been doing just for my work in in personal research with many of them had been doing staged scaling of the context window during training.

Enrico Shippole (@EnricoShippole)[00:17:46] So they would pre train basically, when pre training, they would separate the initial examples from a dataset into multiple stages.

Enrico Shippole (@EnricoShippole)[00:17:54] So anything that is under the window of 2048, you'd separate from the initial dataset then you take things between 2048 4096, then 4096, and 8192, and you would basically chunk the data sets into those different parts you'd first initially train on the 2048 chunk of the data, then you would train on the data between 2048 and 4096, and then you would do the same thing from 4096 to 8192, or if you want to scale that to 16k or 32k context length. But what we have shown now with both the meta paper and this thing, you don't even need to go through that extensive pretraining and staged process, you can just go from a context length of 2048 to 8192.

Enrico Shippole (@EnricoShippole)[00:18:47] scale the rotary embeddings by whatever type of factor that you want to use. So like I was saying, if you're going from 2048 to 8192, you'd be using a scaling factor of 0.25. It only needs 2 lines of code to be able to do that. In the LLongMa post, I had provided an example of scaling the rotary embeddings. The the code was written by Emozilla or Jeff.

Enrico Shippole (@EnricoShippole)[00:19:15] We also came into contact with after all these experiments we then came into contact with Bowen, who had worked a lot about the dynamic NTK scaling with Emozilla, and he had also done NTK by parts which we're we're currently training a lot of models on. So we have the Longma 1 models trained on the open llama series, like the suite of those models that use the linear interpolation scaling.

Enrico Shippole (@EnricoShippole)[00:19:45] We now have the llama 2 models or the longma 2 suite, which is what we're calling it, again, trained on the linear interpolation scaling And then we have another suite of models coming out very soon that uses the the NDK by parts dynamic scaling. That was really specialized by Bowen, so I do not wanna speak on his behalf. It'd it'd probably be good to get him to talk about it in another one of these.

Alex Volkov - targum.video (@altryne)[00:20:14] Absolutely. So let's get in touch after this and and and and set it up. So Thank you for the a very in-depth kind of explanation because we did cover the the the kind of the RoPE killing and how Kaioken in the image boards are ready to wherever he started this in his blog post, and then how it's gonna rotate it. So it's great to to actually hear from the folks who are doing this. I just for the audience, I've attached Enrico's tweet about LLongMA 2, which is now currently trained at AK contact length.

Alex Volkov - targum.video (@altryne)[00:20:47] And and Rico, you told us that we may see even double from the So could you think about the next the next version?

Enrico Shippole (@EnricoShippole)[00:20:56] Okay. So the the initial training process of doing this up to a context, like length of 8192, can be due with be done, basically, with deep speed, 02. and activation checkpointing. And you're able to fit the model on a A100 80 gigabyte node. Now, we are working on the process of scaling it both to 16 k and 32 k. This requires a different methodology during training, you either need to use deep speed 0.3 or fully sharded data parallelism.

Enrico Shippole (@EnricoShippole)[00:21:35] Both of those are are very similar for people who aren't aware. Basically, you're just sharding the optimizer states. The model states across, like, different nodes. You can also use things like tensor parallelism to help with the scaling as well. And then we're going to be basically just adjusting the scaling factor again, collecting a large we've already collected large quantity of data at 16k context length, and we're going to be doing the fine tuning to 16k and be releasing those models Soon, all of this computing is sponsored by stability AI.

Enrico Shippole (@EnricoShippole)[00:22:12] They've been very generous what helping with a lot of the independent research.

Alex Volkov - targum.video (@altryne)[00:22:17] That so I wanna shout out Stability AI for not only given, you know, the world's stability diffusion, also participating in this kind of next wave of AI. Many folks kinda coined the stability AI moment when released the the stable diffusion of the. I wanna say 1.4 back then almost a year ago now, and many folks are saying the about the Llama 2 release now this commercially open source, and and folks can start, like, doing things for you know, for profit companies can join So we definitely wanna shout out stability for for the effort here. And, Enrico, thank you. And, folks, please follow Enrico, and and we'll stay tuned.

Alex Volkov - targum.video (@altryne)[00:22:56] I wanna ask Karan and and Teknium, and other folks from Nous the efforts that that Enrico was talking about. the longer context windows. How would they kinda interplay with the stuff that you're working on with Hermes with with Pufin? Are are kind of the efforts interchangeable? We're gonna see building a top of each other?

karan (@karan4d)[00:23:16] So I I think LDJ can definitely speak to this, but I'd like to happily say that once we did Longbow 1 on the 1st Llama generation of models, we already had puffin 2k, 4k, and 8 for that -- Yeah. -- already prepared and ready. So as the LLongMa models for 13B are released, we will also be doing equivalent, puff in fine tunes, and Potentially Hermes fine tunes. We can talk a little bit more about the future of Hermes at a a little bit later, though.

LDJ (@Dogesator)[00:23:51] Yeah. I mean, I was pretty much going to say the same thing, but kind of elaborate on that about how before when LLongMa V1 and everything. And during the development of LLongMa, there was actually, like you know, of course, me Enrico who are usually just called concepts of mind and and and Emozilla. Like, we've all kinda, like, been butting shoulders a lot together and just kinda working closely, you know, in the same Discord and whatnot. And it's like, hey. Like, you know, working on this, like, experimental LLongMa with thing. Like, hey. You wanna try, like, fine tuning, and then the plan just kind of ended up being like, okay. Just gonna have this Puffin thing.

LDJ (@Dogesator)[00:24:31] that Puffin dataset is already containing a ton of high context conversational data. from GPT 4 and, like, human high quality data. So it's like it's like the perfect fit to have something that's high context capable will be fine tuned on that. And then LLaMa 2 came out, and it's like, oh, Yeah. Let's let's get this out ASAP, and then we'll figure out what we're gonna do later.

Alex Volkov - targum.video (@altryne)[00:24:58] Yeah. Great. And it's just great to see, you know, how many opportunities is like this where with open source can the stuff that we're able to now run and gonna iterate on are building on top of each other. They're just incredible. and this is maybe a watershed moment. And I I wanna thank all of you for being here. I wanna kind of let the other folks who usually hear on Thursday, I need to ask you a question or 2 for Nous visitors. Yam and Nisten, if you if you have a question for news or for Enrico, go ahead. I I will stay young.

Alex Volkov - targum.video (@altryne)[00:25:29] I know you if you have to ask the super deep technical stuff, and the audience will, like it will fly over their I I won't be using the DM with LBJ and and Rico. But yeah. Of course, the stuff that we haven't covered and interesting tough news. Feel free as it pertains to LAMA 2 is gonna be very interesting, I think, for everyone.

nisten (@nisten)[00:25:47] Just to quickly clarify, you guys fine tuned the plain model. Right? Not the chat 1.

Teknium (e/λ) (@Teknium1)[00:25:55] Yep. Okay. Yep. The base model. We wouldn't fine that model. The chat 1 at all.

Alex Volkov - targum.video (@altryne)[00:26:00] Actually, to -- Yeah. The -- -- to maybe continue this stratigram for interrupting. Just one sec. To continue this question, the there are models they were released by Meta, and you have to, like, register and get the email and everything. And then they put some stuff on Hugging Face. And then the the those models were delineated with, like, dash HF. Have you guys use the HuggingFace or the Meta 1, and do you guys know the difference? I felt somebody that, like, maybe doesn't work as well and to inform her Yeah.

Teknium (e/λ) (@Teknium1)[00:26:30] The one on Hugging phase is an FP 16 and the original Llama 2 models in bf16, but we tested the difference between the two models at Carper, and there's such a negligible difference in their quality that it's irrelevant, but we trained on the Hug and Face f P Sixteen ones, but in the f Sixteen ask them.

Alex Volkov - targum.video (@altryne)[00:26:52] Sorry. Yeah. Goran, for interrupting. Go ahead.

karan (@karan4d)[00:26:56] No. All good.

Alex Volkov - targum.video (@altryne)[00:26:58] I I totally forgot what -- That's not it. interrupted today. Yes, Randall. Okay. Nispen, if you have a question for Kiran to follow-up with feel free, and And if not, then, Yum, if you have anything that you wanna ask the the fine folks from Nous, feel feel free as well.

Yam Peleg (@Yampeleg)[00:27:17] Yeah. Sure. First, thank you for what you're doing, guys. You're really making a difference for anyone. There aren't many demos online, so anyone that didn't try Hermes, I highly encourage you to try. I don't know why there aren't them. Okay. I know why there aren't demos that cost money, but just try it. Okay? And now I got a question because from my experience, if you train on the open datasets of Hermes, you get a significantly less quality of a model. No. Now I'm fine I'm fine if you don't release datasets. Don't don't get me wrong.

Yam Peleg (@Yampeleg)[00:27:54] Just I wanted to ask, is there anything else besides the data that is different? What what tips can you give for, I don't know, someone else that want to train high quality model besides having high quality data.

Teknium (e/λ) (@Teknium1)[00:28:08] Everyone understands this. Yeah. The hyperparameters can make key difference. LBJ knows very well because we had to do a ton of different tests. We don't have our freight owners for puffin model. But I'm not sure if those are on the model card for Hermes. If they're not, I can put them And Karen your card can probably talk about the Nous datasets that weren't made public.

karan (@karan4d)[00:28:38] Yeah. We've got, like, maybe around, like, 50 k items of data, like, versus, like, total 300 k instructions there that are not released. And to be frank with you about 45 k of them is just more GPT 4, like, alpaca style instructions. The 5000 or so, the, like, 4500 them compose this dataset we have we've been working on that, you know, at this point, I'm pretty comfortable talking about a we call it the p dactyl dataset.

karan (@karan4d)[00:29:14] I won't speak on everything that's in it, but, essentially, And I don't know if this is the thing that made the big difference, but it's, like, the the one place where I guess you deviate from just using the open datasets more GPT 4 instructions, but it's got some transformers instructions, some linguistics instructions, some calculus 1, instructions, etcetera. It seems to be pretty good.

Teknium (e/λ) (@Teknium1)[00:29:41] Also, Yam, do you have links or anything to the models that tried it with just the makeup of the datasets that we're public from Hermes because I haven't actually seen that before.

Yam Peleg (@Yampeleg)[00:29:57] And again, can you repeat that?

Teknium (e/λ) (@Teknium1)[00:29:58] didn't hear. Do you have any links to the models that trained with just the open datasets from Hermes that you could share with me later?

Yam Peleg (@Yampeleg)[00:30:06] No. No. It's just it's just from my experiments -- Oh, okay. -- on training. Pretty much following the same idea of let's take only GPT 4 from all the open datasets, and the the model that you get is is different. for sure. And and it might be that hyperparameters, you know.

Teknium (e/λ) (@Teknium1)[00:30:25] Another thing that we did too is pretty extensive, like, cleaning. We did do deduplication. We removed things like a URL. Like, any response that had a URL in it, we removed in case it was gonna like, hallucinated URLs. Instead of, like, maybe 8 different filtering processes too that might have made our data quality higher.

LDJ (@Dogesator)[00:30:48] So as an AI language model?

nisten (@nisten)[00:30:51] For anybody -- What do you say? -- for anybody in the audience that hyperparameter meters are are just like the settings in the oven. So it it looks here, like, the ingredients were all okay, but yam mess something up, and before selling as a token -- Yeah. -- came out half baked at the model.

LDJ (@Dogesator)[00:31:08] So we're gonna have to check that out.

LDJ (@Dogesator)[00:31:10] I'm a big proponent personally of hyperparameter optimization being underrated right now, like, in -- Yeah. -- the current space. And that's something I've kind of focused on a lot specifically for things like puffin and just trying to help others around and use some stuff like trying to optimize they're doing, and even just something like like what you just said about the settings for the oven, I mean, double the amount of time you're putting something in the oven, and it's not gonna come out twice as good. It's not even gonna come out 10% as good. It's gonna come worse. You know?

LDJ (@Dogesator)[00:31:45] And although it depends, like, what is your baseline for how how much time you're putting it in the oven and all these different variables that kind of are dependent on each other and affect each other. So it's definitely something you kind of have to build an intuition about to some degree. And then the other end is really I feel like there has to be more investment and more time and energy invested into actual tools that make hyperparameter optimization easier for people that are doing these things.

Yam Peleg (@Yampeleg)[00:32:13] Yeah. Yeah. And the thing is that the models are are really big, so it's really expensive to run them. So you have you have a trade off of how many how much computer you're investing in searching hyperparameters rather than actually using it for training. But but I completely agree So one one last question, actually, too.

Teknium (e/λ) (@Teknium1)[00:32:33] Actually, one thing before we go on. Something great about the puffin dataset is that it's just like, 3000 or so examples, I believe. And so it makes tuning a lot less expensive because you can finish the whole training in just a couple of hours. So, like, with Hermes, if we wanted to try full ablations and dozens of them, it would take weeks weeks to do.

LDJ (@Dogesator)[00:32:55] Yeah. Yeah. Well, to be fair, it's not like it only takes a couple hours on one GPU. We use a a 100 80 gigabytes. So Yeah. Yeah.

Teknium (e/λ) (@Teknium1)[00:33:04] Courtesy of Redman.

Alex Volkov - targum.video (@altryne)[00:33:05] Thank you, Redman.

Enrico Shippole (@EnricoShippole)[00:33:08] Mhmm. I should also probably clarify that when doing the context length, extrapolation, We're doing it on 1,000,000,000 tokens and 64, 80 gigabyte a 100.

Yam Peleg (@Yampeleg)[00:33:20] OOf Mhmm.

Alex Volkov - targum.video (@altryne)[00:33:23] Yeah. Yam is getting over excited. Alright, folks. I wanna -- Yeah. Yeah. -- maybe maybe ask her on this one less and we'll move on to the the the regular ThursdI update camera cadence. But I will say that, like, folks from Nous research and and Rick and and some other here. Thank you so much for coming up and giving us kind of the insights into how this actually happens. Lama2 just released, you know, a few days ago, and you guys are already pumping out, like, open source fine tuned models. And it's great to see. And just so you know, there's always a stage for you here to come in and and announce things.

Alex Volkov - targum.video (@altryne)[00:33:53] And If you do wanna announce, like, a release or something, maybe just, you know, right now, Karan and and Teknium and some folks, I would love to hear like, when the next Hermes is coming?

karan (@karan4d)[00:34:06] Before we say that, I just would like to clarify something about Hermes. So we have the original Hermes dataset on LAMA 2 as something that we will release, but also a sequel to the Hermes dataset, Hermes 2. There will be a distinction between these 2, and you'll see you'll see the the the prior come out first and the latter come out after. But as for release, etcetera, I will absolutely let Technium take the stage with those final words.

Teknium (e/λ) (@Teknium1)[00:34:36] So the training is nearly done. At least it was about 2.8 epochs out of 3 a few hours ago. So it might be done already. Before I release it though, unlike puffin, I didn't we wanted it puffing out, like, same day that llama 2 came out, so we didn't run any benchmarks. And we had to put all the compute we had on Hermes immediately after we were done with that. So we don't have any compute to do any benchmarks or puffing until Hermes is done.

Teknium (e/λ) (@Teknium1)[00:35:06] But before I release Hermes, I do wanna do, like, a full range of benchmarks and stuff like that to make sure everything's good and have a pretty detailed model card, but that should probably only take the rest of tonight at the most. So probably tomorrow morning would be when Hermes comes out.

Alex Volkov - targum.video (@altryne)[00:35:22] That's some folks. And you you heard it here first and definitely follow Teknium, Karan, Enrico, LDJ, and the rest of, like, Nous Research folks, and stay tuned. Enrico, go ahead.

Enrico Shippole (@EnricoShippole)[00:35:34] Yes. I just wanted to to piggyback off of Teknium comment a little bit. So we did do pretty sense of the valuation of the Lauma 2 AK models. We had run different things on perplexity using Gov Report in a couple different other data sets to make sure that the length extrapolation in the context was working properly. We did passkey retrieval. We also did a lot of extensive human evaluation, which took a little bit. I had wanted to get the LAMA 2 AK models out yesterday, but we decided to push it back one day.

Enrico Shippole (@EnricoShippole)[00:36:08] So and what we were doing is we were feeding in research papers and seeing if it could pull out even, like, relevant pieces of information from the context length. And so far, it has been quite successful. So we're we're still running more evals, but the ones so far have shown that there's been, like, no performance degradation, no matter what context length that you're basically using with these extended models.

Alex Volkov - targum.video (@altryne)[00:36:32] That sounds great. and now that this this, you know, LLongMa lies out and the next versions are gonna come out as well. I'm sure that some other folks who also contribute to this research and tell you, like, from their own experiences and vibe. So, yeah, I wanna thank folks. Again, this has been very illuminating, and very glad to have you. And, obviously, the stage is yours whenever you want to come here, and we appreciate you. And you guys are welcome to stay tuned and kinda chime in to the rest of the updates. And with that, I think, for folks in the audience, we're moving to the next thing.

ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.



This is a public episode. If you’d like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe
Sun, 23 Jul 2023 17:02:00 GMT
ThursdAI July 20 - LLaMa 2, Vision and multimodality for all, and is GPT-4 getting dumber?

ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

If you’d like to hear the whole 2 hour conversation, here’s the link to twitter spaces we had. And if you’d like to add us to your favorite podcatcher - here’s the RSS link while we’re pending approval from Apple/Spotify

Happy LLaMa day! Meta open sourced LLaMa v2 with a fully commercial license.

LLaMa 1 was considered the best open source LLM, this one can be used for commercial purposes, unless you have more than 700MM monthly active users (no 🦙 for you Google!)

Meta has released the code and weights, and this time around, also a fine-tuned chat version of LLaMa v2 to all, and has put them on HuggingFace.

There are already (3 days later) at least 2 models that have fine-tuned LLaMa2 that we know of:

* @nousresearch have released Redmond Puffin 13B

* @EnricoShippole with collaboration with Nous have released LLongMa, which extends the context window for LLaMa to 8K (and is training a 16K context window LLaMa)

* I also invited and had the privilege to interview the folks from @nousresearch group (@karan4d, @teknium1 @Dogesator ) and @EnricoShippole which will be published as a separate episode.

Many places already let you play with LLaMa2 for free:

* https://www.llama2.ai/

* HuggingFace chat

* Perplexity LLaMa chat

* nat.dev, replicate and a bunch more!

The one caveat, the new LLaMa is not that great with code (like at all!) but expect this to change soon!

We all just went multi-modal! Bing just got eyes!

I’ve been waiting for this moment, and it’s finally here. We all, have access to the best vision + text model, the GPT-4 vision model, via bing! (and also bard, but… we’ll talk about it)

Bing chat (which runs GPT-4) has now released an option to upload (or take) a picture, and add a text prompt, and the model that responds understands both! It’s not OCR, it’s an actual vision + text model, and the results are very impressive!

I’ve personally took a snap of a food-truck side, and asked Bing to tell me what they offer, it found the name of the truck, searched it online, found the menu and printed out the menu options for me!

Google’s Bard also introduced their google lens integration, and many folks tried uploading a screenshot and asking it for code in react to create that UI, and well… it wasn’t amazing. I believe it’s due to the fact that Bard is using google lens API and was not trained in a multi-modal way like GPT-4 has.

One caveat is, the same as text models, Bing can and will hallucinate stuff that isn’t in the picture, so YMMV but take this into account. It seems that at the beginning of an image description it will be very precise but then as the description keeps going, the LLM part kicks in and starts hallucinating.

Is GPT-4 getting dumber and lazier?

Researches from Standford and Berkley (and Matei Zaharia, the CTO of Databricks) have tried to evaluate the vibes and complaints that many folks have been sharing, wether GPT-4 and 3 updates from June, had degraded capabilities and performance.

Here’s the link to that paper and twitter thread from Matei.

They have evaluated the 0301 and the 0613 versions of both GPT-3.5 and GPT-4 and have concluded that at some tasks, there’s a degraded performance in the newer models! Some reported drops as high as 90% → 2.5% 😮

But is there truth to this? Well apparently, some of the methodologies in that paper lacked rigor and the fine folks at ( and Arvind) have done a great deep dive into that paper and found very interesting things!

They smartly separate between capabilities degradation and behavior degradation, and note that on the 2 tasks (Math, Coding) that the researches noted a capability degradation, their methodology was flawed, and there isn’t in fact any capability degradation, rather, a behavior change and a failure to take into account a few examples.

The most frustrating for me was the code evaluation, the researchers scored both the previous model and the new June updated models on “code execution” with the same prompt, however, the new models defaulted to wrap the returned code with ``` which is markdown code snippets. This could have been easily fixed with some prompting, however, the researchers scored the task based on, wether or not the code snippet they get is “instantly executable”, which it obviously isn’t with the ``` in there.

So, they haven’t actually seen and evaluated the code itself, just wether or not it runs!

I really appreciate the deep dive on this, and recommend you all read it for yourself and make your own opinion and don’t give into the hype and scare mongering and twitter thinkfluencer takes.

News from OpenAI - Custom Instructions + Longer deprecation cycles

In response to the developers (and the above paper), OpenAi announced an update to the deprecation schedule of the 0301 models (the one without functions) and they will keep that model alive for a full year now!

Additionally, OpenAI has released “Custom Instructions for ChatGPT” which allows a chatGPT user to store custom instructions, information and custom prompt that will be saved on OpenAI server side, and will append to every new session of yours with chatGPT.

Think, personal details, preferred coding style (you love ruby and not python) and other incredible things you can achieve without copy-pasting this to every new session!

Don’t forget to enable this feature (unless you’re in the UK or EU where this isn’t available)

Thanks for tuning in, wether you’re a newsletter subscriber, twitter space participant, or just someone who stumbled onto this post, if you find this interesting, subscribe and tell your friends!

“We stay up to date so you don’t have to” is the #ThursdAI motto! 🫡

In other news this week:

LangChain has gotten some flack but they are looking ahead and releasing LangSmith, an observability framework for your agents, that does NOT required using LangChain!

It looks super cool, and is very useful to track multiple prompts and tokens across agent runs! And the results are share-able so you can take a look at great runs and share yours with friends!

Don’t forget to share this with your friends and come back next week 🫡

— Alex Volkov



This is a public episode. If you’d like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe
Fri, 21 Jul 2023 00:27:41 GMT
ThursdAI July 13 - Show recap + Notes

Welcome Friends, to the first episode of ThursdAI recap.

If you can’t come to the spaces, subscribing is the next best thing. Distilled, most important updates, every week, including testimony and tips and tricks from a panel of experts. Join our community 👇

Every week since the day GPT-4 released, we’ve been meeting in twitter spaces to talk about AI developments, and it slowly by surely created a community that’s thirsty to learn, connect and discuss information.

Getting overwhelmed with daily newsletters about tools, folks wanted someone else to do the legwork, prioritize and condense the most important information about what is shaping the future of AI, today!

Hosted by AI consultant Alex Volkov (available for hire), CEO of Targum.video, this information-packed edition covered groundbreaking new releases like GPT 4.5, Claude 2, and Stable Diffusion 1.0. We learned how Code Interpreter is pushing boundaries in computer vision, creative writing, and software development. Expert guests dove into the implications of Elon Musk's new XAI startup, the debate around Twitter's data, and pioneering techniques in prompt engineering. If you want to stay on top of the innovations shaping our AI-powered tomorrow, join Alex and the ThursdAI community.

Since the audio was recorded from a twitter space, it has quite a lot of overlaps, I think it’s due to the export, so sometimes it sounds like folks talk on top of each other, most of all me (Alex) this was not the case, will have to figure out a fix.

Topics we covered in July 13, ThursdAI

GPT 4.5/Code Interpreter:

00:02:37 - 05:55 - General availability of Chad GPT with code interpreter announced. 8k context window, faster than GPT-4.

05:56 - 08:36 - Code interpreter use cases, uploading files, executing code, skills and techniques.

08:36 - 10:11 - Uploading large files, executing code, downloading files.

Claude V2:

20:11 - 21:25 - Anthropic releases Claude V2, considered #2 after OpenAI.

21:25 - 23:31 - Claude V2 UI allows uploading files, refreshed UI.

23:31 - 24:30 - Claude V2 product experience beats GPT-3.5.

24:31 - 27:25 - Claude V2 fine-tuned on code, 100k context window, trained on longer outputs.

27:26 - 30:16 - Claude V2 good at comparing essays, creative writing.

30:17 - 32:57 - Claude V2 allows multiple file uploads to context window.

32:57 - 39:10 - Claude V2 better at languages than GPT-4.

39:10 - 40:30 - Claude V2 allows multiple file uploads to context window.

X.AI:

46:22 - 49:29 - Elon Musk announces X.AI to compete with OpenAI. Has access to Twitter data.

49:30 - 51:26 - Discussion on whether Twitter data is useful for training.

51:27 - 52:45 - Twitter data can be transformed into other forms.

52:45 - 58:32 - Twitter spaces could provide useful training data.

58:33 - 59:26 - Speculation on whether XAI will open source their models.

59:26 - 61:54 - Twitter data has some advantages over other social media data.

Stable Diffusion:

89:41 - 91:17 - Stable Diffusion releases SDXL 1.0 in discord, plans to open source it.

91:17 - 92:08 - Stable Diffusion releases Stable Doodle.

GPT Prompt Engineering:

61:54 - 64:18 - Intro to Other Side AI and prompt engineering.

64:18 - 71:50 - GPT Prompt Engineer project explained.

71:50 - 72:54 - GPT Prompt Engineer results, potential to improve prompts.

72:54 - 73:41 - Prompts may work better on same model they were generated for.

73:41 - 77:07 - GPT Prompt Engineer is open source, looking for contributions.

Related tweets shared:

https://twitter.com/altryne/status/1677951313156636672

https://twitter.com/altryne/status/1677951330462371840

@Surya - Running GPT2 inside code interpreter

tomviner - scraped all the internal knowledge about the env

Peter got all pypi packages and their description

added Claude to to smol menubar (which we also discussed)

SkalskiP awesome code interpreter experiments repo

See the rest of the tweets shared and listen to the original space here:

https://spacesdashboard.com/space/1YpKkggrRgPKj/thursdai-space-code-interpreter-claude-v2-xai-sdxl-more

Full Transcript:

00:02 (Speaker A) You. First of all, welcome to Thursday. We stay up to date so you

don't have to. There's a panel of experts on top here that discuss

everything.

00:11 (Speaker A) If we've tried something, we'll talk about this. If we haven't, and

somebody in the audience tried that specific new AI stuff, feel free

to raise your hand, give us your comment. This is not the space for

long debates.

00:25 (Speaker A) We actually had a great place for that yesterday. NISten and Roy from

Pine, some other folks, we'll probably do a different one. This

should be information dense for folks and this will be recorded and

likely we posted at some point.

00:38 (Speaker A) So no debate, just let's drop an opinion and discuss the new stuff

and kind of continue. And the goal is to stay up to date so you don't

have to in the audience. And I think with that, I will say hi to Alan

Janae and we will get started.

00:58 (Speaker B) Hi everyone, I'm NISten Tahira. I worked on, well, released one of

the first Docker chat bots on the market for Dr. Gupta and scaled it,

and now we're working on getting the therapist bought out once. We

can also pass more testing and get Voice to work at a profitable

manner because we don't really have VC. So at the scale of few

hundred thousand users, the API bills matter quite a bit.

01:31 (Speaker B) So, yeah, these spaces have been pretty helpful because I have some

trouble with running a Voice transformer, trying to run it on the

browser on web GPU, and then the person that wrote Transformers JS

comes in here and just says, oh yeah, that back end is messed up.

Just try blas and synth and stuff. So these have been very

interesting and technical spaces.

01:54 (Speaker A) Yeah, we need to get Zenova in here. Zenova is the guy who NISten was

referring to. Al Janae, do you want to give a few words of intro and

say hi and then we'll start? Just briefly, please, because I think we

need to get going.

02:09 (Speaker C) Sure. Hi, I'm Janae.

02:11 (Speaker D) I'm the resident noob, I started messing around with AI at the

beginning of.

02:16 (Speaker E) The year, and I also host the.

02:18 (Speaker D) Denver AI Tinkerers coming up next week.

02:20 (Speaker A) And if you're in Colorado area, greater Denver, please join us. It's

going to be a blast.

02:27 (Speaker F) Hi, I'm Al Chang. I'm kind of an old school technologist. Just

getting started with the AI again and just here to help.

02:36 (Speaker A) Yeah. All right, folks, so I think we've had a whole space on this.

Simon Wilson and me and many, many other folks chimed in. The second

this was released.

02:50 (Speaker A) Was that six? Was that Sunday? It's hard to keep track of actual

days. Saturday, Saturday, last week, exactly during those spaces, by

the way, as we were talking, Chad GPT, Logan and everybody else from

OpenAI announced general availability of Chad GPT with code

interpreter. So GPT four with code interpreter.

03:12 (Speaker A) And I think we just heard from Matt that even some folks who got

access to the slept on it a little bit because it's maybe potentially

because of its very horrible name that's really hard to type

interpreter and get lost in the R's. But it's an extremely powerful

new superpower that we've got. And we've had the whole space talking

about use cases that people already had.

03:37 (Speaker A) It was like three days into it and since then I bet that many more

people tried it. I think Swyx 20,000 listens to that space, plus the

pod. At least people definitely want to hear new use cases, right?

03:53 (Speaker G) Yeah, not much else to add about it. I think it's the feature for

Switch.

03:59 (Speaker A) Posted a whole deep dive essay and coined it GPT 4.5 between us

friends. And one of the interesting things about it is that we think

at least that's where we are currently after playing around with

this, is that it's a fine tuned model. So they kept training this on

actually running code and executing code.

04:21 (Speaker A) That's what we believe. We don't know, nobody confirmed this and then

that it's fine tuned from an earlier checkpoint of GBT Four. And so

we actually had some folks on spaces talking about that it's less

restricted and better like previous times.

04:36 (Speaker A) So it's an interest, I think NISten right. We have some folks who

tell us they're using code interpreter without the code part. They

just stopped the GPT Four just because it's that model.

04:48 (Speaker A) And I think also they took down the 25 messages per hour restriction

on code interpreter. I've had like four hour sessions and it stopped

like I didn't saw complaints.

05:03 (Speaker G) So it's just better.

05:06 (Speaker A) It's also fast. I think it's fast because not many people maybe use

this by default and this could be the reason for the speed, but it's

definitely faster for sure. I think also context window, was it Yam?

Somebody summarized the context window and they told us the context

window for code interpreter is eight k versus the regular GPD for

actually that could be also a kick.

05:29 (Speaker G) You mean Yam copied and pasted.

05:34 (Speaker A) I would encourage you and Yam need to kiss in the cup because Yama is

doing a lot of legwork to take down the stuff that he posted and Yama

is working on that and it's very visible and you guys need to do

there you go, yam, you need to clear the air. However, Pharrell and

Gabriel bring you up as well. And we're going to keep talking about

code interpreter because that's what we're here to do. NISten and a

few other folks and we started cooking with code interpreter.

05:59 (Speaker A) And by cooking I mean we started stretching the complete boundaries

of what's possible there. And I think Simon Willison kick started

this with the latent space Pod. So for folks who are not following

latent space pod, feel free to follow SWIX, his main account, not

this hidden one.

05:59 (Speaker A) And SWIX reposted the spaces we had simon Wilson was able to run node

JS and Dino within code interpreter, even though OpenAg didn't allow

for that by uploading like a binary and asking code interpreter to

generate. Simon then promptly said they fine tuned the model away

from that and we found ways anyway to ask it to do some stuff. I have

a thread on how I was able to run a vector DB chroma inside code

interpreter.

06:10 (Speaker A) I ran whisper CPP. We saw some folks running GPT-2 inside code

interpreter, right? So imagine an Ll GPD Four running another and

talking to it. It's like a little brother inside.

06:10 (Speaker A) I personally love that inception. I don't know if the person who ran

GPD Two is in the audience as Dan I think was the nickname NISten. I

don't know.

07:22 (Speaker A) Surya.

07:23 (Speaker B) Surya. He also wrote the search to PDF plugin for GP Four plugins and

he wrote that in like two days and it's more used than any other

enterprise thing, which is pretty hilarious.

07:36 (Speaker A) We need to get surya.

07:38 (Speaker B) Yeah, he just did that as I'm just going to do a search plugins for

PDF and it's like the most used.

07:45 (Speaker A) So dope pretty amazing. Again, in that space we've talked about

having like a living manual, so to speak, for code interpreter use

cases because it's coding. So it covers pretty much everything that

we can think of as coders, maybe just in Python, maybe restricted to

an environment. And I've been trying to do that with the code

interpreter can hashtag and I encourage all of you, let me pin this

to the top of the space, to the jumbotron if you have an interesting

code interpreter thing and I'll bring up Skalsky P to the stage as

well.

08:03 (Speaker A) And Lantos, so many good friends. If you have a very interesting code

interpreter technique or skill or new thing that people can do

without coding skills, please tag with this hashtag so folks can find

this. Otherwise I will cover the main three things the code

interpreter gave us besides the new model.

08:42 (Speaker A) One of them is uploading files. And since we've talked, we've noticed

that you can upload up to 250 megabyte files and those can be zips of

other files. So we've uploaded like full models weights.

08:55 (Speaker A) We've uploaded bin files. It's incredible that you can now drag and

drop whole directory and have JPT just know about this and read about

this. We've uploaded weights in embeddings.

09:08 (Speaker A) You can then obviously execute code in a secure environment, which is

again incredible, and you can download files, you can ask it to

actually generate a download for you, which is also super, super

cool. Maybe one last thing I'll say before I'll give it to the

audience for a few more cool use cases. And folks in the stage,

please feel free to raise your hand.

09:21 (Speaker A) I'll get to you in the order that you raise your hand if you have a

use case. Some folks built like a built in memory built in brain

within code interpreter just to save to a file. That's what I try to

do with my vector DB and then they download that memory at the end of

every session and then upload this to the next one and have some like

a prompt that reminds the jgpd like to start from that point.

09:50 (Speaker A) So in addition to the context window, they're also having a separate

offloaded file persisted memory. So code interpreter incredible.

Again.

10:00 (Speaker A) Potentially GPT 4.5. And if you haven't played with this, feel free

to if you don't know what to play with, follow the code interpreter

can hashtag and let's get to Skowski.

10:11 (Speaker A) What's up, man?

10:14 (Speaker H) Hi, hello. Do you hear me?

10:15 (Speaker A) Yeah, we can hear you fine.

10:19 (Speaker H) Yeah, I've been playing a lot with code interpreter over the past

five days, mostly with computer vision use cases because that's what

I do. I haven't introduced myself. I'm pretty much doing computer

vision full time for the past five years and was focusing on like

when I saw that you can input image and video, that was immediately

what I was thinking, we need to make it to computer vision. So I went

through some low effort tasks.

10:46 (Speaker H) So I managed to run old school computer vision algorithms, face

detection, tracking of objects, stuff like that. But I also managed

to exploit it a little bit. So you can add yolo object detection

models to the list of models that were run in code interpreter.

11:15 (Speaker H) There are some problems with memory management, so I'm not yet fully

happy with the result. But yeah, I managed to run it on images and on

videos and the things that are super cool and are kind of like

underrated right now, false positive. So when the model detects

something that shouldn't be detected, you can really use text to ask

code interpreter to filter out false detections.

11:48 (Speaker H) You can just give it your feeling like why that stuff is happening or

when or where. And it's very good at cleaning the detections, which

was kind of like mind blowing for me. And one thing that I noticed

that it sucks at is I managed to create an application that counts

objects moving on the video when they cross the line.

11:55 (Speaker H) And I didn't use any off the shelf libraries, I just had detector and

say, okay, now draw a line and count objects when they cross the

line. It's terrible at that, writing math logic to figure out that

something crossed something, we had like ten prompts or twelve

prompts exchange and I basically bailed out on that, forget it. So

there are some things that blow my mind, but there are something that

probably not.

12:49 (Speaker A) So folks, feel free to follow Skowski. And also I just pin to the top

of the Tweet his brand new awesome code interpreter use cases, git

repo, and there's a list, there's a bunch of use cases there. This

could also serve as a de facto manual. So feel free to go there at

PRS and follow that for updates.

12:52 (Speaker A) And I want to get to Lentos because he seems to be unmuting. What's

up, Lentos?

13:12 (Speaker H) I was just going to say I can't follow him because he's blocked me.

13:15 (Speaker C) Sad face.

13:16 (Speaker H) Oh, no, I noticed that, but I'm not sure why. I will undo that.

13:20 (Speaker A) All right, I'm the peacemaker in the status. Please kiss and make up.

You two as well. Everybody should get along.

13:26 (Speaker A) Yay. I want to get to some other folks who came up on stage recently.

And Gabriel, welcome to talk about code interpreter and your use

cases.

13:35 (Speaker A) Jeanette, if you play with this, I would like to hear two more

opinions before we move on to the next incredible thing. Yeah. Oh,

you guys are talking about let's get together and then June sorry, I

should have been explicit about the order.

13:54 (Speaker E) No worries. So I just posted a comment on this space about the

message cap on a conversation. So even though in the UI, it still

says 25 messages per 3 hours, if you look at the network request, you

can see that. And I posted this, it's actually 100 messages per 3

hours now.

14:12 (Speaker E) And I don't know if they're scaling that up and down as demand

increases and decreases, or they're just trying to trick people into

conserving their messages, but it's definitely been on 100 for a

little while now. Can you confirm same thing you can see in the

network?

14:32 (Speaker A) Can you confirm the same for the regular mode, or do you think the

regular mode is still restricted? Well.

14:41 (Speaker E) Based on just the fact that there's only one message cap, they don't

have message cap per model. So I think it's just consistent across

all the GP four models. And that's also my experience in the last

it's been a little while now. It's probably at least a couple of

weeks that it's been higher.

14:51 (Speaker E) And same thing we discussed, I think, on Saturday about the context

window. And you can also see it in the API that the context window is

eight K for plugins and code interpreter, and it's 4K for the base

GPT four model.

15:16 (Speaker A) That's awesome. Like suicide. Better in every single way.

15:22 (Speaker D) Yeah.

15:23 (Speaker A) Awesome. Thanks.

15:24 (Speaker E) Yeah. In terms of use cases I can share, I've been digging around a

lot in the code interpreter, and I was really trying to hone in on

why are the packages that are installed there, the Python packages in

the environment? Why are they there? Some of them seem really random,

and some of them make a lot of sense. And they released it, saying

it's for, basically data analysis. And a lot of them make sense for

that, but some of them are just really wild, like the ML packages.

15:54 (Speaker A) And the Gabriel folks in the audience. If you look up at the jumbo

tone where we pin Tweets two Tweets before there's a Tweet by Peter

Zero Zero G, who actually printed all the packages and asked GPT Four

to kind of summarize what they do. So if you have no idea about the

potential capabilities of what it can do, feel free to pin that tweet

for yourself. And then it has a bunch of descriptions of what's

possible.

16:11 (Speaker A) So go ahead. Gabriel. Yeah, cool.

16:28 (Speaker E) Yeah, I've done the same kind of thing with just a short yeah, I got

it to do a four word description for each one. So if you're looking

for a really short description of each package, I'll post that tweet.

And if you're looking for a long one, I think Peters is great. And

what you can see there is that there are packages for web

development, right? There's Fast API, there's Flask, there's a bunch

of other packages for Web development.

16:40 (Speaker E) And besides the fact that there's no network access, which obviously

other people using it might be turning it on, but it was just

interesting to me. My perspective is that OpenAI has been using this

internally throughout all their teams for development and testing it

internally, but probably also using it pretty consistently. They

probably have access to the Internet.

17:14 (Speaker A) Yeah, I'm sure they have access to.

17:15 (Speaker E) The Internet and they can install new packages. But I think they also

have the ability, instead of uploading files and downloading files,

they have the ability to just mount persist memory, I don't think, to

persist. I think they just mount their local working directory on

their computer right wherever they're working. So they have their

active directory where they have their project, and they just mount

that and give the code interpreter access to the whole directory with

their whole repo of their project.

17:48 (Speaker C) Yeah.

17:48 (Speaker E) And then Chat Gvt is just writing code to the working directory and

reading from there and it can explore their whole project. We can do

that now by uploading, you can zip your whole project and upload the

whole thing zipped and have it unzipped. And then it can kind of

explore your whole project. But then once it makes some changes, you

want to commit them, you have to ask it to zip the whole thing back,

download it and upload it.

17:48 (Speaker E) And then I think what they're able to do is more of like a kind of

peer programming thing where the developer makes some changes and

then Chat GPT makes some changes and they're kind of working

together. This is taking it one step further. I don't know if they

have this or not, but it would be super.

18:29 (Speaker A) Cool in the realm of updates unless there is no speculation. But I

would love to explore this more with you in the next stage because

this applies to open source and how people already saw somebody tag

us after the last space and said, hey, I'll build this open source. I

would love to pin this to the top of the space. However, I want to

move on to new space and then move on to other updates.

18:51 (Speaker A) Sorry to interrupt, but thanks. I think that the collaborative,

persistent code superpower that probably maybe at some point will

come to us as well. Plus the internet access is like another ten x I

want to get to Skowskin and lent us and I think we'll move on to

Claude.

19:08 (Speaker A) Thanks Gabriel.

19:11 (Speaker H) Yeah, I have a question. I'm not really sure guys, if you notice that

I was obviously experimenting with PyTorch because I needed it for

computer vision. I noticed that the PyTorch version that is installed

in the environment actually pre compiled to work with CUDA. So it's a

GPU version of PyTorch.

19:31 (Speaker H) Even though that in the environment you don't have access to GPU, you

only have CPU. So I'm curious guys, what you think about that. Why is

that? Any ideas?

19:42 (Speaker A) Ideas that just come from what Gabriel just said? Likely we're

getting the same Kubernetes container. However, the open AI folks

have like unlimited stuff. They probably also have CUDA that would

make sense right there is probably connected to a GPU as well, but

that's just an idea. Lantos, I want to get to you and then we'll move

on to Claude.

20:02 (Speaker A) Folks and folks in the audience, feel free to hit the little right

button on the bottom left looks like a little message and leave

comments through commenting as well. Moving on to Claude V Two. Folks

in the audience and folks on stage, feel free to hit up the emojis

plus one.

20:19 (Speaker A) Minus one if you have tried Claude V two if you like it and you

haven't liked it. I'm going to cover this anyway because I think

somebody called me, I think Roy from Python called me a Cloud V Two

fanboy yesterday and I first got offended and I told him that I'm

just a fanboy for 24 hours. Before that I was a code interpreter

fanboy and then I figured with myself whether or not I am a fanboy of

Claude V Two.

20:43 (Speaker A) And yeah, I am and Sweet told me to relax and in fact I invited him

here to be the red blanket on the other side of the list. Anthropic

the company that we can definitely consider number two after opener.

I think that's fair in terms of quality.

21:02 (Speaker A) Have long released Claude version and they made some ways when they

released Claude AKS clong with 100K complex window, they have

released Cloud V Two and let me paste some Claude sorry, pin some

Claude thingies in the jumbotron, sorry. However, Cloud V Two

released with multiple stuff and I want to focus on two stuff and I

think we'll cover the UI first and then we're going to talk about the

model itself, UI wise and product wise. My hot take and I'll pin this

to the top.

21:38 (Speaker A) Unfortunately not debate this, but I love you, all of you. Is that as

products, Cloud V Two right now beats JPD as a product. My mom can go

into two websites and she'll prefer one versus the other one.

21:51 (Speaker A) Or my friends that don't know Xai as plugged in as we are, theirs is

free. And I think Cloud V Two beats GPD 3.5, which is also free, and

100K context window with the model being traded, 200 unleashes, a

bunch of use cases that were not possible before.

22:12 (Speaker A) It just frees you up. If you heard Skowski just say the limitations

of code interpreter. A bunch of these limitations stem from the eight

K context window.

22:13 (Speaker A) If you print a bunch within the code that you're doing, code

interpreter sometimes forgets what you guys talked about 20 minutes

ago. And the 100K context window also means a long, long conversation

history with the model. And I think it's really great.

22:37 (Speaker A) Not to mention that you can drag and drop full books in there. Those

books need to be in like one or two files and they still don't accept

zip files. And I'm planning to release an extension soon that does

this for us and unifies and single files.

22:51 (Speaker A) So hopefully by next week we'll have some updates. However, once you

upload that much or you can upload like a transcript or a podcast,

you can do a bunch of stuff because Cloud V Two is also better

trained on code and we saw a significant jump in wait, I'm switching

to the model, so let me get back to the UI. The UI allows you to

upload files.

23:09 (Speaker A) The UI has a command k interface, which I personally love. I hit

Command K in every website and see if they support it. You can just

start a new chat real quick.

23:21 (Speaker A) It doesn't have Share, but it's definitely refreshed and free UI.

It's called Cloud AI and that's the URL, and if you haven't tried it,

definitely try it. Comments about just the product side and the UI

side before we move to the model? Anybody play with this? Anybody

like it? Anybody loves the upload files feature? I would love to hear

hands and comments.

23:42 (Speaker A) Go ahead, Matt.

23:44 (Speaker D) A bit of a weird thing, but what I've noticed is it's actually quite

frustrating if you want to paste text in it actually, if it's over a

certain length, will paste in as a file. Little small thing.

Hopefully they'll change it, but it is really annoying because then

you can't edit it. Chat GP does do that much better, but I generally

agree with you that overall the product experience on Claude is.

24:03 (Speaker A) Significantly the new one. The fresh coat of paint they released for

us. I will say that Cloud so far was kind of a hidden gem, that only

folks who got access to the API actually got access to their UI, and

that UI was very restricted and folks who have access to Cloud API

know what I'm talking about. I think that UI is still around.

24:22 (Speaker A) It still shows your history. It's like very restrictive. It's not as

cool as this it's not as leak as this.

24:27 (Speaker A) So we like cloud AI, definitely a plus. Check it out. Now, let's talk

about the model behind this UI, because that model also changed and

several incredible things that changed with it.

24:38 (Speaker A) First of all, they released a new model, same price as the previous

one. We love to see this. Please everybody, including opinion,

continue giving the same price and cheaper and cheaper down the line.

24:41 (Speaker A) We love to see this. Second of all, they claim it's been fine tuned

on several things. One of them is code.

24:54 (Speaker A) And we actually saw a bump in the evaluation called Human Eval, which

is a set of questions that OpenAI released and I think the bump was

from like 55% to 78%, which I think beats 3.5 and is not there

compared to GPT four. Correct?

25:14 (Speaker C) Yeah, and four and four on past first on the first, not on GPT four

that is allowed to refine and fix it there, but on the first trial.

Yeah, by a little bit.

25:33 (Speaker A) So, news to me and thank you for joining in the past numbers is how

many times it's able to reflect upon the sensors and improve them.

25:43 (Speaker C) The past time is kind of what I meant by reflection is even stronger

GPT four. If GPT four sees the exception, it can come up with a

solution. So this is not in the Human Eval test, but if you use GPT

four this way, you get to 90 something percent, which is which I

think it's more realistic if you think about it. No programmer writes

the whole code in a one go.

26:10 (Speaker C) You write it intuitively, six bugs and so on. And also in code

interpreter, you see it. But it is remarkable to see state.

26:19 (Speaker A) Of the art on first and it's significantly better in code. And I

suggest folks who previously tried quad and haven't impressed to try

as well. An additional crazy thing that they've trained on is 100K

contacts window and they've actually trained, they claim on 200K

contact window, so twice as much as the previous round. And we follow

this one guy of your press, the guy behind Self Ask with Search and

the guy behind Alibi, the ability to extend complex windows.

26:55 (Speaker A) He just defended his PhD and he talked about complex windows and he

was impressed with the way they presented and the way they showed

their loss curve. And so this could be we saw the paper maybe this

week the folks saw the paper where the window dips in the middle.

There's like less attention in the middle of the beginning at the

end.

27:03 (Speaker A) And it looks like that's not the case for Claude as well. So I

suggest you try the huge context window and al you have your raised

hand and then we'll talk about some other model changes.

27:26 (Speaker F) Yeah, I would talk a little bit about I used Claude about a month and

a half ago to win Best Solo Hacker at the Craft Ventures hackathon

david Sachs won. Yeah, it had like 200 entries, but it's

exceptionally good at creative writing and also like comparing and

contrasting. I don't think people have really taken advantage of what

the context window is capable of doing. It's more than just loading

single files in.

27:53 (Speaker F) So what I did for the project was I loaded these large legislative

bills, these like 50 page unreadable bills, and you turned them into

relatable narratives. So one of the things that Claude can do is you

can adopt a persona. So a lot of times with summaries, summaries just

compress the text that you see, but you can tell it to say, write

1000 words from a social conservative point of view, or a bus

driver's point of view, or a social liberal point of view.

28:21 (Speaker F) And what that does is it takes all of its knowledge about the outside

world and gives you not a summary, but it gives you essentially an

essay about the practical effects of something like a bill. I've

actually been working with the idea of reading a book and having it

tell you what I would have learned from this, because that's actually

probably what you're more interested in. What it can do in terms of

comparing and contrasting large essays is exceptional.

28:51 (Speaker F) So you could have it say, write 2000 words from a social conservative

point of view, 2000 words from a social liberal point of view, and

then have it contrast the essays, which is something that would be

very difficult for a human to do. So you get to give it multiple

files and have it just give you a more balanced approach so you get

rid of some of the bias that comes in.

29:18 (Speaker A) My dream, go to my dream project that I never get to is to create

this for Twitter as like a Chrome extension that I can select a bunch

of tweets and then say, remove the bias from this and just give me

the debiased version of all of this. Yeah, completely. Like the cross

reference ability of Cloud between because of this context window is

incredible for many, many use cases.

29:41 (Speaker F) Yeah, I would say that as far it's not as good as GPT Four for

certain things. But that context window is fantastic. And I would say

a lot of people that are using embeddings and retrieval, you can

actually just put the whole thing in the context window and ask

questions to that and then you have a baseline to compare your

results from it. Most people, if they're chatting to a website or

something like that, you actually can just put the whole thing in

there as opposed to trying to chunk it up and do questions and you'll

see that your results are much better that way.

29:51 (Speaker F) And for most people, that would be good enough.

30:17 (Speaker A) So additional thing that the additional thing that Cloud was trained

on, they've talked about the output tokens, just the number. Of

output tokens of how much cloud is able to generate. And they've said

that previous models, I don't know if the same about GPT, I haven't

seen numbers on GPT Four, but they've said that previous Claude

models were focused on shorter outputs just as they were trained. And

this latest model was trained to output up to 4000 tokens in output.

30:47 (Speaker A) This is added to the fact that they also fine tuned it and trained to

output JSON files, complete JSON files as responses, which we as

engineers, we waited for this and Open Xai gave us functions via kind

of here you go, there's the function interface. And we love the

function interface. The function interface kind of locks us down to

the OpenAI ecosystem.

31:04 (Speaker A) And it's great to see another model that's like very close to state

of the art in human evil that also is now fine tuned to respond in

full intact JSONs. And those JSONs can be 4000 tokens at length. Any

thoughts on these?

31:28 (Speaker F) Yeah, I can confirm on it being able to write large amounts of

output. I mean, I was having it write like 2000, 3000 word like sort

of essays and outputs and it was fine with that.

31:40 (Speaker A) Yes. And I think it's I'm going to.

31:45 (Speaker B) Stick with GPT Four myself. But this might be pretty useful for just

dumping in an entire code base, given the 100k context window and

then getting some reviews and stuff, and then maybe moving some of

the stuff.

32:02 (Speaker A) Once I stop posting status and build that chrome extension that you

upload the zip and it flatlines it to one file and then upload it,

then we'd be able to do, like, a proper comparison, because code

interpreter can take zip files and then extract them. Oh, one

difference that I want to for folks in the audience, GPD Four with

code interpreter allows you to upload zip files, et cetera. We talked

about this. It does not load them into context window, right? So

there's like eight k context window.

32:30 (Speaker A) The files that you upload are not automatically in the context

window. The model doesn't it has to write Python code that actually

prints the files. And it usually does like the first few lines, hint,

hint.

32:30 (Speaker A) The folks in the audience who get my drift. But it doesn't usually

read all the unless you specifically ask it to and Claude does. So

everything you upload to, Claude goes directly to the immediate

working memory of the complex window.

32:38 (Speaker A) And that's a major difference to watch out for and also take care of.

Go ahead.

33:00 (Speaker C) I would like to ask everyone before I say my opinion, what do you

think about it in comparison to GPT Four about the performance? What

do you think?

33:10 (Speaker A) I would like comments from folks who actually use both and did the

comparison. And before I get to folks, please raise your hand to

answer. I want to call out SWIX's small menu bar which allows you to

actually Swyx. Can you give us like a brief two minutes on the menu

bar thing?

33:28 (Speaker G) Yeah, well, you don't have to choose. Just run it all the time on

every single chat. So it's a little electron app that runs in the

menu bar. And I've been maintaining it and I just added Cloud Two

this week.

33:42 (Speaker G) Cloud Two is not super stable yet. Sometimes it will fail to submit

the button. So you just have to retry manually to submit the button.

33:50 (Speaker G) But yeah, it's a great way to a B test models, but then also just

amplify every question with between four to five different chat

models with the answers. So I've been trying it. It's up to you if

you want.

34:07 (Speaker A) To.

34:10 (Speaker C) Find it.

34:14 (Speaker A) With the announcements, if you can. Yeah, awesome. Yeah, just

basically and maybe for instance, you don't have to stop using, you

don't have to choose. So I think the last thing that we need to

acknowledge it's, Claude, is the multilinguality.

34:28 (Speaker A) So they actually focused on showing us how much better, like, the new

ones from previous ones, and they posted blue scores, Bleu scores,

clock Two is significantly better at languages than the previous

versions. I think, to answer your question, I think it's close to GPD

Four, if not better at some things. Hebrew goes fluently, and usually

Hebrew is not that great.

34:57 (Speaker A) Russian and Ukrainian that I use also go fluently. And that part is

really good with a lot of context because you sometimes need to do a

lot of translation, or at least I need to do a lot of translation.

35:11 (Speaker C) Yeah, multilinguality works great. I was surprised. Absolutely. What

I think if you just compare the two on the same prompt, the same

question, I have a feeling that GPT Four is slightly better, but I

just don't have an example to tell you.

35:31 (Speaker C) Okay, here I don't know, it's a strange situation, but I really

wanted to ask you, like, what did you try and work better here and

there?

35:38 (Speaker A) So here's my use case that GPT Four currently cannot do. Yesterday,

Lex Friedman interviewed Israel's Prime Minister Benjamin Netanyahu

in one of the weirdest turns of history this podcast was, and given

that I know kind of who Benjamin Netanyahu is from, before I decided

to not listen to this, I decided to use the tools that we have at our

disposal. So I ran this through Whisper with Diarization. So I have,

like, a very nice transcript of who's talking.

36:10 (Speaker A) When I took that, I just dumped this as a text file. And I agree with

Matt, it's a little bit annoying that Claude turns whatever you paste

into like, a little text file uploads. That because you can't edit

it.

36:21 (Speaker A) However, I uploaded that transcript directly to Cloud, and then I

asked it to do sentiment analysis, entity extraction, and sentiment

analysis and entity extraction. Something that if I'd asked GPT code

interpreter, it would probably write some Python code to do this, and

Quad just kind of did it. And I haven't seen GPT Four being able to

do this for bigger files.

36:38 (Speaker A) And once I could just let me just this point. I continued by saying,

hey, because of the new coding abilities of Quad, I asked it like,

hey, print me a Python file that dumps whatever table of topics he

mentioned and sentiment, negative, positive, dump it into a word

cloud. That's something the code interpreters can actually do and

show you.

37:03 (Speaker A) But I asked it from Quad because previously Claude was s**t at coding

and it gave me Python files that ran from the first time. I didn't

have to change anything, there was no bugs. And then showed me a word

cloud of everything that was mentioned by BB in that podcast and it

all took like maybe seven minutes.

37:11 (Speaker A) And I don't know if for bigger complex windows, GPT Four can

currently do this. Go ahead, Al.

37:28 (Speaker F) Yeah, I've actually been putting a lot of transcripts for podcasts in

there and you can actually have the because it seems so much about

the speakers and it knows about the speakers, you can actually have

them continue a discussion about things that they didn't actually

discuss. Yeah, so it's like you can have it say, okay, well, what are

some topics they disagreed on and then some things that they didn't

cover? Tangentially, you can just have it give you another two

minutes of interview and it does a pretty reasonable job, especially

with public figures that it actually has a lot of their background

on. So it's pretty interesting.

38:01 (Speaker A) And not to mention free, ngbt Four needs a $20 a month payment and

quality is free.

38:08 (Speaker F) That's a good point, too. For those of you that have eval keys,

you'll notice that they're actually not charging you for them, so you

can actually go on as long as you want. The limitation is that you

can only do one request per organization. So if it's just a single

person, they only charge you basically when you start deploying for

commercial purposes.

38:21 (Speaker F) So that's something that people may not have realized.

38:32 (Speaker A) So I think we've covered everything right, trained on 200K context,

which they can enable tomorrow for us, and we'll get like two X. It's

going to be insane. There is some stuff that they have in Cloud in a

tropic called Constitution AI, so they have a mix of Rlhf access and

Constitution AI. So they're working on their model to actually be

more helpful, but also more safe and less jail breakable.

38:57 (Speaker A) They talked at length about this. We talked about human evil better

and same price and free playground. I think we've covered most of it.

39:03 (Speaker A) So anything else about Quad that we haven't covered, feel free to

raise your hand and tell us, and if not, I think we can move on. What

do you guys think?

39:17 (Speaker G) I'll mention briefly, did you talk about the multiple file uploads?

39:21 (Speaker A) No, go ahead.

39:24 (Speaker G) So I think it's just an interesting way difference between co

interpreter and Claude code interpreter. You can only upload one

file, right? But it can be a zip file with multiple files in Zion. So

it's de facto multiple files, but then you can only run code on that.

Whereas what Cloud here is doing is something slightly different,

which is to me is interesting, which is you can upload multiple

files, it just reads the file straight into the context and it's

using that 100K context to synthesize answers.

39:24 (Speaker G) So you can do, for example, PDF A and PDF B and give me a comparison

between the two of them or synthesize knowledge across them. And I

think that is something that code interpreter cannot do because code

interpreter will only run code across files. So I think that's

noteworthy.

40:15 (Speaker G) It's called genuinely coming up with one new thing that is not

copying chat GBT and good for them.

40:23 (Speaker A) Yeah. And unfortunately no zip allowed. But we're going to fix this

with an extension and hopefully talk about this next week. I want to

say hi to Weather Report.

40:33 (Speaker A) Feel free to chime in. Sorry you raised your hand open to come up

before. So if you have a comment about code interpreter, we've moved

past it, but if you have a comment about Claude, feel free to tell us

what's up with the report.

40:46 (Speaker A) Actually, I had only one thing about code interpreter that in the

previous space I talked about that there was a hypothesis I had about

code interpreter, which.

40:56 (Speaker B) Is to use it as a huddle because it's recorded.

40:59 (Speaker A) We'll move on and let's talk about code interpreter next time. I

think that some folks are saying that their audio is glitching and so

they're not able to and I want to see if I think Joseph has comment

about code interpreter. Joseph Polak. We'll give him a second to log

in and then I think we'll move on to other updates because we have

many other things to talk about.

41:29 (Speaker A) What's up, Joseph? Welcome to stage.

41:31 (Speaker G) Hi there, folks.

41:33 (Speaker A) Thanks for taking my question. I didn't even know all about that code

interpreter stuff with the file.

41:40 (Speaker G) So I'm really happy to have heard it. About Cloud, though.

41:46 (Speaker A) For Cloud. Well, I'm still on waitlist. First of all, it's free now.

You can access it right now.

41:53 (Speaker A) Cloud AI. There's no waitlist anymore unless you live in the States

and you'll have to get a VPN. Okay, I'll definitely check that out.

42:03 (Speaker A) My question was about using Cloud and actually code interpreter

through API. Do you think that's ever going to exist or if it's

coming so clogged API? But I think that's waitlisted. I have talked

with Claude folks and they said the waitlist is now going faster.

42:24 (Speaker A) So they are ready to get more people in. I think because of the new

safety updates, they're less afraid. So definitely apply for the

waitlist on quads account.

42:35 (Speaker A) Code interpreter is not available via API, and we've seen some folks

who hack it together with like, I think a browser plugin that proxy

something. Sweets I don't know if you remember the unofficial quote

unquote code interpreter API and it's how to access this, but it's

not available in the official OpenAI APIs as of yet. We haven't seen

them.

42:56 (Speaker G) No. For the record, there's no unofficial code interpreter API.

There's the browser side thing that we are trying to but nobody's

made any.

43:07 (Speaker D) Adapter for it yet.

43:08 (Speaker G) I think you can, if you want, using puppeteer.

43:12 (Speaker A) I would not recommend definitely, if anything, there was some folks

that tagged us and I need to go and find this that they're working on

like an open source version of code interpreter that uses laws and

stuff. And that one this will likely be the way forward. If you do

want something programmatic that has code interpret capabilities, go

ahead. NISten.

43:35 (Speaker B) There's also Chatbot UI on GitHub. So yeah, for the other people that

are hacking something together, I'll wait until there is something

public before, because then.

43:45 (Speaker D) We don't know everything.

43:47 (Speaker G) Open source is going to be worse. Because you are missing the model.

43:51 (Speaker A) Yeah, because we think that it's fine tuned on actually knowing how

to run code. Right. That's kind of the highlight that we get with

from the less space. We think it's smarter because of that.

44:01 (Speaker A) And one of the main things again, sorry, going back to code number

just real quick, it is able to then fix itself and ask itself, oh,

oops, I made a mistake. Let me try again. Matt, I saw you unmute

yourself.

44:13 (Speaker A) Feel free to go ahead.

44:16 (Speaker D) Well, yeah, just a quick thing. So from what I know, openi will be

offering fine tuning relatively soon. So at that point, you

theoretically could go and fine tune your own code interpreter like

Model, even if they don't offer it, which is going to you.

44:31 (Speaker A) Can also theoretically not that we would recommend, but theoretically

right now you could start distilling some stuff from code interpreter

by asking it questions. Generate code and store it to a file. Ask it

to download and then quote, unquote, generate the data set. But not

that you should, but you can theoretically as well, so that when it's

time to fine tune, you have some data set.

44:52 (Speaker D) Yeah, theoretically. I don't know if a shared GBT currently supports

those types of conversations, but if it does, I'm sure that's going

to happen really soon.

45:00 (Speaker G) I don't think it's maintained because chat GPT itself well, I want to

speak for share GBT. I know, Steven, but I can help you move the

conversation back to cloud.

45:11 (Speaker A) Yes, please. Let's move back to cloud. Thank you.

45:14 (Speaker G) So just between the how many people are listening to this chat

anyway? I think it's like 60 people. Email support@anthropic.com for

the Cloud API.

45:26 (Speaker A) Yes, email them, state your use case and they'll likely get you in

and you can use SWIX's menu bar to actually kind of run them in

parallel with the megaprom feature. Megapron super prompt, what is it

called? I think SWIX dropped. There is like one prompt that you type

and then it all goes to both to all the models. I want to recognize

some folks in the audience.

45:50 (Speaker A) Hey, feel free to regime if you.

45:52 (Speaker D) Want to come up.

45:52 (Speaker A) Obviously, I saw some other Euro I saw in the audience. Max AI.

Welcome, Dexter. There's a bunch of folks who are usually here and

it's great to see, and I think we're moving on to a very spicy one.

46:06 (Speaker A) What do you guys think about Xai? So I'm pasting the summary of the

people. Elon Musk and a bunch of other folks have announced X. AI

they're essentially answer to OpenAI.

46:22 (Speaker A) We've all seen Elon kind of talk about safety and talk about helping

open Xai and then could not be open since then. He talked about truth

GPT at some point. And finally they announced Xai as we were talking.

46:37 (Speaker A) By the way, I have an application from Xai which they're going to

have spaces tomorrow to go deep into deeper into Xai. But so far

there's not a lot of detail. There are some details about the folks

who work there.

46:50 (Speaker A) So they have folks who wrote the Adam Optimizer. There are other

folks thoughts about Xai before we get to hear what they do.

Obviously, there's no product yet.

46:59 (Speaker A) I don't think they've started training. The one thing that I will say

is that they will have premium access to Twitter, obviously, because

Twitter is now rebranded.com Xai. After closing down the APIs and

closing down the scraping for Twitter, xai will now have a data set

that's insane to train on Twitter.

47:21 (Speaker A) And we wish them, quote, unquote, good luck. I would love to hear

from folks on stage. What do you think about the announcement, the

direction, the people? And we're going to wait for tomorrow to

actually hear them talk.

47:24 (Speaker A) I know. NISten, you have some ideas if you want to share to get

started.

47:40 (Speaker B) Well, this is more of an old lady babushko opinion that's just

talking about stuff. I found it interesting that they went from, what

was it? Base GPT through street taking on GPT four and this entire

competition to doing something more noble like dedicating it to be

better at math and discovering new things in physics. So the way I

see that, that's pretty noble. But at the same time, I feel like

that's a result of having problems hiring in order to be competitive

with the other ones.

48:26 (Speaker B) So, yeah, this will be interesting. But the way I see the whole set

up right now is, as the kids say, it's pretty mid, in my opinion.

48:39 (Speaker A) As the kids you don't use with that. I will say that we will see

tomorrow from their space. They're probably going to use Elon's Cloud

to maybe try to hire and it's probably harder now to hire because

everybody knows how quick they're getting fired and how much. It's

not like super fun to work for X, but we're in for a nice ride

because they do have access to the cross pollination from Tesla as

well, right? So if they have big questions, tesla does have a few

good folks still, even after Andre Capati left, and so they'd be able

to ask them for assistance.

49:20 (Speaker A) There's obviously the whole Dodgy thing in play, which we can I don't

think we have time to talk about Dodgy, and it's not new, but there

could be something there. Gabriel, you wanted to come up? Maybe you

have. Yeah, go ahead.

49:33 (Speaker A) Gabriel.

49:34 (Speaker E) Yeah, I was just going to say about Xai, I mean, you mentioned

Twitter's data, and I'd be interested in hearing other people on the

stage opinion on this because recently there's been a lot of work

done on quality of data over quantity of data. And of course, Elon

also has a ton of GPUs. Reportedly, he's bought tens of thousands of

GPUs. So that's definitely important in building these big models.

49:58 (Speaker E) But I'd be interested in hearing from people on the stage if they

think Twitter's data and the kind of data that Twitter has is

actually going to be really powerful for training good models.

50:11 (Speaker A) Anybody wants to take this?

50:13 (Speaker F) Yeah, I'll take a little of it. One of the things that Twitter has

that other people don't is that people are actually debating issues.

So I think that's one of the reasons why he's really focused on the

idea of Twitter being a source of truth and being sort of

unrestricted so that you're not just following like, one thread, you

watch the narratives being debated and he has access to all that.

50:35 (Speaker A) Data and community notes. And it's really hard to scrape. Like, I

don't think it's API ball at all. It's not super simple to scrape at

all.

50:42 (Speaker A) I want to get yum before I think Matt wanted to unmute and go and

then yum. If Matt, you still want to chime in and then yum.

50:53 (Speaker D) Yeah, I mean, nothing too much to add here. I think the community

notes are very interesting as a way to sort of like, reduce

hallucinations. I think one of the things that they're going to want

to do heavily is invest in sort of filtering that data set because

there's a lot of great stuff on Twitter. There's a lot of crap on

Twitter.

51:07 (Speaker A) A lot of yeah.

51:09 (Speaker D) And the more of that that seeps in, the worse the model is going to

perform. Obviously, scale is important, but data quality is

incredibly, incredibly important and the scale kind of doesn't negate

bad data quality. So I think if they do one thing right, it's going

to have to be getting the sort of filtering of the data set down. But

they do have a ton of incredibly high quality data.

51:27 (Speaker A) Yes, I think Yam was next and then we have a few folks wanted to come

in. I think Pharrell wanted to come up. So yam. And then pharrell.

51:34 (Speaker A) And then Gabriel.

51:37 (Speaker C) I just want to say, of course, if you just take Twitter data and

start training your model, you can expect it to be average Twitter,

which is not what you want. What you can do, which is a gold mine, is

to transform this data or just rephrase it as other forms. And this

just makes the data a gold mine because Twitter does have very high

quality content here and there. Absolutely.

52:05 (Speaker C) If you can, and transform it and rephrase it to a different form if

you want an example. So the paper textbooks are all you need.

Basically, they just take data and make it into a tutorial, make it

into a textbook, like perfect, clean and everything.

52:22 (Speaker C) It is very easy to do, and you don't need a powerful model to do

that. You don't need chachi PT. You can use it to do it with a small

model.

52:30 (Speaker C) I'm currently doing off the record, I'm currently doing it myself in

a large model I'm training. It doesn't it doesn't matter matter

anyway. It's a gold mine.

52:43 (Speaker C) What I'm saying, it's a gold mine.

52:45 (Speaker D) About Twitter.

52:46 (Speaker A) An additional thing before I get to Farrell and then gabriel

additional thing. NISten I talked about yesterday at length in our

late night line cook space. That's not going to be scheduled. If you

guys are on, feel free to join that one.

53:00 (Speaker A) Twitter Spaces is also a gold mine. Transcribing Twitter spaces and

seeing all the reaction emojis that they have in real time. Like the

space that Elon ran with RFK Jr. For example, if you know in the

audience who are actual people instead of bots, and you're able to

get like emoji reactions in real time, that's a definite, definite,

very high signal kind of training set that they have and almost

nobody else has.

53:25 (Speaker A) And through how to get Pharrell, you are next, I think. And then

gabriel yeah, I wonder what.

53:30 (Speaker D) The relation is and how useful the Twitter data will be for their

goal of building a sort of math reasoning machine. Right. Also, do we

know if they're open source, as in truly open source or not?

53:49 (Speaker A) No, we don't know yet. Hopefully tomorrow we'll be able to answer

questions. However, we've seen Elon take Twitter's algorithm to open

source, and now he's like, boasting this comparatively competitive

advantage versus something like Threads. He's saying, like, hey, open

source.

54:07 (Speaker A) If you go to Threads, you're under the Zucks influence algorithm. So

there is definitely an attempt to open source from their side, but we

don't know anything about that beyond that. Gabriel.

54:17 (Speaker A) And then Johnny.

54:20 (Speaker C) Yeah.

54:22 (Speaker E) First of all, I think it's funny that Elon's s**t posting is

polluting his data set. I would say that.

54:34 (Speaker A) By the way, if there's anybody with the option to detect S**t

posting, it's them, right? They're going to be able to build a model.

Understand, this is s**t post. This is like somebody who made an

effort to give us clean information. But sorry, go ahead.

54:49 (Speaker E) Yeah, that's exactly my point that I was going to make, that Elon was

on this crusade before he bought Twitter. And this is kind of why he

got forced into buying Twitter, because he was going after the bots

and he made a big deal about the bots. And I think they spent a lot

of resources on figuring out what's good content and what's bought

content. And another thing is that we each are kind of experiencing a

different Twitter, right? Because we're within whether it's an ML

Twitter or Israel based Twitter, and there's many different

communities and their Twitter is very good at segmenting those

communities and figuring out which content belongs to what community.

54:55 (Speaker E) And they'll have the ability, I think, to segment this data and train

many different models that are good at different things because

they're in a literature community or in an ML community or MMA

community or whatever.

55:37 (Speaker A) I actually saw a map of like 5 million, 7 million tweets all embedded

in Nomic Xai Atlas. I don't know if you guys follow Nomic, they just

recently announced like a 17 million round A, by the way. So kudos to

Nomic good friends. Andre, the GPT for all team, and they have like

an embedded map before the API was shut down that they were able to

siphon, et cetera.

56:00 (Speaker A) And Gabriel, what you're saying is actually visible in the embedding

map. You can actually see those tweets and then different areas of

the political Twitter. There was a journalist Twitter until all of

the journalists started living there's like a bunch of different

pockets of Twitter that we don't get exposed to, not to mention the

different languages.

56:20 (Speaker A) There's a whole Japanese Twitter that's like insane. And people go

super, super hard. And translating is easy.

56:26 (Speaker A) We talked about Cloud being able to translate. So they have a bunch

of very interesting data. And I think Zuck is also going after that

data with Threads.

56:31 (Speaker A) And I think this is the reason why we'll see Threads getting

continued work and we'll see a lot of investment from their side. But

to compare to Threads, and we talked about this yesterday, is that

Twitter has back history and a lot of historical data that they can

train others. Threads is fairly new as well.

56:54 (Speaker A) So definitely a bunch of interesting data sets. Johnny and then

Lentil. Hey.

57:00 (Speaker H) So one I think about when I think about the data from Twitter that is

potentially lacking and some of the other data sets is colloquial

language. Because what Twitter has that Facebook doesn't have and a

lot of other things don't have, especially from what you're talking

about, like historic, is the way that people actually interact with

each other. You know what I mean?

57:26 (Speaker A) Not only that, how it evolved as well, right throughout exactly.

57:35 (Speaker H) To be honest, I think the data sets from earlier is probably better

and stronger because it's just gotten out of hand. But I agree with

what I'm not sure it was Yam or who said the filtering because all

right, this is black box, it's not open source. Elon has not been shy

about his kind of response to what he perceives as wokism and all of

that stuff. I'll be super curious.

57:36 (Speaker H) I mean, there's a big team on this, but I will be super curious to

see what that bears out in the actual model. Because, God, there's

equal parts or more parts disinformation on Twitter than there is

information. So if we're talking about source of truth, that rings

some alarm bells for me, for me personally.

58:21 (Speaker H) So those are just my thoughts.

58:29 (Speaker A) Yeah. Thanks, johnny Lentil. Go ahead. And then Gabriel.

58:33 (Speaker A) Let's finish on the Gabriel and then we'll move on to the next topic.

58:36 (Speaker H) Cool.

58:37 (Speaker A) Yes.

58:37 (Speaker H) So I think it's going to be hugely bullish for this data. And from

the perspective of relating idea space and people and the relations

between those, I think that's probably going to be more of a goat

information than conversation because you can build so much from

that. Like dating this is just one like a dating thing. Or finding

people, finding brain power compute, that's going to be huge.

58:40 (Speaker H) And to touch on the open sourceness of the data, I think not open

sourcing it at some point is going to be hugely politically bad for

Elon to do.

59:23 (Speaker A) That'S.

59:23 (Speaker H) My thoughts on that.

59:24 (Speaker A) Awesome. Thanks, Lance. Gabriel, let's end up and then, Matt, we're

going to talk about some interesting stuff.

59:31 (Speaker E) Yeah, just on the kind of data. I think for those of us who ran,

like, the early versions of Llama before they got fine tuned in all

kinds of ways, and you run it, and especially the smaller models, you

put in a prompt and it spits out some generic Facebook type of

content. It sounds like a Facebook post of like a 15 year old or

something like that. That shows what you get when you use all this

kind of unfiltered data.

59:59 (Speaker E) But I think the interesting thing is that Llama was then fine tuned

in many different ways and some really powerful models are built on

top of it. So I think in some sense, almost any data is valuable in

the sort of pretraining stages and maybe you need really high quality

for the fine tuning, but I think that big volume might be really

useful, maybe not the most economical.

60:21 (Speaker A) So I want to wrap up things why they potentially have like a leg up

versus not a leg up. We definitely know that Twitter was used to

train other models that we currently use. We know this for a fact.

This was the reason why Elon and Sam Hoffman, who used to be friends,

are no longer friends, sheet posting about them.

60:40 (Speaker A) And the current models we use. Do use this data set, but it's old for

them. It's no longer like recent and relevant.

60:40 (Speaker A) And we know for a fact that Twitter is significantly biased and

probably the best place in the world for uncovering news as they

happen before the bias sets in, before the narrative sets in, before

folks know how to before folks get their marching orders from MSNBC,

from the Other Side, how to think about things when not. The Twitter

is really good at talking about issues as they arise, the second they

arise. And I think that on its own is going to teach the models a

very great deal.

61:16 (Speaker A) Naval Ravican, if you guys follow Namal, he always said Twitter makes

him a better writer. So we definitely know also that tweet in short

form condense information better. And if their model trains on that,

obviously taking all the precautions we talked about before, bots and

s**t, posting, et cetera, if they're able to actually get this into

the model, likely their model will be more up to date and more fine

tuned like reaction.

61:20 (Speaker A) So with that, I want to close. We'll see about Xai. It's definitely

exciting, right? We're potentially getting another big one,

potentially open source one.

61:20 (Speaker A) So we'll see. I'm going to wrap up this update and I think the next

one I want to move on. Matt, let me know if you're still around if

you want to cover.

61:20 (Speaker A) So we have Matt, who introduced himself in the beginning. So I'll let

you do this quickly again because maybe and then we're going to talk

about the stuff that GitHub Stars is rising on, which I think is

super cool. And I invite you to give us a little bit of an interview

about this.

62:16 (Speaker A) Go ahead, Matt.

62:17 (Speaker D) Yeah, sure. So I'll try to summarize it a bit better than the last

time. A lot of practice, but very long story short, co founder, CEO

of Other Side AI, creator of Hyperwrite, and a number of other

things. Basically, we've been around for a number of years now.

62:30 (Speaker D) We're one of the first companies in the space working with LLMs. The

goal always has been to build a personal assistant that scales to

everybody, just like a real human personal assistant, but at scale,

way cheaper, digital. The tech wasn't there at the beginning. So we

built other products to sort of learn and gather resources, whether

that's users, revenue, bunch of other things that we can do.

62:50 (Speaker D) What we do today. Today we are actually building that personal

assistant. So an AI that can operate a computer, any software to do

what a human can do on pretty much anything.

62:53 (Speaker D) So it'll help you with your tasks. It's very simple. Today it's a

Chrome extension that lets you sort of like control Chrome just by

sort of talking to it.

62:53 (Speaker D) So you could say, go order me a pizza, or go send this person an

email or go filter my email, or anything else it works okay today.

The idea is that over time, it's going to get a lot better, a lot

cheaper, a lot faster, to the point where six months from now, a year

from now, it might actually be as good as, if not better than a human

on many tasks. But that being said, while I work on this, I also like

to learn about getting the most out of these technologies because

they're so fast moving and you really have to stay on top of it to be

effective, or you.

63:34 (Speaker A) Can every week and then stay up to date with us together. But yeah,

go ahead.

63:40 (Speaker D) Exactly. I mean, a lot of what I do to learn really, is just build

things that I find interesting, and I find that often, even if I'm

not expecting it, a lot of those learnings do translate to stuff

we're doing at other sides. So this sort of just came out of that.

Happy to sort of dive into the project, or if you want to sort.

63:56 (Speaker A) Of stop me and let's pause here for a second and I'll just tell folks

that I pinned Matt's Tweet from a couple of days ago with the

introduction. Since then you got a few thousand stars, I think, on

GitHub, and we're going to talk about the GPT Prompt Engineer project

and the different reasons why Matt and folks kind of written this and

what it's here to serve. So maybe give us an introduction to the GPD

Prompt Engineer and what kind of made you come up with this and how

it works. Yeah, go deep, man.

64:29 (Speaker A) Sure. Yeah.

64:30 (Speaker D) So forget about rambling in advance. Essentially, I find prompt

engineering so fun. I've been doing it pretty much every day for

everything, honestly, to the point of excess, from what I would do

for work to having it decide what I'm making for dinner for years

now. And as I've gone through this process, sort of like learning how

to use these models, it's become very clear that especially as these

models evolve, there's no best practice for anything.

64:54 (Speaker D) Prompts change ways to prompt change. Something that works for one

task might not work for a very similar task. And the only way sort of

get out of that is to sort of get an intuition of the model and try a

lot of things, but that doesn't always work perfectly.

65:01 (Speaker D) And also you don't really know kind of what works and what doesn't.

Even when you're trying things right, you have to do it sort of like

in a very scientific way, but there's no real right answer to

anything. It's kind of like alchemy.

65:18 (Speaker D) So starting to think I think this was right. When GPD Four came out,

I was using GPD Four pretty often to just ideate prompts. I would

say, here's what I'm trying to do.

65:20 (Speaker D) I would say, write a prompt me, and I would use the ideas from that

to help me improve my own prompts and that actually got a lot of

interest. We ended up building a sort of thing similar to that into

the hyperwrite platform. At the time it was really cool, but really

wasn't something that would replace what I do every day, which is

really hardcore prompting.

65:43 (Speaker D) Eventually I was just sort of thinking about it, and I think this was

on the 4 July, I was just sitting there kind of thinking, what if we

tried it? And I started thinking about how could you design a system

that actually comes up with good prompts? Not just a prompt that does

the job, but something that's actually optimal, because as humans,

right, we can only try so many things at once. But the magic of these

LLMs is they're creative and they think faster than we do. In the

time that I could write half a prompt, LLMs could write 5100.

65:48 (Speaker D) And what if you could leverage that? Because even if the average

prompt isn't very good, you're going to luck into one or two that

happen to be exceptional for your task. So I started by doing it

actually with a classifier. I only released this notebook yesterday

just because it's like a step on the road.

65:48 (Speaker D) And what we ended up using it for was actually something at other

side where we needed to build a classifier for something with

personal assistant. And I just wasn't getting good performance out of

the prompts that I was writing. So I said f**k it, what if we have

the AI try to do this? And I built this so that essentially I

describe the task, I give it some test cases, so I'll give it some

true false test cases.

66:11 (Speaker D) Because the classifier was classifying things as true or false. It

was like classified the statement as true or false. And it was like

New York is in America, it would be true.

66:54 (Speaker D) If it was new York is in Paris it would be false. And I basically

created like ten or 20 of these test cases. I described the task and

I had GPT generate something like, I think 20 or so prompts.

66:57 (Speaker D) And surprisingly, the quality of them just at first glance was pretty

good, right? It was kind of shocking considering I spent so much time

trying to do this manually. Then what I did was I just basically had

each of these prompts test against each of these test cases. And I

plotted sort of the success of each and turns out some of them

actually outperformed what I did.

66:57 (Speaker D) I was kind of shocked, right? Like you wouldn't expect that,

especially doing this for years.

67:30 (Speaker A) Just to recap real quick on this, the GPT four, I assume that's what

you're using generated prompts actually performed better than Match

rumors. Prompts and Matchroomr is the founder of a prompt company

with a lot of prompt use cases for a long time, from GPT-3 to four,

et cetera. And some of the ones that it came up with performed better

than yours.

67:52 (Speaker D) Yeah, it was kind of scary. Some of them performed way worse. But the

idea is that you're going to sort of luck into something that is

better. Maybe two out of 20 will be better, but they're great.

68:02 (Speaker D) So I was sort of just so fascinated by this, I was like, how do you

take this further? Because classification is one thing, but real

prompts where you're actually having it generate text, those are

harder. How do you judge that? You could use GPD four to judge them,

right? If you have two prompts and you say each of them generate me

something and they give you your responses and you want to know which

is better, you can ask GPD four. And so I figured we could apply

that.

68:29 (Speaker D) Turns out there's some issues with that and there are some papers

written about this where essentially it'll be sort of like more

favoring the one that's on the bottom. So just do it twice, flip the

order and see if one wins. And I took that approach and I sort of

combined it with sort of like an ELO style tournament where

essentially you have each of them go head to head, like one on one,

and each of them gets their ELO score either bumped up or down based

on whether they win, lose or draw.

68:53 (Speaker A) Can you give two sentences on ELO scores as a concept? Yeah.

68:57 (Speaker D) I'm actually not super familiar with it. Funny enough, I had GPC

write the code for that part, but basically think of it like a

ranking system in a video game. Yeah, think of it like a ranking

system in chess or a video game where you have two people competing

and the one that wins gets their score increased by x. The one that

loses gets their score decreased by x.

69:18 (Speaker D) And it also sort of like weighted based on the previous scores. So if

somebody that has a high score beats somebody with a very low score,

their score won't increase that much because they're very likely

going to win. So it's sort of just like a weighting system to help

figure out what's the best so instead of just sort of getting a clear

cut, yes, this is right, or no, this isn't what you can do with

classifiers, because there is a right and a wrong ground truth

answer.

69:39 (Speaker D) I just had each prompt sort of generate for a test case and the sort

of opposite prompt the competition prompt would generate for that

test case. So I was a little bit complex and they would have the

model judge which one was better. And it's expensive, right? It might

cost like $20 in GPT calls to get to an answer, but turns out at the

end, the prompts again were just kind of blowing me away.

70:04 (Speaker D) Awesome creativity in them. Like the words it used, the trigger

words, it didn't do what I would do. And in a really good way.

70:10 (Speaker D) And it also opened up my eyes to sort of like new ways of prompting

that I never would have thought of and just sort of like aren't

standard. And that's kind of the magic of all this. I think that this

sort of abstracts away the sort of atomic level of prompts, right?

You talk about prompts as sort of a prompt in and of itself and then

a system built around the prompts with many prompts kind of working

together.

70:31 (Speaker D) This makes it so that you don't have to guess about, do I have the

best prompts for this single atomic part of our system? Where the

magic really comes in then, is how do you string these amazing

individually crafted by AI prompts together to make something that

actually works really well.

70:46 (Speaker A) And how you robustly build the evaluation system, right? Because the

classifier is a simple example of evaluating, because maybe you know

this, et cetera, but how do you actually scale up the evaluation

system such that this could potentially run in loops and then

generate the best of the best prompts for a task?

71:03 (Speaker D) Exactly.

71:03 (Speaker A) That's also like a very interesting piece. How do you think about

evaluation going forward?

71:08 (Speaker D) Yeah, so I think it's sort of like that, where you could have this

thing run in the loop three times and take the three winners and then

have GPT read those winners right, and be like, here are prompts that

worked really, really well. Here are the test cases where they

failed. Now I want you to write new prompts that take what's good

about these but also mitigate the failure cases and generate a whole

new set of prompts. Sort of like evolution really doesn't just have

to stop at one point in time after the first run.

71:37 (Speaker D) It's like, let's learn from what these amazing ones still did wrong

and continue to make this better and better and better. Obviously,

this relies on a relatively large test set. I'm also experimenting

with ways where you can have the test set autogenerate, but that's a

little bit finicky.

71:50 (Speaker D) But I do think that sort of like evolution of this could lead to some

really exceptional prompts. But what I found was even on the first

run I was seeing it outperform myself. For example, there was a

classifier we were using GPT four with logic bias to do because it

was such a hard challenge and we were getting some like 90% accuracy.

71:50 (Speaker D) I had it do these prompts with GPT four, but then I had it run them

using GPT 3.5 and it got 96%.

72:19 (Speaker A) We've talked about this pattern before where you can outsource kind

of the hard work to GPD four, but then once you get really good at

prompting, GPD 3.5 is actually very decent in many things and it's

way faster, cheaper, and has a 16K context now that you can use. And

so we've seen this pattern with many folks that if you don't need the

full power of the GPT four, human evil for coding, et cetera. You can

go far into GPT 3.

5 and get very far along, especially as you're getting better

prompts. And now, Matt, you have like a recursive crafter helper guy

that's here. And my next question for you is, have you used anything

else? So you mentioned GPD 3.

5 where you run the prompts. Have you tried them on different models,

like Cloud maybe, or the open source llama ones?

73:07 (Speaker D) I actually haven't just because I wanted to see if this worked. It

was sort of just an interesting thing for me and my time is really

focused on other side and personal assistant, but it wouldn't be hard

to get Claude in. I suspect Claude prompts would perform better on

Claude. Open ad prompts would perform better on Open xai just because

the models give the prompt them very differently.

73:18 (Speaker D) Claude is sort of like a more emotional thinker. Open xai is more of

like a logical thinker. It's a very sort of simple, not perfect

analogy, but I suspect you'd want to sort of like stick within the.

73:36 (Speaker A) Ecosystems, maybe, not to mention inflections pie, which is like a

whole different beast.

73:41 (Speaker D) Yeah, that's an interesting one.

73:44 (Speaker A) We discussed by a couple of times and I've seen some reactions, but I

don't think maybe at the end of this, if we have time, matt, one

question I will have for you on this and I think we'll move on. Is

that where folks can find more work of this? Is it open source? What

are you looking for contributions? If you are. And yeah, just give us

a wrap up of this project.

74:07 (Speaker D) Yeah, so you can find it on GitHub. It's called GPT prompt engineer

Currently there are two notebooks. It's all done in Jupiter notebook

format, so it's pretty easy to edit. One is for the classification

system, the other is for the generation system.

74:20 (Speaker D) We're honestly sort of like at a point where it works well, so it's

like, what do you build around it? One thing that's missing is the

classification version only supports true and false labels, but it's

not hard to use TikTok into or TikTok and whatever it is to allow it

to support arbitrary labels like happy, sad, angry, whatever. That's

probably like a 20 minutes ad that if somebody goes in and does that

opens up a whole new set of use cases. The evolution idea that I

mentioned before, right? Taking the best prompts and then saying,

here's where it went wrong on these test cases, and then throwing it

back to GPT and having it generate more and rerunning it, that's

interesting.

74:45 (Speaker D) The ability to use Claude would be awesome if anybody wants to add

that. I could even see it evaluating each prompt on each model,

right? Because right now we only generate with GPD four. We only

evaluate with GPT 3.

75:19 (Speaker D) 5. But imagine if you generate with GPD four half of them, you

generate half of them with Claude and then you evaluate each prompt

on GPT four, GPT 3.5 and Claude.

75:27 (Speaker D) And you can see sort of the latency success rates for each along with

scores. I think all that would be super interesting. Also sort of

like just open to ideas.

75:40 (Speaker D) I'm not really sort of supporting this at all. So if anybody wants to

kind of take it and run with it, I am all for that. Also sort of just

like a shameless plug right now or thing that we're looking for just

because I have an audience here.

We are at other side in hyperwrite, really looking for somebody to

help on back end hopefully with a security set of expertise. And then

also if anybody is experienced in training machine learning models, I

would love some help there because we're doing a lot of LLM training.

75:55 (Speaker A) So just quick thing and also to add that now with the Prompt Engineer

that's automated, the results of this would likely generate like a

great data set that you can add and continue fine tuning, especially

as GPT four fine tuning is coming soon. So Matt, definitely store

everything you generate with the yellow score and everything and from

a GPT prompt engineer that runs and doesn't know about the rest run,

maybe there's going to be a path forward to actually fine tuning a

prompting model, which could be exactly. Well, yeah, exactly.

76:28 (Speaker D) Imagine taking a prompt and taking one that has a slightly higher

score and fine tuning a model to take the initial prompt and then

sort of output the one that has a higher score and you can do that

evolutionarily continue to get better prompts in theory.

76:40 (Speaker A) Awesome. So folks, if you want to work in a cool place, I can write,

hit met up and also check out GPD Prompt Engineer on GitHub. Thanks

for coming. Feel free to stay and kind of continue commenting and

talking with us as we go through a bunch of other updates that we

have.

76:57 (Speaker A) Just a quick check with NISten who promised me to follow Twitter and

see if anything new comes up. Breaking news as we talk. I haven't

seen anything besides the space of Xai.

77:04 (Speaker A) I will ask people's attention to the last pin tweet from Dr. Jim Fan

that talks about the context length dip. Matt, you also touched on

this context length dip. It's basically a paper, I think.

77:22 (Speaker A) Stanford I'm not sure that figured out. That even longer. Context

windows, they have a dip in the middle, which means that at the

beginning of the prompt at the end of the prompt, the model has more

attention to what you actually asked it to or the details that you

provide in the middle there's like a dip.

77:39 (Speaker A) And this was also released this week. However, the one thing I said

previously I will repeat here claude and some folks who know about

contact windows way more than me. They say the Claude is actually

really good at this without the dip.

77:54 (Speaker D) Yeah, I feel like that's saying. It's an interesting paper. I feel

like it's sort of saying like, hey, if you train on marketing copy,

then it's going to be worse at coding, obviously. Right.

78:03 (Speaker D) We do a lot of long context stuff at other side. That's actually what

I'm focused on right now, training really long context massive

models. And if you train it on data where there's context in the

middle that matters, it is going to be good at that.

78:16 (Speaker A) Interesting. So what you're saying, I think I've seen this kind of

opinion before as well. It's just the outcome of the data that was

fed in and for blog posts and other places, people want to hook your

attention in the beginning and then kind of finish strong. Basically

you're saying that this is potentially an outcome of that and not

necessarily the tech behind it.

78:38 (Speaker D) Yeah, I believe so. I mean, who knows, maybe wrong, but from my

experience, right, why I was given that analogy before is like if you

train it up to do one thing and then you're asking it to do another,

it's not going to do that other thing as well. And I'm guessing the

data set that they sort of did this evaluation on was something that

didn't have a ton of information at all. Part of the reason that so

few of the language model companies have super long context length

models and why it was such a big deal that Anthropic did is because a

lot of the challenge in training them isn't actually in training

them, it's in the data.

79:08 (Speaker D) Obviously, inference becomes a challenge. It's the cost and the

overhead there. But the data to sort of do this is really sparse.

79:10 (Speaker D) It's not very available. Right. So that's I think part of it right

there's not just like a sort of standard data set that has super long

context link, that has information in the middle.

79:25 (Speaker D) We do actually we've been building one another side and that's sort

of given me some of the ideas that I'm sort of spouting here. But my

guess is that Anthropic part of the reason theirs works is because

they focused on the data. The data is really important.

79:38 (Speaker A) Right.

79:39 (Speaker D) I will say model, it's just fine tuning.

79:41 (Speaker A) Yeah. I will say when I got access to Clouds Window, I did like a

bunch of tests with my Twitter data. I just pasted like a bunch of

JSON with Twitter numbers, twitter IDs numbers. And the smaller

model, the not 100K, gave me back results that actually didn't invent

those numbers.

79:57 (Speaker A) The 100K model lost in the middle and started inventing those

numbers. I literally saw this difference between the longer complex

one and the previous one and I thought it's because of like it loses

some complex in the middle. And I need to retry this on the new ones

because the new ones, they claim this doesn't happen with that.

80:01 (Speaker A) I want to go to Al and yeah, one of you I think raise your hand first

to talk about the context length dip and that paper if you have read

this, if you have thoughts and if you have noticed this as well.

80:29 (Speaker F) I just had a quick question for Matt about the differences that he

found in prompting between say, Claude and GPT Four. I noticed like,

the prompts aren't really reusable and maybe you could speak to that

in the general case.

80:42 (Speaker A) Yeah, let's end with maybe this question and move on to other updates

as we have. Go ahead, Matt.

80:48 (Speaker D) Yeah, sure. So it's like talking to two people with two different

personalities, right? They're both people, but they respond

differently to different ways. You're sort of prompting them, if you

will. Claude is sort of like more emotional, I guess, where open xai

is sort of more logical.

81:03 (Speaker D) And it's hard to sort of pin that down to any one thing, and it's

hard to give you sort of like techniques based on that because,

again, every use case is very different, but it's very clearly it's a

prompt them differently. I think also talking about the idea of fine

tuning a prompting model will be very interesting is fine tuning a

model that takes an Open Xai prompt and converts it to the idealized

version of a Claude prompt and vice versa. I mean, I think that could

be very powerful because there are ways to sort of intuit your way

there.

81:29 (Speaker D) It's just hard to sort of distill into a set of rules. One thing I

found actually quite interestingly with Quad two is that it is

insanely resistant to sort of like jailbreak attacks. So I was able

to get it to do it.

81:44 (Speaker D) Turns out the stupidest method worked. It was sort of like modifying

that dan prop that's been going around like reddit but the more

nuanced sort of like complex methods that typically work with OpenAI

they didn't. So I think the model is just qualitatively different.

81:56 (Speaker D) I think it's going to take some time to fully explore it and

understand why and how still super early days.

82:06 (Speaker A) I love the fact that all of us are getting an intuition about

different models and how to approach them right. And that's like

Sweet was here before. This is like a specialization of what I think

he talked about as an AI engineer. We're getting to start to

understand the differences between those to the little fine little

things that you can say.

82:11 (Speaker A) And I think it will be very interesting if you have a model that's

trained to actually convert them or translate them between the models

to work the same. I have an idea where not to get locked into the GPD

Four ecosystem with the functions. I have an idea of wrapping the GPD

Four API package with something.

82:47 (Speaker A) They will actually kind of print the functions into the context

because cloud now has a huge context window. And then try to see

whether or not cloud is able to kind of without additional tech,

without additional changes to the API to replicate the outputs of how

a GPT with functions would do. And that's going to be an idea I'll be

testing, hopefully, and talk about next week.

83:08 (Speaker A) Thanks, Matt.

83:10 (Speaker C) Today, there has been a thing today, maybe yesterday, but anyway,

today there have been a model that generates prompts. By the way, by

giving the data, you generate the prompt. I've written about it today

on Twitter. It is so powerful, it is such a cool method that you can

take whatever you have, like, I don't know, scientific papers and

generate instructions for them.

83:32 (Speaker C) Now you can fine tune a model that generate scientific papers. You

got jokes. Now you can train a model that become funny.

83:35 (Speaker C) You can generate the instruction, convert whatever you want into

instructions. Amazing it is today. One more thing about the deep in

the middle thing.

83:51 (Speaker C) I don't know why it happens. I have no idea how Open Xai trained

their models. But I think if you think about it, many missions, many

instructions, paragraph, and before the paragraph, you tell the

model, please summarize the following, or on the contrary, like a

paragraph and at the end, what was that? Something.

84:10 (Speaker C) So it makes a lot of sense that a model pays a lot of attention to

the beginning at the end, because of this. And on the same note, it's

very easy to fix. So I wouldn't just point fingers.

84:21 (Speaker C) It's good that they pointed it, but I think it's like, I don't know,

a couple of minutes of training, open AI, like, fine tune for a

minute and fix it.

84:28 (Speaker A) I just want to ask yum, yum. The the pin that I just tweet sorry, the

Tweet that I just pinned on top, this was the one that you talked

about, the instructions generation and the problem generation.

84:38 (Speaker C) Yeah.

84:39 (Speaker A) Awesome. So folks, definitely feel free to check this out. I haven't

seen this. You want to give a couple more words about that one.

84:44 (Speaker A) It looks like you wrote, like, a very deep dive. What's the model

like eleven B, three B?

84:54 (Speaker C) Sure. Two models put into the models, whatever you want. Okay, let's

go back. You got a data set of something, emails from your company,

for example, and you want a model that will help you write emails.

85:01 (Speaker C) Okay, you can start thinking about how to train this model, or you

can use this and now generate a text that basically says, help me

write the following email to this following person of something

something and the actual email. And all of a sudden, you have a model

that is extremely you have a data set to train a model or to fuselage

or whatever that is extremely tuned to this. So I think it's a very

cool technique.

85:40 (Speaker C) It's very powerful, has a lot of potential. And the trick, in simple

words, is training the model. What not to say? That's the missing

piece here, that they added the trick.

85:51 (Speaker C) They took instructions and outputs that do not fit just a different

random output from the data and train with a different laws. That the

model should not say this because this input does not with that

instruction, does not result in this output. That's it.

86:11 (Speaker C) That's the trick. And it works perfectly and really cool.

86:17 (Speaker A) Awesome. I have some folks who want to come up and ask questions. I

think we're almost there in terms of the updates. I will just briefly

run to some updates.

86:18 (Speaker A) I don't even have time to go and look for the threads, but if you're

not following Rama CPP, follow gerga is one of the groups that we

have in the States. I think he single handedly is in charge of so

many folks trying to get a MacBook, because it's incredible how much

performance they've been able to squeeze out of Llama. And it's

comparatives.

86:49 (Speaker A) And many people just, like, quantize their models, basically make

them smaller to run on this GGML platform that they have. The recent

news that I have from over there, there's like two pieces of news.

Last week, for those of us who were here last week, we talked about

CFG.

86:58 (Speaker A) I forgot something. I forgot the guidance scale. And we talked about

the CFG parameter moving from diffusion models that we know.

87:17 (Speaker A) Like, in stable diffusion, you can define how close to your prompt

should the model generate the image. Somebody decided, I think, an

illusion reaction. Somebody said, hey, can we have this control of

CFG to our LLM generation? CFG is a classifying guidance scale,

something like that.

87:37 (Speaker A) And they did it. The Chad GGR added this to Llama CPP. And so now you

can actually kind of pass a CFG control and fine tune.

87:48 (Speaker A) It's almost like a running fine tune to an extent. You can test the

model to be closer, farther away from the problem that you have.

Contrasting this with the stuff that we have on a GPD, four API,

which is temperature.

88:01 (Speaker A) And I think, Matt, you mentioned something to logic bias, logged

bias, something like that, right? Where you can ask it not to say

certain things. So contrasting CFG, it's like a different beast that

we now have a different control. And so GGML just merged into their

platform.

88:18 (Speaker A) Definitely worth checking out. And the second thing is, I need to

find the Tweet. Yesterday, Georgia was like, oh, yeah, by the way,

here's the 48% inference speed improval that somebody just merged in.

88:30 (Speaker A) Have you guys play and try this. For the 33 billion parameter model

of Llama, somebody just merged in a 50% increase on inference speed

just on the way. And I find this incredible because Gmail already

runs many stuff on Raspberry Pi or whatever, iPhones, and now

somebody's like, oh, yeah, here's a 50% increase in infinite speed.

88:41 (Speaker A) And then I think Nissan was here before he was talking about GGML

runs on the iPhone, because iPhones, even from three years ago, have

the same neuron chip that like the latest Max or some such, and that

this performance boost on GGML also applies to iPhones as well. So,

incredible stuff. And as we hear every week, we keep seeing leaps,

incredible leaps in speed and performance.

89:15 (Speaker A) Definitely worth checking out GGML and the five folks that work on

those stuff. GML comments, folks who use Llama, CCP, feel free to hop

up and raise your hand and give us more updates from that length. I

denied it.

89:28 (Speaker A) You are gay at the spaces, but sometimes as a guest as well. Other

than that, I think we'll move on to some more updates and then we

just have questions. No? Cool.

89:41 (Speaker A) So the next update that I have is from the diffusion side that we

sometimes cover. We don't cover it often, but we do cover it from

sometimes time to time. So two things from stability stable

diffusion.

89:46 (Speaker A) We talked about Sdxl, the new Excel model that can generate 1024

images. We've talked about last week about the 0.9 weights dropping.

90:01 (Speaker A) Sdxl 1.0 is now available in the Stable Diffusion discord. If you've

played with Me Journey before and you looked at Stable Diffusion,

it's like, it's not that great.

90:05 (Speaker A) Stable diffusion sdxl one is really impressive. And besides being

really impressive, they plan to release this open source. So we're

going to see a bunch of folks fine tune loras and specific versions

of the specific things.

90:16 (Speaker A) And I think it's like, incredible. If you want to play with those

models and you haven't yet, go to Stable Diffusion discord and hit up

that bot and then Netflix let us know how incredibly different that

is. And we're waiting for the wait for the Sdxl 1.

90:47 (Speaker A) 0 to drop. And I will mention this every day until the year mark.

It's been less than a year since table Diffusion.

90:57 (Speaker A) It's been less than a year. I remember I think it was August 22 when

they actually dropped the full open source model. Less than a year.

91:12 (Speaker A) And we've seen just such incredible progress. So, like Matt said

before, it's really hard to keep up, but it's also really hard to

internalize how far, just how far we're coming with those incredible

leaps and changes every week. And again, to just plug in this

Thursday I space.

91:21 (Speaker A) This is why we're here. Every thursdai talking about everything and

everything that's changed and updated. And the other thing that I

want to I see art in the audience with apart.

91:28 (Speaker A) If you play the list, the Excel, feel free to raise your hand to come

up. The other thing that they released, I don't know if you guys

familiar with Clip Drop. So Stable Diffusion bought Clip Drop as a

company and started implementing that interface compared to their

Dream Studio interface.

91:49 (Speaker A) So ClipDrop is like a way simpler interface day to day release,

something called Stable Doodle. Stable Doodle is I don't know if

folks in the audience remember this. Meme how to draw an owl.

91:51 (Speaker A) Step one, draw a circle. Step two, draw some eyes. And step three is

like, draw the rest of the f*****g owl.

92:06 (Speaker A) And then you have, like, a beautiful owl painting at the end of this.

This is now the go to test on how the Doodle models work. And I

pinned my attempt at this, but definitely check out ClipDrop Doodle

thing.

It's really fun to play with. So those are, like, the updates from

the diffusion world.

92:10 (Speaker D) Hey, real quick. I was just looking at the repository for Comfy UI,

and then I saw that I don't know how to say his name. Scousekip is in

here. So I just wanted to come on and say, like, hey, this is

incredible.

92:24 (Speaker D) This is what we've been talking about for months now, right? This

node based character codex, if you will, of like there's just

infinite possibilities. I just want to listen, but thanks.

92:35 (Speaker A) For bringing me up.

92:36 (Speaker D) This is really cool, man. I was just thanks for bringing up Comfy UI.

92:42 (Speaker A) I feel guilt at not being up to date on every single possible thing.

I know it's impossible. I really try, and Comfy I has been on my list

to try, but then Quad was released and Code Interpreter was released.

Comfy I seems like the thing we want, man.

92:42 (Speaker A) I think stabilization when they tried to bring up Dream Studio, they

talked about, like, a node based thing where you can pipe models to

other models, you can find filters, et cetera. Comfy UI for folks who

have tested it out, it looks like that's it. And I definitely want to

agree with Art.

93:16 (Speaker A) It's something to watch out and maybe try because automatic one on

one, even though it's, like, super advanced and has been there for a

beginning since Stable Diffusion, it's just like a s**t show of a UX.

Just like horrible, horrible. I'm sorry, guys.

93:30 (Speaker A) I've built a web UI before automatic. It's really hard to get Gradio

to play as much as you want. It's really hard to maintain a good UX

product with many, many people contributing, with many, many things

are changing under your feet.

93:45 (Speaker A) So it's really not their fault, but it's a s**t show to get started

with. And Comfy UI seems like a fresh, clean start. So definitely if

you're playing with this, test this out and let us know.

93:55 (Speaker A) Max, you have your hand raised and you play with the Excel. Give us

some of your thoughts.

94:01 (Speaker I) Yeah, I have played through the website in a studio, so I'm lately

working with a company that make toys for kids. They want to start

incorporating AI. And one of my concerns we're working with them is

like, okay, we want to generate images for kids. Something that is

going to probably freak them out is two things that diffusion models

have been lacking.

94:27 (Speaker I) One is the ability of painting things like complicated shapes or

intricate shapes like hands. SD. Excel is not better at it.

94:40 (Speaker I) Another one is this concept of what is named like concept bleeding,

which is this diffusion model tends to mix objects that are similar

in shape or form is not good at it, neither. Now, I was reading the

paper from Stability or the report. They claim they are outperforming

Mid Journey in five of seven categories now, mid Journey 5.

1, right?

95:12 (Speaker A) Just to make sure. Mid Journey since then released the new version

also because we're in same pace, but yeah, they've compared to Mid

Journey 5.1. Yeah.

95:20 (Speaker I) Well, now this is a report internal released by Stability. It's a

paper, it might have some credibility, I don't know. I like the

results. It's very close to me journey, but I think there is still

one or two steps behind, in my opinion.

95:36 (Speaker I) What is different is what you have mentioned, Alex. Once they release

the weight and we can see Lotus about this, I'm expecting to see the

results that we can get because probably that is what is going to

position this model like a step above Mid Journey, but not yet. This

is my opinion.

95:58 (Speaker A) Yeah, definitely. And thanks for that. And I love folks coming up and

sharing their opinion about these things. I will say on the top.

96:05 (Speaker A) Thanks Mike. Or I guess I know you're a new name, but I'm not sure if

I can if I should.

96:10 (Speaker I) Yeah, totally, totally have it, in my view. I'm Juan Spanish, living

in Mexico and I like these things.

96:17 (Speaker A) We appreciate you coming up here on the topic of UIs that we've

mentioned with somebody or somebody folks released Pinocchio. They

call this the AI browser. And I want to highlight this because I want

to give you practical tips. Janae, I think, is coming in with some

breaking news.

96:28 (Speaker A) I don't know if Janae wants to come up or can, but if you can, feel

free to come up and tell us there's some news from Bard. Until we

talk about Bard, the topic of UIs for those things, and you guys know

we're mostly focused on the LLM side and the Engineer side. Less than

there's a fusion, but we sometimes have love for both the above tool

that you can download and not deal with the terminal, not deal with

the bunch of stuff, unifies all of them.

97:08 (Speaker A) It's really nice. Check out the Nokio AI browser. I think it's open

source.

97:12 (Speaker A) You download this once, it's cross platform, Mac, PC, et cetera, and

then you're able to download Llama CPP, and then you're able to also

download table diffusion. And then fairly quickly, without knowing

how to code, without going through the terminal, without installing

packages, folks here know that installing the packages is like a

whole pain we all share and we all hate without doing all of that.

That's the promise that they have, you are able to pipe Llama outputs

into stable diffusion.

97:38 (Speaker A) So Yam previously mentioned kind of the model that can do, and Yam

and Method are talking about a method of generating prompts for LLMs,

but also we know that there's models prompts to actually generate

prompts for diffusions and they're trained on different and fine

tuned on different ways to generate diffusion prompts. Right, and

this Pinocchio browser is actually allowing you to run like an and

then pipe the output into stabilization model and then see the output

of that. I think it's incredible that this exists and is

downloadable.

98:07 (Speaker A) I haven't tried this yet. If you in the audience or somebody on stage

have tried Pinocchio, please raise your hand. I want to bring you up

and talk about Pinocchio and your experience with this.

98:19 (Speaker A) And if we haven't, I want to bring this to our attention so that next

week we're able to talk about this. This is added to my list of

things I like. The Comfy UI that I haven't tried it yet.

98:29 (Speaker A) Anybody use pinocchio yet? No? Cool. I wanted to get Cocktail Peanut.

The guy who wrote Cocktail Peanut.

98:36 (Speaker A) If you're in the audience, feel free to raise your hand. I don't

think you are, but feel free to follow the thread. He goes fairly

deep.

98:44 (Speaker A) And feel free to use and try Pinocchio by next week and then come up

next week and talk about the differences between this and running

automatic one one. All right, folks, thanks everyone for coming to

another Thursday. I space.

98:58 (Speaker A) Hope this has been helpful for a bunch of you. We tried a few new

things here. We tried to give updates, but also deep dive into a

conversation with Matt and looks from the reactions here that maybe

this is worth putting down on paper and sending out an email for

those of you who want to maybe sign up for this and not don't have

the time to listen to two hour spaces, so I'll definitely try at

least to do that.

99:19 (Speaker A) I want to thank a few folks on stage that have joined consistently

and providing a lot of signal yum follow Yam. He has great insights

into models and training and different things al in the audience.

Thanks always for coming up.

99:33 (Speaker A) Junaid is running the Denver meetup, and if you're in the Denver

area, feel free to join us next week. Thanks for coming. Haven't seen

you in a while, buddy.

99:45 (Speaker A) Juan sorry. Yeah, I think Juan great. Maxi and Lentos has recently

been joining us.

99:51 (Speaker A) It's been great. We have some more folks in the Evans who are

regulars, and we invite you to also be regulars and come up and talk

about Thursday. I will say this one thing, tag me in anything that's

new.

100:01 (Speaker A) I would love that. And help promote the message for other folks. If

you did like the space, this also really helps for more folks to get

to the bottom of this for those folks.

100:01 (Speaker A) I didn't get to their questions. I apologize. I'm trying to keep this

as a balance of a high signal thing versus letting everybody

questions as well.

100:22 (Speaker A) Last thing I'll say is about myself, a little bit consultant. I stay

up to date so you don't have to. That's my tagline.

100:29 (Speaker A) If you're in the company and needs consultancy for somebody who's up

to date on everything, I try to be that guy. Feel free to tap me in

the DMs. And, yeah, thursdai folks, keep tagging us everything that's

new. We're going to try to cover next week with that.

100:34 (Speaker A) I thank all of you. Thanks for coming. Thanks for giving us two and a

half hours of your attention.

100:34 (Speaker A) I really appreciate it. Attention is sparse and very important, and I

really thank everybody who gave us, like, two and a half hours. Thank

you, folks.

101:00 (Speaker A) Hey, Alex, we really appreciate you.

101:04 (Speaker B) Thanks, Alex.

101:05 (Speaker H) Thanks for doing a good space and keeping us on track, actually.

101:09 (Speaker A) Yeah, thank you.

101:10 (Speaker D) Yeah, alex definitely want to kind of.

101:13 (Speaker A) Give our thanks to you as well.

101:15 (Speaker E) For curating an awesome space.

101:17 (Speaker D) I think I'm definitely not the only one that gets a lot of good

signal out of this. And I know a lot of hard work goes into keeping

yourself up to.

101:27 (Speaker A) Date so that you can share it.

101:28 (Speaker E) With all of us.

101:29 (Speaker D) So just on my own behalf, thank you. And I'm sure that is echoed by.

101:34 (Speaker E) A lot of people on stage and in the audience.

101:36 (Speaker A) Humble man thank you. I appreciate you. Thank you, folks. Have a nice

Thursday and bye next week.



This is a public episode. If you’d like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe
Fri, 14 Jul 2023 02:33:41 GMT
-
-
(基於 PinQueue 指標)
0 則留言