GenAI Weekly News Update 2024-06-10
News Update
Research Update
Big week in model launching: Luma AI launched Dream Machine while Stability AI introduced Stable Diffusion 3. OpenAI's annualized revenue reached $3.4 billion, and mistral AI lands a new round of funding.
AI Model and Product Launch
Luma AI launched Sora-competitor Dream Machine
Luma AI has recently launched its new AI video generation model, Dream Machine, which is designed to create high-quality, realistic videos from text and images. This model utilizes a scalable, multimodal transformer architecture and has been trained directly on videos, enabling it to generate physically accurate, consistent, and action-packed scenes.
Dream Machine is accessible to the public for free during its beta phase. The model can generate up to 120 frames in 120 seconds, making it relatively fast for its capabilities. Early user uses it to animate static art and meme, with notable creations The Girl with the Pearl Earring and "living" Doge going viral on Twitter, signaling a promising start for the model.
Stability AI launched Stable Diffusion 3
Stable Diffusion 3, the latest text-to-image model from Stability AI, has been announced with significant advancements and is currently available for early preview. Here are some key highlights:
Advanced Architecture: SD3 introduces a new diffusion transformer architecture. This new design improves training speed, sampling efficiency, and overall output quality. It also includes enhancements in rendering text within images, addressing a historically challenging aspect of image synthesis models.
Performance and community reaction: The model ranges from 800 million to 8 billion parameters, allowing users to select versions that best fit their needs for scalability and image quality. However, it was found that the model has significant issues with rendering human figures specifically lying down, which has sparked discussions within the community.
Abacus AI partnered with Yann LeCunn's team announced a new LLM benchmark LiveBench AI
The benchmark contains a set of 18 diverse tasks across 6 categories, and featured new questions every month to limit potential contamination with existing dataset.
Leadboard can be found here. Some highlights:
- GPT-4o is ranked best, inches out GPT-4-turbo. However, GPT-4 does much better at reasoning and coding than GPT-4o.
- Claude Opus excels at data analysis and language understanding
- Gemini doesn't score as well as Claude or GPT-4 as it does on Lmsys.
- Qwen 72B is the best open-source model.
AI Company Update
OpenAI annualized revenue reaches $3.4B
OpenAI has achieved a significant milestone by reaching an annualized revenue of $3.4 billion. This marks a substantial increase from approximately $1 billion in summer 2023 and $1.6 billion in late 2023. The majority of this revenue, around $3.2 billion, comes from subscriptions to its chatbots, such as ChatGPT, and API fees, with additional contributions from enterprise partnerships and model sales.
OpenAI's rapid revenue growth can be attributed to the widespread adoption of its AI models across various industries and the increasing demand for AI-driven solutions. Despite facing challenges and competition, the company's focus on expanding its product offerings and enhancing the capabilities of its AI models has driven significant financial success.
Mistral AI lands huge $640M round with valuation surpass $6B
The funding round was led by General Catalyst and included participation from notable investors such as Andreessen Horowitz, Lightspeed, Bpifrance, BNP Paribas, Nvidia, Samsung, Salesforce, and IBM. This substantial investment reflects the company's rapid growth and its disruptive potential in the AI market, challenging established players like OpenAI and Meta.
The recent funding will be used to expand Mistral's compute capacity and team, scale its commercialization efforts internationally, and enhance its competitive edge in the global AI race. Mistral AI's commitment to open-source AI has garnered over 27 million downloads from public repositories, and its models are being used in various sectors, including finance, tech, and the public sector.
Research Update
Meta published paper on Pixel Transformer
Meta published "An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels" this week and challenges the common practice in Vision Transformers of treating patches as tokens. Instead, it demonstrates that Transformers can directly use individual pixels as tokens to achieve high performance in tasks like object classification, self-supervised learning, and image generation. This approach questions the necessity of locality inductive bias in modern vision architectures, showing that despite higher computational costs, treating pixels as tokens can be highly effective.
Discovering Preference Optimization Algorithms with and for Large Language Models
The author addresses the limitations of current preference optimization techniques used to enhance the quality of LLM outputs. Traditional methods rely on human-designed loss functions, which limit the exploration of potential optimizations. The authors propose using LLM-driven objective discovery to automatically find new preference optimization algorithms. They introduce DiscoPOP, a novel algorithm that adaptively blends logistic and exponential losses, demonstrating superior performance across various tasks compared to existing methods.