The source code for this blog is available on GitHub.

Planet GenAI Blog.

GenAI Weekly News Update 2024-10-15

News Update

Research Update

Cover Image for GenAI Weekly News Update 2024-10-15

The weekly news highlights significant advancements in AI, including NVIDIA's release of the Llama-3.1-Nemotron-70B instruct model, which surpasses competitors in benchmarks, and Perplexity's new features for enhanced research capabilities. Mistral AI introduced edge models for privacy-focused applications, while Adobe MAX 2024 showcased AI enhancements in Photoshop and Illustrator. Additionally, HeyGen launched interactive AI avatars for Zoom, and Sequoia discussed the evolution of generative AI towards deeper reasoning. Andreessen Horowitz explored the concept of personalized AI companions, and recent research focused on improving long-context retrieval augmented generation in AI models.

AI Product Update

Nvidia Open Source Llama-3.1 Nemotron 70B instruct

NVIDIA has unveiled the Llama-3.1-Nemotron-70B-Instruct, an open-source large language model (LLM) developed in collaboration with Meta. The model, which features 70 billion parameters, has outperformed OpenAI's GPT-4o and Anthropic's Claude 3.5 Sonnet in several AI benchmarks, including the Arena Hard benchmark. It is specifically designed to refine AI responses to align more closely with human preferences, focusing on factual accuracy and coherent problem-solving.

NVIDIA's new LLM uses a technique called SteerLM Regression Reward Modeling to improve the quality of generated responses. the SteerLM Regression Reward Modelling involves defining a reward function that guides the LLM's learning process by using regression models to refine datasets to generate a clearer response. This makes data quality and model complexity much more refined, ultimately allowing NVIDIA to generate responses close to the user's requirements.

ModelArena HardAlpacaEvalMT-BenchMean Response Length
Details(95% CI)2 LC (SE)(GPT-4-Turbo)(# of Characters for MT-Bench)
Llama-3.1-Nemotron-70B-Instruct85.0 (-1.5, 1.5)57.6 (1.65)8.982199.8
Llama-3.1-70B-Instruct55.7 (-2.9, 2.7)38.1 (0.90)8.221728.6
Llama-3.1-405B-Instruct69.3 (-2.4, 2.2)39.3 (1.43)8.491664.7
Claude-3-5-Sonnet-2024062079.2 (-1.9, 1.7)52.4 (1.47)8.811619.9
GPT-4o-2024-05-1379.3 (-2.1, 2.0)57.5 (1.47)8.741752.2

Perplexity introduce Space and Internal Knowledge Search

Perplexity has introduced two major features: Internal Knowledge Search and Spaces. Internal Knowledge Search allows Pro and Enterprise users to search across both public web content and internal files, providing a unified platform for comprehensive research. Spaces serve as customizable AI-powered collaboration hubs, enabling teams to organize research, connect files, and work with tailored AI assistants. These new features enhance productivity by combining internal and external data sources, ensuring privacy, and supporting collaboration. Enterprise customers like NVIDIA and Dell are already utilizing these tools to boost their workflow efficiency.

Mistral AI Introduces Edge Models: Ministral 3B and 8B for Privacy-First Applications

Mistral AI has unveiled two new edge AI models, Ministral 3B and Ministral 8B, designed for local and on-device applications. With up to 128k context length, these models focus on low latency and privacy-first inference, ideal for internet-less assistants, analytics, and robotics. Mistral aims to empower users from developers to enterprises by enhancing capabilities in knowledge, reasoning, and efficiency without cloud dependency.

The models exceed most similarly sized models in the evaluation metrics.

Generative Image Dynamics Generative Image Dynamics

Adobe MAX 2024: AI-Powered Enhancements Transform Photoshop and Illustrator

Adobe MAX 2024 revealed major updates for Photoshop and Illustrator, introducing AI-driven tools like Distraction Removal and Generative Workspace in Photoshop and Enhanced Image Trace in Illustrator. These features aim to boost creativity and streamline workflows for designers, enhancing efficiency and ease of use. Both applications now also support 3D design integration, expanding the creative possibilities for users.

  • Photoshop:
    • Distraction Removal: AI tool for removing unwanted objects.
    • Generative Workspace: Enhanced features for seamless AI-assisted editing.
  • Illustrator:
    • Objects on Path: New tool for arranging objects along a path.
    • Enhanced Image Trace: Improved precision for converting raster to vector.
  • Both Applications:
    • 3D Integration: New features for combining 2D and 3D designs.

HeyGen Introduces Interactive AI Avatars for Zoom Meetings

HeyGen has launched a feature allowing users to send their AI-powered interactive avatars to Zoom meetings. These avatars can participate in multiple meetings simultaneously, responding intelligently using Open AI Realtime Voice integration. The avatars can be customized with specific personas, making them ideal for roles like online coaching, customer support, and repetitive sales calls. This innovation aims to improve productivity by enabling users to be virtually present in many places at once.

AI Company Update

Sequoia: Generative AI Enters the Agentic Reasoning Era

  • System 1 to System 2 Transition: Generative AI is evolving from immediate, pre-trained responses (System 1) to deeper, inference-time reasoning (System 2).
  • AlphaGo Comparison: This shift mirrors how AlphaGo used deliberate reasoning to achieve success in Go.
  • Rise of Agentic Applications: New capabilities lead to agent-based tools in sectors like legal, medical, and software development.
  • Expanded Productivity: With enhanced reasoning, AI's impact on productivity and task automation is significantly growing.
  • Industry Integration: Sequoia sees this as reshaping AI's integration into workflows across various domains. Generative Image Dynamics

Andreessen Horowitz Explores 'Exporting Your Brain' with AI Tools

Justine Moore from Andreessen Horowitz explores the concept of "exporting your brain" by using AI like ChatGPT to capture and synthesize thoughts, aiding in productivity, creativity, and communication. Moore shares how she uses AI tools to maintain a daily journal, draft content, and simulate conversations for better self-understanding. This approach not only enhances personal reflection but also lays the foundation for personalized AI companions that assist with complex problem-solving, content generation, and even interpersonal advice

He also mention his vision for what’s next:

  • Long-term AI Companions: AI companions tailored to each individual that can assist with day-to-day decision-making and creative tasks.
  • Expanding Capabilities: AI helping users beyond just productivity by understanding their preferences and behaviors for deeper personalization.

And Product Vision:

  • Personalized Assistance: Envisioned AI products include personal assistants that know your workflow, preferences, and style.
  • AI for Life Integration: AI tools integrated into daily activities for everything from brainstorming new ideas to organizing work-life schedules effectively.

Some early product he mentioned:

  • Dot: A personal AI companion that serves as a "living history," with infinite memory of conversations, helping users recall details.
  • Limitless: An AI assistant with apps and a wearable pendant to capture everything a user says, sees, or hears, making all information searchable.
  • Delphi: Enables well-known figures to create "AI clones" of themselves for fans to interact with, scaling their presence and enabling conversations that would otherwise be impossible.

AI Research Update

Inference Scaling for Long-Context Retrieval Augmented Generation

Research Background

  • Long-context LLMs: Long-context large language models (LLMs) are designed to process extended input sequences, improving performance on various tasks by utilizing extensive context (e.g., Gemini 1.5 Pro with up to 2M tokens)
  • Challenges in RAG: Previous studies on retrieval augmented generation (RAG) have focused on increasing the quantity of retrieved knowledge, but limitations exist in effectively locating relevant information in ultra-long sequences
  • Performance Plateau: Retrieving beyond certain thresholds (e.g., top-10 documents) can lead to performance plateaus or declines due to increased noise in the context

Contribution

  • New Strategies: The paper introduces two strategies for scaling inference computation in RAG:
    • Demonstration-based RAG (DRAG): Utilizes multiple RAG examples as demonstrations to leverage long-context capabilities
    • Iterative Demonstration-based RAG (IterDRAG): Decomposes input queries into simpler sub-queries, allowing for iterative retrieval and generation, thus bridging the compositionality gap for multi-hop queries.
  • Effective Context Length: The effective context length is defined as the total number of input tokens across all iterations before generating the final answer. This is crucial for understanding how RAG performance scales with inference computation.
  • Performance Scaling: The paper demonstrates an almost linear relationship between RAG performance and the scale of effective context length through extensive experiments on benchmark QA datasets.

Generative Image Dynamics Generative Image Dynamics

Conclusion and Analysis

  • Inference Scaling Laws: The study derives inference scaling laws for RAG, indicating that performance improves with the expansion of effective context length under optimal configuration.
  • Optimal Configurations: The computation allocation model allows for predicting optimal inference parameters that maximize performance across different RAG tasks, providing practical guidance for optimal computation allocation.