The source code for this blog is available on GitHub.

Planet GenAI Blog.

GenAI Weekly News Update 2024-07-01

News Update

Research Update

Cover Image for GenAI Weekly News Update 2024-07-01

Kyutai launched Moshi, an alternative to GPT-4o voice model. Runway Gen-3 Alpha Text-to-Video Now Publicly Available. Suno Launched iOS app. Hebbia anounced $130 Million Series B. GraphRag on Github. MInference 1.0: Accelerating Pre-filling for Long-Context LLMs.

AI Product Launch

French AI developer Kyutai launched Moshi, an alternative to GPT-4o voice model

While users are still waiting for OpenAI's GPT-4o voice assistant, French AI lab Kyutai, backed by a substantial $300 million, has introduced Moshiu. The model enables real-time interaction through joint pre-training on a combination of text and audio, and operates with a two-channel I/O system, simultaneously generating text tokens and audio codecs.

Moshi's fine-tuning involved 100,000 synthetic "oral-style" conversations, converted using Text-to-Speech (TTS) technology.

Given the 7B model, capabilities seem pretty limited for now but the latency is pretty low.

Runway Gen-3 Alpha Text-to-Video Now Publicly Available

Runway has opened the text-to-video feature of its latest Gen-3 alpha model to the public. The community has responded positively, praising the Gen-3 for its improved details, enhanced control, and coherence in video production. However, the pricing is notably high, set at 10 credits per second (with 1 credit approximately equaling $0.01), which is about twice the cost of the Gen-2 model.

Suno Launched iOS app

Despite facing lawsuits from the RIAA, Suno has launched its iOS app. Using version 3.5, the app functions similarly to the website, where users enter a text prompt describing the desired song, and Suno generates it. The app also allows users to incorporate their own voice into the AI-generated music. Additionally, users can share songs, curate their prompts into a library, and discover music based on their mood.

Regarding pricing, free users receive 50 daily credits on the app's basic plan, which reset daily. The $10-per-month pro plan increases this to 2,500 daily credits, while a $30-per-month option offers 10,000 daily credits. Notably, Suno grants commercial rights to songs generated on paid plans, but retains ownership of songs created on the free plan.

AI Company Update

Hebbia anounced $130 Million Series B

Hebbia, an AI startup specializing in helping businesses analyze complex, multi-step data queries, has announced a $130 million Series B funding round. This round was led by A16z, with participation from Index Ventures, Google Ventures, and Peter Thiel. Initially reported late last month, the funding values the company at approximately $700 million, according to Bloomberg. Crunchbase reports that Hebbia's revenue has grown 15x in the past 18 months.

Research Update

GraphRag on Github

Microsoft introduced graphrag on Feb and open-sourced the project on GitHub, offering an easy-to-use API experience hosted on Azure that can be deployed code-free in just a few clicks. Experimental results demonstrate that GraphRAG, when using community summaries at any level of the community hierarchy, outperforms naive RAG in terms of comprehensiveness and diversity, with a win rate of approximately 70–80%.

Unlike regular RAG, GraphRAG can extract a rich knowledge graph from any collection of text documents. It creates a semantic structure of the data before any user queries by detecting "communities" of densely connected nodes in a hierarchical fashion, partitioning the graph at multiple levels from high-level themes to low-level topics. An LLM is used to summarize each of these communities, resulting in a hierarchical summary of the data. This provides an overview of a dataset without needing to know which questions to ask in advance.

MInference 1.0: Accelerating Pre-filling for Long-Context LLMs

Microsoft's research introduces a system designed to enhance the efficiency of processing long input sequences in LLMs. The pre-filling stage, which involves processing the input sequence, is computationally expensive and time-consuming due to the quadratic complexity of attention computation, resulting in significant delays in generating the first token. MInference leverages dynamic sparse attention mechanisms and identifies three patterns in long-context attention matrices: the A-shape, Vertical-Slash, and Block-Sparse. It determines the optimal pattern for each attention head offline and dynamically builds sparse indices based on the assigned pattern during inference. MInference effectively reduces inference latency by up to 10x for pre-filling on an A100, while maintaining accuracy.