The source code for this blog is available on GitHub.

Planet GenAI Blog.

GenAI Weekly News Update 2024-07-08

News Update

Research Update

Cover Image for GenAI Weekly News Update 2024-07-08

Significant funding rounds were announced for German defense AI firm Helsing, robotics startup Skild AI, smart city startup Hayden AI, and generative AI platform Fireworks AI. AMD plans to acquire Silo AI to bolster its enterprise AI solutions. FlashAttention-3 was published, aiming to accelerate H100 GPU performance. Additionally, Saining Xie's team released new research on scaling up 3D Gaussian Splatting training.

AI Company Update

German defence AI Helsing raised \€450 million in Series C

The funding round was led by General Catalyst, known for backing the HR tech unicorn Factorial. Other participants included Elad Gil, Accel, Saab, Lightspeed, Plural, and Greenoaks. According to techfundingnews the post-money valuation reached $5.4 billion.

Robot Startup Skild AI raised $300M series A

The funding round is led by Coatue, Lightspeed Venture Partners, SoftBank Group, and Jeff Bezos, valuing the Pittsburgh-based startup at $1.5 billion. Unlike typical robotics companies, this startup focuses on developing robot brains. The idea is that these brain models can be applied to various robots and tasks, rather than being limited to a single application. This strategy has garnered support from several high-profile investors.

Smart city startup Hayden AI raised $90 million

The funding round is led by TPG. The AI platform, which offers geospatial analytics, transit management, and technology services, primarily addresses problems for governments and businesses. While the platform utilizes cameras mounted on buses to detect illegal parking and moving violations, its capabilities extend beyond that. It employs geospatial data collection sensor systems to provide cities with insights to enhance traffic safety and accessibility. The platform can detect and predict traffic congestion, improve transportation networks, and more.

Fireworks AI receive $52 million series B from NVIDIA and Sequoia

The investment round, led by Sequoia Capital with participation from NVIDIA, AMD, and MongoDB, values the company at $552 million. Founded by the former head of PyTorch at Meta, the company offers a generative AI platform as a service. It focuses on optimizing for rapid product iteration while minimizing the cost to serve.

AMD will Acquire Silo AI

Silo AI is Europe's largest private AI lab, for approximately $665 million. This acquisition aims to enhance AMD's enterprise AI solutions globally by leveraging Silo AI's expertise in developing tailored AI models and platforms. The Silo AI team will join the AMD Artificial Intelligence Group, continuing under the leadership of co-founder Peter Sarlin. This move is part of AMD's broader strategy to expand its AI capabilities and support its global customer base with advanced AI solutions.

Research Update

Learning to (Learn at Test Time): RNNs with Expressive Hidden States

The artical proposed a new class of sequence modeling layer called Test-Time Training (TTT) layers. TTT is a network layer, replacing the hidden state of an RNN (aka. feature vector) with a linear model or a small neural network. It can be a simple replacement for the self-attention layer in Transformer. Generative Image Dynamics

New layer reduce the quandratic comlexity of attention layer to linearty comparing to Transformer, and better long context performance comparing to RNN.

FlashAttention-3

One year after the release of FlashAttention 2, the research team has published the third version. Compared to version 2, version 3 focuses on the H100 GPU, increasing its utilization from 35% to 75% (achieving 740 TFLOPs/s in FP16). With FP8, it reaches nearly 1.2 PFLOPs/s and has 2.6 times lower numerical error than the baseline FP8 attention.

The three major improvements include leveraging the asynchrony of the Tensor Cores and TMA to (1) overlap computation and data movement through warp-specialization, (2) interleave block-wise matrix multiplication and softmax operations, and (3) implement block quantization and incoherent processing that takes advantage of hardware support for FP8 low precision.

On Scaling Up 3D Gaussian Splatting Training

Generative Image Dynamics image from Saining Xie's X post

he team has developed a solution for 3DGS training on multiple GPUs. Previously, 3DGS training was limited to a single GPU. The paper introduces Grendel, a distributed system designed to partition 3DGS parameters and parallelize computation across multiple GPUs. It uses sparse all-to-all communication to transfer Gaussians to pixel partitions and perform dynamic load balancing. Additionally, it supports batched training with multiple views. On the Rubble dataset, the system achieves a test PSNR of 27.28 by distributing 40.4 million Gaussians across 16 GPUs, compared to a PSNR of 26.28 using 11.2 million Gaussians on a single GPU.

Generative Image Dynamics