Tesla Dojo - The D1 Chip

May 27, 2024

Tesla Dojo, What is Tesla Dojo?

Tesla’s Dojo, a processor designed for AI training, is now being mass-produced and will be used soon. It will be the first chip in the world to use TSMC’s System on Wafer (SoW) method.

SoW lets us put many chips on a single 300mm wafer. This includes logic chips, SoIC-compliant packages, HBM, and other chips. This makes it great for high-performance computing and AI uses.

SoW comes from TSMC’s earlier InFO_SoW, which itself came from integrated fan-out technology.

Tesla’s Dojo D1 chip was the first to use InFO_SoW. TSMC says that their 1.6-nanometer process, compared to their N2P process, makes the A16 chip 1.1 times denser, 8-10% faster at the same voltage, and uses 15-20% less power at the same speed.

Tesla Dojo is a supercomputer designed by Tesla for a very specific purpose: to improve its Full Self-Driving (FSD) advanced driver-assistance system. It went into production in July 2023 and its goal is to efficiently process millions of terabytes of video data captured from real-life driving situations from Tesla’s 4+million car

The Unique Architecture of Tesla Dojo

The architecture of Tesla Dojo is what sets it apart. It’s built around a custom chip called the D1, designed to be highly efficient for machine learning workloads. Each D1 chip has a dedicated CPU, local memory, and communication interface, allowing each chip to operate as a full-fledged computer.

Scalability

Scalability is the defining goal of Dojo. Tesla has de-emphasized several mechanisms that you find in typical CPUs, like coherency, virtual memory, and global lookup directories because these mechanisms do not scale very well. Instead, they have relied on a very fast and very distributed SRAM (static random-access memory) storage throughout the mesh.

High Throughput

At a high level, Dojo is an eight wide core with four way SMT, running at a conservative 2 GHz. It features a CPU-style pipeline, making it more tolerant of different algorithms and branchy code than something like a GPU.

Custom ISA

Dojo’s instruction set supports 64-bit scalar instructions and 64 B SIMD instructions. It includes primitives to deal with transferring data from local memory to remote memories and has support for semaphore and barrier constraints. This is necessary to get memory operations in line with instructions running not just within a D1 core, but across collections of D1 cores.

No Direct System Memory Access & Lack of Virtual Memory Support

Code running on Dojo cannot directly access system memory. Instead, applications are expected to work mainly out of a small pool of local SRAM. This local SRAM is managed by software, and doesn’t work as a cache. If data from main memory is needed, it has to be brought in using DMA operations. Also, Dojo lacks virtual memory support, which makes multitasking very difficult.

This unique architecture allows Tesla to make trade-offs that more general architectures can’t make, optimizing for their specific use case of training machine learning models.

Conclusion

Tesla Dojo is a fascinating piece of technology that showcases how custom hardware can significantly improve machine learning workloads. Its unique architecture is a testament to Tesla’s innovative approach to problem-solving.

Sources: x.com, wikipedia.org, HotChips_tesla_dojo_uarch.pdf

World.Tech.Stuff