Tesla Dojo - The D1 Chip
Tesla Dojo, What is Tesla Dojo?
Tesla’s Dojo,
a processor designed for AI training, is now being mass-produced and will be
used soon. It will be the first chip in the world to use TSMC’s System on Wafer
(SoW) method.
SoW lets us put many chips
on a single 300mm wafer. This includes logic chips, SoIC-compliant packages,
HBM, and other chips. This makes it great for high-performance computing and AI
uses.
SoW comes from TSMC’s
earlier InFO_SoW, which itself came from integrated fan-out technology.
Tesla’s Dojo D1 chip was
the first to use InFO_SoW. TSMC says that their 1.6-nanometer process, compared
to their N2P process, makes the A16 chip 1.1 times denser, 8-10% faster at the
same voltage, and uses 15-20% less power at the same speed.
Tesla Dojo is a supercomputer designed by Tesla for a very specific purpose: to improve its Full Self-Driving (FSD) advanced driver-assistance system. It went into production in July 2023 and its goal is to efficiently process millions of terabytes of video data captured from real-life driving situations from Tesla’s 4+million car
The Unique
Architecture of Tesla Dojo
The architecture of Tesla
Dojo is what sets it apart. It’s built around a custom chip called the D1,
designed to be highly efficient for machine learning workloads. Each D1 chip
has a dedicated CPU, local memory, and communication interface, allowing each
chip to operate as a full-fledged computer.
Scalability
Scalability is the defining
goal of Dojo. Tesla has de-emphasized several mechanisms that you find in
typical CPUs, like coherency, virtual memory, and global lookup directories
because these mechanisms do not scale very well. Instead, they have relied on a
very fast and very distributed SRAM (static random-access memory) storage
throughout the mesh.
High
Throughput
At a high level, Dojo is an
eight wide core with four way SMT, running at a conservative 2 GHz. It features
a CPU-style pipeline, making it more tolerant of different algorithms and
branchy code than something like a GPU.
Custom ISA
Dojo’s instruction set
supports 64-bit scalar instructions and 64 B SIMD instructions. It includes primitives
to deal with transferring data from local memory to remote memories and has
support for semaphore and barrier constraints. This is necessary to get memory
operations in line with instructions running not just within a D1 core, but
across collections of D1 cores.
No Direct
System Memory Access & Lack of Virtual Memory Support
Code running on Dojo cannot
directly access system memory. Instead, applications are expected to work
mainly out of a small pool of local SRAM. This local SRAM is managed by software,
and doesn’t work as a cache. If data from main memory is needed, it has to be
brought in using DMA operations. Also, Dojo lacks virtual memory support, which
makes multitasking very difficult.
This unique architecture
allows Tesla to make trade-offs that more general architectures can’t make,
optimizing for their specific use case of training machine learning models.
Conclusion
Tesla Dojo is a fascinating
piece of technology that showcases how custom hardware can significantly
improve machine learning workloads. Its unique architecture is a testament to
Tesla’s innovative approach to problem-solving.
Comments
Post a Comment