Quantifying the performance of the TPU, our
first machine learning chip
By Norm Jouppi, Distinguished Hardware
April 06, 2017
been using compute-intensive machine learning in our products for
the past 15 years. We use it so much that we even designed an
entirely new class of custom machine learning accelerator, the
Tensor Processing Unit.
Just how fast is the TPU, actually? Today, in conjunction with a
TPU talk for a National
Academy of Engineering meeting at the Computer History Museum in
a study that
shares new details on these custom chips, which have been running
machine learning applications in our data centers since 2015. This
first generation of TPUs targeted inference (the use of an already
trained model, as opposed to the training phase of a model, which
has somewhat different characteristics), and here are some of the
results we’ve seen:
- On our
production AI workloads that utilize neural network inference,
the TPU is 15x to 30x faster than contemporary GPUs and CPUs.
- The TPU also
achieves much better energy efficiency than conventional chips,
achieving 30x to 80x improvement in TOPS/Watt measure (tera-operations
[trillion or 1012 operations] of computation per Watt
of energy consumed).
- The neural
networks powering these applications require a surprisingly
small amount of code: just 100 to 1500 lines. The code is based
our popular open-source machine learning framework.
- More than 70
authors contributed to this report. It really does take a
village to design, verify, implement and deploy the hardware and
software of a system like this.
The need for TPUs really
emerged about six years ago, when we started using computationally
expensive deep learning models in more and more places throughout
our products. The computational expense of using these models had us
worried. If we considered a scenario where people use Google voice
search for just three minutes a day and we ran deep neural nets for
our speech recognition system on the processing units we were using,
we would have had to double the number of Google data centers!
allow us to make predictions very quickly, and enable products that
respond in fractions of a second. TPUs are behind every search
query; they power accurate vision models that underlie products like
Google Image Search, Google Photos and the Google Cloud Vision API;
they underpin the
improvements that Google Translate
rolled out last year; and they were instrumental in
victory over Lee Sedol,
the first instance of a computer defeating a world champion in the
ancient game of Go.
We’re committed to building the best infrastructure and sharing
those benefits with everyone. We look forward to sharing more
updates in the coming weeks and months.