Announcing TensorFlow Fold: Deep
Learning With Dynamic Computation Graphs
By Google's Moshe Looks, Marcello
Herreshoff and DeLesley Hutchins, Software Engineers
February 09, 2017
In much of
machine learning, data used for training and inference
undergoes a preprocessing step, where multiple inputs (such
as images) are scaled to the same dimensions and stacked
into batches. This lets high-performance deep learning
run the same
across all the inputs in the batch in parallel. Batching
capabilities of modern GPUs and multi-core CPUs to speed up
execution. However, there are many problem domains where the
size and structure of the input data varies, such as
in natural language understanding,
in source code,
for web pages and more. In these cases, the different inputs
have different computation graphs that don't naturally batch
together, resulting in poor processor, memory, and cache
Today we are releasing
to address these challenges. TensorFlow Fold makes it easy
to implement deep-learning models that operate over data of
varying size and structure. Furthermore, TensorFlow Fold
brings the benefits of batching to such models, resulting in
a speedup of more than 10x on CPU, and more than 100x on GPU,
over alternative implementations. This is made possible by
dynamic batching, introduced in our paper
with Dynamic Computation Graphs.
Fold library will initially build a separate computation
graph from each input.
run with dynamic batching. Operations with the same
color are batched together, which lets TensorFlow
run them faster. The Embed operation converts
words to vector
The fully connected (FC) operation combines word
vectors to form vector representations of phrases.
The output of the network is a vector representation
of an entire sentence. Although only a single parse
tree of a sentence is shown, the same network can
run, and batch together operations, over multiple
parse trees of arbitrary shapes and sizes.
Because the individual inputs may have different sizes and
structures, the computation graphs may as well. Dynamic
batching then automatically combines these graphs to take
advantage of opportunities for batching, both within and
across inputs, and inserts additional instructions to move
data between the batched operations (see
for technical details).
To learn more, head over to our
We hope that TensorFlow Fold will be useful for researchers
and practitioners implementing neural networks with dynamic
computation graphs in TensorFlow.
This work was done under the supervision of Peter Norvig.