AWS Enhances SageMaker ML Service
December 5, 2022
“Today, tens of
thousands of
customers of all
sizes and across
industries rely on
Amazon SageMaker.
AWS customers are
building millions of
models, training
models with billions
of parameters, and
generating trillions
of predictions every
month. Many
customers are using
ML at a scale that
was unheard of just
a few years ago,”
said Bratin Saha,
vice president of
Artificial
Intelligence and
Machine Learning at
AWS. “The new Amazon
SageMaker
capabilities
announced today make
it even easier for
teams to expedite
the end-to-end
development and
deployment of ML
models. From
purpose-built
governance tools to
a next-generation
notebook experience
and streamlined
model testing to
enhanced support for
geospatial data, we
are building on
Amazon SageMaker’s
success to help
customers take
advantage of ML at
scale.” The cloud enabled
access to ML for
more users, but
until a few years
ago, the process of
building, training,
and deploying models
remained painstaking
and tedious,
requiring continuous
iteration by small
teams of data
scientists for weeks
or months before a
model was
production-ready.
Amazon SageMaker
launched five years
ago to address these
challenges, and
since then AWS has
added more than 250
new features and
capabilities to make
it easier for
customers to use ML
across their
businesses. Today,
some customers
employ hundreds of
practitioners who
use Amazon SageMaker
to make predictions
that help solve the
toughest challenges
around improving
customer experience,
optimizing business
processes, and
accelerating the
development of new
products and
services. As ML
adoption has
increased, so have
the types of data
that customers want
to use, as well as
the levels of
governance,
automation, and
quality assurance
that customers need
to support the
responsible use of
ML. Today's
announcement builds
on Amazon
SageMaker's history
of innovation in
supporting
practitioners of all
skill levels,
worldwide. New ML governance
capabilities in
Amazon SageMaker Amazon SageMaker
offers new
capabilities that
help customers more
easily scale
governance across
the ML model
lifecycle. As the
number of models and
users within an
organization
increases, it
becomes harder to
set least-privilege
access controls and
establish governance
processes to
document model
information (e.g.,
input data sets,
training environment
information,
model-use
description, and
risk rating). Once
models are deployed,
customers also need
to monitor for bias
and feature drift to
ensure they perform
as expected. Next-generation
Notebooks Amazon SageMaker
Studio Notebook
gives practitioners
a fully managed
notebook experience,
from data
exploration to
deployment. As teams
grow in size and
complexity, dozens
of practitioners may
need to
collaboratively
develop models using
notebooks. AWS
continues to offer
the best notebook
experience for users
with the launch of
three new features
that help customers
coordinate and
automate their
notebook code. Automated
validation of new
models using
real-time inference
requests Before deploying
to production,
practitioners test
and validate every
model to check
performance and
identify errors that
could negatively
impact the business.
Typically, they use
historical inference
request data to test
the performance of a
new model, but this
data sometimes fails
to account for
current, real-world
inference requests.
For example,
historical data for
an ML model to plan
the fastest route
might fail to
account for an
accident or a sudden
road closure that
significantly alters
the flow of traffic.
To address this
issue, practitioners
route a copy of the
inference requests
going to a
production model to
the new model they
want to test. It can
take weeks to build
this testing
infrastructure,
mirror inference
requests, and
compare how models
perform across key
metrics (e.g.,
latency and
throughput). While
this provides
practitioners with
greater confidence
in how the model
will perform, the
cost and complexity
of implementing
these solutions for
hundreds or
thousands of models
makes it unscalable. Amazon SageMaker
Inference now
provides a
capability to make
it easier for
practitioners to
compare the
performance of new
models against
production models,
using the same
real-world inference
request data in real
time. Now, they can
easily scale their
testing to thousands
of new models
simultaneously,
without building
their own testing
infrastructure. To
start, a customer
selects the
production model
they want to test
against, and Amazon
SageMaker Inference
deploys the new
model to a hosting
environment with the
exact same
conditions. Amazon
SageMaker routes a
copy of the
inference requests
received by the
production model to
the new model and
creates a dashboard
to display
performance
differences across
key metrics, so
customers can see
how each model
differs in real
time. Once the
customer validates
the new model’s
performance and is
confident it is free
of potential errors,
they can safely
deploy it. New
geospatial
capabilities in
Amazon SageMaker
make it easier for
customers to make
predictions using
satellite and
location data Today, most data
captured has
geospatial
information (e.g.,
location
coordinates, weather
maps, and traffic
data). However, only
a small amount of it
is used for ML
purposes because
geospatial datasets
are difficult to
work with and can
often be petabytes
in size, spanning
entire cities or
hundreds of acres of
land. To start
building a
geospatial model,
customers typically
augment their
proprietary data by
procuring
third-party data
sources like
satellite imagery or
map data.
Practitioners need
to combine this
data, prepare it for
training, and then
write code to divide
datasets into
manageable subsets
due to the massive
size of geospatial
data. Once customers
are ready to deploy
their trained
models, they must
write more code to
recombine multiple
datasets to
correlate the data
and ML model
predictions. To
extract predictions
from a finished
model, practitioners
then need to spend
days using open
source visualization
tools to render on a
map. The entire
process from data
enrichment to
visualization can
take months, which
makes it hard for
customers to take
advantage of
geospatial data and
generate timely ML
predictions. Amazon SageMaker
now accelerates and
simplifies
generating
geospatial ML
predictions by
enabling customers
to enrich their
datasets, train
geospatial models,
and visualize the
results in hours
instead of months.
With just a few
clicks or using an
API, customers can
use Amazon SageMaker
to access a range of
geospatial data
sources from AWS
(e.g., Amazon
Location Service),
open-source datasets
(e.g., Amazon Open
Data), or their own
proprietary data
including from
third-party
providers (like
Planet Labs). Once a
practitioner has
selected the
datasets they want
to use, they can
take advantage of
built-in operators
to combine these
datasets with their
own proprietary
data. To speed up
model development,
Amazon SageMaker
provides access to
pre-trained
deep-learning models
for use cases such
as increasing crop
yields with
precision
agriculture,
monitoring areas
after natural
disasters, and
improving urban
planning. After
training, the
built-in
visualization tool
displays data on a
map to uncover new
predictions. Capitec Bank is
South Africa's
largest digital bank
with over 10 million
digital clients. “At
Capitec, we have a
wide range of data
scientists across
our product lines
who build differing
ML solutions,” said
Dean Matter, ML
engineer at Capitec
Bank. “Our ML
engineers manage a
centralized modeling
platform built on
Amazon SageMaker to
empower the
development and
deployment of all of
these ML solutions.
Without any built-in
tools, tracking
modelling efforts
tends toward
disjointed
documentation and a
lack of model
visibility. With
Amazon SageMaker
Model Cards, we can
track plenty of
model metadata in a
unified environment,
and Amazon SageMaker
Model Dashboard
provides visibility
into the performance
of each model. In
addition, Amazon
SageMaker Role
Manager simplifies
access management
for data scientists
in our different
product lines. Each
of these contribute
toward our model
governance being
sufficient to
warrant the trust
that our clients
place in us as a
financial services
provider.” EarthOptics is a
soil-data-measurement
and mapping company
that leverages
proprietary sensor
technology and data
analytics to
precisely measure
the health and
structure of soil.
“We wanted to use ML
to help customers
increase
agricultural yields
with cost-effective
soil maps,” said
Lars Dyrud, CEO of
EarthOptics. “Amazon
SageMaker’s
geospatial ML
capabilities allowed
us to rapidly
prototype algorithms
with multiple data
sources and reduce
the amount of time
between research and
production API
deployment to just a
month. Thanks to
Amazon SageMaker, we
now have geospatial
solutions for soil
carbon sequestration
deployed for farms
and ranches across
the U.S.”
Intuit is the
global financial
technology platform
that powers
prosperity for more
than 100 million
customers worldwide
with TurboTax,
Credit Karma,
QuickBooks, and
Mailchimp. “We’re
unleashing the power
of data to transform
the world of
consumer,
self-employed, and
small business
finances on our
platform,” said
Brett Hollman,
director of
Engineering and
Product Development
at Intuit. “To
further improve team
efficiencies for
getting AI-driven
products to market
with speed, we've
worked closely with
AWS in designing the
new team-based
collaboration
capabilities of
SageMaker Studio
Notebooks. We’re
excited to
streamline
communication and
collaboration to
enable our teams to
scale ML development
with Amazon
SageMaker Studio.”AWS
introduced eight
new capabilities for
Amazon SageMaker,
its end-to-end
machine learning
(ML) service.
Developers, data
scientists, and
business analysts
use Amazon SageMaker
to build, train, and
deploy ML models
quickly and easily
using its fully
managed
infrastructure,
tools, and
workflows. As
customers continue
to innovate using
ML, they are
creating more models
than ever before and
need advanced
capabilities to
efficiently manage
model development,
usage, and
performance. Today’s
announcement
includes new Amazon
SageMaker governance
capabilities that
provide visibility
into model
performance
throughout the ML
lifecycle. New
Amazon SageMaker
Studio Notebook
capabilities provide
an enhanced notebook
experience that
enables customers to
inspect and address
data-quality issues
in just a few
clicks, facilitate
real-time
collaboration across
data science teams,
and accelerate the
process of going
from experimentation
to production by
converting notebook
code into automated
jobs. Finally, new
capabilities within
Amazon SageMaker
automate model
validation and make
it easier to work
with geospatial
data.
HERE Technologies
is a leading
location-data and
technology platform
that helps customers
create custom maps
and location
experiences built on
highly precise
location data. “Our
customers need
real-time context as
they make business
decisions leveraging
insights from
spatial patterns and
trends,” said
Giovanni Lanfranchi,
chief product and
technology officer
for HERE
Technologies. “We
rely on ML to
automate the
ingestion of
location-based data
from varied sources
to enrich it with
context and
accelerate analysis.
Amazon SageMaker’s
new testing
capabilities allowed
us to more
rigorously and
proactively test ML
models in production
and avoid adverse
customer impact and
any potential
outages because of
an error in deployed
models. This is
critical, since our
customers rely on us
to provide timely
insights based on
real-time location
data that changes
every minute.”