Gremlin Intros Automatic Service Discovery

April 28, 2021

In modern environments that include microservices and Kubernetes, a single service is often distributed across many hosts, making visibility and targeting difficult.

Gremlin introduced their Automatic Service Discovery at FailoverConf. The new feature from Gremlin automatically identifies the various services running across distributed systems, which enables engineers to directly target them for more effective Chaos Engineering experiments.

“When we started Gremlin our primary focus was on the underlying infrastructure, helping customers answer questions like, 'Can we handle server crashes?' or 'Can this cluster deal with a 10X traffic spike?'” said Matthew Fornaciari, CTO and Co-Founder of Gremlin. “But the rise in popularity of microservices necessitate services functioning as first-class citizens. The infrastructure layer is becoming more abstract and engineers are increasingly thinking about their systems as a collection of services. We want to replicate that mental model in Gremlin and reduce the cognitive load necessary to create controlled chaos.”

Gremlin's Automatic Service Discovery works by identifying the services running where the Gremlin agent is installed, and then surfacing the operational data that makes those services function, such as process names, container images, and where the service is deployed. This makes it easier than ever before for engineers to run targeted chaos experiments, regardless of how they are hosted, be it distributed across hosts, containers, or even multiple cloud providers.

“End customers won’t care about the ephemeral workloads and API calls happening behind the UI, they just want applications that function and perform as expected,” said Jason English, Principal Analyst at Intellyx. “Before DevOps teams can shift-left and engineer resiliency into a system with early performance testing, chaos experiments and telemetry; they need to shift-right and discover exactly what services are contributing to that customer experience in production.”

Gremlin has also built a new way to track reliability progress, enabling SREs and DevOps teams to click into a particular service and view the full history of experiments run over time. The owner of the service can also include links to runbooks for remediation and any associated dashboards for deeper observability. Having a single view for all of this information will provide engineers with a greater understanding of the reliability of their services.

Terms of Use | Copyright © 2002 - 2021 CONSTITUENTWORKS SM  CORPORATION. All rights reserved. | Privacy Statement