Chaos Mesh + SkyWalking: Better Observability for Chaos Engineering
Chaos Mesh + SkyWalking: Better Observability for Chaos Engineering
This tutorial demonstrates how to combine SkyWalking and Chaos Mesh to better observe the effects of chaos experiments on applications’ service performance.
Chaos Mesh is an open-source cloud-native chaos engineering platform. You can use Chaos Mesh to conveniently inject failures and simulate abnormalities that might occur in reality, so you can identify potential problems in your system. Chaos Mesh also offers a Chaos Dashboard which allows you to monitor the status of a chaos experiment. However, this dashboard cannot let you observe how the failures in the experiment impact the service performance of applications. This hinders us from further testing our systems and finding potential problems.
Apache SkyWalking is an open-source application performance monitor (APM), specially designed to monitor, track, and diagnose cloud-native, container-based distributed systems. It collects events that occur and then displays them on its dashboard, allowing you to observe directly the type and number of events that have occurred in your system and how different events impact the service performance.
When you use SkyWalking and Chaos Mesh together during chaos experiments, you can observe how different failures impact the service performance.
This tutorial will show you how to configure SkyWalking and Chaos Mesh. You’ll also learn how to leverage the two systems to monitor events and observe in real-time how chaos experiments impact applications’ service performance.
Prerequisite
Before you start to use SkyWalking and Chaos Mesh, you have to:
- Set up a SkyWalking cluster according to the SkyWalking configuration guide.
- Deploy Chao Mesh using Helm.
- Install JMeter or other Java testing tools (to increase service loads).
- Configure SkyWalking and Chaos Mesh according to this guide if you just want to run a demo.
Step 1: Access the SkyWalking Cluster
After you install the SkyWalking cluster, you can access its user interface. However, no service is running at this point, so before you start monitoring, you have to add one and set the agents.
In this tutorial, we take Spring Boot, a lightweight microservice framework, as an example to build a simplified demo environment.
<!--[if !supportLists]-->1. <!--[endif]-->Create a SkyWalking demo in Spring Boot by referring to this document.
<!--[if !supportLists]-->2. <!--[endif]-->Execute the
command kubectl apply -f demo-deployment.yaml -n skywalking
to deploy
the demo.
After you finish deployment, you can observe the real-time monitoring results at the SkyWalking UI.
Note: Spring Boot and SkyWalking have the same
default port number: 8080. Be careful when you configure the port forwarding;
otherwise, you may have port conflicts. For example, you can set Spring Boot’s
port to 8079 by using a command like kubectl port-forward
svc/spring-boot-skywalking-demo 8079:8080 -n skywalking
to
avoid conflicts.
Step 2: Deploy SkyWalking Kubernetes Event Exporter
SkyWalking Kubernetes Event Exporter is able to watch, filter, and send Kubernetes events into the SkyWalking backend. SkyWalking then associates the events with the system metrics and displays an overview of when and how the metrics are affected by the events.
If you want to deploy SkyWalking Kubernetes Event Explorer with
one line of commands, refer to this document to create configuration files
in YAML format, and then customize the parameters in the filters and exporters.
Now, you can use the command kubectl apply
to
deploy SkyWalking Kubernetes Event Explorer.
Step 3: Use JMeter To Increase Service Loads
To better observe the change in service performance, you need to increase the service loads on Spring Boot. In this tutorial, we use JMeter, a widely adopted Java testing tool, to increase the service loads.
Perform a stress test on localhost:8079
using
JMeter, and adding five threads to continuously increase the service loads.
The
user interface of Apache JMeter
Open the SkyWalking Dashboard. You can see that the access rate is 100%, and that the service loads reach about 5,300 calls per minute (CPM).
SkyWalking Dashboard
Step 4: Inject Failures via
Chaos Mesh and Observe Results
After you finish the three steps above, you can use the Chaos Dashboard to simulate stress scenarios and observe the change in service performance during chaos experiments.
CPU Load: 10%; Memory Load: 128 MB
The first chaos experiment simulates low CPU usage. To display
when a chaos experiment starts and ends, click the switching button on the
right side of the dashboard. To identify whether the experiment is Applied
to the
system or Recovered
from
the system, move your cursor onto the short, green line.
During the time period between the two short, green lines, the service load decreases to 4,929 CPM, but returns to normal after the chaos experiment ends.
The service load variation under the first chaos condition
The service load variation under the second chaos condition
CPU Load: 100%; Memory Load: 128 MB
When the CPU usage is at 100%, the service load decreases to only 40% of what it would be if no chaos experiments were taking place.
The service load variation under the third chaos condition
Summary
Because the process scheduling under the Linux system does not allow a process to occupy the CPU all the time, the deployed Spring Boot Demo can still handle 40% of the access requests even in the extreme case of a full CPU load.
By combining SkyWalking and Chaos Mesh, you can clearly observe when and to what extent chaos experiments affect application service performance. This combination of tools lets you observe the service performance in various extreme conditions, thus boosting your confidence in your services.
Relevant blogs
Recent Comments
No comments
Leave a Comment
We will be happy to hear what you think about this post