ETCD cluster benchmark for Openshift 4!
In the following post, we are going to talk on how to perform a set of benchmark tests on the ETCD cluster of Openshift 4 with root disk over SAN.
Prerequisites
- OCPv4.10.X
Using the etcdctl check perf
tool:
Step 1. Testing the ETCD
Before performing any ETCD benchmark testings please collect oc must-gather logs.
export ETCD_POD=$(oc -n openshift-etcd get pods -l app=etcd -o name | head -1)
- load = small
oc exec -n openshift-etcd -it -c etcd $ETCD_POD -- etcdctl check perf --load='s' --auto-compact=true --auto-defrag=true --command-timeout=10m
60 / 60 Boooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo! 100.00% 1m0s
Compacting with revision 8758230
Compacted with revision 8758230
Defragmenting "https://10.92.117.52:2379"
Defragmented "https://10.92.117.52:2379"
Defragmenting "https://10.92.117.53:2379"
Defragmented "https://10.92.117.53:2379"
Defragmenting "https://10.92.117.54:2379"
Defragmented "https://10.92.117.54:2379"
PASS: Throughput is 151 writes/s
PASS: Slowest request took 0.035363s
PASS: Stddev is 0.001403s
PASS
- load = medium
oc exec -n openshift-etcd -it -c etcd $ETCD_POD -- etcdctl check perf --load='m' --auto-compact=true --auto-defrag=true --command-timeout=10m
60 / 60 Boooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo! 100.00% 1m0s
Compacting with revision 8819152
Compacted with revision 8819152
Defragmenting "https://10.92.117.52:2379"
Defragmented "https://10.92.117.52:2379"
Defragmenting "https://10.92.117.53:2379"
Defragmented "https://10.92.117.53:2379"
Defragmenting "https://10.92.117.54:2379"
Defragmented "https://10.92.117.54:2379"
PASS: Throughput is 999 writes/s
PASS: Slowest request took 0.039985s
PASS: Stddev is 0.002749s
PASS
- load = large
oc exec -n openshift-etcd -it -c etcd $ETCD_POD -- etcdctl check perf --load='l' --auto-compact=true --auto-defrag=true --command-timeout=10m
60 / 60 Boooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo! 100.00% 1m0s
Compacting with revision 9235285
Compacted with revision 9235285
Defragmenting "https://10.92.117.52:2379"
Defragmented "https://10.92.117.52:2379"
Defragmenting "https://10.92.117.53:2379"
Defragmented "https://10.92.117.53:2379"
Defragmenting "https://10.92.117.54:2379"
Defragmented "https://10.92.117.54:2379"
FAIL: Throughput too low: 6918 writes/s
PASS: Slowest request took 0.132319s
PASS: Stddev is 0.004785s
FAIL
- load = extralarge
oc exec -n openshift-etcd -it -c etcd $ETCD_POD -- etcdctl check perf --load='xl' --auto-compact=true --auto-defrag=true --command-timeout=10m
60 / 60 Boooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo! 100.00% 1m0s
Compacting with revision 8747581
Compacted with revision 8747581
Defragmenting "https://10.92.117.52:2379"
Defragmented "https://10.92.117.52:2379"
Defragmenting "https://10.92.117.53:2379"
Defragmented "https://10.92.117.53:2379"
Defragmenting "https://10.92.117.54:2379"
Defragmented "https://10.92.117.54:2379"
FAIL: Throughput too low: 10821 writes/s
PASS: Slowest request took 0.048963s
PASS: Stddev is 0.005151s
FAIL
Step 2. The benchmark threshold results set by the upstream community
Measure | Threshold limit |
---|---|
Throughput | < 150 * 0.9 writes/s (s) ; < 1000 * 0.9 writes/s (m) ; < 8000 * 0.9 writes/s (l) ; < 15000 * 0.9 writes/s (xl) |
Slowest request | > 500 ms |
Standard deviation | > 100 ms |
Step 3. Testing the ETCD with kube-burner
In this point we are going to extend the benchmark tool described on the Step 1 for the ETCD cluster on OCPv4.X.
Before performing any ETCD benchmark testings please collect oc must-gather logs.
Run the must-gather through the etcd.sh:
alias etcdcheck='podman run --privileged --volume /$(pwd):/test quay.io/peterducai/openshift-etcd-suite:latest etcd '
etcdcheck /test/<path to must-gather>
curl -L -O https://github.com/cloud-bulldozer/kube-burner/releases/download/v0.15.5/kube-burner-0.15.5-Linux-x86_64.tar.gz
tar xvfz kube-burner-0.15.5-Linux-x86_64.tar.gz
sudo mv kube-burner /usr/local/bin/
curl -L -O https://github.com/cloud-bulldozer/cluster-perf-ci/archive/refs/heads/master.zip
cd cluster-perf-ci/
At this point we will need to tune the workload file configmap-scale.yaml
as described below:
---
global:
writeToFile: false
requestTimeout: 15s
indexerConfig:
enabled: false
esServers: ["https://search-perfscale-dev-chmf5l4sh66lvxbnadi4bznl3a.us-west-2.es.amazonaws.com"]
defaultIndex: ripsaw-kube-burner
type: elastic
jobs:
- name: stage-1
namespace: stage-1
jobIterations: 1
qps: 200
burst: 200
namespacedIterations: false
podWait: false
verifyObjects: false
objects:
- objectTemplate: "templates/configmap-scale/configmap.yml"
#replicas: 20000
replicas: 50000
inputVars:
# Data lenght is in bytes, 2000000 = 2MiB
#data_length: 10000
data_length: 10000
- name: delete-stage-1
waitForDeletion: true
jobType: delete
objects:
- kind: Namespace
labelSelector: {kube-burner-job: stage-1}
Running the kube-kurner
service in a dedicated terminal:
oc project default
oc create sa kubeburner
oc adm policy add-cluster-role-to-user cluster-admin -z kubeburner
export TOKEN=$(oc sa get-token kubeburner)
kube-burner init -c configmap-scale.yml -t ${TOKEN} --uuid $(uuidgen)
Open a new terminal and run the following commands:
while true; do oc get configmap -n stage-1 | wc -l; done
Once those steps are finished and all the data has been collected, try to perform a oc must-gather logs collection.
Run the must-gather through the etcd.sh:
alias etcdcheck='podman run --privileged --volume /$(pwd):/test quay.io/peterducai/openshift-etcd-suite:latest etcd '
etcdcheck /test/<path to must-gather>
For more informations on the etcd.sh.
Run the fio_suite:
podman run --privileged --volume /$(pwd):/test quay.io/peterducai/openshift-etcd-suite:latest fio