Resize Machines
Steps to resize a Machine on OpenShift cluster.
Important Note
- All steps described here will follow the safety way to resize a Machine in OCP 4.x.
- This is not a official documentation and those steps were tested on versions 4.9 and 4.10.
Overview of steps:
- Gather cluster information
- Set target Machines to resize
- Set the new size
- Graceful Power off
- Change Machine size
- Power on
- Patch Machine Object spec
Supported/documented platforms:
- AWS
- Azure
Gather cluster information
Check the provider
Make sure you are running the steps for the correct Cloud Provider:
oc get infrastructures \
-o jsonpath='{.items[*].status.platformStatus.type}'
Example output
AWS
Azure
Check the cluster version
oc get clusterversion
Check all the nodes are Ready
Make sure that all group of nodes that will be resized are with the Status=Ready
.
Theme extension prerequisites
All steps described here was done on master
nodes
oc get nodes \
-l kubernetes.io/os=linux,node-role.kubernetes.io/master=
Sample output:
NAME STATUS ROLES AGE VERSION
mrbaz01-2754r-master-0 Ready master 5h57m v1.22.0-rc.0+8719299
mrbaz01-2754r-master-1 Ready master 5h57m v1.22.0-rc.0+8719299
mrbaz01-2754r-master-2 Ready master 5h56m v1.22.0-rc.0+8719299
Check all the machines are Running
Make sure that all group of nodes that will be resized are with the Status=Ready
.
Notes:
- Sample steps filtering the group of nodes: master
oc get machines \
-n openshift-machine-api \
-l machine.openshift.io/cluster-api-machine-role=master
NAME PHASE TYPE REGION ZONE AGE
mrbaz01-2754r-master-0 Running Standard_D4s_v3 eastus 1 6h1m
mrbaz01-2754r-master-1 Running Standard_D4s_v3 eastus 3 6h1m
mrbaz01-2754r-master-2 Running Standard_D4s_v3 eastus 2 6h1m
Gather Machine Information
Gather Cloud provider information from Machine object.
Choose the Cloud Provider
oc get machines \
-n openshift-machine-api \
-l machine.openshift.io/cluster-api-machine-role=master \
-o json \
| jq -r '.items[]| (\
"node_name: " + .status.nodeRef.name,\
"machine_name: "+ .metadata.name,\
"instanceId: "+ .status.providerStatus.instanceId,\
"instanceTypeSpec: "+ .spec.providerSpec.value.instanceType,\
"instanceTypeMeta: "+ .metadata.labels."machine.openshift.io/instance-type",\
"")'
oc get machines \
-n openshift-machine-api \
-l machine.openshift.io/cluster-api-machine-role=master \
-o json \
| jq -r '.items[]| (\
"node_name: " + .status.nodeRef.name,\
"machine_name: "+ .metadata.name,\
"instanceId: "+ .status.providerStatus.vmId,\
"instanceTypeSpec: "+ .spec.providerSpec.value.vmSize,\
"instanceTypeMeta: "+ .metadata.labels."machine.openshift.io/instance-type",\
"")'
Sample output
N/A
node_name: mrbaz01-2754r-master-0
machine_name: mrbaz01-2754r-master-0
instanceId: /subscriptions/a-b-c-d-xyz/resourceGroups/mrbaz01-2754r-rg/providers/Microsoft.Compute/virtualMachines/mrbaz01-2754r-master-0
instanceTypeSpec: Standard_D4s_v3
instanceTypeMeta: Standard_D4s_v3
node_name: mrbaz01-2754r-master-1
machine_name: mrbaz01-2754r-master-1
instanceId: /subscriptions/a-b-c-d-xyz/resourceGroups/mrbaz01-2754r-rg/providers/Microsoft.Compute/virtualMachines/mrbaz01-2754r-master-1
instanceTypeSpec: Standard_D4s_v3
instanceTypeMeta: Standard_D4s_v3
node_name: mrbaz01-2754r-master-2
machine_name: mrbaz01-2754r-master-2
instanceId: /subscriptions/a-b-c-d-xyz/resourceGroups/mrbaz01-2754r-rg/providers/Microsoft.Compute/virtualMachines/mrbaz01-2754r-master-2
instanceTypeSpec: Standard_D4s_v3
instanceTypeMeta: Standard_D4s_v3
General steps to resize each machine
Tip
Repeat the steps bellow for each machine you want to resize
Set the machine_name
variable value.
Warning
The variable machine_name
should be set specific for your environment,
and updated for each machine to resize.
machine_name=mrbaz01-2754r-master-0
Set the new Machine size
new_machine_type="<cloud_provider_size>"
Example by Cloud Provider
To check EC2 compatibility with OCP, please check this doc, then set:
new_machine_type="m5.xlarge"
To check VM size available for specific VM, run:
az vm list-vm-resize-options \
--resource-group ${resource_group} \
--name ${machine_name} \
--output table
Then set the desired value:
new_machine_type="Standard_D8s_v3"
Collect Machine info
Attention
You shouldn't change any step describe below, just run according your environment.
Discovery variable values based on ${machine_name}
Choose the Cloud Provider
instanceId=$(oc get machine ${machine_name} -n openshift-machine-api -o jsonpath={.status.providerStatus.instanceId})
node_name=$(oc get machine ${machine_name} -n openshift-machine-api -o jsonpath={.status.nodeRef.name})
resource_group=$(oc get machine ${machine_name} -n openshift-machine-api -o jsonpath={.spec.providerSpec.value.resourceGroup})
instanceId=${machine_name}
node_name=$(oc get machine ${machine_name} -n openshift-machine-api -o jsonpath={.status.nodeRef.name})
- Make sure all varialbes are set:
echo "[${instanceId}] [${node_name}] ${resource_group:-}"
Graceful Power off
- Cordon the node
oc adm cordon ${node_name}
- Drain the node
oc adm drain ${node_name} --ignore-daemonsets --grace-period=60
- Shutdown
oc debug node/${node_name} -- chroot /host shutdown -h 1
- Wait the node to shutdown
Attention
Wait until node is Status=NotReady
oc get node ${node_name} -w
- Wait until the Instance/VM is in stopped state (by Cloud provider)
Choose the Cloud Provider
while true; do \
st=$(aws ec2 describe-instance-status \
--instance-id ${instanceId} \
| jq -r .InstanceStatuses[0].InstanceState.Name); \
echo “state=$st”; \
test $st == "null" && break; \
test $st == "running" && ( \
echo "state=$st; sleeping 15s"; \
sleep 15;\
); \
done
while true; do \
st=$(az vm get-instance-view \
--resource-group ${resource_group} \
--name ${machine_name} \
--output json \
| jq -e '.instanceView.statuses[] \
| select( .code | startswith("PowerState") ).code'); \
echo “state=$st”; \
test $st == "\"PowerState/stopped\"" && break; \
test $st == "\"PowerState/running\"" && ( \
echo "state=$st; sleeping 15s"; \
sleep 15;\
); \
done
- Make sure that the node is turned off
Choose the Cloud Provider
aws ec2 describe-instance-status \
--instance-id ${instanceId}
Expected result:
{
"InstanceStatuses": []
}
az vm get-instance-view \
--resource-group ${resource_group} \
--name ${machine_name} \
--output table
Expected result:
Name ResourceGroup Location ProvisioningState PowerState
---------------------- ---------------- ---------- ------------------- ------------
mrbaz01-2754r-master-0 mrbaz01-2754r-rg eastus Succeeded VM stopped
Change instance Type
- Change the size
Choose the Cloud Provider
aws ec2 modify-instance-attribute \
--instance-id ${instanceId} \
--instance-type ${new_machine_type}
az vm resize \
--resource-group ${resource_group} \
--name ${machine_name} \
--size ${new_machine_type}
- Check the current [new] size
Choose the Cloud Provider
aws ec2 describe-instance-attribute \
--instance-id ${instanceId} \
--attribute instanceType
az vm get-instance-view \
--resource-group ${resource_group} \
--name ${machine_name} \
--output json \
| jq -r '.hardwareProfile.vmSize'
Power on
- Power on the VM
Choose the Cloud Provider
aws ec2 start-instances \
--instance-ids ${instanceId}
az vm start \
--resource-group ${resource_group} \
--name ${machine_name} \
--output table
- Wait until the Instance is in running state from Cloud Provider
Choose the Cloud Provider
while true; do \
st=$(aws ec2 describe-instance-status \
--instance-id ${instanceId} \
| jq -r .InstanceStatuses[0].InstanceState.Name \
); \
echo "state=$st"; \
test $st == "running" && break; \
test $st == "null" && ( \
echo "state=$st; sleeping 15s"; \
sleep 15;\
); \
done
while true; do
st=$(az vm get-instance-view \
--resource-group ${resource_group} \
--name ${machine_name} \
--output json \
| jq -e '.instanceView.statuses[] | select( .code | startswith("PowerState") ).code');
echo "state=$st";
test $st == "\"PowerState/running\"" && break;
test $st == "\"PowerState/stopped\"" && ( \
echo "state=$st; sleeping 15s"; \
sleep 15;\
);
done
- Wait the node to be in Ready (
STATUS=Ready
)
oc get node ${node_name} -w
- Wait MAPI to reconcile and update the new machine size (
TYPE
)
oc get machine ${machine_name} \
-n openshift-machine-api
Sample output
NAME PHASE TYPE REGION ZONE AGE
mrbg3-4glln-master-0 Running m5.xlarge us-east-1 us-east-1a 48m
NAME PHASE TYPE REGION ZONE AGE
mrbaz01-2754r-master-0 Running Standard_D8s_v3 eastus 1 7h8m
- Make sure that no csr is pending (it shouldn't have any pending)
All certs should be issued and approved, just make sure if there was any issue in that step.
oc get csr
- Some operators should be degraded, review it:
oc get co
- Uncordon the node
oc adm uncordon ${node_name}
- Wait until all operators clear the degraded state
oc get co -w
- Review the Machine object attributes
Choose the Cloud Provider
oc get machine ${machine_name} \
-n openshift-machine-api \
-o json \
| jq -r '. | (\
"node_name: " + .status.nodeRef.name,\
"machine_name: "+ .metadata.name,\
"instanceTypeSpec: "+ .spec.providerSpec.value.instanceType,\
"instanceTypeMeta: "+ .metadata.labels."machine.openshift.io/instance-type",\
"")'
oc get machine ${machine_name} \
-n openshift-machine-api \
-o json \
| jq -r '. | (\
"node_name: " + .status.nodeRef.name,\
"machine_name: "+ .metadata.name,\
"instanceTypeSpec: "+ .spec.providerSpec.value.vmSize,\
"instanceTypeMeta: "+ .metadata.labels."machine.openshift.io/instance-type",\
"")'
Patch Machine API
Patch Machine Object:
Choose the Cloud Provider
oc patch machine ${machine_name} \
-n openshift-machine-api \
--type=merge \
-p "{\"spec\":{\"providerSpec\":{\"value\":{\"instanceType\":\"${new_machine_type}\"}}}}"
oc patch machine ${machine_name} \
-n openshift-machine-api \
--type=merge \
-p "{\"spec\":{\"providerSpec\":{\"value\":{\"vmSize\":\"${new_machine_type}\"}}}}"
- Review if the Machine Type was changed:
Example output
oc get machines ${machine_name} \
-n openshift-machine-api \
-o json \
| jq -r '. | (\
"node_name: " + .status.nodeRef.name,\
"machine_name: "+ .metadata.name,\
"instanceTypeSpec: "+ .spec.providerSpec.value.instanceType,\
"instanceTypeMeta: "+ .metadata.labels."machine.openshift.io/instance-type",\
"")'
Sample output:
node_name: ip-10-0-133-111.ec2.internal
machine_name: mrbg3-4glln-master-0
instanceTypeSpec: m5.xlarge
instanceTypeMeta: m5.xlarge
oc get machines ${machine_name} \
-n openshift-machine-api \
-o json \
| jq -r '. | (\
"node_name: " + .status.nodeRef.name,\
"machine_name: "+ .metadata.name,\
"instanceTypeSpec: "+ .spec.providerSpec.value.vmSize,\
"instanceTypeMeta: "+ .metadata.labels."machine.openshift.io/instance-type",\
"")'
Sample output:
node_name: mrbaz01-2754r-master-1
machine_name: mrbaz01-2754r-master-1
instanceTypeSpec: Standard_D8s_v3
instanceTypeMeta: Standard_D8s_v3
Check services
- Check all cluster operators
oc get co
- Review Kube apiservers
oc get pod kube-apiserver-${node_name} \
-n openshift-kube-apiserver
- Review etcd cluster
Pods
oc get pod etcd-${node_name} \
-n openshift-etcd
Example output
NAME READY STATUS RESTARTS AGE
etcd-mrbaz01-2754r-master-1 4/4 Running 4 7h12m
Members
oc exec \
-n openshift-etcd \
etcd-${node_name} -- etcdctl member list -w table 2>/dev/null
Example output
+------------------+---------+------------------------+-----------------------+-----------------------+------------+
| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER |
+------------------+---------+------------------------+-----------------------+-----------------------+------------+
| 612953730164bdff | started | mrbaz01-2754r-master-2 | https://10.0.0.6:2380 | https://10.0.0.6:2379 | false |
| 8bf6319e4243538c | started | mrbaz01-2754r-master-0 | https://10.0.0.7:2380 | https://10.0.0.7:2379 | false |
| de0c658dd1ee52b8 | started | mrbaz01-2754r-master-1 | https://10.0.0.8:2380 | https://10.0.0.8:2379 | false |
+------------------+---------+------------------------+-----------------------+-----------------------+------------+
Endpoints healthy (HEALTH=true
)
oc exec \
-n openshift-etcd \
etcd-${node_name} -- etcdctl endpoint health -w table 2>/dev/null
Example output
+-----------------------+--------+-------------+-------+
| ENDPOINT | HEALTH | TOOK | ERROR |
+-----------------------+--------+-------------+-------+
| https://10.0.0.8:2379 | true | 16.361971ms | |
| https://10.0.0.6:2379 | true | 16.523072ms | |
| https://10.0.0.7:2379 | true | 15.879969ms | |
+-----------------------+--------+-------------+-------+
Repeat the steps for each machine
Repeat the section "General steps to resize each machine" for each new machine to resize
Review all changes
- Review Nodes
oc get nodes \
-l kubernetes.io/os=linux,node-role.kubernetes.io/master=
- Gather current Machine summary
oc get machines \
-n openshift-machine-api \
-l machine.openshift.io/cluster-api-machine-role=master
- Review Machines attributes from all machines
Choose the Cloud Provider
oc get machines \
-n openshift-machine-api \
-l machine.openshift.io/cluster-api-machine-role=master \
-o json \
| jq -r '.items[]| (\
"node_name: " + .status.nodeRef.name,\
"machine_name: "+ .metadata.name,\
"instanceTypeSpec: "+ .spec.providerSpec.value.instanceType,\
"instanceTypeMeta: "+ .metadata.labels."machine.openshift.io/instance-type",\
"")'
oc get machines \
-n openshift-machine-api \
-l machine.openshift.io/cluster-api-machine-role=master \
-o json \
| jq -r '.items[]| (\
"node_name: " + .status.nodeRef.name,\
"machine_name: "+ .metadata.name,\
"instanceTypeSpec: "+ .spec.providerSpec.value.vmSize,\
"instanceTypeMeta: "+ .metadata.labels."machine.openshift.io/instance-type",\
"")'
Suggested Next Steps
- Create a kubectl plugin to handle all the steps covered here