Resiliency
Starting with Container Storage Module 1.12, all deployments will use images from quay.io by default. New release images will be available on Docker Hub until CSM 1.14 (May 2025), and existing releases will remain on Docker Hub.
Prerequisite
- The CSM for Resiliency module only acts on pods with a specific label.
- This label must match the key and value set in the module’s configuration.
- On startup, CSM for Resiliency logs the label key and value it uses to monitor pods.
- Apply this label to the Statefulset you want monitored by CSM for Resiliency.
labelSelector: {map[podmon.dellemc.com/driver:csi-<driver>]}
The above message indicates the key is: podmon.dellemc.com/driver and the label value is csi-<driver>
. To search for the pods that would be monitored, try this:
kubectl get pods -A -l podmon.dellemc.com/driver=csi-<driver>
Note:
<driver>
should be replaced with respective driver name
User must follow all the prerequisites of the respective drivers before enabling this module.
Storage Array Upgrades
- Disable CSM for Resiliency during storage array upgrades to prevent application pods from getting stuck in a Pending state, even if the upgrade is advertised as non-disruptive.
- If nodes lose connectivity with the array, Resiliency will delete the pods on affected nodes and attempt to move them to a healthy node.
- If all nodes are affected, the pods will be stuck in a Pending state. Configure all the helm chart parameters described below before installing the drivers.
Helm Chart Installation
The drivers that support Helm chart installation allow CSM for Resiliency to be optionally installed by variables in the chart. There is a podmon block specified in the values.yaml file of the chart that will look similar to the text below by default:
# Enable this feature only after contact support for additional information
podmon:
enabled: true
controller:
args:
- "--csisock=unix:/var/run/csi/csi.sock"
- "--labelvalue=csi-<driver>"
- "--mode=controller"
- "--skipArrayConnectionValidation=false"
- "--driver-config-params=/<driver>-config-params/driver-config-params.yaml"
- "--driverPodLabelValue=dell-storage"
- "--ignoreVolumelessPods=false"
node:
args:
- "--csisock=unix:/var/lib/kubelet/plugins/<driver>.emc.dell.com/csi_sock"
- "--labelvalue=csi-<driver>"
- "--mode=node"
- "--leaderelection=false"
- "--driver-config-params=/<driver>-config-params/driver-config-params.yaml"
- "--driverPodLabelValue=dell-storage"
- "--ignoreVolumelessPods=false"
Note:
<driver>
should be replaced with respective driver name
To install CSM for Resiliency with the driver:
- Enable CSM for Resiliency by setting
podmon.enabled
totrue
(enables both controller-podmon and node-podmon). - If you need to change the registry, specify the podmon image to be used in
images.podmon
- Provide arguments for controller-podmon in
podmon.controller.args
(some arguments are required and differ from node-podmon). See “Podmon Arguments” below. - Provide arguments for node-podmon in
podmon.node.args
(some arguments are required and differ from controller-podmon). See “Podmon Arguments” below.
Podmon Arguments
PowerMax Specific Recommendations
Here is a typical installation used for testing:
podmon:
enabled: false
controller:
args:
- "--csisock=unix:/var/run/csi/csi.sock"
- "--labelvalue=csi-powermax"
- "--arrayConnectivityPollRate=60"
- "--driverPath=csi-powermax.dellemc.com"
- "--mode=controller"
- "--skipArrayConnectionValidation=false"
- "--driver-config-params=/powermax-config-params/driver-config-params.yaml"
- "--driverPodLabelValue=dell-storage"
- "--ignoreVolumelessPods=false"
node:
args:
- "--csisock=unix:/var/lib/kubelet/plugins/powermax.emc.dell.com/csi_sock"
- "--labelvalue=csi-powermax"
- "--arrayConnectivityPollRate=60"
- "--driverPath=csi-powermax.dellemc.com"
- "--mode=node"
- "--leaderelection=false"
- "--driver-config-params=/powermax-config-params/driver-config-params.yaml"
- "--driverPodLabelValue=dell-storage"
- "--ignoreVolumelessPods=false"
Dynamic parameters
CSM for Resiliency has configuration parameters that can be updated dynamically, such as the logging level and format. This can be
done by editing the Dell CSI Driver’s parameters ConfigMap. The ConfigMap can be queried using kubectl.
For example, the Dell Powerflex CSI Driver ConfigMaps can be found using this command: kubectl get -n vxflexos configmap
.
The ConfigMap to edit will have this pattern: vxflexos-config-params
).
To update or add parameters, you can use the kubectl edit
command. For example, kubectl edit -n vxflexos configmap vxflexos-config-params
.
This is a list of parameters that can be adjusted for CSM for Resiliency:
Parameter | Type | Default | Description |
---|---|---|---|
PODMON_CONTROLLER_LOG_FORMAT | String | “text” | Logging format output for the controller podmon sidecar. Should be “text” or “json” |
PODMON_CONTROLLER_LOG_LEVEL | String | “debug” | Logging level for the controller podmon sidecar. Standard values: ‘info’, ’error’, ‘warning’, ‘debug’, ’trace’ |
PODMON_NODE_LOG_FORMAT | String | “text” | Logging format output for the node podmon sidecar. Should be “text” or “json” |
PODMON_NODE_LOG_LEVEL | String | “debug” | Logging level for the node podmon sidecar. Standard values: ‘info’, ’error’, ‘warning’, ‘debug’, ’trace’ |
PODMON_ARRAY_CONNECTIVITY_POLL_RATE | Integer (>0) | 15 | An interval in seconds to poll the underlying array |
PODMON_ARRAY_CONNECTIVITY_CONNECTION_LOSS_THRESHOLD | Integer (>0) | 3 | A value representing the number of failed connection poll intervals before marking the array connectivity as lost |
PODMON_SKIP_ARRAY_CONNECTION_VALIDATION | Boolean | false | Flag to disable the array connectivity check, set to true for NoSchedule or NoExecute taint due to K8S Control Plane failure (kubelet failure) |
Here is an example of the parameters:
PODMON_CONTROLLER_LOG_FORMAT: "text"
PODMON_CONTROLLER_LOG_LEVEL: "info"
PODMON_NODE_LOG_FORMAT: "text"
PODMON_NODE_LOG_LEVEL: "info"
PODMON_ARRAY_CONNECTIVITY_POLL_RATE: 20
PODMON_ARRAY_CONNECTIVITY_CONNECTION_LOSS_THRESHOLD: 2
PODMON_SKIP_ARRAY_CONNECTION_VALIDATION: true