Resiliency

Dell Container Storage Modules (CSM) for Resiliency installation

Starting with Container Storage Module 1.12, all deployments will use images from quay.io by default. New release images will be available on Docker Hub until CSM 1.14 (May 2025), and existing releases will remain on Docker Hub.

CSM for Resiliency is installed as part of the Dell CSI driver installation.

For information on the PowerFlex CSI driver, see PowerFlex CSI Driver. For information on the Unity XT CSI driver, see Unity XT CSI Driver. For information on the PowerScale CSI driver, see PowerScale CSI Driver. For information on the PowerStore CSI driver, see PowerStore CSI Driver. For information on the PowerStore CSI driver, see PowerMax CSI Driver.

Prerequisite

  • The CSM for Resiliency module only acts on pods with a specific label.
  • This label must match the key and value set in the module’s configuration.
  • On startup, CSM for Resiliency logs the label key and value it uses to monitor pods.
  • Apply this label to the Statefulset you want monitored by CSM for Resiliency.
labelSelector: {map[podmon.dellemc.com/driver:csi-<driver>]}

The above message indicates the key is: podmon.dellemc.com/driver and the label value is csi-<driver>. To search for the pods that would be monitored, try this:

kubectl get pods -A -l podmon.dellemc.com/driver=csi-<driver>

Note: <driver> should be replaced with respective driver name

User must follow all the prerequisites of the respective drivers before enabling this module.

Storage Array Upgrades

  • Disable CSM for Resiliency during storage array upgrades to prevent application pods from getting stuck in a Pending state, even if the upgrade is advertised as non-disruptive.
  • If nodes lose connectivity with the array, Resiliency will delete the pods on affected nodes and attempt to move them to a healthy node.
  • If all nodes are affected, the pods will be stuck in a Pending state. Configure all the helm chart parameters described below before installing the drivers.

Helm Chart Installation

The drivers that support Helm chart installation allow CSM for Resiliency to be optionally installed by variables in the chart. There is a podmon block specified in the values.yaml file of the chart that will look similar to the text below by default:

# Enable this feature only after contact support for additional information
podmon:
  enabled: true
  controller:
    args:
      - "--csisock=unix:/var/run/csi/csi.sock"
      - "--labelvalue=csi-<driver>"
      - "--mode=controller"
      - "--skipArrayConnectionValidation=false"
      - "--driver-config-params=/<driver>-config-params/driver-config-params.yaml"
      - "--driverPodLabelValue=dell-storage"
      - "--ignoreVolumelessPods=false"
  node:
    args:
      - "--csisock=unix:/var/lib/kubelet/plugins/<driver>.emc.dell.com/csi_sock"
      - "--labelvalue=csi-<driver>"
      - "--mode=node"
      - "--leaderelection=false"
      - "--driver-config-params=/<driver>-config-params/driver-config-params.yaml"
      - "--driverPodLabelValue=dell-storage"
      - "--ignoreVolumelessPods=false"

Note: <driver> should be replaced with respective driver name

To install CSM for Resiliency with the driver:

  1. Enable CSM for Resiliency by setting podmon.enabled to true (enables both controller-podmon and node-podmon).
  2. If you need to change the registry, specify the podmon image to be used in images.podmon
  3. Provide arguments for controller-podmon in podmon.controller.args (some arguments are required and differ from node-podmon). See “Podmon Arguments” below.
  4. Provide arguments for node-podmon in podmon.node.args (some arguments are required and differ from controller-podmon). See “Podmon Arguments” below.

Podmon Arguments

Arguments

PowerFlex Specific Recommendations

PowerFlex supports a very robust array connection validation mechanism that can detect changes in connectivity in about two seconds and can detect whether I/O has occurred over a five-second sample. For that reason it is recommended to set “skipArrayConnectionValidation=false” (which is the default) and to set “arrayConnectivityPollRate=5” (5 seconds) and “arrayConnectivityConnectionLossThreshold=3” to 3 or more.

Here is a typical installation used for testing:

podmon:
  enabled: true
  controller:
    args:
      - "--csisock=unix:/var/run/csi/csi.sock"
      - "--labelvalue=csi-vxflexos"
      - "--mode=controller"
      - "--arrayConnectivityPollRate=5"
      - "--arrayConnectivityConnectionLossThreshold=3"
      - "--skipArrayConnectionValidation=false"
      - "--driver-config-params=/vxflexos-config-params/driver-config-params.yaml"
      - "--driverPodLabelValue=dell-storage"
      - "--ignoreVolumelessPods=false"
  node:
    args:
      - "--csisock=unix:/var/lib/kubelet/plugins/vxflexos.emc.dell.com/csi_sock"
      - "--labelvalue=csi-vxflexos"
      - "--mode=node"
      - "--leaderelection=false"
      - "--driver-config-params=/vxflexos-config-params/driver-config-params.yaml"
      - "--driverPodLabelValue=dell-storage"
      - "--ignoreVolumelessPods=false"

Dynamic parameters

CSM for Resiliency has configuration parameters that can be updated dynamically, such as the logging level and format. This can be done by editing the Dell CSI Driver’s parameters ConfigMap. The ConfigMap can be queried using kubectl. For example, the Dell Powerflex CSI Driver ConfigMaps can be found using this command: kubectl get -n vxflexos configmap. The ConfigMap to edit will have this pattern: -config-params (e.g., vxflexos-config-params).

To update or add parameters, you can use the kubectl edit command. For example, kubectl edit -n vxflexos configmap vxflexos-config-params.

This is a list of parameters that can be adjusted for CSM for Resiliency:

Parameter Type Default Description
PODMON_CONTROLLER_LOG_FORMAT String “text” Logging format output for the controller podmon sidecar. Should be “text” or “json”
PODMON_CONTROLLER_LOG_LEVEL String “debug” Logging level for the controller podmon sidecar. Standard values: ‘info’, ’error’, ‘warning’, ‘debug’, ’trace’
PODMON_NODE_LOG_FORMAT String “text” Logging format output for the node podmon sidecar. Should be “text” or “json”
PODMON_NODE_LOG_LEVEL String “debug” Logging level for the node podmon sidecar. Standard values: ‘info’, ’error’, ‘warning’, ‘debug’, ’trace’
PODMON_ARRAY_CONNECTIVITY_POLL_RATE Integer (>0) 15 An interval in seconds to poll the underlying array
PODMON_ARRAY_CONNECTIVITY_CONNECTION_LOSS_THRESHOLD Integer (>0) 3 A value representing the number of failed connection poll intervals before marking the array connectivity as lost
PODMON_SKIP_ARRAY_CONNECTION_VALIDATION Boolean false Flag to disable the array connectivity check, set to true for NoSchedule or NoExecute taint due to K8S Control Plane failure (kubelet failure)

Here is an example of the parameters:

    PODMON_CONTROLLER_LOG_FORMAT: "text"
    PODMON_CONTROLLER_LOG_LEVEL: "info"
    PODMON_NODE_LOG_FORMAT: "text"
    PODMON_NODE_LOG_LEVEL: "info"
    PODMON_ARRAY_CONNECTIVITY_POLL_RATE: 20
    PODMON_ARRAY_CONNECTIVITY_CONNECTION_LOSS_THRESHOLD: 2
    PODMON_SKIP_ARRAY_CONNECTION_VALIDATION: true