Troubleshooting

Can Container Storage Module Operator manage existing drivers installed using Helm charts or the CSI Operator?

The Container Storage Module Operator is unable to manage any existing driver installed using Helm charts or the CSI Operator. If you already have installed one of the Dell CSI driver in your cluster and want to use the CSM operator based deployment, uninstall the driver and then redeploy the driver via Container Storage ModuleM Operator

Why do some of the Custom Resource fields show up as invalid or unsupported in the OperatorHub GUI?

The Container Storage Module Operator is not fully compliant with the OperatorHub React UI elements. Due to this, some of the Custom Resource fields may show up as invalid or unsupported in the OperatorHub GUI. To get around this problem, use kubectl/oc commands to get details about the Custom Resource(CR). This issue will be fixed in the upcoming releases of the Container Storage Module Operator.

How can I view detailed logs for the Container Storage Module Operator?

Detailed logs of the Container Storage Module Operator can be displayed using the following command:

kubectl logs <csm-operator-controller-podname> -n <namespace>

My Dell CSI Driver install failed. How do I fix it?

Describe the current state by issuing: kubectl describe csm <custom-resource-name> -n <namespace>

In the output refer to the status and events section. If status shows pods that are in the failed state, refer to the CSI Driver Troubleshooting guide.

Example:

Status:
	Controller Status:
	Available: 0
	Desired: 2
	Failed: 2
	Node Status:
	Available: 0
	Desired: 2
	Failed: 2
	State: Failed

Events Warning Updated 67s (x15 over 2m4s) csm (combined from similar events): at 1646848059520359167 Pod error details ControllerError: ErrImagePull= pull access denied for dellem/csi-isilon, repository does not exist or may require 'docker login': denied: requested access to the resource is denied, Daemonseterror: ErrImagePull= pull access denied for dellem/csi-isilon, repository does not exist or may require 'docker login': denied: requested access to the resource is denied

The above event shows dellem/csi-isilon does not exist, to resolve this user can kubectl edit the csm and update to correct image.

To get details of driver installation: kubectl logs <dell-csm-operator-controller-manager-pod> -n dell-csm-operator.

Typical reasons for errors:

  • Incorrect driver version
  • Incorrect driver type
  • Incorrect driver Spec env, args for containers
  • Incorrect RBAC permissions

My CSM Replication install fails to validate replication prechecks with ’no such host'.

In replication environments that utilize more than one cluster, and utilize FQDNs to reference API endpoints, it is highly recommended that the DNS be configured to resolve requests involving the FQDN to the appropriate cluster.

If for some reason it is not possible to configure the DNS, the /etc/hosts file should be updated to map the FQDN to the appropriate IP. This change will need to be made to the /etc/hosts file on:

  • The bastion node(s) (or wherever repctl is used).
  • Either the CSM Operator Deployment or ClusterServiceVersion custom resource if using an Operator Lifecycle Manager (such as with an OperatorHub install).
  • Both dell-replication-controller-manager deployments.

To update the ClusterServiceVersion, execute the command below, replacing the fields for the remote cluster’s FQDN and IP.

kubectl patch clusterserviceversions.operators.coreos.com -n <operator-namespace> dell-csm-operator-certified.v1.3.0 \
--type=json -p='[{"op": "add", "path": "/spec/install/spec/deployments/0/spec/template/spec/hostAliases", "value": [{"ip":"<remote-IP>","hostnames":["<remote-FQDN>"]}]}]'

To update the dell-replication-controller-manager deployment, execute the command below, replacing the fields for the remote cluster’s FQDN and IP. Make sure to update the deployment on both the primary and disaster recovery clusters.

kubectl patch deployment -n dell-replication-controller dell-replication-controller-manager \
-p '{"spec":{"template":{"spec":{"hostAliases":[{"hostnames":["<remote-FQDN>"],"ip":"<remote-IP>"}]}}}}'

How to update resource limits for CSM Operator when it is deployed using Operator Hub

In certain environments where users have deployed CSM Operator using Operator hub, they have encountered issues related to Container Storage Module Operator pods reporting ‘OOM Killed’. This issue is attributed to the default resource requests and limits configured in the CSM Operator, which fail to meet the resource requirements of the user environments. In this case users can update the resource limits from Openshift web console by following the steps below:

  • Login into OpenShift web console
  • Navigate to Operators section in the left pane and expand it and click on ‘Installed Operators’
  • Select the Dell Container Storage Modules operator
  • Click on the YAML tab under the operator and you will see ClusterServiceVersion(CSV) file opened in an YAML editor
  • Update the resource limits in the opened YAML under the section spec.install.spec.deployments.spec.template.spec.containers.resources
  • Save the CSV and your changes should be applied

Here are some installation failures that might be encountered and how to mitigate them.

Symptoms Prevention, Resolution or Workaround
The kubectl logs isilon-controller-0 -n isilon -c driver logs shows the driver cannot authenticate Check your secret’s username and password for corresponding cluster
The kubectl logs isilon-controller-0 -n isilon -c driver logs shows the driver failed to connect to the Isilon because it couldn’t verify the certificates Check the isilon-certs- secret and ensure it is not empty and it has the valid certificates. Set isiInsecure: "true" for insecure connection. SSL validation is recommended in the production environment.
The kubectl logs isilon-controller-0 -n isilon -c driver logs shows the driver error: create volume failed, Access denied. create directory as requested This situation can happen when the user who created the base path is different from the user configured for the driver. Make sure the user used to deploy CSI-Driver must have enough rights on the base path (i.e. isiPath) to perform all operations.
Volume/filesystem is allowed to mount by any host in the network, though that host is not a part of the export of that particular volume under /ifs directory “Dell PowerScale: OneFS NFS Design Considerations and Best Practices”:
There is a default shared directory (ifs) of OneFS, which lets clients running Windows, UNIX, Linux, or Mac OS X access the same directories and files. It is recommended to disable the ifs shared directory in a production environment and create dedicated NFS exports and SMB shares for your workload.
Creating snapshot fails if the parameter IsiPath in volume snapshot class and related storage class is not the same. The driver uses the incorrect IsiPath parameter and tries to locate the source volume due to the inconsistency. Ensure IsiPath in VolumeSnapshotClass yaml and related storageClass yaml are the same.
While deleting a volume, if there are files or folders created on the volume that are owned by different users. If the Isilon credentials used are for a nonprivileged Isilon user, the delete volume action fails. It is due to the limitation in Linux permission control. To perform the delete volume action, the user account must be assigned a role that has the privilege ISI_PRIV_IFS_RESTORE. The user account must have the following set of privileges to ensure that all the CSI Isilon driver capabilities work properly:
* ISI_PRIV_LOGIN_PAPI
* ISI_PRIV_NFS
* ISI_PRIV_QUOTA
* ISI_PRIV_SNAPSHOT
* ISI_PRIV_IFS_RESTORE
* ISI_PRIV_NS_IFS_ACCESS
* ISI_PRIV_STATISTICS
In some cases, ISI_PRIV_BACKUP is also required, for example, when files owned by other users have mode bits set to 700.
If the hostname is mapped to loopback IP in /etc/hosts file, and pods are created using 1.3.0.1 release, after upgrade to driver version 1.4.0 or later there is a possibility of “localhost” as a stale entry in export Recommended setup: User should not map a hostname to loopback IP in /etc/hosts file
Driver node pod is in “CrashLoopBackOff” as “Node ID” generated is not with proper FQDN. This might be due to “dnsPolicy” implemented on the driver node pod which may differ with different networks.

This parameter is configurable in both helm and Operator installer and the user can try with different “dnsPolicy” according to the environment.
The kubectl logs isilon-controller-0 -n isilon -c driver logs shows the driver Authentication failed. Trying to re-authenticate when using Session-based authentication The issue has been resolved from OneFS 9.3 onwards, for OneFS versions prior to 9.3 for session-based authentication either smart connect can be created against a single node of Isilon or CSI Driver can be installed/pointed to a particular node of the Isilon else basic authentication can be used by setting isiAuthType in values.yaml to 0
When an attempt is made to create more than one ReadOnly PVC from the same volume snapshot, the second and subsequent requests result in PVCs in state Pending, with a warning another RO volume from this snapshot is already present. This is because the driver allows only one RO volume from a specific snapshot at any point in time. This is to allow faster creation(within a few seconds) of a RO PVC from a volume snapshot irrespective of the size of the volume snapshot. Wait for the deletion of the first RO PVC created from the same volume snapshot.
Driver install or upgrade fails because of an incompatible Kubernetes version, even though the version seems to be within the range of compatibility. For example: Error: UPGRADE FAILED: chart requires kubeVersion: >= 1.22.0 < 1.25.0 which is incompatible with Kubernetes V1.22.11-mirantis-1 If you are using an extended Kubernetes version, please see the helm Chart and use the alternate kubeVersion check that is provided in the comments. Please note that this is not meant to be used to enable the use of pre-release alpha and beta versions, which is not supported.
Standby controller pod is in crashloopbackoff state Scale down the replica count of the controller pod’s deployment to 1 using kubectl scale deployment <deployment_name> --replicas=1 -n <driver_namespace>
fsGroupPolicy may not work as expected without root privileges for NFS only https://github.com/kubernetes/examples/issues/260 To get the desired behavior set “RootClientEnabled” = “true” in the storage class parameter