Troubleshooting

Troubleshooting guide

Symptoms	Prevention, Resolution or Workaround
Persistent volumes don’t get created on the target cluster.	Run `kubectl describe` on one of the pods of replication controller and see if event says `Config update won't be applied because of invalid configmap/secrets. Please fix the invalid configuration`. If it does, then ensure you correctly populated replication ConfigMap. You can check the current status by running `kubectl describe cm -n dell-replication-controller dell-replication-controller-config`. If ConfigMap is empty, please edit it yourself or use `repctl cluster inject` command.
Persistent volumes don’t get created on the target cluster. You don’t see any events on the replication-controller pod.	Check logs of replication controller by running `kubectl logs -n dell-replication-controller dell-replication-controller-manager-<generated-symbols>`. If you see `clusterId - <clusterID> not found` errors then be sure to check if you specified the same clusterIDs in both your ConfigMap and replication enabled StorageClass.
You apply replication action by manually editing ReplicationGroup resource field `spec.action` and don’t see any change of ReplicationGroup state after a while.	Check events of the replication-controller pod, if it says `Cannot proceed with action <your-action>. [unsupported action]` then check spelling of your action and consult the Replication Actions page. Alternatively, you can use `repctl` instead of manually editing ReplicationGroup resources.
You execute failover action using `repctl failover` command and see `failover: error executing failover to source site`.	This means you tried to failover to a cluster that is already marked source. If you still want to execute failover for RG, just choose another cluster.
You’ve created PersistentVolumeClaim using replication enabled StorageClass but don’t see any RGs created in the source cluster.	Check annotations of created PersistentVolumeClaim. If it doesn’t have `annotations` that start with `replication.storage.dell.com` then please wait for a couple of minutes for them to be added and RG to be created.
When installing common replication controller using helm you see an error that states `invalid ownership metadata` and `missing key "app.kubernetes.io/managed-by": must be set to "Helm"`	This means that you haven’t fully deleted the previous release, you can fix it by either deleting entire manifest by using `kubectl delete -f deploy/controller.yaml` or manually deleting conflicting resources (ClusterRoles, ClusterRoleBinding, etc.)
PV and/or PVCs are not being created at the source/target cluster. If you check the controller’s logs you can see `no such host` errors	Make sure cluster-1’s API is pingable from cluster-2 and vice versa. If one of your clusters is OpenShift located in a private network and needs records in /etc/hosts, `exec` into controller pod and modify `/etc/hosts` manually.
After upgrading to Replication v1.4.0, if `kubectl get rg` returns an error `Unable to list "replication.storage.dell.com/v1alpha1, Resource=dellcsireplicationgroups"`	This means `kubectl` still doesn’t recognize the new version of CRD `dellcsireplicationgroups.replication.storage.dell.com` after upgrade. Running the command `kubectl get DellCSIReplicationGroup.v1.replication.storage.dell.com/<rg-id> -o yaml` will resolve the issue.
To add or delete PV s in the existing SYNC Replication Group in PowerStore, you may encounter the error `The operation is restricted as sync replication session for resource <Replication Group Name> is not paused`	To resolve this, you need to pause the replication group, add the PV, and then resume the replication group (RG). The commands for the pause and resume operations are: `repctl --rg <rg-id> exec -a suspend` `repctl --rg <rg-id> exec -a resume`
To delete the last volume from the existing SYNC Replication Group in Powerstore, you may encounter the error ‘failed to remove volume from volume group: The operation cannot be completed on metro or replicated volume group because volume group will become empty after last members are removed’	To resolve this, unassign the protection policy from the corresponding volume group on the PowerStore Manager UI. After that, you can successfully delete the last volume in that SYNC Replication Group.
When running CSI-PowerMax with Replication in a multi-cluster configuration, the driver on the target cluster fails and the following error is seen in logs: `error="CSI reverseproxy service host or port not found, CSI reverseproxy not installed properly"`	The reverseproxy service needs to be created manually on the target cluster. Follow the instructions here to create it.
When getting the following error for CSI-Powerscale with Replication with encryption enabled: `SyncIQ policy failed to establish an encrypted connection`, the Replication groups and PVC’s won’t be created at target cluster.	The `encryption required` flag in the SyncIQ settings was set to “yes” by default in OneFS 9.0+. To rectify this error, please follow the following article: https://www.dell.com/support/kbdoc/en-us/000215174/isilon-synciq-9-0-all-policies-fail-when-source-or-target-cluster-is-on-onefs-9-0-with-no-node-on-source-cluster-was-able-to-connect-to-target-cluster