In this article, I’m going to introduce and analyze a new feature that we added to the VSphere 6.0 suite, and you’ll know it as a VMCP, as a new mechanism for identifying problems with non-availability of hosts. It is introduced in the High Availability service to address the weaknesses identified in the HA detection mechanisms. The VMCP capability allows VSphere to detect problems with the inaccessibility of the storage, which can be specific to Permanent Device Loss or PDL and All Paths Down or APD processes, which are important concepts in the HA service. Browse together.
PDL or Permanent Device Loss?
Permanent means stable, Device means device, and Loss means destruction, this is actually a Event or occurrence in the HA service, we call it PDL in summary. When PDL occurs, our Storage Array will receive a service code, which means that the device is not available. The simplest example that I can do for you is when a LUN is corrupted and fails, or in another example, the network administrator may incorrectly or accidentally delete your WWN’s Zone settings. In the PDL mode of our Storage Array, we can still communicate with vSphere, or better communicate with vSphere hosts, but it sends alert messages or SCSI Sense messages to the device and alerts you of its status.
What is APD or All Path Down?
If our VSphere hosts cannot communicate with our storage device and no PDL code is sent to our host via the SCSI Code, its APD mode for the system. This mode is completely different from that of the PDL because in our host’s PDL mode, we know our storage array status using SCSI Code codes, but in the host’s APD state, we do not have any information about our storage status and an understanding of this case. This is not temporary or permanently disconnected. The device may return to the circuit or never be in the circuit. When the APD mode occurs, our host system still sends its own I / O commands to our storage device, and this will continue until the timeout period, called APD Timeout, and is reached. When the APD Timeout arrives, our host will not send any other Virtual Machine I / O commands to the Storage location, but a specific I / O type will only send Storage for identification to return, which does not include virtual machine traffic in any way. Is not. This traffic is actually more related to mounting NFS volumes that are created on the intended storage. Here, the traffic for virtual machines to the Storage will be completely cut off. By default, the APD Timeout is set to 140 seconds, but you can change this value using the Misc. APD Timeout parameter in the Advanced Settings of each Host. This traffic is actually more related to mounting NFS volumes that are created on the intended storage. Here, the traffic for virtual machines to the Storage will be completely cut off. By default, the APD Timeout is set to 140 seconds, but you can change this value using the Misc. APD Timeout parameter in the Advanced Settings of each Host. This traffic is actually more related to mounting NFS volumes that are created on the intended storage. Here, the traffic for virtual machines to the Storage will be completely cut off. By default, the APD Timeout is set to 140 seconds, but you can change this value using the Misc. APD Timeout parameter in the Advanced Settings of each Host.
What is VMCP or VM Component Protection?
Currently, the vSphere HA feature is capable of detecting PDL and APD and applying the appropriate collision with them if configured properly. In simple words, it can detect these incidents and deal with them appropriately. The first step you need to do in a HA structure is to enable the HA functionality with the VMCP. Enabling VMCP on vSphere means that you want to protect your virtual machines against PDL and APD using your HA Agent. Enabling VMCP on vSphere is simply ticking the checkbox that you see in the same name on the related page and has no particular complexity.
Now, we want to teach you how to configure the VMCP settings in HA, which has the most technical and theoretical side, to enable HA and VMCP and the corresponding settings. Your friends can easily enter the following path in the VCenter and fit the VMCP according to Enable shape:
- Cluster Settings > vSphere HA
- Host Hardware Monitoring
- VM Component Protection
- Protect Against Storage Connectivity Loss
After activating VMCP and HA, the next step is how to configure the VSphere to apply the appropriate approach to PDL and APD. Each of the settings you see in the image can be implemented separately. You can open and view the following by simply opening the subframes of the Failure Conditions section. In fact, in the same part as you enable VMCP, the following is located. Just click on VM Response.
Response for Datastore with Permanent Device Loss (PDL):
There are three actions or work that can be done against a PDL incident; these three tasks are very simple because the PDL process is a clear process:
Disabled: There is no work for existing VMs
Issue Events: Nothing is done for affected VMs, but the administrator will be notified about the PDL event.
Power off and restart VMs: All VMs that are in the HA set are unplugged and restarted on hosts that can communicate with Storage.
Response for Datastore with All Paths Down (APD):
Regarding the use of the APD process, there is a greater amount of Option, and the reason is quite clear. As we said that the APD process is an unknown process, that is, there is no perception of inaccessibility, it is unclear whether the occurrence has occurred continuously or this is a cross-sectional problem:
Disabled: There is no work for existing VMs
Issue Events: There is no action for VMs in the HA setup, but the administrator will be notified about the APD event.
(Power off and restart (conservative): Note that vSphere HA will not shut down and transfer virtual machines until it is trusted. There is another host that can restart the VMs. APD reporting the event, communicating with the HA Master, explains the amount of space needed to restart the VMs to the HA Master. If the HA Master recognizes that there is another host that has both the space and the resources needed It immediately removes problematic VMs from the circuit and restarts on a healthy Host. If the host that experienced the APD fails to the vSphere HA Master will not work.
(Power off and restart VMs (Aggressive: In this case, vSphere HA removes the affected machines, even if there is no proper host to reboot them. The host on which APD is reported is attempting to communicate with the HA Master continuously, and this is done to identify the resources needed to reboot the VMs. If the HA Master is not available, it cannot detect the status of resources at all. In this scenario, Host completely removes all VMs from the circuit, and it’s likely that there might be another healthy Host in the circuit that can restart VMs. However, using this option if the resources are never needed, HA cannot restart all affected VMs. This happens when there is a communication problem between the host and the HA Master.
Delay for VM failover for APD:
When the APD timeout reaches its own limit, which is 140 seconds, the VCMP will wait a while before it does its work and add a delay to its operational process. By default, this is a Waiting time or a delay of 3 minutes. In other languages, VCMP will wait 5 minutes and 20 seconds before starting to restart VMs. The timeout and delayed APD set is called VCMP Timeout.
Response for APD recovery after APD timeout:
This configuration tells vSphere HA that it will do a specific job if an APD is cleared after the APD Timeout has passed, but before the time when the relevant Delay is completed, when it may be possible to switch your service to Activate and do not need to do HA. This setting is set to Disabled mode, which is the default and does not do anything, and Reset VMs resets the hard reset to all its host VMs.