Wednesday, March 18, 2020

vmware-statsmonitor service times out on VCSA 6.7u2

In my homelab, I've made some changes, rolled back some changes, blown up a bunch of stuff, rebuilt servers, tried Terraform to see if I could make it work (I would call it a 58% success); tried Ansible (which I am MUCH more familiar with so it was a smashing success)... All of that to say I've been mean to my infrastructure, so I decided it was time to give it a fresh start.

I used the iDRAC (Integrated Dell Remote Access Console) to mount the vSphere install ISO and reloaded vSphere on all four nodes from scratch. I also re-initalized all of my  Once that completed, it was off to the races with vCenter and Update Manager.

This is probably a really old issue, but hopefully this helps someone :)

The VCSA I installed was older, so I backed it up to my Drobo and upgraded to 6.7u2. That's where the fun begins. Before that, though - if you're new to vSphere and vCenter, take my advice (especially if you're using a homelab to learn) and enable SSH when you install the VCSA. You'll thank me later.

On reboot, the PSC appeared to struggle (well, to be frank, it failed) to start all of the services. Most troubling was that the vapi-endpoint service continually failed upon start. Here's where SSH comes in handy. I connected to the appliance and checked the services status. You COULD do the same thing in the :5480 VCSA management console, but not nearly as quickly as with SSH.

Running "service-control --status" shows what is running and what is not. As you can see from the screenshot, many of the services were not running. The quickest way to restart is to use service-control.

This KB article is where I found help.

https://kb.vmware.com/s/article/68149

This describes a different problem, but it seems that the vmware-statsmonitor service was taking more than 60 seconds to load, which caused all kinds of downstream havoc. I changed the timeout values as suggested and rebooted. 3 days later (IM KIDDING), all services were running again and all is well.

Once I changed the .json mentioned in the article, everything seems to be ok. Now I can backup vCenter again and upgrade to 6.7.u3... stay tuned...

/finis