Saturday, May 09, 2020

Wow

Has it really been 2 weeks since I updated this? Has anyone even noticed?

Homelab
I added a node to my homelab. A reasonably powerful PowerEdge R340. It has a single CPU socket, 64GB RAM, and. 2x 1Gb/sec ethernet interfaces. I added an older Mellanox 10Gb NIC and a Dell EMC BOSS card with 2 mirrored 240GB SSD's. The machine also has a pair of 2TB 7.2k SATA drives. 

I've installed ESXi 6.7 on the machine and it was great. Peppy performance, although because it's not the same as my HCI cluster, it can't participate in vSAN (well, not THAT vSAN anyway). I literally spun up a FoH client on the machine, which proceeded to demonstrate the drawbacks of a Xeon E 2124. Friends don't let friends crush all four cores of a 4 core CPU.

I've decided that this class of machine is probably a little too lightweight as a "tech refresh" target for my HCI cluster, but makes a really nice management and TIG stack host.

Not much of a story but it's a slow news day.

Work Hacks
I have to say that working from home with zero airport visits for 75 consecutive days has my "honey do" list well sorted at home.  For work, I've focused on optimizing the "sit in a chair with back to back zoom calls with zero time in between for 14 hours a day for 5 consecutive days every week" (read that without spaces - you'll begin to understand the insanity). I've taken the following steps:

1. Bought and installed a standing desk. (Editor's note: WHAT TOOK SO LONG FOR ME TO DO THIS). I now stand for at the very least every other meeting.  Huge difference.

2. Decided that I am only attending 25 minutes of a 30 minute call - and 50 minutes of an hour long call. I'm also ending regular calls that I host 5 or 10 minutes early depending on the length. If we're not finished, we will schedule a follow-up call. I strongly encourage you to consider this. It's helping tremendously.

3. Declining meta-meetings. Think about the words there, and you'll understand what these are. If someone must have a meeting ABOUT a meeting, I'm not coming. The exception here is preparatory meetings for important client interactions (like being added to an existing conversation thread that I had not previously been a part of) - BUT - those meetings should really only be 15 minutes or so. Give me the cliff's notes backstory - I'll figure out the rest.

4.  Take time - block time - and make it immutable. Calendars like mine mean that I'm either forced to work all weekend to produce whatever it is I agreed to during the week, or work from 6am-8am and then 7pm-10pm on some days. Mine is a global role, so I'm frequently on calls at odd hours, so that plan (early and late work) doesn't really work -  and it's horrible for work/life balance. So -  I've blocked an hour each day for work. And that hour is sacrosanct. I don't answer the phone, decline invitations, and will not "give up" that hour for anything or anyone. I may move the hour based on demands, but out of my work day, I will have one hour of uninterrupted work time.

I would love to hear any work hacks you've come up with. Work hacks are part of a global weekly call I lead - we spend at least a few minutes talking work hacks and how to navigate this new reality we're all in.

/finis

Thursday, April 16, 2020

Technical debt? Or just bored?

Since the world has gone to complete lockdown, I found myself wanting to correct some technical debt in my lab (clean up cabling, etc). However, because of the lockdown, amazon took its' sweet time delivering the cool Monoprice slim cat6 cables. I finally got everything I need, and installed it today. These cables are awesome!

Instead of having a bunch of heavy cat6a cables poking out of a brushstrip, I created a patch panel on each end of the rack, and short patch cables to the switches. Can't do much about the DAC cables, but removing the cat6 cables and using the patches really cleaned it up. I also replaced the 16 port secondary management switch with a small Ubiquiti 5 port POE switch and put it in the back.

Take a look and see if you agree:

BEFORE

AFTER

/finis


Wednesday, March 18, 2020

vmware-statsmonitor service times out on VCSA 6.7u2

In my homelab, I've made some changes, rolled back some changes, blown up a bunch of stuff, rebuilt servers, tried Terraform to see if I could make it work (I would call it a 58% success); tried Ansible (which I am MUCH more familiar with so it was a smashing success)... All of that to say I've been mean to my infrastructure, so I decided it was time to give it a fresh start.

I used the iDRAC (Integrated Dell Remote Access Console) to mount the vSphere install ISO and reloaded vSphere on all four nodes from scratch. I also re-initalized all of my  Once that completed, it was off to the races with vCenter and Update Manager.

This is probably a really old issue, but hopefully this helps someone :)

The VCSA I installed was older, so I backed it up to my Drobo and upgraded to 6.7u2. That's where the fun begins. Before that, though - if you're new to vSphere and vCenter, take my advice (especially if you're using a homelab to learn) and enable SSH when you install the VCSA. You'll thank me later.

On reboot, the PSC appeared to struggle (well, to be frank, it failed) to start all of the services. Most troubling was that the vapi-endpoint service continually failed upon start. Here's where SSH comes in handy. I connected to the appliance and checked the services status. You COULD do the same thing in the :5480 VCSA management console, but not nearly as quickly as with SSH.

Running "service-control --status" shows what is running and what is not. As you can see from the screenshot, many of the services were not running. The quickest way to restart is to use service-control.

This KB article is where I found help.

https://kb.vmware.com/s/article/68149

This describes a different problem, but it seems that the vmware-statsmonitor service was taking more than 60 seconds to load, which caused all kinds of downstream havoc. I changed the timeout values as suggested and rebooted. 3 days later (IM KIDDING), all services were running again and all is well.

Once I changed the .json mentioned in the article, everything seems to be ok. Now I can backup vCenter again and upgrade to 6.7.u3... stay tuned...

/finis

Saturday, February 08, 2020

Rethink the possible - SAP HANA at home

The first thing I did with my new lab was connect it to my old lab. (yawn - and well, the power bill at some point becomes a problem)

CAUTION: This was an experiment. It involves mostly commercially supported configurations. I'm specifically "detail light" in this post because I don't want any assumptions being made about suitability or supportability for production workloads. Nothing about my homelab is suitable for production supported workloads.

I'm involved in a pretty cool project for work around SAP HANA virtualized on VMware vSphere. As a VMUG Advantage member, I'm loving the fact that I can get non-production licenses for just about anything I can think of when it comes to building out virtual datacenters. While I am not licensed for HANA at home, I did have 30 days to mess around with it, and mess around I did.

I wish I had taken screenshots, but here's the hot take. I connected my homelab with a 64GB HANA instance (and 3 application servers) to a 64GB HANA instance I built in Walldorf, Germany. This was a mostly functional landscape, with some bogus data I generated in my homelab - and used HANA System Replication to copy the data, and Dell EMC RecoverPoint for VMs for the application servers to Germany. This copy (with my exceptional cable modem service) took almost a week to complete, but just before I shut down my HANA landscape in my homelab, I was able to literally push all of the functions of my tiny little application instance over to Germany.

VMware vSphere as my virtualization engine provided stable, consistent infrastructure. I was using vSphere 6.7 in my homelab and 6.5u3 in the Germany VxRail cluster. Of course, consideration had to be made for a variety of configuration details (HW versions for the 6.7 VMs, etc). Let me tell you how incredibly simple VMware NSX virtual networking made this - especially since the test was across such great distances.

In the projects I'm on at work, this is a very similar scenario to what we're working on, but with much bigger landscapes. For me personally, provided a proof point as to how simple SAP HANA can be when it's built on top of intelligent virtualization. The great thing about this to me is, I proved several important concepts in my own mind that before this, I had only read about in whitepapers.

I'm a very pragmatic technologist, and the ability to perform a "cloud to on-premise and back" landscape migration / replication / copy is hard. Systems must be 100% perfect across potentially hundreds of them. Complex application integrations must remain established on either side of the landscapes. Data integrity is, of course, sacrosanct. Intelligent virtualization helps customers with a consistent view, regardless of the "iron underneath". There is no better than VMware vSphere, vSAN  and NSX - especially on VxRail.  My little experiment has given me a high degree of confidence that with the right tools, time and smart people, enormous projects at enormous scale can be accomplished with the right tools. A proof point in my own mind on the work I do every day.

Interested in learning more? Find out more about VMUG Advantage here. Learn about SAP infrastructure for free here.

/finis

Monday, November 11, 2019

Homelab: Update


I promised an update when the rework was complete, and here it is.

First, the rack and stack. This took a bit of time, and a bit of patience as I sourced the best deals on my favorite auction site. My patience paid off, and here's the finished product. From top down:

Ubiquiti UniFi USW-XG (16 port 10Gb ethernet)
AC Infinity sensor based intelligent fan
Ubiquiti UniFi USW-16 (18 port 1Gb ethernet)
Leviton commercial power conditioner
Dell EMC PowerEdge R620 x4
2014 Mac Mini (home media server)
Drobo 5c (connected to Mac) with 5x 1.5TB SSD
Drobo 5n (clients, vCenter backups) with 5x 8TB HDD

All of this is connected to my Ubiquiti UniFi based system. The only component of the network that is not UniFi is a SonicWall TZ350 just before my Comcast Business CPE.

This home lab is surprisingly quiet and consumes (again, surprisingly) much less power than I originally planned for or had available when I built it.

It's all virtualized using vSphere (of course), managed by vCenter and vRealize Log Insight (VMUG Advantage is awesome by the way), and I recently installed a TIG stack (Telegraf, InfluxDB and Grafana) to see what kind of metrics I could get out of it. Questions / comments welcome.





/finis







Saturday, November 02, 2019

Homelab: connectivity

I've posted a little bit about my home lab, and have recently consolidated everything into a single rack. I'm still waiting on a few components for the rack to address recirculation and aesthetics, so I'll wait to publish pictures until that's complete.

I'm a big fan of Ubiquiti UniFi products, and they suit a lab's needs well. If you're looking for a managed solution of simple layer 2 switches, go check them out. I'll publish links to the products as I describe them.

My lab is connected to my home network via the default VLAN. That's the only way into the lab. Everything else is isolated within the lab networks. My home network consists of a pretty robust firewall, and everything behind it is Ubiquiti.

I have a 1Gb fiber connection between the switch that serves my home and the lab "core" switch. The lab core is a UniFi Switch 16 XG (link) that offers (12) 1/10 Gb/sec SFP+ capable ports, and (4) 1/10 Gb/sec RJ45 ports. It is connected to a UniFi Switch 16 (link) and a UniFi Switch 8 (link).

The VLAN configuration simplifies everything in the network. Rather than worry about port assignments, VLAN to port tagging, etc; I decided to create my distributed virtual switches with the VLAN tag in the Distributed Port Groups. This way, I can maintain flexibility and simplicity. The only exception in this scheme is in the connections to the NAS platform which is connected to the Default VLAN and is accessible from both the home network and the lab network.

The server connectivity is shown to the right.

I didn't have a 24 port switch, so I decided to separate management and provisioning. There's not really a need to do that for a small environment, but I could - so I did.

The vMotion and vSAN ports are separate, and the DVS' are using separate VLANs in the connections. I could have used a LACP connection on these ports but in the interest of simplicity, these connections are separate 10Gb/sec using SFP+ DACs.

Hopefully if you're building a server based home lab, you find this helpful. Comments / questions welcomed below.

/finis

Homelab: The quest for the circle of trust

NOTICE: This contains some advanced and potentially dangerous configuration steps. If you're at all uncertain on this, please don't do it. I cannot assume any responsibility for your system or information security. This worked for me, and may introduce serious risk to your own system. Know what you're doing, and how to undo it - or don't read this.

I would like to address an issue that has come up with Mac OS Catalina (10.15.x). Besides the rapid release of fixes, etc associated with iOS 13 and Catalina, one other issue has arisen that I found the workaround for. It truly is a workaround, and appears to affect ONLY Chrome on Catalina.

NET::ERR_CERT_REVOKED

SSL certificates are a pain by any measure, and self-signing isn't working anymore on Chrome / Catalina. SO, you can either get / create your own (a massive pain), or follow the steps below.

The NET:ERR_CERT_REVOKED message can't be bypassed like some SSL errors that Chrome reports. In the case where you're on the internet or looking into a system that you're not completely familiar with, this is a good thing. However, in the case where you KNOW the system (home labs are a perfect example), this is a royal pain.

So, upon connecting to my lab post-upgrade (to Mac OS Catalina), I received this message on all of my "home" systems. Connecting via Safari worked, as did connecting via Firefox - so I knew it was (1) a certificate issue, and (2) Chrome. Here's the workaround:


1. Open the URL in Safari ex: 192.168.1.200. You will receive the usual SSL message. 
Select "Show Details"
2. Here's a little known Mac OS trick. Once you view the details of the offending certificate in Safari, you can drag the certificate to your desktop by click / hold / drag the image. You'll then have your certificate on your desktop.

3. Once it's there, open "Keychain Access" and drag the certificate into your certificate store.  Once there, you need to expand the "Trust" section at the top and then select "Always Trust". This will then allow you to connect via Chrome. PLEASE NOTE: If you are at all unsure about what you're doing here, please do not do it. This bypasses a VERY significant security feature of Mac OS and Chrome. I am only doing this because I trust these systems.

I hope this works for you. I would also STRONGLY state that this process should NEVER be used on any SSL protected connection that you are not 100% responsible for, and definitely not for something outside of your own network and control.


/finis

Thursday, October 17, 2019

PowerEdge and vSphere. My home lab upgrade

So I just finished installing my new Dell EMC PowerEdge servers in my home lab. The difference one generation of server makes is astounding. The new machines are pretty stout and will serve well in the experiments and learning I want / need to do.

Home labs get budget racks
4x Dell EMC PowerEdge R620 (Sandy Bridge EP)
- 2x Intel Xeon E5-2670 2.6GhZ eight core CPU
- 128GB RAM each
- 1x Dell 400GB SAS SSD (vSAN Cache tier)
- 2x Dell 1.2TB SAS 10k (vSAN capacity tier)
- 2x Dell 600GB SAS 10k (local datastore)
- 2x Samsung 16GB SD-Card (boot)

Ubiquiti UniFi 16 port 1Gb switch (+ 2 1Gb SFP)
Ubiquiti UniFi 16 port 10Gb switch
Ubiquiti UniFi 8 port 1Gb switch
Spanning Tree enabled
Uplinked to my "home network" but isolated from
it except for management (all workloads isolated but internet accessible)

The nodes are connected as follows:
- iDRAC is on a dedicated VLAN (16 port UniFi)
- eth4 is on VLAN 1 (16 port UniFi)
- eth3 is on a dedicated routed VLAN (8 port UniFi)
- eth5 is on a 10Gb SFP+ DAC for VMTN (closed VLAN)
- eth6 is on a 10Gb SFP+ DAC for vSAN (closed VLAN)

Configuring these machines was SO simple.

  1. Since I bought them used, I connected to the iDRAC first and downloaded the Enterprise license key. I then reset the iDRAC. This took a few minutes - but trust me - it's worth it to not have to slog through troubleshooting only to find out some obscure setting was in your way.
  2. Once that was finished, I connected a local keyboard and monitor to each server and set the static IP address, admin user, and a few other options. This can be done remotely, but it's kind of a pain to discover the iDRAC and have to reconnect. The 5 minutes it took was worth the "in person" visit to my basement.
  3. I then used Virtual Media to mount the Dell EMC Remote Update ISO. If you're not already aware of this gift - get aware. It's an ISO image (so could be burned to DVD and run locally if you wanted to) that I mounted to the virtual CD and booted the server from. Think of this as a run-time out of band Lifecycle Management tool for all of the devices in your compute node. It updates everything it finds to the versions on the ISO and restarts the system.

    You can find the ISO for your system here.
  4. I proceeded then to mount the vSphere (Dell EMC custom build ISO) image and installed vSphere to the SD-Card. 
Once all of that was finished, I configured my DVS' and vmKernel NIC's and was ready to start playing. 

But wait... there's more...

Backstory: Every Dell EMC PowerEdge contains a Lifecycle Management utility in its' pre-boot environment. This LCM process allows you to connect to Dell from any internet accessible network and - just like the ISO in step #3 it will analyze everything in your system and offer to update it. Since the ISO I downloaded in step #3 was from July, there were most certainly updates issued by Dell EMC since then.

Anyone want to buy some R610's and a NetApp 10Gb switch?
So, I configured everything - including vSAN and stuff is running beautifully. I then put Server #1 into Maintenance mode (vSAN is configured for FTT1) and proceeded to reboot into the LCM. Sure enough, it found several firmware items that were newer than what was installed, so I let it do its' thing.

vSAN is magic. Period. It has come SO far in so short a period of time - I'm a HUGE fanboy. The LCM process on Server #1 took about 45 minutes. Tons of time before vSAN rebuild starts. Except I'm an idiot. I got distracted - forgot the server was updating - and what do you know... 90 minutes or so after the LCM started, I realized it was finished and rebooted ESX.

Before ESX completely booted, vSAN stopped rebuilding and re-synchronized the node with the other 3 members. 

I'm really enjoying my time with VMware. I'm hoping (now that I have enough CPU and RAM) to start messing around with PKS and, later, OpenShift. I'll continue to update...


Proudly displayed on the wall in my "lab" because why not?


/finis