This morning we had an outage and while I did what I could to describe it via text, I can’t seem to come to grips with three unrelated issues occurring together creating so many pages.

Why?

Our network is segmented/separated by multiple physical segments and these unrelated incidents should not affect the other.

Basic Network Diagram

Basic Network Diagram

This graphic representation isn’t necessarily 100% accurate as it is a logic drawing with basic network interconnections in place. The physical machines where our virtualized servers live are not represented, though there are 8 of them. The NFS servers aren’t accurate either as it is multiple active/active NFS controllers exporting 3 NFS mount points to our internal infrastructure using a storage network. There is another NFS cluster that handles the real email and web storage that isn’t depicted.

Every production machine on the ipHouse network, whether physical or virtual, has at least 2 ethernet ports. One connected to external facing services and one on an internal network where we do our storage and secure data transfers. Some systems have more though it really depends on what service the server is providing.