1.OverView
Availability is a probabilistic measure of the length of time a system or network is functioning.
- Generally calculated as a percentage, e.g. 99.999% (referred to as 5 nines up time) is carrier grade availability.
- A network has a high availability when downtime / repair times are minimal.
- For example, high availability networks are down for minutes, where low availability networks are down for hours.
- Unavailability is the percentage of time a system is not functioning or downtime and is generally expressed in minutes.
- Unavailability = (1 – Availability)*365*24*60
- Unavailability(U)=MTTR/MTBF
- The unavailability of a 99.999% available system is 5.3 minutes per year.
- Availability is generally measured as either failure rates or mean time before failure (MTBF).
- Availability calculations always assume a bi-directional system.
2.Circuit vs. Nodal Availability
Circuit and nodal availability measure different quantities. To help explain this clearly un-availability (Unavailability=1-Availablity) will be used in this section.
- Circuit un-availability is a measure of the average down time of a traffic demand / service.
- A circuit is un-available only if traffic affecting components that help transport the demand / service have failed.
- Circuit unavailability is calculated by considering the unavailabilities of components which are traffic affecting and by taking into consideration those components that are hardware protected.
- For example, the failure of both 10G line cards on an NE can cause a traffic outage.
- Nodal un-availability is a measure of the average down time of a node.
- Each time there is a failure in a node, regardless if it is traffic affecting or not, an engineer is required to visit the node to fix the failure.
- Therefore nodal un-availability is based on calculated failure rates, it is still a direct measure of an operational expenditure.
- Nodal unavailability is calculated by adding all components of a network element regardless of hardware protection, i.e. in series.
- For example, failure of a protected switch card is non-traffic affecting but still requires a site visit to be replaced.
3.Terms & Definitions
Failure rate
- Failure rate is usually measured as Failures in Time (FIT), where one FIT equals a single failure in one billion (109) hours of operation.
- FITs are calculated according to industry standard (Telcordia SR 332).
MTBF- (Mean time between failure)
- Average time between failures for a given component.
- Measured either in hours or years.
- MTBF is inversely proportional to FITs.
MTTR-(Mean time to repair)
- Average time to repair a given failure.
- Measured in hours.
- Availability is always quoted in terms of number of nines
- For example, carrier grade is 5 9’s, which is 99.999%
- Availability is better understood in terms of unavailability in minutes per year
- Therefore for an availability of 99.999%, the unavailability or downtime is 5.3 minutes per year