Share and Explore the Tech Inside You!!!
Menu


Why is soak timer actually needed on Optical devices?

 

We all know that during troubleshooting we look for detected fault, alarms, or performance parameters on the monitoring points and then correlate with other factors to conclude the root cause. Fault detection is the process of determining that a fault exists. Fault detection capabilities are intended to detect all actual and potential hardware and software troubles so that the impact on service is minimized and consistent with the reliability and service objectives.

 

 

Now let's talk about what we are intended to discuss here.

 

In accordance with GR-474-CORE,  a time period known as “soak time” is incorporated in the definition of a signal failure to allow for momentary transmission loss e.g from single transient events, and to protect against false alarms.

 

For transport entities, the soaking interval is entered when a defect event is detected and is exited only if either the defect persists for the soak time interval and a bona fide failure is declared, or normal transmission returns within the soaking time interval.

 

 

Also keep in mind that circuits do not use the soak timer, but ports do.

 

For example, the time period for DS3 signal failure entry/clearing is 2.5 ± 0.5 seconds and 10 ± 0.5 seconds  (more at  https://mapyourtech.com/entries/general/what-is-the-soak-time-period-to-detect-los-on-a-port-  )

 

 

 

Do we still need 50 ms(milliseconds) restoration for running telecom services?

 

It was always exciting discussing 50ms switching/restoration time perspective for telecom circuits for every engineer who belongs to some part of telecom services including, optical, voice, data, microwave, radio, etc. I was also seeking it since the start of my telecom career, and I believe still somewhere at some point in time, engineers or telecom professionals might be hearing this term and wonder about why (“WHY”) is this? So, I researched over available knowledge pools, and using my experience, I thought of putting it into words to enlighten some of my friends like me.

 

The 50 ms idea originated from Automatic Protection-based Switching subsystems during early digital transmission systems. It was not actually based on any particular service requirement. The value persists because it is not entirely based on technical considerations which could resolve it, but has roots in historical practices and past capabilities and has been a tool of certain marketing strategies.

 

Initially, digital transmission systems based on 1:N APS typically required about 20 ms for fault detection10 ms for signaling, and 10 ms for the tail-end transfer relay operationso the specification for APS switching times was reasonably set at 50 ms, allowing a 10 ms margin

 

 

 

For information, early generations of DS1 channel banks (1970s era) also had a Carrier Group Alarm (CGA) threshold of about 230 ms. The CGA is a time threshold for the persistence of any alarm state on the transmission line side (such as loss of signal or frame synch loss) after which all trunk channels would be busied out. But the requirement for 50 ms APS switching stayed in place, mainly because this was still technically quite feasible at no extra cost in the design of APS subsystems. 

 

The apparent sanctity of 50 ms was further entrenched in the 1990s by vendors who promoted only ring-based transport solutions and found it advantageous to insist on 50 ms as the requirement, effectively precluding distributed mesh restoration alternatives under equal consideration start of the SONET era. 

 

As a marketing strategy, the 50 ms issue served as the "mesh killer" for the 1990s as more traditional telcos were bought into this as reference.

 

On the other hand, there was also real urgency in the early 1990s to deploy some kind of fast automated restoration method relatively immediately. This lead to the quick adoption of ring-based solutions which had only incremental development requirements over 1+1 APS transmission systems. However, once rings were deployed, the effect was to only further reinforce the cultural assumption of 50 ms as the standard. Thus, as sometimes happens in engineering, what was initially a performance capability in one specific context (APS switching time) evolved into a perceived requirement in all other contexts.

But the "50 ms requirement" is undergoing serious challenges to its validity as a ubiquitous requirement, even being referred to as the "50 ms myth" by data-centric entrants to the field who see little actual need for such fast restoration from an IP services standpoint. Faster restoration is by itself always desirable as a goal, but restoration goals must be carefully set in light of corresponding costs that may be paid in terms of limiting the available choices of network architecture. In practice, insistence on "50 ms" means 1+1 dedicated APS or UPSR rings (to follow) are almost the only choices left for the operator to consider. But if something more like 200 ms is allowed, the entire scope of efficient shared-mesh architectures becomes available. So it is an issue of real importance as to whether there are any services that truly require 50 ms.

 

Sosnosky's original study found no applications that require 50 ms restoration. However, the 50 ms requirement was still being debated in 2001 when Schallenburg, understanding the potential costs involved to his company, undertook a series of experimental trials with varying interruption times and measured various service degradations on voice circuits, SNA, ATM, X.25, SS7, DS1, 56 kb/s data, NTC digital video, SONET OC-12 access services, and OC-48. He tested with controlled-duration outages and found that 200 ms outages would not jeopardize any of these services and that, except for SS7 signaling links, all other services would in fact withstand outages of two to five seconds.

Thus, the supposed requirement for 50 ms restoration seems to be more of a techno-cultural myth than a real requirement—there are quite practical reasons to consider 2 seconds as an alternate goal for network restoration. This avoids the regime of connection and session time-outs and IP/MPLS layer reactions but gives a green light to the full consideration of far more efficient mesh-based survivable architectures.

 

  A study done by Sosnosky provides a summary of effects, based on a detailed technical analysis of various services and signal types. In this study, outages are classified by their duration and it is presented how with the given different outage time, main effects/characteristics change.

 

 

Conclusive Comment

 

Considering state-of-art technologies evolving overtimes in all aspects of telecommunication fields, switching speed is too fast, even hold-up-timer (HUT) and hold-down-timers or hold-off-timers are playing significant roles that can hold the consequent actions and avoids unavailability of service. Yes, there will definitely be some packet losses in the services which could be visible as some form of errors in the links or may increase latency sometimes but as we know it varies with the nature of services like voice, data, live stream, internet surfing, video buffering, etc. So we can say that in the recent world the networks are quite resistant to brief outages, although it could vary based on the architecture of the network and flow of the services. Even 50ms or 200ms outages would not jeopardize services (data, video, voice) and it will be based on network architecture and routing of services.

Would love to see viewers comment on this and further discussion.

Reference
Mesh-Based Survivable Networks: Options and Strategies for Optical, MPLS, SONET, and ATM Networking By Wayne D. Grover

 

 

 

What is the reason behind most of the EDFAs (Erbium Doped Fiber Amplifier) using only 980nm and 1480nm as their pumping wavelengths and not any other wavelengths?

The pump wavelength used is either 980nm or 1480 nm due to the availability of these laser sources.  In the course of the explanation let see the energy level diagrams for the (Erbium) Er3+ ion, the absorption band of Er+, and the pump efficiency.

 

 

 

 

(a) Energy levels of erbium ions and (b) gain and attenuation spectra.

 

 

Excited State Absorption (ESA)

Ground-state absorption (GSA)

 

 

There are several states to which the erbium ions can be pumped using sources of different wavelengths, such as those operating at 1480, 980, and 800 nm. However, in practical optical systems, the pump wavelength must provide a high power to achieve a high gain per pump.

 

The commonly available laser diodes can operate at 800, 980, and 1480 nm, but the pump efficiency can go more than 1 dB/mW with low attenuation depending on the pump frequency.

 

The only pump wavelength laser sources that can give a high pumping efficiency with lower attenuations are those operating at 980 and 1480 nm.

 

In practice, the 980 nm pumping source is commonly used due to its high gain coefficient (4 dB/mW). The difference in the effects of these two wavelength sources is mainly caused by the absorption and emission factors.

 

 

Vendors Making these laser Pumps

 

 

Reference: OPTICAL FIBER COMMUNICATIONS SYSTEMS- Le Nguyen Binh

 

 

Quick note on Pre-FEC , Post-FEC ,BER and Q relation.

Pre-FEC BER corresponding to Q.

  • BER and Q can be calculated from one another according to the following relationship:

  • Or, in Excel: dBQ = 20*LOG10(-NORMSINV(BER))
  • dBQ = 20Log(Q)

Post-FEC BER/Q

  • Post-FEC BER of <1e-15 essentially means no post-FEC errors
  • This is equivalent to ~18dBQ, which is about as high as can be measured

 

FEC Limit

  • This is the lowest Q (or highest BER) that can be corrected by the FEC
  • Beyond this post-FEC errors will occur

e.g

 FEC Limit: 8.53dBQ or a BER of 3.8e-3
 FEC Limit: 5.23dBQ or a BER of 3.4e-2 (<97% of bits have to be correct!)
 
 

Pre-FEC Calculation eg.

 

 

Assume:

219499456 : Bit Errors

0 : Uncorrectable words

6.4577198E-6 : PRE-FEC BER

Assume the Time at this instant of performance was 12:05:04 which means : 304 Seconds since the last time interval.

       
Assume The FEC settings was : STANDARD FEC so the Rate used for 100 G transponder is : 1.1181 * 10^11
       
General Formula to calculate PRE_FEC: 
       
PRE_FEC BER =    TotalErrors 
              ------------------------------------------

                (secsFromLast * (rate)) 

TotalErrors =((bitErrorCorrected count + (9 * (uncorrected Words count))

   
       
So Substituting the Values     
       
219499456 / (304*1.1181 * 10^11) = 6.4577198E-6   
       

 

 

Quick note on Physical Coding Sublayer (PCS) error in Optical System

What is PCS ?

The Physical Coding Sublayer (PCS) is a networking protocol sublayer in the  Ethernet standards. This layer resides at the top of the physical layer (PHY) which provides an interface between the Physical Medium Attachment (PMA) sublayer and the media-independent interface (MII). This layer is responsible for coding and decoding data streams flowing to and from the MAC layer , scrambling and descrambling it, block and symbol redistribution, alignment marker insertion and removal, and lane block synchronisation .Currently most of the optical client ports supports PCS lane to enable high data rate .

Where actually PCS layer lies ?

 

e.g

 

How to troubleshoot PCS errors issues in Optical Network links?

If you see PCS errors on the interfaces; it may cause the link to flap or you can see errors on the client interfaces of the optical or router ports. PCS block reports signal fail/signal degrade too based on pre-set thresholds.

 

  • Sometime you may see bit errors or error block in the performance of the interface.Bit errors can also be converted to PCS errors.PCS errors are generally due to physical component degrade or failure  issues like problem in  physical interface mapper, damaged or attenuated fiber, issue on patch panel,ODF or due to faulty or damaged optic pluggables.  The higher rate we go the complexity of the internal mapper/components increases so performance becomes more sensitive to optical path perturberation .
  • PCS errors are also visible on the interfaces if there is some activity involving manual fiber pull ,device reboots ,optics replacement etc.  During link bring-up or bring down or flapping kind of situation, it is expected to see PCS errors increase for a short interval of time; which  is because of the  initial synchronization or skew-deskew process  of the two Ethernet end points. PCS errors are always counted from the incoming direction on the receiving node.
  • The other reasons to see PCS errors could be damaged or bad fiber, faulty optical pluggable (sfp/xfp/cfp etc).
  • Low receive power on the interface can also result in this kind of error so it is always recommended to troubleshoot or investigate on physical fiber as well as physical port on the devices(router/optical client ports).

For PCS lane based modules like SR4,LR4 ,LR10  or multi lane pluggables ,it is recommended to see errors on the lanes of the pluggable. if only few lanes are having issue ,it is better to suspect the connector or the optical XFP/CFP.

Also there is a limit for max difference in receive/transmit power between any two lanes .If the difference is greater than the threshold it may also result in issues.

Max Difference in receive power between any two lanes

100GBASE-LR4  5.5dB
200GBASE-FR4  4.1dB
200GBASE-LR4  4.2dB
400GBASE-FR8  4.1dB
400GBASE-LR8  4.5dB

Max Difference in transmit power between any two lanes

100GBASE-LR4  5dB
200GBASE-FR4  3.6dB
200GBASE-LR4  4 dB
400GBASE-FR8  4dB
400GBASE-LR8  4.5dB

e.g 

Consider the case of using an LR4 CFP for the optical transceiver; each of the 4 wavelengths used on the link will be carrying 5 PCS lanes. In the case of 5 of the PCS lanes being in errors this may indicate the errors being specific to that wavelength and so areas of investigation should include the individual transmitters and receivers within the CFP.

If a 10 lane CFP (SR10 or LR10) is being used then each wavelength (in the case of the LR10) or fibre (in the case of the SR10) would each be carrying two PCS lanes. In this case then if two PCS lanes within the same CAUI lane are found to contain errors or defects then as well as investigating the CAUI the two lanes would also be carried on the same wavelength or fibre. In this case once again  the optical components should be investigated at both ends of the link. In the case of an SR10 based link the multi-fibre cable should also be checked as it may be possible that one of the individual fibres has been damaged within the cable.

 

CAUI -(Chip to) 100Gb/s Attachment Unit Interface

CFP :Centum Form factor Pluggable

Reference:https://testing100g.net/troubleshooting-100g-links-with-pcs-lanes/

 

View older posts »

Apps

MapYourTech.com for Optical Fiber Professionals
Power(mW) Power(dBm)   
Coupling ratio (%) Insertion Loss(dB)   
Frequency(THz) λ (nm)   
Δƒ(GHz) Δλ(nm)   
Q-factor BER

Views

1053711

Comments

Best explanation ever! May Jesus bless you.

Thank you, very good article.

Its is little hard to understand could u please elaborate with more example

Hi Pankaj,

Thanks for reading this article.I am glad that it is helpful.

Pt(Transmit) you can get from Transmit power from 6500 Amplifier modules.I believe for reflected power there is a photodiode which is attached by coupler on the Tx module.This Photodiode detects the reflected power and ORL calculation is done. This value is not visible in management interfaces like Site Manager but you can definitely see it via debug commands .Recommending to contact some Ciena Engineer who can guide you to get the reflected value.

This link will be helpful to visualize the context
https://www.rfwireless-world.com/Terminology/OCWR-method-vs-OTDR-method.html

Thanks

Requests Articles

Flag Counter

Feeds