Network-as-Code for Optical Automation
Optical configuration moves from a CLI session on a single node into a Git repository, where every wavelength, gain target and FEC setting is reviewed, tested in a pipeline, and reconciled against streaming telemetry. This is what GitOps looks like when the device under management is a ROADM.
1. Introduction
A wavelength turn-up on a closed optical network still looks much as it did a decade ago: an engineer opens a craft session to a ROADM, types a sequence of vendor commands, watches a power reading settle, and saves the running configuration to a file that exists nowhere a colleague can find it. The change is real, the documentation is a memory, and the only rollback is whatever config was captured before someone started typing. That model does not survive contact with a network of a few hundred nodes and several vendors, and it is the gap that Network-as-Code (NaC) closes.
Network-as-Code applies the discipline that software teams built for application delivery to network configuration. The desired state of the network lives in a Git repository as declarative data, changes arrive as pull requests that are reviewed and tested by an automated pipeline, and a delivery system pushes the approved state to devices and then checks that the live network matches. The optical layer is, against intuition, one of the better-prepared domains for this, because optical equipment was dragged into model-driven management early by the hyperscale operators who refused to run proprietary element managers at cloud scale.
This article works through the parts that make NaC real on an optical network: the data-modelling language and protocols that let a pipeline talk to a ROADM, the structure of a continuous-integration and continuous-deployment pipeline built around a candidate datastore, the formulas that quantify what the discipline buys you, and where the approach reaches its limits — because a Git commit cannot change the physics of a fibre span. It also covers the skills shift underneath, since the engineers who can write a YANG payload and reason about chromatic dispersion in the same afternoon are the ones operators are now competing to keep. For the protocol groundwork that sits beneath everything here, the NETCONF and YANG basics guide is the companion reference.
2. Fundamentals: GitOps meets the optical layer
GitOps rests on four ideas, and each one maps cleanly onto an optical network once you accept that a wavelength is just another piece of state. The system is declarative: you describe the configuration you want, not the steps to reach it. That description is versioned and immutable in Git, so the repository is the single source of truth for what the network should be. Changes are pulled and applied automatically by software agents rather than typed by hand. And the running state is continuously reconciled against the declared state, so drift is detected and either flagged or corrected. Industry treatments of Network-as-Code put these GitOps principles at the centre — configurations version-controlled in Git, changes proposed through pull requests, then deployed and synchronised so the network's actual state matches the declared state.
The data model is the contract
None of this works without a way to describe optical configuration that both a pipeline and a device agree on. That contract is YANG, the data-modelling language standardised by the IETF — version 1.0 in RFC 6020 and the current YANG 1.1 in RFC 7950. A YANG model defines, in a tree of typed nodes, what is configurable and observable on a device: that an optical channel has a frequency in MHz, a target output power in dBm, an operational mode, and an administrative state. A pipeline renders intent into data that conforms to that model, and the device validates incoming data against the same model. Get the structure wrong and the device rejects it before anything is applied.
Two model families dominate the optical layer, and they answer different questions. OpenConfig is a set of vendor-neutral models maintained by a working group of network operators; its terminal-device model describes coherent transponders and its optical-amplifier model describes line amplifiers, so automation written against OpenConfig runs across any transponder that implements it without device-specific code. OpenROADM is an operator-led interoperability specification with device, network and service models; its device model describes shelves, circuit-packs and ports, its interface models describe the optical layers an engineer already recognises — OTS, OMS, OCH, OTU, ODU — and its service model drives end-to-end wavelength provisioning. OpenConfig aims for one model across all gear; OpenROADM aims for ROADMs and transponders from different vendors to interoperate on one line system. The fuller comparison of these and the rest of the management stack sits in the network management protocols and APIs introduction.
The protocols that carry it
A model needs a transport, and there are three current ones plus a legacy pair worth naming for contrast. NETCONF (RFC 6241), which runs over SSH on TCP port 830, is the workhorse for configuration because of one feature that matters more than any other for automation: the candidate datastore. Rather than editing the running configuration directly as a CLI does, NETCONF stages changes in a separate candidate datastore, validates them, and commits atomically — which removes the partial-configuration failures that make CLI automation dangerous. RESTCONF (RFC 8040) exposes the same YANG-defined data as HTTP resources with create, read, update and delete semantics, which suits web tooling and simple integrations. gNMI, defined by OpenConfig over gRPC, handles both configuration (Get and Set) and high-rate streaming telemetry (Subscribe), and it is displacing SNMP for monitoring. The datastore concepts these protocols share were later formalised in the Network Management Datastore Architecture (RFC 8342).
The candidate datastore is the single feature that makes optical NaC safe rather than reckless. A pipeline can load a full configuration into candidate, ask the device to validate referential integrity server-side, and only then commit — and with a confirmed commit, the device automatically reverts unless a second confirmation arrives within a timeout. SNMP cannot do this; it can poll a counter but cannot safely commit a transactional configuration. That is why NETCONF and gNMI, not SNMP, sit on the southbound side of every serious optical pipeline.
YANG is the contract, OpenConfig and OpenROADM are the two model families that fill it for optics, and NETCONF, RESTCONF and gNMI carry it. The candidate datastore is what turns "push a config" from a gamble into a transaction.
3. Pipeline architecture
The architecture of an optical NaC system is a loop, not a line. Intent enters at a Git repository, flows through validation and delivery to the devices, and observed state returns through telemetry to be compared against intent — and any divergence pushes the loop around again. The diagram below shows the five stages and where each protocol lives.
Stage 1 — the repository as source of truth
Everything the network should be lives here as declarative data, usually a simplified YAML or JSON layer that renders into model-conformant payloads. A per-site variables file holds the things that differ — frequencies, power targets, node names — while shared modules hold the structure that does not. The repository carries the full change history: who changed a launch power, when, against which ticket, and what the value was before. That audit trail is not a side effect; for many operators it is the reason the project gets funded, because the version-control change log documents which changes were made and by whom for compliance.
Stage 2 — continuous integration as the gate
CI is where a proposed change is checked before it can touch the network, and the gate has layers. The first is schema validation: does the rendered payload conform to the device's YANG model? Tools such as pyang and yanglint answer this without a device in the loop. The second is semantic and policy validation — the rules a schema cannot express. Is the frequency on the configured grid? Is the launch power inside the span's budget? Is the change inside an approved maintenance window? The third, where infrastructure allows, is a dry run against a staging twin or a device's candidate datastore in test-only mode, which surfaces device-level rejections before merge. Continuous integration here checks data representation correctness, schema validation, semantic validation and policy compliance, and may test against pre-production environments.
An engineer edits a site file to add a 400G wavelength and fat-fingers the frequency to 193,105 MHz — 5 MHz off the 100 GHz grid. The schema check passes, because 193,105 is a valid frequency value. The semantic policy check does not: check_grid.py computes that the value is not a multiple of the configured grid spacing relative to the ITU-T G.694.1 anchor and fails the merge request with a diff that says exactly which line is wrong and what the nearest valid frequency is. The change never reaches a device. On a CLI, that same typo would have been applied, and the first sign of trouble would have been a channel that would not come up.
The YANG intent the pipeline renders and the CI stages that test it are shown below — the first is the per-port configuration, the second is the pipeline definition that gates it.
Stage 3 — continuous deployment as the reconciler
Once a change merges, a delivery system makes the network match. In a pull-style GitOps model — the one cloud-native tools such as ArgoCD and Flux popularised — an agent watches the repository, compares desired state to running state, and applies only the delta. On optical gear the same logic runs through an orchestrator or domain controller that speaks NETCONF or gNMI southbound. The reconciler does not blindly replay the whole configuration; it computes what differs and changes only that, which is both faster and safer than a full re-push. Crucially, it uses the device's own transaction features — confirmed commit and rollback — so a delivery that goes wrong does not strand the node.
Stage 4 — the devices as servers
Each optical network element is a NETCONF or gNMI server running its own datastore: a coherent transponder exposing frequency, modulation, FEC and power; a ROADM exposing wavelength cross-connects; an amplifier exposing target gain and tilt; an open line system exposing span and OMS power profiles. The reconciler is a client to all of them. Because the models are vendor-neutral where OpenConfig or OpenROADM is implemented, one reconciler can drive a multi-vendor line system — which is the entire point of the open-line-system model covered in multi-vendor coherent wavelengths over open line systems.
Stage 5 — telemetry closes the loop
Configuration without verification is hope. The fifth stage is streaming telemetry over gNMI Subscribe, pushing OSNR, pre- and post-FEC bit error rate, optical power, amplifier gain and temperature at a cadence from sub-second to seconds. OpenConfig made streaming telemetry an explicit priority and aimed to replace SNMP's data-pull model with a push-based framework built on YANG and gNMI. The observed state is then compared against the desired state in Git. If a wavelength that should be up is reporting a pre-FEC BER above threshold, that is drift, and drift raises an alarm or triggers reconciliation. This is also where NaC connects to the wider optical automation story: the telemetry stream that verifies a deployment is the same stream that feeds analytics and closed-loop control.
Where a pipeline meets a hierarchical controller
A pipeline rarely talks to every ROADM directly. On a multi-domain or multi-layer network it targets a control hierarchy, and the reference model for that hierarchy is the IETF's Abstraction and Control of TE Networks framework (RFC 8453). ACTN defines three tiers: a Customer Network Controller (CNC) that expresses what a service needs, a Multi-Domain Service Coordinator (MDSC) that maps that intent across domains and abstracts the topology, and one or more Provisioning Network Controllers (PNCs) that drive the actual devices in each domain. The two interfaces that matter are the CNC-MDSC Interface (CMI) and the MDSC-PNC Interface (MPI); the southbound interface down to devices is deliberately out of ACTN's scope, which is exactly why the framework can sit above NETCONF, gNMI or T-API without caring which one a PNC speaks.
For packet-optical integration the hierarchy splits cleanly: a packet PNC controls the routers hosting the ZR pluggables, an optical PNC controls the line system, and an MDSC coordinates the two — the structure behind the two-controller model operators run today. A NaC pipeline can enter at either end. It can declare service intent at the top, where the MDSC decomposes it into per-domain work, or it can render device intent at the bottom for a PNC to apply. That choice is an abstraction trade: top-of-hierarchy intent is portable across vendors but hides the device detail an optical engineer often needs; bottom-of-hierarchy intent is precise but couples the pipeline to the model each device exposes. Either way, whether a wavelength will actually close at the required margin still has to be computed before commit — the path-feasibility work covered in in-house multi-vendor optical link planning.
4. The transactional change path
The single most important behaviour to understand is what happens between "merge" and "this wavelength is permanently up." It is a transaction with a safety net, and it runs through the candidate datastore rather than the running configuration. The flow below traces it, including the paths a change takes when something fails.
The confirmed commit is the mechanism that makes unattended optical deployment defensible. The reconciler issues commit with a confirm timeout — say 300 seconds. The configuration goes live immediately, but the device starts a timer; unless a second, plain commit arrives before the timer expires, the device reverts to the prior configuration on its own. The reconciler uses the window to read telemetry: if OSNR, BER and power are in spec, it confirms; if telemetry shows a fault, it does nothing and lets the timer revert the change. The terminal session below shows the sequence on a single ROADM.
A confirmed commit protects against a configuration that fails fast — a wavelength that will not lock, a power reading wildly out of range. It does not protect against a degradation that develops slowly. A launch power 1 dB too high may pass every check inside a 300-second window and only push a downstream span into nonlinear penalty hours later, as the channel count fills. Telemetry-driven reconciliation catches that eventually, but the rollback window is the wrong tool for slow-onset physical impairments. Those need the longer-horizon analytics that watch parameter drift over days, not the commit timer.
5. Design, performance and trade-offs
The case for NaC is usually made in operational terms — fewer errors, faster changes, an audit trail — but those terms have numbers behind them, and the numbers favour the approach for reasons that are mechanical, not promotional.
What the discipline removes
Manual optical provisioning carries a per-device time cost and a per-change error risk that compound across a network. Industry treatments of optical automation put manual per-device configuration in the order of tens of minutes with a high error rate, against a pipeline that applies a validated change in a fraction of that time with errors caught before deployment. The mechanism is simple: a human typing into a CLI has no schema check, no policy gate, and no transaction, so every keystroke is an opportunity for an error that only surfaces after it is applied. The pipeline moves every one of those checks to before the change touches a device.
Streaming telemetry versus polling
The monitoring half of the loop has measurable advantages of its own. A 2026 study in the Journal of Optical Communications and Networking benchmarked OpenConfig- and OpenROADM-driven control of a multi-vendor IP-over-DWDM and transponder network and compared gNMI streaming telemetry against NETCONF. Under controlled benchmark conditions, gNMI-based telemetry reduced controller CPU load by 52% and improved throughput efficiency by 38% relative to NETCONF, while also lowering telemetry latency. The same work measured average end-to-end service creation and deletion at roughly 198 and 76 seconds respectively. These are single-study, controlled-condition figures, not a universal guarantee — but the direction matches the mechanism, because a subscription that pushes only changed values avoids the repeated full-object reads that polling forces.
The trade-offs you accept
NaC is not free of cost. The first is up-front engineering: in virtual-machine or on-premises environments, the pull-style reconciliation that cloud-native GitOps assumes is harder to achieve, and teams often start with a push-style pipeline that needs custom scripts before it earns its keep. The second is model coverage: where a device exposes a feature only through a vendor-native model and not through OpenConfig or OpenROADM, the pipeline either drops to vendor-specific code for that feature or does without it — and pre-standard hardware behaviour is exactly where this gap appears. The third is the discipline tax: a team that keeps making out-of-band CLI changes alongside the pipeline destroys the single-source-of-truth guarantee, and the reconciler will fight those changes as drift. The payoff is real, but it is conditional on the organisation actually committing to the model.
The pipeline's value is mechanical: it moves error detection before deployment and replaces polling with subscription. The costs are organisational: up-front engineering, gaps where models do not cover a feature, and the requirement that nobody bypasses the pipeline with a craft session.
6. Practical deployment
Most operators reach NaC in phases rather than in one cutover, and the phasing is dictated by what can be made safe first. A workable order is: put existing configurations into Git read-only to establish the source of truth and the audit trail; add CI validation so changes are checked even while they are still applied by hand; introduce push-style deployment for low-risk change classes such as telemetry subscriptions; then extend to wavelength provisioning once the rollback path is trusted. Each phase delivers value on its own, which matters when the full pull-style loop is a multi-quarter effort.
Bootstrap and the device's own control loops
Two things happen below the pipeline that it does not do itself, and an engineer who blurs them debugs in the wrong place. The first is bootstrap. Before a device can accept declarative intent it has to be on the network with a base configuration and a management address, and zero-touch provisioning (ZTP) handles that: a device powers on, reaches a provisioning server, and pulls a templated base configuration with nobody on a console. ZTP gets the device to the starting line; the pipeline owns every change after it.
The second is the embedded control loop. A current disaggregated open line system does not simply accept a target gain and a channel list and hope — it runs local control loops that perform automated end-to-end turn-up, hold launch power against an embedded amplified-spontaneous-emission or channel reference, and run automated connection verification to check patch loss at each site before declaring a span healthy. The pipeline declares the desired optical state; the device's own loops drive the physical layer to that state and report back through telemetry. That division of labour is the right one — a CI job on a build server has no business closing a power-control loop that needs millisecond reaction at the node. When a wavelength will not come up, the diagnostic question is which layer owns the failure: a rejected payload is the pipeline's, while a patch-loss alarm or a turn-up that never converges is the device's.
Vendor landscape
The open interfaces that make NaC possible are now table stakes across the optical vendor set. Ribbon's Apollo optical transport family exposes NETCONF/YANG interfaces with OpenROADM API interworking so a customer-selected third-party optical domain controller can drive it, and its Muse SDN domain orchestration provides the automation, planning and real-time control layer above the line system; recent Apollo and Muse deployments at regional and national operators show the model in production. Ciena, Cisco (with Acacia coherent optics), Nokia and Adtran all ship optical platforms with model-driven management, and Adtran demonstrated multi-vendor ZR+ interoperability at OFC 2026. On the device-software side, IP Infusion's OcNOS provides native gNMI with dial-out subscriptions, sub-second granularity and OpenConfig YANG support for IP-over-DWDM. The common thread is that the southbound interface is increasingly OpenConfig or OpenROADM over NETCONF or gNMI, regardless of whose hardware sits underneath — which is precisely what lets one pipeline manage a mixed estate. The economics that drive this convergence sit in the IP-over-DWDM basics and the fuller IP-over-DWDM architecture walkthrough.
Troubleshooting quick reference
| Symptom | Likely cause | First check |
|---|---|---|
| Merge request fails schema gate | Payload does not conform to device YANG model | Run pyang/yanglint locally against the device model revision in use |
| Change passes CI but device rejects commit | Semantic constraint the offline schema check missed | Re-run the dry run into the candidate datastore against the real model |
| Reconciler reports persistent drift | Out-of-band CLI change on the device | Diff running config against Git desired state; remove the manual change |
| Confirmed commit keeps auto-reverting | Telemetry never reaches in-spec before timeout | Read OSNR/BER directly; the change may be physically infeasible, not a pipeline fault |
| Telemetry stream gaps | gNMI subscription dropped or sample interval too aggressive | Check dial-out session state and sample-interval against device limits |
A reconciler flags drift on an amplifier: declared gain 18 dB, running gain 19.5 dB. The instinct is to reconcile back to 18. The right move is to read the change log first — and the log shows a night-shift engineer raised the gain by hand to recover a degraded downstream span during an emergency. Reconciling to 18 would have re-broken the span. The fix is to bring the emergency change into Git as the new desired state with its justification, not to let the pipeline silently overwrite a deliberate human action. NaC does not remove judgement; it forces every change, including the emergency one, to end up in the source of truth.
7. Future outlook
Three trajectories are worth watching. The first is consolidation onto gNMI and OpenConfig for the monitoring half of the loop; the optical layer is already the most telemetry-standardised domain, and streaming is steadily displacing the polling model. The second is the tightening of the loop into intent-based, closed-loop control, where the gap between observed drift and automatic correction shrinks from a human-in-the-loop alarm to a policy-bounded automatic reconciliation. The third is the convergence of skills: the engineer who can author a YANG payload, reason about a CI pipeline, and still predict whether a launch power will push a span nonlinear is a different professional from the optical specialist or the software engineer alone, and that combined profile is what operators are now short of.
The fourth trajectory is moving fastest in 2026: agent-driven operations. Where closed-loop reconciliation runs fixed logic — observe drift, apply a known correction — an agentic layer puts a reasoning model between the telemetry and the action, able to diagnose a fault it was not explicitly programmed for and propose a remediation. The plumbing is standardising on open protocols: the Model Context Protocol (MCP), which originated at Anthropic, for connecting an agent to tools and data, and the Agent-to-Agent protocol (A2A) for coordination between agents, both now governed under the Linux Foundation's Agentic AI Foundation. Early enterprise integrations already chain agents that detect an issue, map its service impact, deploy a network function and validate the fix. The mechanism is powerful and the boundary is the one that has bounded every layer above it: an agent that commits a launch-power change still cannot out-argue fibre nonlinearity, and handing commit authority to an autonomous agent raises a trust problem the industry is answering with cryptographically signed agent identities and hard kill switches. More autonomy raises the stakes on the validation gates and the rollback path described earlier — it does not remove the need for them.
For an engineer planning where to invest, the durable skills are the data-modelling languages and protocols here — YANG, NETCONF, gNMI — paired with version control and CI/CD fluency, on top of the optical-layer physics that no abstraction removes. The standards to track are the IETF NETCONF working group's revisions and the OpenConfig and OpenROADM model releases, because those define what a pipeline can express.
8. Reference
Protocol and model quick reference
| Interface | Standard / body | Transport | Primary role |
|---|---|---|---|
| NETCONF | RFC 6241, IETF | SSH, TCP 830 | Transactional configuration with candidate datastore |
| RESTCONF | RFC 8040, IETF | HTTPS | CRUD over YANG-defined resources |
| gNMI | OpenConfig, over gRPC | gRPC/HTTP-2 | Configuration and streaming telemetry |
| YANG 1.1 | RFC 7950, IETF | n/a (modelling language) | Defines configurable and observable structure |
| OpenConfig | OpenConfig working group | NETCONF / RESTCONF / gNMI | Vendor-neutral models (terminal-device, optical-amplifier) |
| OpenROADM | OpenROADM MSA | NETCONF, TCP 830 | Multi-vendor device, network and service models |
| T-API | ONF Transport-API | NETCONF / RESTCONF | Multi-vendor transport-domain abstraction |
| ACTN | RFC 8453, IETF | CMI / MPI (above the SBI) | Hierarchical multi-domain control (CNC / MDSC / PNC) |
Essential relations
Standards and references
- IETF, RFC 7950 — The YANG 1.1 Data Modeling Language, Internet Engineering Task Force.
- IETF, RFC 6241 — Network Configuration Protocol (NETCONF), Internet Engineering Task Force.
- IETF, RFC 8040 — RESTCONF Protocol, Internet Engineering Task Force.
- IETF, RFC 8342 — Network Management Datastore Architecture (NMDA), Internet Engineering Task Force.
- OpenConfig, Vendor-neutral data models and gNMI/gNOI specifications, OpenConfig working group.
- OpenROADM Multi-Source Agreement, OpenROADM Device and Network Models, OpenROADM MSA.
- IETF, RFC 8453 — Framework for Abstraction and Control of TE Networks (ACTN), Internet Engineering Task Force.
- Kamalzadeh et al., Experimenting with OpenROADM and OpenConfig to control multi-vendor IPoDWDM and Xponder-based transport networks, Journal of Optical Communications and Networking.
Sanjay Yadav, "Optical Network Communications: An Engineer's Perspective" — Bridge the Gap Between Theory and Practice in Optical Networking.
Glossary
| Term | Meaning |
|---|---|
| Network-as-Code (NaC) | Managing network configuration as version-controlled declarative code with automated pipelines |
| GitOps | Operational model using a Git repository as the single source of truth, with automated reconciliation |
| YANG | Yet Another Next Generation — the IETF data-modelling language for network configuration and state |
| NETCONF | Network Configuration Protocol — transactional config over SSH with a candidate datastore |
| RESTCONF | HTTP-based protocol exposing YANG data as REST resources |
| gNMI | gRPC Network Management Interface — configuration and streaming telemetry over gRPC |
| Candidate datastore | A staging configuration store edited and validated before an atomic commit to running |
| Confirmed commit | A commit that auto-reverts unless a second confirmation arrives within a timeout |
| Reconciliation | Comparing observed running state to declared desired state and resolving any difference |
| Drift | Divergence between the network's running state and the desired state declared in Git |
| OpenConfig | Operator-led vendor-neutral YANG models and gNMI-based management framework |
| OpenROADM | Operator-led interoperability specification with device, network and service YANG models |
| T-API | ONF Transport-API — northbound abstraction for a transport domain controller |
| ACTN | Abstraction and Control of TE Networks — IETF three-tier control hierarchy (CNC, MDSC, PNC) |
| MDSC | Multi-Domain Service Coordinator — the ACTN tier that maps service intent across domains |
| PNC | Provisioning Network Controller — the ACTN tier that drives devices in one domain |
| ZTP | Zero-Touch Provisioning — a device fetches a templated base configuration on first boot, no console session |
| MCP | Model Context Protocol — open protocol (originated at Anthropic) connecting AI agents to tools and data |
| Agentic operations | Network operations driven by reasoning AI agents that diagnose and remediate, coordinated over open agent protocols |
Optical Communications & Network Automation Expert | Author of 3 Books for Optical Engineers | Founder, MapYourTech
Optical networking engineer with nearly two decades of experience across DWDM, OTN, coherent optics, submarine systems, and cloud infrastructure. Founder of MapYourTech. Read full bio →
Follow on LinkedInRelated Articles on MapYourTech
Continue Reading This Article
Sign in with a free account to unlock the full article and access the complete MapYourTech knowledge base.