AI in Networking

Written by Eric Sommers | 12.01.2025

What an explosion we’ve seen with Artificial Intelligence (AI)! My initial scope for this article was to explain how AI works for network management without getting too deep into the details, but while researching, I found myself frequently looking up terms and acronyms, only to fall down another rabbit hole of more terms and acronyms. AI is, naturally, a highly complex topic that can mean different things to different people. It’s the most complex reading I’ve done in a while, and I suspect other people share the sentiment. In addition to the sheer number of things referenced as AI, the OEMs are introducing new AI offerings in an equally confusing and, in many cases, cannibalistic manner. If you’re a network manager, OEMs are coming to you with monitoring and automation. If you’re a CIO or CTO, every single organizational unit within your company is coming to you about different AI solutions. Customer service automations, sales and marketing personalization, sales coaching, deep analysis of unstructured data, content creation, chatbots, supply chain management, document processing, financial operations, and the list continues to grow. Yet while vendors are clamoring for your AI dollars, there is significant uncertainty in the space. We currently have 7 major companies pumping trillions of dollars into each other while all simultaneously competing against one another. OpenAI announced a loss of $11.5 billion in its last quarter of 2025, which also resulted in a $3.1 billion write-off for Microsoft as a 27% owner of OpenAI. Additionally, MIT in The State of AI in Business 2025 found that 95% of generative AI (GenAI) pilots were failing. 95%!?!? That’s hundreds of billions invested in enterprise GenAI pilots that are yielding no results.

So, why the push now? Because some companies are realizing massive returns and gaining a competitive advantage. Image recognition applications are exploding, helping do everything from identifying cancer to detecting unhealthy leaves in crops using drones to mapping the most efficient and cost-effective conduit layout for any-sized building, and the list goes on and on. Complex problems that may have once taken decades of experience to understand can not be solved in a split second, by anyone. The secret to a successful AI implementation is to have a clear goal of what you want to achieve. What problem is being addressed? What defines success? Is the goal cost reduction, improved efficiency, or increased revenue? Do the potential results justify the investment in creating? In many areas of your business, the value proposition will be clear. AI has tremendous upside. As I continue to learn about it, the possibilities seem to be restricted only by our understanding (or acceptance) of its capabilities.

What about your network? YES, AI, or more specifically, Machine Learning (ML) and Large Language Models (LLM) will be important technology in your network. But, “network” can also mean a lot of different things to different people. Are you looking for AI help to improve your security posture or increase performance in highly complex multi-vendor data center environments? Do you have $150M in Opex tied to network engineering, or $200K? What about your access layer? What real problem is being addressed?

If you look at the cloud management/monitoring tools launched by Cisco, none of them have proven to solve problems in your access layer. Cisco has launched 5 different subscription-based software packages (Cisco ONE, Smart Licensing, Smart License Using Policy, DNA, and now Catalyst Center), yet none have proven to justify the added cost in the access layer.

As I continue to dig into Cisco’s AI offerings, it's clear there is significant overlap and too many options to have confidence in just one or even a few. Cisco AI Networking, Cisco Intelligent Packet Flow, Cisco IQ, AI Assistant, AI Canvas, VAST, Hyperfabric for AI, NVIDIA Spectrum-X, Agentic AI, Deep Network Model, AI agents spawning AI agents, AI-augmented access control, Umbrella, Duo, Splunk, the ServiceNow collab, Foundation AI, Instant Attack Verification, XDR Forensics, Cisco Secure AI Factory, Outshift, AI PODs, Webex AI Agent, AI Solutions for Contact Center, FlexPod, FlashStack, ThousandEyes, Catalyst Center, Open-source AI, and the list continues to grow, and at the bottom of each article….

“Many of the products and features mentioned are still in development and will be made available as they are finalized, subject to ongoing evolution in development and innovation. The timeline for their release is subject to change.”

“Products and features described in this release that are not currently available remain in varying stages of development and will be offered on a when-and-if-available basis. The delivery timeline of any future products and features is subject to change at the discretion of Cisco and its partners.”

I’ll try to define what each of the above solutions is in another article, but it's already overwhelming, and new products are introduced almost daily.

If you don’t want to be the first group on the AI bus, Edgeium is here to help you support your access layer network hardware with perpetual licensing and no recurring software subscriptions.

What is AI?

To understand what AI solutions are capable of, let's outline what AI is. For the most part, AI for your network is machine learning (ML), telemetry, and a chat window. Like ChatGPT, the primary interface for AI in networking is also a chat window. That’s what language models do. They apply statistical values (weights and biases) to determine what word (or words) are “most likely” to come next. Add telemetry data, and the model can now answer questions about your specific network. AI in networking will continue to learn, self-optimize, and possibly predict and even rectify service degradations or interruptions before they occur as most outages are preceded by a series of corresponding events, which of course can be modeled.

I’m going to unpack in this article what AI is in a similar path my research took. So, let's start with these:

Machine Learning (ML) - The science of developing algorithms and statistical models to correlate data. (example: Self-driving cars can “see” their surroundings and differentiate between objects like cars and people.)

Telemetry – The automated collection, transmission, and processing of data collected from network systems. (example: monitoring software)

Neural Networks – A computational model inspired by how our brain is interconnected. Neural networks consist of interconnected processing units called nodes that are organized into three layers: Input Layer, Hidden Layers, and an Output Layer. These networks consist of multiple layers of interconnected nodes or "neurons" that work together to process complex information. Each node can also be referred to as a parameter or weight. Each layer in the network performs a specific mathematical operation on the input data, and the output of one layer serves as the input for the next layer. (example: your brain!)

Natural Language Processing - A type of AI that allows computers to interpret, understand, comprehend, and generate human language. (example: voice-activated assistants like Amazon’s Alexa, Apple’s Siri, etc.

Language Model(LM) - A type of machine learning that uses Natural Language Processing to receive text as input and outputs natural language in return. The base job of a language model is to calculate the probability of a specific word given specific input. “Paris is a city in ____." ~with “France” being the most probable answer.

3Blue1Brown is a phenomenal resource for understanding complex mathematical functions. The following example is from Large Language Models explained briefly.

LMs are sophisticated mathematical functions that predict the next word in a sequence of text. It does this by assigning probabilities to all possible next words, training itself on historical data and context usage.

Take the phrase "Paris is a city in ________." Our brains, after years of training, immediately think of France, although the answer could be Texas as well:-) Statistically speaking, though, "France" is the most probable answer.

The model then repeats itself over and over again until complete.

Parameters or Weights - A parameter is a number inside a model that can be adjusted to make it more or less accurate. Internal weights and values that an LM learns during training. The parameters capture patterns in language, such as grammar, context, meaning, and the relationship(s) between words.

Think of parameters as dials that can be tuned to make the model more accurate and efficient. Parameters are the source of how and why an LM behaves as it does. Changing the parameters changes the probability the model assigns to predicting the next word. No human assigns these values. The values are initially completely random and are refined during the model's training.

Training - The process of teaching an LM to understand and generate human text through two main phases: pre-training and fine-tuning.

Pre-Training - The model begins to produce weights by feeding random text into it and comparing it against massive datasets of historical data, mainly from the internet.

Back-Propagation - An algorithm that works backwards tweaking the values again to make the predictions even more probable.

Machine Reasoning – Uses acquired “knowledge” to work through numerous possible outcomes to determine the most optimal.

Large Language Model – Neural networks where “large” refers to the number of parameters an AI model has. It's estimated that OpenAI’s GPT-4 has upwards of 170 TRILLION parameters!!

By completing training for trillions upon trillions of examples, the model not only becomes more accurate at predicting on the training data, but also begins to make predictions on text it's never seen before.

When you start to conceptualize using 170 trillion mini programs to make decisions at machine speed, it’s clear we’re in the middle of an information technology revolution. The science fiction we grew up on now is trending towards becoming non-fiction! "Go to destination" is all George Jetson needed to tell his flying transportation. No licensing or flight training required! Nearly every discipline that today requires significant specialization, education, and training will be replaced by LLMs.

The largest LLM is estimated to have up to 170 trillion parameters which are interconnected with the complexity, and simplicity, of the human brain.

For a human to read the amount of text that was used to train GPT-3, it would take them 2,600 years reading non-stop 24-7. And the scale of computation in training LLMs is extraordinary. If you could perform 1 billion additions and multiplications every second, it would take more than 100 million years to do all of the operations involved in training the largest LLM. This technology is revolutionary.

And with some additional programming, Transfer Learning and Deep Learning emerged.

Transfer Learning - a machine learning technique where a model pre-trained on one task can be reused and adapted for a second, related task. If you built an app that could tell you if a picture of an animal is a dog, that same model can be used to identify cats, rabbits, etc, by simply changing some of the parameters.

Deep Learning - a more advanced ML composed of millions of software components that perform micro-mathematical operations on small data units to solve a large problem. Deep Learning models are designed to recognize patterns, classify data, and make predictions by learning from the data rather than explicit programming. (example: image classification by processing the individual pixels that make the image. This can also be live video images like a facial recognition system.)

AI In Networking

So, how does predicting human text apply to networking? Cisco claims to have deployed more than 50 million networks in the past 20 years. Deployment guides, datasheets, community forums, Cisco U, etc. will all be used to train Cisco’s Deep Learning Network LLM. I have no doubt that ML and LLMs will transform networking, however I do question the timeline. According to Cisco, the timeline is NOW. Cisco actually has marketing collateral that is inferring loss if you’re not an early adopter, or Pacesetter, of these solutions. Again, AI in networking isn’t a single product, and the network isn't a singular object. If you’re not in the 13% of organizations Cisco identifies as "Pacesetters," I recommend giving these solutions a few years to formalize. Is your production environment the best place to iron out the wrinkles? There are currently more than 30 different AI offerings from Cisco? Most of them have significant overlap. None of them is cost-justified for your access-layer network hardware. At least not today.

What Next?

Well, not entirely sure. LLMs only know what they have been trained on and what they can derive from that particular dataset. And yes, “hallucinations” are legitimate GenAI issues that describe when an LLM provides an inaccurate answer. It can and will be wrong if its training data is inaccurate, as is some of the data from Reddit, community forums, etc that most current LLMs were trained on. Training LLMs on objective factual data will minimize, if not eliminate, these hallucinations. If the training data is built upon objective rules, principles, laws, and guidelines, the concern of hallucinations goes away.

Cisco is currently presenting AgenticOps as a transformative approach to IT operations. You can even allegedly purchase “plug-and-play” AI solutions. The issue is that networks aren’t the same. Acquisitions and divestitures, available telco providers, an unusual scanning solution that required a unique tweak, etc. We’re moving from a GenAI world to Agentic AI, but “agent” and “AI” are used casually and are sometimes misleading. OEMs are labeling anything they can as “agents” whether it is one or not. The “agency” refers to the agent's ability to reason, adapt, and learn on its own. An agent is software that can interact with LLMs, and even other agents, potentially enlisting other agents, it can choose which tools it needs, all while continuously learning, to complete a task. There is now an open agent community that allows agents to be shared. While the possibilities are endless, isn’t the risk as well? We continue to have major outages (CloudFare) and security breaches that put our network operations at risk. While you cannot avoid storing confidential data in the cloud for these services, you can avoid tying your access layer to these risks. There will be bad-acting agents.

One final note about our future in networking. In addition to these software subscriptions and AI offerings being unproven and costly today, being forced to adopt these tools might ultimately reduce or replace the need for human capital in networking. Edgeium can help you leverage perpetual licensing for at least the next 5-10 years for your access switches and anything plugged into them.

Subscribe or follow me on LinkedIn for additional content regarding AI. https://www.linkedin.com/in/ericsommers/

View full post