diff --git a/website/src/pages/ai-observability.tsx b/website/src/pages/ai-observability.tsx index 36d11682a2..c563440e35 100644 --- a/website/src/pages/ai-observability.tsx +++ b/website/src/pages/ai-observability.tsx @@ -1193,6 +1193,12 @@ const result = await generateText({
+ AIOps (AI Operations) is the practice of running AI applications in + production across their full lifecycle, from development and + evaluation through deployment and monitoring. This encompasses{" "} + LLM applications and agents, RAG + systems, and{" "} + + traditional machine learning models + + . +
+ ++ Historically, "AIOps" referred to using AI for IT operations + (automated log analysis, anomaly detection, incident management). + Today, the term has evolved to also describe the{" "} + operations for AI: the practices and platforms + needed to build, deploy, and maintain AI applications in production. + As organizations adopt LLMs, agents, and ML models at scale, AIOps + provides a unified framework to manage all of these workloads. +
+ ++ MLflow is the most adopted open-source + AIOps platform, providing a unified stack for both LLMOps ( + tracing,{" "} + + evaluation + + ,{" "} + + prompt management + + , AI Gateway) and traditional + ML operations ( + + experiment tracking + + ,{" "} + + model registry + + ). +
+ ++ AI applications, whether LLM-powered agents or traditional ML + models, introduce operational challenges that standard DevOps can't + address: +
+ ++ Problem: Teams use separate tools for ML + experiment tracking, LLM tracing, evaluation, and deployment, + creating tool sprawl and fragmented workflows. +
++ Solution: A unified AIOps platform manages all + AI workloads (ML models, LLM apps, and agents) under a single + framework. +
++ Problem: AI outputs are non-deterministic and + can degrade silently, making it hard to maintain quality across + thousands of daily requests. +
++ Solution: Automated evaluation with LLM judges + and continuous monitoring catch regressions before they reach + users. +
++ Problem: Without systematic tracking of + parameters, data, models, and prompts, AI experiments and + deployments become impossible to reproduce. +
++ Solution: Experiment tracking and model + registries capture every artifact, enabling full reproducibility + across ML and LLM workloads. +
++ Problem: AI workloads consume expensive compute + (GPU training) and API costs (LLM tokens) that can spiral + without visibility. +
++ Solution: AIOps platforms track resource usage + across all AI workloads, helping teams optimize costs and + allocate resources effectively. +
++ Modern AIOps is the operational discipline for all AI applications. + It unifies the practices previously split across MLOps (for + traditional ML) and LLMOps (for LLM + applications) into a single framework that covers: +
+ ++ The key insight behind modern AIOps is that organizations rarely + build with just one type of AI. Most teams have a mix of traditional + ML models, LLM-powered features, and increasingly autonomous agents. + AIOps provides a unified platform to operationalize all of these, + preventing tool sprawl and ensuring consistent practices across all + their AI work. +
+ ++ AIOps is closely related to{" "} + AI observability (the + monitoring and understanding subset) and{" "} + LLMOps (the LLM-specific subset). AIOps + is the broadest term, encompassing both and adding experiment + management, model versioning, and unified deployment. +
+ ++ A comprehensive AIOps platform combines capabilities for both + LLMOps/AgentOps and traditional ML workloads: +
+ ++ MLflow provides a complete, open-source + AIOps platform. Here's how teams use MLflow across different AI + workloads: +
+ ++ LLM Tracing +
+ +
+ {tokens.map((line, i) => (
+
+ {line.map((token, key) => (
+
+ ))}
+
+ ))}
+
+ )}
+ + Evaluation with LLM Judges +
+ +
+ {tokens.map((line, i) => (
+
+ {line.map((token, key) => (
+
+ ))}
+
+ ))}
+
+ )}
+ + ML Experiment Tracking +
+ +
+ {tokens.map((line, i) => (
+
+ {line.map((token, key) => (
+
+ ))}
+
+ ))}
+
+ )}
+ + MLflow provides unified visibility across all AI operations: LLM + tracing, evaluation, and experiment tracking +
++ + MLflow + {" "} + is the largest open-source AI platform, with over 30 million + monthly downloads. Backed by the Linux Foundation and licensed + under Apache 2.0, it provides a complete AIOps stack with no + vendor lock-in.{" "} + Get started → +
++ When choosing an AIOps platform, the decision between open source + and proprietary SaaS tools has significant long-term implications + for your team, infrastructure, and data ownership. +
+ ++ + Open Source (MLflow): + {" "} + With MLflow, you maintain complete control over your AIOps + infrastructure and data. Deploy on your own infrastructure or use + managed versions on Databricks, AWS, or other platforms. There are + no per-seat fees, no usage limits, and no vendor lock-in. MLflow + supports any AI framework, from scikit-learn and PyTorch to OpenAI + and LangChain, under a single platform. +
+ ++ Proprietary SaaS Tools: Commercial AIOps platforms + offer convenience but at the cost of flexibility and control. They + typically charge per seat or per usage volume, which can become + expensive at scale. Your data is sent to their servers, raising + privacy and compliance concerns. Most proprietary tools specialize + in either ML or LLM workloads, not both, leading to tool sprawl. +
+ ++ Why Teams Choose Open Source: Organizations + building production AI applications increasingly choose MLflow + because it offers production-ready AIOps for both ML and LLM + workloads without giving up control of their data, cost + predictability, or flexibility. The Apache 2.0 license and Linux + Foundation backing ensure MLflow remains truly open and + community-driven. +
+ ++ Unified operations for all AI workloads: LLM applications, + agents, and traditional ML models under one platform. +
+diff --git a/website/src/pages/llm-evaluation.tsx b/website/src/pages/llm-evaluation.tsx index 87200191b8..f06f479727 100644 --- a/website/src/pages/llm-evaluation.tsx +++ b/website/src/pages/llm-evaluation.tsx @@ -1791,6 +1791,12 @@ results = mlflow.genai.evaluate(
- LLMOps is closely related to AIOps (the broader discipline of - running all AI applications in production) and{" "} + LLMOps is closely related to AIOps (the + broader discipline of running all AI applications in production) and{" "} AI observability (the monitoring and debugging subset). LLMOps specifically targets LLM-powered applications, while AIOps also covers traditional ML @@ -1042,6 +1042,9 @@ export default function LLMOps() { Prompt Registry Documentation +