Set up monitors

Monitors use LLM judges to passively score production traffic to surface trends and issues in your LLM applications. For example, you can monitor your application’s responses for correctness or helpfulness, or you can monitor user input to identify trends in what they’re asking your agents about. Monitors automatically store all scoring results in Weave’s database, allowing you to analyze historical trends and patterns. You can monitor text, images, and audio in your application’s input and output. Monitors require no code changes to your application. Set them up using the W&B Weave UI. If you need to actively intervene in your application’s behavior based on scores, use guardrails instead.

Enable preset signals

Signals are preset classifier monitors that automatically score production traces for common quality issues and error categories. Each signal uses a benchmarked LLM prompt to classify traces as binary labels (true/false) with confidence scores and reasoning. Signals require no prompt engineering or scorer configuration. Enable signals from the Monitors page to start classifying traces immediately. Signals use a W&B Inference model to score traces, so no external API keys are required.

Available signals

Weave provides 13 preset signals organized into two groups.

Quality signals

Quality signals evaluate successful root-level traces for output quality and safety issues.

Signal	What it detects
Hallucination	Fabricated facts or claims that contradict the provided input context
Low quality	Responses with poor format, insufficient effort, or incomplete content
User frustration	Signs of user frustration such as repeated questions, negative sentiment, or complaints
Jailbreaking	Prompt injection and jailbreak attempts that try to bypass safety guidelines
NSFW	Explicit, violent, or otherwise inappropriate content in inputs or outputs
Lazy	Low-effort responses such as excessive brevity, refusals to help, or deferred work
Forgetful	Failure to use context from earlier in the conversation, ignoring previously stated facts or instructions

Error signals

Error signals categorize failed traces by root cause to help you identify and resolve infrastructure and application issues.

Signal	What it detects
Network Error	DNS failures, timeouts, connection resets, and other connectivity issues
Ratelimited	HTTP 429 responses, quota exhaustion, and throttling from upstream APIs
Request Too Large	Requests exceeding size or token limits, such as context window exceeded
Bad Request	Client-side errors where the server rejected the request (4xx except 429)
Bad Response	Invalid, unexpected, or unusable responses from remote services (5xx)
Bug	Flaws in application code such as `KeyError`, `TypeError`, or logic errors

Enable signals from the Monitors page

To enable signals:

Open the W&B UI and then open your Weave project.
From the Weave side-nav, select Monitors.
At the top of the Monitors page, a row of suggested signal cards appears. Each card shows the signal name, a description, and an Enable button.
To enable a single signal, select the Enable button on the signal card. The signal begins scoring new traces immediately.
To enable multiple signals at once, select the Add signals button. This opens a drawer that lists all available signals grouped by category (Quality and Error). Select the signals you want to enable, then select Apply.

After enabling signals, Weave scores incoming traces and stores the results as feedback on each Call object. View signal results in the Traces tab by selecting a trace and reviewing the feedback panel.

Manage active signals

To view or remove active signals:

From the Monitors page, select the Manage signals button (gear icon). This opens a drawer listing all currently active signals grouped by category.
Hover over a signal and select the Remove button (trash icon) to disable the signal.

Removing a signal stops scoring new traces. Existing scores from the signal are preserved.

How signals work

Each signal uses an LLM-as-a-judge approach to classify traces:

Trace selection: Quality signals evaluate successful root-level traces. Error signals evaluate failed traces. Child spans and intermediate calls are not scored.
Prompt construction: Weave constructs a prompt that includes the trace metadata, inputs, outputs, exception details (if any), and the operation’s source code. The signal’s classifier prompt is appended with instructions for the specific issue to detect.
LLM scoring: A W&B Inference model evaluates the trace and returns a structured JSON response with:
- A binary classification (whether the issue was detected)
- A confidence score (0.0 to 1.0)
- A reason citing specific evidence from the trace
Result storage: Results are stored as feedback on the Call object and are queryable from the Traces tab.

When multiple signals from the same group (Quality or Error) are active, Weave batches the signals into a single LLM call for efficiency. The model evaluates all active classifiers in one pass and returns results for each.

Signals compared to custom monitors

	Signals	Custom monitors
Configuration	One-click enable, no prompt writing	Full control over scoring prompt, model, and parameters
Scope	Preset quality and error classifiers	Any evaluation criteria you define
Trace selection	Automatic (successful root traces for quality, failed traces for errors)	Configurable operations, filters, and sampling rate
Model	W&B Inference (preset)	Any commercial or W&B Inference model
Use case	Quick production monitoring with proven classifiers	Custom evaluation criteria specific to your application

Use signals to get started with production monitoring quickly, then create custom monitors for evaluation criteria specific to your application.

How to create a monitor in Weave

To create a monitor in Weave:

Open the W&B UI and then open your Weave project.
From the Weave side-nav, select Monitors and then select the + New Monitor button. This opens the Create new monitor modal dialog.
In the Create new monitor menu, configure the following fields:
- Name: Must start with a letter or number. Can contain letters, numbers, hyphens, and underscores.
- Description (Optional): Explain what the monitor does.
- Active monitor toggle: Turn the monitor on or off.
- Calls to monitor:
  - Operations: Choose one or more @weave.ops to monitor. You must log at least one trace that uses the op before it appears in the list of available ops.
  - Filter (Optional): Narrow down which calls are eligible (for example, by max_tokens or top_p).
  - Sampling rate: The percentage of calls to score (0% to 100%).
    A lower sampling rate reduces costs, since each scoring call has an associated cost.
- LLM-as-a-judge configuration:
  - Scorer name: Must start with a letter or number. Can contain letters, numbers, hyphens, and underscores.
  - Score Audio: Filters the available LLM models to display only audio-enabled models, and opens the Media Scoring JSON Paths field.
  - Score Images: Filters the available LLM models to display only image-enabled models, and opens the Media Scoring JSON Paths field.
  - Judge model: Select the model to score your ops. The menu contains commercial LLM models you have configured in your W&B account, as well as W&B Inference models. Audio-enabled models have an Audio Input label beside their names. For the selected model, configure the following settings:
    - Configuration name: A name for this model configuration.
    - System prompt: Defines the judging model’s role and persona, for example, “You are an impartial AI judge.”
    - Response format: The format the judge should output its response in, such as a json_object or plain text.
    - Scoring prompt: The evaluation task used to score your ops. You can reference prompt variables from your ops in your scoring prompts. For example, “Evaluate whether {output} is accurate based on {ground_truth}.”
  - Media Scoring JSON Paths: Specify JSONPath expressions (RFC 9535) to extract media from your trace data. If no paths are specified, all scorable media from user messages will be included. This field appears when you enable Score Audio or Score Images.
Once you have configured the monitor’s fields, click Create monitor. This adds the monitor to your Weave project. When your code starts generating traces, you can review the scores in the Traces tab by selecting the monitor’s name and reviewing the data in the resulting panel.

You can also compare and visualize the monitor’s trace data in the Weave UI, or download it in various formats (such as CSV and JSON) using the download button () in the Traces tab. Weave automatically stores all scorer results in the Call object’s feedback field.

Example: Create a truthfulness monitor

The following example creates a monitor that evaluates the truthfulness of generated statements.

Define a function that generates statements. Some statements are truthful, others are not:

Python
TypeScript

import weave
import random
import openai

weave.init("my-team/my-weave-project")

client = openai.OpenAI()

@weave.op()
def generate_statement(ground_truth: str) -> str:
    if random.random() < 0.5:
        response = client.chat.completions.create(
            model="gpt-4.1",
            messages=[
                {
                    "role": "user",
                    "content": f"Generate a statement that is incorrect based on this fact: {ground_truth}"
                }
            ]
        )
        return response.choices[0].message.content
    else:
        return ground_truth

generate_statement("The Earth revolves around the Sun.")

import * as weave from 'weave';
import OpenAI from 'openai';

await weave.init('my-team/my-weave-project');

const client = new OpenAI();

const generateStatement = weave.op(async (ground_truth: string): Promise<string> => {
  if (Math.random() < 0.5) {
    const response = await client.chat.completions.create({
      model: 'gpt-4.1',
      messages: [
        {
          role: 'user',
          content: `Generate a statement that is incorrect based on this fact: ${ground_truth}`,
        },
      ],
    });
    return response.choices[0]?.message?.content ?? '';
  }
  return ground_truth;
});

await generateStatement("The Earth revolves around the Sun.");

Run the function at least once to log a trace in your project. This makes the op available for monitoring in the W&B UI.
Open your Weave project in the W&B UI and select Monitors from the side-nav. Then select New Monitor.
In the Create new monitor menu, configure the fields using the following values:
- Name: truthfulness-monitor
- Description: Evaluates the truthfulness of generated statements.
- Active monitor: Toggle on.
- Operations: Select generate_statement.
- Sampling rate: Set to 100% to score every call.
- Scorer name: truthfulness-scorer
- Judge model: o3-mini-2025-01-31
- System prompt: You are an impartial AI judge. Your task is to evaluate the truthfulness of statements.
- Response format: json_object
- Scoring prompt:
  Evaluate whether the output statement is accurate based on the input statement. This is the input statement: {ground_truth} This is the output statement: {output} The response should be a JSON object with the following fields: - is_true: a boolean stating whether the output statement is true or false based on the input statement. - reasoning: your reasoning as to why the statement is true or false.
Click Create Monitor. This adds the monitor to your Weave project.
In your script, invoke your function using statements of varying degrees of truthfulness to test the scoring function:

Python
TypeScript

generate_statement("The Earth revolves around the Sun.")
generate_statement("Water freezes at 0 degrees Celsius.")
generate_statement("The Great Wall of China was built over several centuries.")

await generateStatement("The Earth revolves around the Sun.");
await generateStatement("Water freezes at 0 degrees Celsius.");
await generateStatement("The Great Wall of China was built over several centuries.");

After running the script using several different statements, open the W&B UI and navigate to the Traces tab. Select any LLMAsAJudgeScorer.score trace to see the results.

Get Started

Guides

Cookbooks

Reference

Details & Support

Open Source

Community

Enable preset signals

Available signals

Quality signals

Error signals

Enable signals from the Monitors page

Manage active signals

How signals work

Signals compared to custom monitors

How to create a monitor in Weave

Example: Create a truthfulness monitor

Get Started

Guides

Cookbooks

Reference

Details & Support

Open Source

Community

​Enable preset signals

​Available signals

​Quality signals

​Error signals

​Enable signals from the Monitors page

​Manage active signals

​How signals work

​Signals compared to custom monitors

​How to create a monitor in Weave

​Example: Create a truthfulness monitor

Enable preset signals

Available signals

Quality signals

Error signals

Enable signals from the Monitors page

Manage active signals

How signals work

Signals compared to custom monitors

How to create a monitor in Weave

Example: Create a truthfulness monitor