-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Add Trace Span Pruning Processor #45617
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
portertech
wants to merge
64
commits into
open-telemetry:main
Choose a base branch
from
portertech:trace-span-pruning
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
+8,689
−0
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Sean Porter <portertech@gmail.com>
Signed-off-by: Sean Porter <portertech@gmail.com>
Signed-off-by: Sean Porter <portertech@gmail.com>
Signed-off-by: Sean Porter <portertech@gmail.com>
Signed-off-by: Sean Porter <portertech@gmail.com>
Signed-off-by: Sean Porter <portertech@gmail.com>
Signed-off-by: Sean Porter <portertech@gmail.com>
Signed-off-by: Sean Porter <portertech@gmail.com>
Signed-off-by: Sean Porter <portertech@gmail.com>
Signed-off-by: Sean Porter <portertech@gmail.com>
Signed-off-by: Sean Porter <portertech@gmail.com>
Signed-off-by: Sean Porter <portertech@gmail.com>
Signed-off-by: Sean Porter <portertech@gmail.com>
Signed-off-by: Sean Porter <portertech@gmail.com>
Signed-off-by: Sean Porter <portertech@gmail.com>
Signed-off-by: Sean Porter <portertech@gmail.com>
Signed-off-by: Sean Porter <portertech@gmail.com>
Signed-off-by: Sean Porter <portertech@gmail.com>
Signed-off-by: Sean Porter <portertech@gmail.com>
Signed-off-by: Sean Porter <portertech@gmail.com>
Signed-off-by: Sean Porter <portertech@gmail.com>
Signed-off-by: Sean Porter <portertech@gmail.com>
Signed-off-by: Sean Porter <portertech@gmail.com>
Signed-off-by: Sean Porter <portertech@gmail.com>
Signed-off-by: Sean Porter <portertech@gmail.com>
Signed-off-by: Sean Porter <portertech@gmail.com>
Signed-off-by: Sean Porter <portertech@gmail.com>
Signed-off-by: Sean Porter <portertech@gmail.com>
Signed-off-by: Sean Porter <portertech@gmail.com>
Signed-off-by: Sean Porter <portertech@gmail.com>
Signed-off-by: Sean Porter <portertech@gmail.com>
Signed-off-by: Sean Porter <portertech@gmail.com>
Signed-off-by: Sean Porter <portertech@gmail.com>
Signed-off-by: Sean Porter <portertech@gmail.com>
Signed-off-by: Sean Porter <portertech@gmail.com>
Signed-off-by: Sean Porter <portertech@gmail.com>
Signed-off-by: Sean Porter <portertech@gmail.com>
Signed-off-by: Sean Porter <portertech@gmail.com>
Signed-off-by: Sean Porter <portertech@gmail.com>
Signed-off-by: Sean Porter <portertech@gmail.com>
Span Pruning: Outlier Detection (IQR + MAD) and Preservation
Signed-off-by: Sean Porter <portertech@gmail.com>
Signed-off-by: Sean Porter <portertech@gmail.com>
Signed-off-by: Sean Porter <portertech@gmail.com>
Signed-off-by: Sean Porter <portertech@gmail.com>
Signed-off-by: Sean Porter <portertech@gmail.com>
Signed-off-by: Sean Porter <portertech@gmail.com>
Signed-off-by: Sean Porter <portertech@gmail.com>
Signed-off-by: Sean Porter <portertech@gmail.com>
Signed-off-by: Sean Porter <portertech@gmail.com>
Signed-off-by: Sean Porter <portertech@gmail.com>
Signed-off-by: Sean Porter <portertech@gmail.com>
Signed-off-by: Sean Porter <portertech@gmail.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR introduces the spanpruningprocessor, a new trace processor that dramatically reduces trace storage costs while preserving observability value. It intelligently identifies and aggregates repetitive leaf spans within traces, replacing groups of similar operations with single summary spans that capture the full statistical picture.
The Problem
Modern distributed systems generate enormous volumes of trace data. A significant portion consists of repetitive, similar spans—think N+1 database queries, batch HTTP calls, or fan-out operations. Storing every individual span is expensive and often provides diminishing analytical value beyond the first few instances.
Current solutions are inadequate:
The Solution
The Span Pruning Processor identifies duplicate or similar leaf spans within a single trace, groups them, and replaces each group with a single aggregated summary span. When leaf spans are aggregated, the processor also recursively aggregates their parent spans if all children of those parents are being aggregated.
Leaf spans are spans that are not referenced as a parent by any other span in the trace. They typically represent the last actions in an execution call stack (e.g., individual database queries, HTTP calls to external services).
Spans are grouped by:
group_by_attributesParent spans are eligible for aggregation when all of their children are aggregated, they share the same name, kind, and status code, and they are not root spans.
Optionally, the processor can detect duration outliers using statistical methods (IQR or MAD) and either annotate summary spans with outlier correlations or preserve outlier spans as individual spans for debugging while still aggregating normal spans.
This processor is useful for reducing trace data volume while preserving meaningful information about repeated operations.
Use Cases
Configuration
Configuration Options
group_by_attributesdb.*)min_spans_to_aggregatemax_parent_depthaggregation_attribute_prefixaggregation_histogram_buckets[5ms, 10ms, 25ms, 50ms, 100ms, 250ms, 500ms, 1s, 2.5s, 5s, 10s]enable_attribute_loss_analysisattribute_loss_exemplar_sample_rateenable_attribute_loss_analysisis true.enable_bytes_metricsenable_outlier_analysisoutlier_analysis.methodoutlier_analysis.iqr_multiplieroutlier_analysis.mad_multiplieroutlier_analysis.min_group_sizeoutlier_analysis.correlation_min_occurrenceoutlier_analysis.correlation_max_normal_occurrenceoutlier_analysis.max_correlated_attributesoutlier_analysis.preserve_outliersoutlier_analysis.max_preserved_outliersoutlier_analysis.preserve_only_with_correlationGlob Pattern Support
The
group_by_attributesfield supports glob patterns for matching attribute keys:db.*db.operation,db.name,db.statement, etc.http.request.*http.request.method,http.request.header.content-type, etc.rpc.*rpc.method,rpc.service,rpc.system, etc.db.operationdb.operationWhen multiple attributes match a pattern, they are all included in the grouping key (sorted alphabetically for consistency).
Summary Span
When spans are aggregated, the summary span includes:
Properties
SELECT)What Gets Aggregated Away
When spans are aggregated into a summary span, the following data from non-template spans is lost:
To understand attribute loss, enable
enable_attribute_loss_analysis: truewhich addsdiverse_attributesandmissing_attributesto summary spans.Aggregation Attributes
The following attributes are added to the summary span (shown with default
aggregation_attribute_prefix: "aggregation."):<prefix>is_summarytrueto identify summary spans<prefix>span_count<prefix>duration_min_ns<prefix>duration_max_ns<prefix>duration_avg_ns<prefix>duration_total_ns<prefix>histogram_bucket_bounds_s<prefix>histogram_bucket_countsOptional Outlier Analysis Attributes
When
enable_outlier_analysis: true, the following additional attributes are added:<prefix>duration_median_ns<prefix>outlier_correlated_attributeskey=value(outlier%/normal%), ...)Histogram Buckets
The histogram provides a latency distribution of the aggregated spans. The buckets are cumulative, meaning each bucket count includes all spans with duration less than or equal to the bucket boundary.
Example with buckets
[10ms, 50ms, 100ms]and 5 spans with durations[5ms, 15ms, 25ms, 75ms, 150ms]:histogram_bucket_bounds_s:[0.01, 0.05, 0.1]histogram_bucket_counts:[1, 3, 4, 5]Outlier Analysis (Optional)
When
enable_outlier_analysis: true, the processor detects duration outliers and identifies attributes that correlate with slow spans.Detection Methods
The processor supports two statistical methods for outlier detection:
threshold = Q3 + (multiplier × IQR)threshold = median + (multiplier × MAD × 1.4826)When to use each:
How It Works
IQR (Interquartile Range) Method:
MAD (Median Absolute Deviation) Method:
Note: The 1.4826 scale factor makes MAD comparable to standard deviation for normal distributions.
Attribute Correlation (same for both methods):
Configuration Example
Example Output
Interpretation:
cache_hit=false, while 0% of normal spans didThis helps identify root causes of latency issues:
When to Use
Performance Impact
min_group_size: 7or higher to skip analysis on small groupsPreserving Outlier Spans (Optional)
When
outlier_analysis.preserve_outliers: true, detected outlier spans are kept as individual spans instead of being aggregated. This provides:Configuration
Configuration Options
preserve_outliersmax_preserved_outlierspreserve_only_with_correlationExample Output
Before (10 similar SELECT spans, 2 are outliers):
After (with
preserve_outliers: true,max_preserved_outliers: 2):Summary Span Attributes (When Preserving Outliers)
<prefix>preserved_outlier_count<prefix>preserved_outlier_span_idsPreserved Outlier Span Attributes
<prefix>is_preserved_outlier<prefix>summary_span_idBehavior Notes
min_spans_to_aggregate), the entire group is left unchangedPipeline Placement
This processor is designed to work best when placed after processors that ensure complete traces are available:
Or with tail sampling:
Example
Basic Example
A trace with repeated database queries (some failing):
Before Processing:
After Processing (with
min_spans_to_aggregate: 2):Note: Spans with different status codes are grouped separately, preserving error information.
Recursive Parent Aggregation Example
When spans are aggregated, the processor also checks if their parent spans can be aggregated. Parent spans are eligible for aggregation when:
Before Processing (with
min_spans_to_aggregate: 2,group_by_attributes: ["db.op"]):After Processing:
Why each span was handled this way:
Limitations
Telemetry
The processor emits the following metrics to help monitor its operation:
Counters
otelcol_processor_spanpruning_spans_receivedotelcol_processor_spanpruning_spans_prunedotelcol_processor_spanpruning_aggregations_createdotelcol_processor_spanpruning_traces_processedotelcol_processor_spanpruning_outliers_detectedenable_outlier_analysis: true)otelcol_processor_spanpruning_outliers_preservedpreserve_outliers: true)otelcol_processor_spanpruning_outliers_correlations_detectedotelcol_processor_spanpruning_bytes_receivedenable_bytes_metrics: true)otelcol_processor_spanpruning_bytes_emittedenable_bytes_metrics: true)Histograms
otelcol_processor_spanpruning_aggregation_group_sizeotelcol_processor_spanpruning_processing_durationOptional Attribute Loss Metrics
When
enable_attribute_loss_analysis: true, the processor also emits metrics about attribute loss during aggregation. These metrics help you understand how much information is being lost when spans are grouped together.To correlate these metrics back to traces, a configurable fraction of these metric recordings can include trace exemplars via
attribute_loss_exemplar_sample_rate. Sampling is applied per aggregation group, and the exemplar context is taken from the slowest span in the group.Histograms (Optional)
otelcol_processor_spanpruning_leaf_attribute_diversity_lossotelcol_processor_spanpruning_leaf_attribute_lossotelcol_processor_spanpruning_parent_attribute_diversity_lossotelcol_processor_spanpruning_parent_attribute_lossAttribute loss analysis is disabled by default (
enable_attribute_loss_analysis: false) to reduce overhead. When enabled, the processor:<prefix>diverse_attributesand<prefix>missing_attributessummary attributes to aggregated spansThese metrics can be used to:
spans_receivedvsspans_pruned)processing_durationaggregation_group_size