In modern cloud-native architectures, a single user request can traverse dozens of services: API Gateway → Lambda → SQS → Lambda → DynamoDB → S3 → another Lambda → RDS. When that request fails or performs poorly, traditional monitoring tools show you that something broke, but not where or why across this distributed system. For teams managing cloud-native architectures, observability is critical.
This is where Application Performance Monitoring (APM) with distributed tracing becomes essential. Distributed tracing gives you end-to-end visibility into request flows, allowing you to identify bottlenecks, debug failures, and optimize performance across your entire microservices or serverless architecture. See our real metrics page for production monitoring examples.
In this comprehensive guide, we'll walk through setting up distributed tracing with Lumigo - a serverless-first APM platform that excels at automatically instrumenting AWS Lambda functions, capturing async event flows, and providing deep context without code changes. We'll cover everything from initial setup to production best practices, using real-world examples from production deployments.

1. Introduction: Why Distributed Tracing Matters
What is APM and Distributed Tracing?
Application Performance Monitoring (APM) is the practice of tracking application performance metrics, errors, and user experience in real-time. Distributed tracing is a specific APM technique that follows a request as it flows through multiple services, creating a "trace" that shows the complete journey.
Think of distributed tracing like a package tracking system: when you ship a package, you get a tracking number that shows every stop along the way - picked up, sorted, in transit, delivered. Distributed tracing does the same for requests: it shows you every service the request touched, how long it spent in each, and where it encountered errors or slowdowns.
The Challenge with Modern Cloud-Native Systems
Traditional monitoring tools were built for monolithic applications running on servers you control. Modern architectures present unique challenges:
- Serverless Functions: Lambda functions are ephemeral, stateless, and scale to zero. Traditional agents don't work here. For AWS Lambda optimization, see our cost optimization case study.
- Async Event Flows: SQS → Lambda → SNS → Lambda → DynamoDB. How do you trace a request that spans multiple async invocations?
- Cold Starts: Lambda cold starts can add 1-5 seconds to response times. You need visibility into initialization time.
- No Servers to Instrument: You can't install agents on Lambda execution environments.
- High Cardinality: With thousands of functions and millions of invocations, you need intelligent sampling. AIOps platforms help manage this complexity.
Why Lumigo?
Lumigo was built specifically for serverless and cloud-native architectures. Here's what makes it unique:
- Zero-Code Instrumentation: Automatically traces AWS Lambda, API Gateway, SQS, SNS, DynamoDB, S3, and more without code changes
- Async Event Tracing: Automatically connects related invocations across async boundaries (SQS → Lambda → SNS)
- Deep Context: Captures request/response payloads, environment variables, and full stack traces Minimal Overhead: <1% performance impact, intelligent sampling to control costs
- Serverless-First: Designed for Lambda, containers, and microservices from the ground up
Problems Lumigo Solves
Here are specific scenarios where Lumigo's distributed tracing shines:
Cold Start Debugging
Lambda cold starts are a major performance concern. Lumigo automatically identifies cold starts in traces and shows you initialization time, helping you optimize your functions and identify which ones need provisioned concurrency.
Async Event Flow Tracking
When a user uploads a file that triggers S3 → Lambda → SQS → Lambda → DynamoDB, traditional tools show you disconnected invocations. Lumigo automatically connects these into a single trace, showing the complete flow.
Latency Bottleneck Identification
A request takes 2 seconds, but where is the time spent? Lumigo's waterfall charts show you exactly: 200ms in API Gateway, 1500ms in Lambda (including 800ms in a DynamoDB query), 300ms in another Lambda. You can immediately see the bottleneck.
Error Root Cause Analysis
An error occurs, but which service caused it? Lumigo shows you the complete error chain across services, with full stack traces and payloads, making debugging dramatically faster.
2. Core Concepts: Understanding Distributed Tracing
Spans, Transactions, and Trace IDs
Distributed tracing is built on three fundamental concepts:
Spans
A span represents a single operation within a trace. Each span has:
- Name: What operation was performed (e.g., "DynamoDB.Query", "Lambda.Invoke")
- Start/End Time: When the operation began and completed
- Duration: How long it took
- Tags/Metadata: Additional context (HTTP status, error messages, resource names)
- Parent Span ID: Which span this operation is part of
Traces
A trace is a collection of spans that represent a complete request flow. All spans in a trace share the same trace ID, which allows you to correlate operations across services.
Transactions
A transaction (in Lumigo's terminology) is a high-level operation that represents a user-facing action, like "Process Payment" or "Upload File". A transaction contains multiple traces and spans.
How Microservices/Serverless Need Specialized Tracing
Traditional APM tools assume:
- Long-lived processes where you can install agents
- Synchronous request/response patterns
- Stable network connections between services
- Centralized logging and metrics
Serverless and microservices break these assumptions:
- Ephemeral Execution: Lambda functions exist for milliseconds, then disappear. You can't install persistent agents.
- Async Boundaries: SQS, SNS, EventBridge create async boundaries where trace context must be propagated through message attributes.
- High Concurrency: Thousands of concurrent invocations require efficient, low-overhead instrumentation.
- Multi-Cloud: Requests might span AWS, GCP, and Azure services.
Lumigo solves this by:
- Using Lambda Layers for zero-code instrumentation (no code changes needed)
- Automatically extracting trace context from AWS service metadata (X-Ray, CloudWatch Logs)
- Propagating trace context through SQS message attributes, SNS metadata, and EventBridge detail
- Using intelligent sampling to handle high-volume workloads cost-effectively
Lumigo Architecture Overview
Lumigo's architecture consists of three main components:
The Tracer
The Lumigo tracer is deployed as a Lambda Layer (or container sidecar) that automatically instruments your functions. It:
- Captures function invocations, AWS SDK calls, and HTTP requests
- Extracts trace context from incoming requests
- Propagates trace context to downstream services
- Sends trace data to the Lumigo collector
The Collector
Lumigo's collector runs as an AWS service that:
- Receives trace data from tracers
- Correlates spans into complete traces
- Stores traces for querying and analysis
- Applies sampling rules and retention policies
The Dashboard
The Lumigo web dashboard provides:
- Transaction maps showing request flows
- Waterfall charts for latency analysis
- Error tracking and alerting
- Service dependency graphs
- Payload inspection and debugging tools
3. Setting Up Lumigo
Account Setup
Getting started with Lumigo is straightforward:
- Sign up at lumigo.io (free trial available)
- Connect your AWS account using CloudFormation or Terraform
- Lumigo will create necessary IAM roles and Lambda layers automatically
- Get your Lumigo token from the dashboard (you'll need this for configuration)
Installing the Lumigo CLI
The Lumigo CLI makes it easy to wrap and deploy functions. Install it:
# Using npm
npm install -g @lumigo/cli
# Using pip
pip install lumigo-cli
# Verify installation
lumigo --versionConnecting AWS Lambda Functions
There are three ways to instrument Lambda functions with Lumigo:
Method 1: Lambda Layer (Recommended)
Add the Lumigo layer to your Lambda function. This requires zero code changes:
# Using AWS CLI
aws lambda update-function-configuration \
--function-name my-function \
--layers arn:aws:lambda:us-east-1:114300393969:layer:lumigo-node:XXX \
--environment Variables='{
"LUMIGO_TRACER_TOKEN": "your-token-here",
"LUMIGO_DEBUG": "false"
}'Method 2: Serverless Framework
If you're using the Serverless Framework, add Lumigo as a plugin:
# serverless.yml
plugins:
- serverless-lumigo
provider:
environment:
LUMIGO_TRACER_TOKEN: ${env:LUMIGO_TRACER_TOKEN}
layers:
- arn:aws:lambda:${self:provider.region}:114300393969:layer:lumigo-node:XXX
functions:
myFunction:
handler: src/handler.myFunction
lumigo:
token: ${env:LUMIGO_TRACER_TOKEN}Method 3: Terraform
resource "aws_lambda_function" "my_function" {
function_name = "my-function"
handler = "index.handler"
runtime = "nodejs18.x"
layers = [
"arn:aws:lambda:us-east-1:114300393969:layer:lumigo-node:XXX"
]
environment {
variables = {
LUMIGO_TRACER_TOKEN = var.lumigo_token
LUMIGO_DEBUG = "false"
}
}
}4. Configuration Examples by Runtime
Node.js Setup
For Node.js Lambda functions, Lumigo automatically instruments:
- AWS SDK v2 and v3 calls
- HTTP/HTTPS requests
- Database connections (MongoDB, PostgreSQL, MySQL)
- Async/await and Promise chains
Basic Setup (Zero Code Changes)
// No code changes needed! Just add the layer and environment variable.
// Your existing handler works as-is:
exports.handler = async (event, context) => {
const dynamodb = new AWS.DynamoDB.DocumentClient();
const result = await dynamodb.get({
TableName: 'Users',
Key: { userId: event.userId }
}).promise();
return result.Item;
};Manual Instrumentation (Advanced)
For custom spans or additional context, you can manually instrument:
const lumigo = require('@lumigo/tracer')({
token: process.env.LUMIGO_TRACER_TOKEN
});
exports.handler = lumigo.trace(async (event, context) => {
// Create a custom span for a specific operation
const span = lumigo.createSpan('custom-operation', {
'custom.tag': 'value',
'operation.type': 'data-processing'
});
try {
// Your business logic
const result = await processData(event);
span.setTag('result.size', result.length);
return result;
} catch (error) {
span.setTag('error', true);
span.setTag('error.message', error.message);
throw error;
} finally {
span.finish();
}
});Python Setup
Python functions are automatically instrumented for:
- boto3 (AWS SDK)
- requests, urllib3 (HTTP libraries)
- SQLAlchemy, psycopg2 (database libraries)
Basic Setup
# No code changes needed with Lambda Layer
# Your existing handler:
import boto3
import json
def handler(event, context):
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('Users')
response = table.get_item(
Key={'userId': event['userId']}
)
return {
'statusCode': 200,
'body': json.dumps(response['Item'])
}Manual Instrumentation
from lumigo_tracer import lumigo_tracer
@lumigo_tracer(token=os.environ.get('LUMIGO_TRACER_TOKEN'))
def handler(event, context):
# Create custom span
with lumigo_tracer.span('custom-operation') as span:
span.set_tag('operation.type', 'data-processing')
try:
result = process_data(event)
span.set_tag('result.size', len(result))
return result
except Exception as e:
span.set_tag('error', True)
span.set_tag('error.message', str(e))
raiseJava Setup
For Java Lambda functions, add the Lumigo dependency:
// pom.xml
io.lumigo
lumigo-java-tracer
1.0.0
// Handler.java
import io.lumigo.Lumigo;
public class Handler implements RequestHandler<Map, String> {
static {
Lumigo.init(System.getenv("LUMIGO_TRACER_TOKEN"));
}
@Override
public String handleRequest(Map event, Context context) {
// Your code - automatically instrumented
DynamoDB dynamoDB = new DynamoDB(AmazonDynamoDBClientBuilder.defaultClient());
Table table = dynamoDB.getTable("Users");
Item item = table.getItem("userId", event.get("userId"));
return item.toJSON();
}
} Container Setup (Docker/Kubernetes)
For containerized applications, you can use Lumigo's auto-instrumentation or sidecar pattern:
Auto-Instrumentation (Node.js Container)
# Dockerfile
FROM node:18-alpine
# Install Lumigo tracer
RUN npm install -g @lumigo/tracer
# Set environment variables
ENV LUMIGO_TRACER_TOKEN=your-token-here
ENV LUMIGO_DEBUG=false
# Wrap your application
CMD ["lumigo", "node", "app.js"]Kubernetes Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
spec:
replicas: 3
template:
spec:
containers:
- name: app
image: my-app:latest
env:
- name: LUMIGO_TRACER_TOKEN
valueFrom:
secretKeyRef:
name: lumigo-secrets
key: token
- name: LUMIGO_DEBUG
value: "false"5. Distributed Tracing Setup
How Lumigo Injects Trace Context
Lumigo automatically propagates trace context across service boundaries. Here's how it works:
HTTP Requests
When your Lambda makes an HTTP request, Lumigo automatically adds trace headers:
// Automatically added headers:
X-Trace-Id: abc123xyz
X-Span-Id: def456uvw
X-Parent-Span-Id: ghi789rstDownstream services that are also instrumented with Lumigo will pick up these headers and continue the trace.
AWS Service Integration
Lumigo automatically extracts trace context from:
- API Gateway: From request headers and X-Ray integration
- SQS: From message attributes
- SNS: From message metadata
- EventBridge: From event detail
- Step Functions: From execution context
Capturing Asynchronous Flows
One of Lumigo's key strengths is automatically connecting async invocations. Here's an example:
SQS Integration
When a Lambda sends a message to SQS, Lumigo automatically adds trace context to message attributes:
// Your code (no changes needed)
const AWS = require('aws-sdk');
const sqs = new AWS.SQS();
await sqs.sendMessage({
QueueUrl: 'https://sqs.us-east-1.amazonaws.com/123456789/my-queue',
MessageBody: JSON.stringify(data)
}).promise();
// Lumigo automatically adds:
// MessageAttributes: {
// 'lumigo-trace-id': { StringValue: 'abc123', DataType: 'String' },
// 'lumigo-span-id': { StringValue: 'def456', DataType: 'String' }
// }When the consumer Lambda processes the message, Lumigo extracts the trace context and continues the trace.
Automatic vs Manual Instrumentation
Lumigo provides both automatic and manual instrumentation:
Automatic Instrumentation (Default)
With just the Lambda layer and environment variable, Lumigo automatically traces:
- Function invocations (entry/exit, duration, errors)
- AWS SDK calls (DynamoDB, S3, SQS, SNS, etc.)
- HTTP requests (fetch, axios, requests library)
- Database queries (if using supported libraries)
Manual Instrumentation (When Needed)
Use manual instrumentation for:
- Custom business logic spans
- Third-party APIs not automatically instrumented
- Adding custom tags/metadata
- Marking specific operations for alerting
// Node.js example
const lumigo = require('@lumigo/tracer');
exports.handler = lumigo.trace(async (event) => {
// Automatic: AWS SDK calls are traced
const dynamodb = new AWS.DynamoDB.DocumentClient();
await dynamodb.get({...}).promise();
// Manual: Custom span for business logic
const span = lumigo.createSpan('process-payment', {
'payment.amount': event.amount,
'payment.currency': 'USD'
});
try {
const result = await processPayment(event);
span.setTag('payment.status', 'success');
return result;
} catch (error) {
span.setTag('payment.status', 'failed');
span.setTag('error', error.message);
throw error;
} finally {
span.finish();
}
});Environment Variables and Configuration
Key environment variables for Lumigo:
# Required
LUMIGO_TRACER_TOKEN=your-token-here
# Optional - Debugging
LUMIGO_DEBUG=true # Enable debug logging
LUMIGO_LOG_LEVEL=INFO # DEBUG, INFO, WARN, ERROR
# Optional - Sampling
LUMIGO_SAMPLE_RATE=1.0 # 1.0 = 100%, 0.1 = 10%
# Optional - Data Redaction
LUMIGO_REDACT_ALL=false # Redact all payloads
LUMIGO_REDACT_REGEX=.*password.* # Redact fields matching regex
# Optional - Performance
LUMIGO_MAX_ENTRY_SIZE=10000 # Max payload size (bytes)
LUMIGO_SKIP_HTTP_ENDPOINTS=health # Skip tracing for specific endpointsIAM Permissions
Lumigo requires minimal IAM permissions. The CloudFormation template creates a role with:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents"
],
"Resource": "arn:aws:logs:*:*:*"
},
{
"Effect": "Allow",
"Action": [
"xray:PutTraceSegments",
"xray:PutTelemetryRecords"
],
"Resource": "*"
}
]
}6. Visualizing Traces
Understanding Lumigo's Transaction Map
Lumigo's transaction map provides a visual representation of your request flow. Here's how to read it:

Key Features
- Service Icons: Visual representation of each service (Lambda, DynamoDB, API Gateway, etc.)
- Duration Bars: Horizontal bars showing time spent in each service
- Color Coding: Green (fast), Yellow (slow), Red (error)
- Cold Start Indicator: Special marker showing Lambda cold starts
- Click to Expand: Click any service to see detailed span information
Latency Waterfall Charts
Waterfall charts show the sequential timing of operations, making it easy to identify bottlenecks:

Payload Inspection
Lumigo captures request and response payloads automatically, which is invaluable for debugging:
Benefits
- See Exact Inputs: What data was sent to each service
- See Exact Outputs: What each service returned
- Debug Errors: Full error messages and stack traces
- Performance Analysis: Identify large payloads causing slowdowns
Data Redaction
For security and compliance, redact sensitive data:
# Redact specific fields
LUMIGO_REDACT_REGEX=.*password.*|.*token.*|.*secret.*
# Redact all payloads (only keep metadata)
LUMIGO_REDACT_ALL=true
# Programmatic redaction (Node.js)
const lumigo = require('@lumigo/tracer');
lumigo.redact(['password', 'ssn', 'creditCard']);Error and Timeout Detection
Lumigo automatically detects and highlights:
- Errors: Exceptions, HTTP error status codes, AWS service errors
- Timeouts: Lambda timeouts, API Gateway timeouts, service timeouts
- Cold Starts: Lambda initialization time
- Throttling: DynamoDB throttling, API rate limiting
Each error includes:
- Full stack trace
- Error message and type
- Request payload that caused the error
- Service context (Lambda name, region, version)
Correlating Traces with Logs and Metrics
Lumigo integrates with CloudWatch Logs and X-Ray for comprehensive observability:
- CloudWatch Logs: Click "View Logs" in a trace to see related log entries
- X-Ray Integration: Traces appear in AWS X-Ray console
- Metrics: Aggregate trace data into metrics (p50, p95, p99 latencies)
7. Real-World Example: E-Commerce Order Processing
Let's walk through a complete example: an e-commerce order processing system. This will show how Lumigo captures a real-world distributed flow.
Architecture
Implementation
Here's the code for the main order processing Lambda:
// createOrder.js
const AWS = require('aws-sdk');
const dynamodb = new AWS.DynamoDB.DocumentClient();
const sqs = new AWS.SQS();
exports.handler = async (event) => {
const orderId = generateOrderId();
const userId = event.requestContext.authorizer.userId;
// Create order in DynamoDB
await dynamodb.put({
TableName: 'Orders',
Item: {
orderId,
userId,
items: event.body.items,
status: 'pending',
createdAt: new Date().toISOString()
}
}).promise();
// Send to processing queue
await sqs.sendMessage({
QueueUrl: process.env.ORDER_QUEUE_URL,
MessageBody: JSON.stringify({ orderId, userId })
}).promise();
return {
statusCode: 200,
body: JSON.stringify({ orderId, status: 'created' })
};
};// processOrder.js
const AWS = require('aws-sdk');
const lambda = new AWS.Lambda();
const dynamodb = new AWS.DynamoDB.DocumentClient();
const sns = new AWS.SNS();
exports.handler = async (event) => {
const { orderId, userId } = JSON.parse(event.Records[0].body);
// Validate payment
const paymentResult = await lambda.invoke({
FunctionName: 'validatePayment',
Payload: JSON.stringify({ orderId, userId })
}).promise();
if (JSON.parse(paymentResult.Payload).status!== 'success') {
throw new Error('Payment validation failed');
}
// Update inventory
for (const item of order.items) {
await dynamodb.update({
TableName: 'Inventory',
Key: { productId: item.productId },
UpdateExpression: 'SET quantity = quantity -:qty',
ExpressionAttributeValues: { ':qty': item.quantity }
}).promise();
}
// Publish to SNS
await sns.publish({
TopicArn: process.env.ORDER_PROCESSED_TOPIC,
Message: JSON.stringify({ orderId, status: 'processed' })
}).promise();
return { statusCode: 200 };
};Sample Trace Output
Here's what Lumigo captures for this flow (simplified JSON):
{
"traceId": "abc123xyz",
"transactionName": "Process Order",
"duration": 1250,
"spans": [
{
"name": "API Gateway",
"service": "apigateway",
"duration": 45,
"tags": {
"http.method": "POST",
"http.path": "/orders",
"http.status_code": 200
}
},
{
"name": "Lambda: createOrder",
"service": "lambda",
"duration": 320,
"coldStart": false,
"spans": [
{
"name": "DynamoDB.Put",
"service": "dynamodb",
"duration": 85,
"tags": {
"table": "Orders",
"operation": "PutItem"
}
},
{
"name": "SQS.SendMessage",
"service": "sqs",
"duration": 45,
"tags": {
"queue": "order-queue"
}
}
]
},
{
"name": "Lambda: processOrder",
"service": "lambda",
"duration": 885,
"coldStart": true,
"coldStartDuration": 1200,
"spans": [
{
"name": "Lambda.Invoke: validatePayment",
"service": "lambda",
"duration": 450,
"spans": [
{
"name": "HTTP Request",
"service": "external",
"duration": 380,
"tags": {
"http.url": "https://api.stripe.com/v1/charges",
"http.status_code": 200
}
}
]
},
{
"name": "DynamoDB.Update",
"service": "dynamodb",
"duration": 280,
"tags": {
"table": "Inventory"
}
},
{
"name": "SNS.Publish",
"service": "sns",
"duration": 155,
"tags": {
"topic": "order-processed"
}
}
]
}
],
"errors": [],
"metadata": {
"region": "us-east-1",
"accountId": "123456789012"
}
}What This Trace Reveals
From this trace, we can immediately see:
- Cold Start Impact: processOrder had a 1200ms cold start, adding significant latency
- External API Bottleneck: Stripe API call took 380ms (43% of processOrder time)
- DynamoDB Performance: Inventory updates took 280ms (could be optimized with batch writes)
- Total Latency: 1250ms end-to-end (acceptable but could be improved)
8. Best Practices
Minimize Noise and Control Sampling
High-traffic applications can generate millions of traces. Use sampling to control costs and focus on what matters:
# Sample 10% of traces (reduce costs by 90%)
LUMIGO_SAMPLE_RATE=0.1
# Sample 100% of errors, 10% of successful requests
# (Requires custom logic in your code)
const lumigo = require('@lumigo/tracer');
exports.handler = lumigo.trace(async (event) => {
const shouldTrace = event.isError || Math.random() < 0.1;
if (shouldTrace) {
lumigo.setTraceSampled(true);
}
// Your code...
});Secure Sensitive Data
Always redact sensitive information:
# Environment variables
LUMIGO_REDACT_REGEX=.*password.*|.*token.*|.*secret.*|.*api[_-]?key.*
LUMIGO_REDACT_ALL=false # Only redact matching fields
# Programmatic (Node.js)
const lumigo = require('@lumigo/tracer');
lumigo.redact([
'password',
'creditCard',
'ssn',
'apiKey',
'authorization'
]);Integrate with CI/CD
Add Lumigo to your deployment pipeline:
# GitHub Actions example
name: Deploy Lambda with Lumigo
on:
push:
branches: [main]
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v2
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: us-east-1
- name: Deploy with Lumigo
run: |
aws lambda update-function-configuration \
--function-name my-function \
--layers arn:aws:lambda:us-east-1:114300393969:layer:lumigo-node:XXX \
--environment Variables="{
LUMIGO_TRACER_TOKEN=${{ secrets.LUMIGO_TOKEN }},
LUMIGO_DEBUG=false
}"Use Lumigo Alerts for SLOs
Set up alerts based on trace data:
- Error Rate: Alert if error rate exceeds 1%
- Latency: Alert if p95 latency exceeds 500ms
- Cold Starts: Alert if cold start rate exceeds 10%
- Timeout Rate: Alert if timeout rate exceeds 0.5%
Monitor Latency Hotspots
Use Lumigo's service map to identify slow services:
- Sort services by average latency
- Identify services with high p95/p99 latencies
- Compare latency across different time periods
- Set up alerts for latency degradation
9. Troubleshooting & Common Pitfalls
Missing Spans
Problem: Some operations aren't showing up in traces.
Common Causes:
- Function not instrumented (missing layer or environment variable)
- Operation not automatically instrumented (custom code, unsupported library)
- Sampling rate too low (operation was sampled out)
- Trace context not propagated (async boundary, external service)
Solutions:
- Verify Lambda layer is attached:
aws lambda get-function --function-name my-function - Check environment variables:
aws lambda get-function-configuration --function-name my-function - Increase sample rate temporarily:
LUMIGO_SAMPLE_RATE=1.0 - Add manual instrumentation for unsupported operations
Cold Start Confusion
Problem: Traces show high latency, but it's unclear if it's cold start or actual execution time.
Solution: Lumigo automatically marks cold starts. Look for the cold start indicator in the trace. If cold starts are frequent, consider:
- Provisioned concurrency for critical functions
- Optimizing initialization code (move heavy imports inside handler)
- Using Lambda SnapStart (Java functions)
Improper Environment Variables
Problem: Traces not appearing, or errors in logs.
Common Issues:
- Missing
LUMIGO_TRACER_TOKEN - Invalid token (expired, wrong account)
- Token stored in wrong format (extra quotes, whitespace)
Verification:
# Check environment variables
aws lambda get-function-configuration \
--function-name my-function \
--query 'Environment.Variables'
# Test token validity
curl -H "Authorization: Bearer $LUMIGO_TRACER_TOKEN" \
https://api.lumigo.io/v1/tracesNetwork/Permission Issues
Problem: Tracer can't send data to Lumigo collector.
Check:
- VPC configuration (functions in VPC need NAT Gateway for internet access)
- Security groups (allow outbound HTTPS to Lumigo endpoints)
- IAM permissions (CloudWatch Logs, X-Ray permissions)
10. Conclusion & Next Steps
Distributed tracing with Lumigo transforms how you understand and optimize your serverless and microservices architectures. By automatically instrumenting your functions and connecting async flows, Lumigo gives you the visibility you need to:
- Identify performance bottlenecks across your entire stack
- Debug errors quickly with full context
- Optimize cold starts and reduce latency
- Understand service dependencies and interactions
- Meet SLOs with confidence
Recommended Next Steps
- Instrument All Functions: Add Lumigo to all your Lambda functions, not just the critical ones
- Set Up Dashboards: Create custom dashboards for your key metrics (error rate, latency, throughput)
- Configure Alerts: Set up alerts for error rates, latency spikes, and cold start increases
- Optimize Hotspots: Use trace data to identify and fix the top 5 performance bottlenecks
- Integrate with Logging: Connect Lumigo traces with CloudWatch Logs for complete debugging context
- Share with Team: Train your team on reading traces and using Lumigo for debugging
Start with instrumenting your most critical functions, then expand coverage as you see the value. The zero-code instrumentation makes it easy to get started, and the deep insights will quickly become indispensable for your operations.