Home / Blog / APM & Observability

APM with Lumigo: Distributed Tracing Setup

In modern cloud-native architectures, a single user request can traverse dozens of services: API Gateway → Lambda → SQS → Lambda → DynamoDB → S3 → another Lambda → RDS. When that request fails or performs poorly, traditional monitoring tools show you that something broke, but not where or why across this distributed system. For teams managing cloud-native architectures, observability is critical.

This is where Application Performance Monitoring (APM) with distributed tracing becomes essential. Distributed tracing gives you end-to-end visibility into request flows, allowing you to identify bottlenecks, debug failures, and optimize performance across your entire microservices or serverless architecture. See our real metrics page for production monitoring examples.

In this comprehensive guide, we'll walk through setting up distributed tracing with Lumigo - a serverless-first APM platform that excels at automatically instrumenting AWS Lambda functions, capturing async event flows, and providing deep context without code changes. We'll cover everything from initial setup to production best practices, using real-world examples from production deployments.

What You'll Learn: How distributed tracing works, Lumigo's unique serverless-first approach, step-by-step integration for Node.js, Python, Java, and containers, visualizing traces, troubleshooting common issues, and production optimization strategies.
57%
Faster API Response
0
Code Changes
100%
Auto-Instrumentation
Lumigo APM dashboard showing distributed tracing overview with service map, request flows, latency metrics, and error rates across microservices
Lumigo APM dashboard providing end-to-end visibility into distributed request flows

1. Introduction: Why Distributed Tracing Matters

What is APM and Distributed Tracing?

Application Performance Monitoring (APM) is the practice of tracking application performance metrics, errors, and user experience in real-time. Distributed tracing is a specific APM technique that follows a request as it flows through multiple services, creating a "trace" that shows the complete journey.

Think of distributed tracing like a package tracking system: when you ship a package, you get a tracking number that shows every stop along the way - picked up, sorted, in transit, delivered. Distributed tracing does the same for requests: it shows you every service the request touched, how long it spent in each, and where it encountered errors or slowdowns.

The Challenge with Modern Cloud-Native Systems

Traditional monitoring tools were built for monolithic applications running on servers you control. Modern architectures present unique challenges:

Why Lumigo?

Lumigo was built specifically for serverless and cloud-native architectures. Here's what makes it unique:

💡 Real-World Impact: In a production deployment, we used Lumigo to identify that 8 API endpoints were spending 40% of their time in DynamoDB queries. By optimizing those queries and adding caching, we reduced average API response time by 57% - from 420ms to 180ms.

Problems Lumigo Solves

Here are specific scenarios where Lumigo's distributed tracing shines:

Cold Start Debugging

Lambda cold starts are a major performance concern. Lumigo automatically identifies cold starts in traces and shows you initialization time, helping you optimize your functions and identify which ones need provisioned concurrency.

Async Event Flow Tracking

When a user uploads a file that triggers S3 → Lambda → SQS → Lambda → DynamoDB, traditional tools show you disconnected invocations. Lumigo automatically connects these into a single trace, showing the complete flow.

Latency Bottleneck Identification

A request takes 2 seconds, but where is the time spent? Lumigo's waterfall charts show you exactly: 200ms in API Gateway, 1500ms in Lambda (including 800ms in a DynamoDB query), 300ms in another Lambda. You can immediately see the bottleneck.

Error Root Cause Analysis

An error occurs, but which service caused it? Lumigo shows you the complete error chain across services, with full stack traces and payloads, making debugging dramatically faster.

2. Core Concepts: Understanding Distributed Tracing

Spans, Transactions, and Trace IDs

Distributed tracing is built on three fundamental concepts:

Spans

A span represents a single operation within a trace. Each span has:

Traces

A trace is a collection of spans that represent a complete request flow. All spans in a trace share the same trace ID, which allows you to correlate operations across services.

Transactions

A transaction (in Lumigo's terminology) is a high-level operation that represents a user-facing action, like "Process Payment" or "Upload File". A transaction contains multiple traces and spans.

Example Trace Structure: Trace ID: abc123xyz ├─ Span: API Gateway (50ms) │ └─ Span: Lambda Function "processOrder" (420ms) │ ├─ Span: DynamoDB Query "getUser" (180ms) │ ├─ Span: Lambda Invoke "validatePayment" (150ms) │ │ └─ Span: External API Call "stripe.com" (120ms) │ └─ Span: DynamoDB Put "saveOrder" (90ms) └─ Span: Response (5ms) Total Duration: 475ms Bottleneck: DynamoDB Query (180ms = 38% of total time)

How Microservices/Serverless Need Specialized Tracing

Traditional APM tools assume:

Serverless and microservices break these assumptions:

Lumigo solves this by:

Lumigo Architecture Overview

Lumigo's architecture consists of three main components:

Lumigo Architecture: ┌─────────────────────────────────────────────────┐ │ Your Application │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │ Lambda 1 │ │ Lambda 2 │ │ Lambda 3 │ │ │ └────┬─────┘ └────┬─────┘ └────┬─────┘ │ │ │ │ │ │ │ └─────────────┴─────────────┘ │ │ │ │ │ ┌──────────▼──────────┐ │ │ │ Lumigo Tracer │ │ │ │ (Lambda Layer) │ │ │ └──────────┬──────────┘ │ └────────────────────┼───────────────────────────┘ │ ┌───────────▼───────────┐ │ Lumigo Collector │ │ (AWS Service) │ └───────────┬───────────┘ │ ┌───────────▼───────────┐ │ Lumigo Dashboard │ │ (Web UI) │ └───────────────────────┘ Data Flow: 1. Tracer captures spans from your functions 2. Collector aggregates and processes traces 3. Dashboard visualizes and allows querying

The Tracer

The Lumigo tracer is deployed as a Lambda Layer (or container sidecar) that automatically instruments your functions. It:

The Collector

Lumigo's collector runs as an AWS service that:

The Dashboard

The Lumigo web dashboard provides:

3. Setting Up Lumigo

Account Setup

Getting started with Lumigo is straightforward:

  1. Sign up at lumigo.io (free trial available)
  2. Connect your AWS account using CloudFormation or Terraform
  3. Lumigo will create necessary IAM roles and Lambda layers automatically
  4. Get your Lumigo token from the dashboard (you'll need this for configuration)

Installing the Lumigo CLI

The Lumigo CLI makes it easy to wrap and deploy functions. Install it:

# Using npm
npm install -g @lumigo/cli

# Using pip
pip install lumigo-cli

# Verify installation
lumigo --version

Connecting AWS Lambda Functions

There are three ways to instrument Lambda functions with Lumigo:

Method 1: Lambda Layer (Recommended)

Add the Lumigo layer to your Lambda function. This requires zero code changes:

# Using AWS CLI
aws lambda update-function-configuration \
 --function-name my-function \
 --layers arn:aws:lambda:us-east-1:114300393969:layer:lumigo-node:XXX \
 --environment Variables='{
 "LUMIGO_TRACER_TOKEN": "your-token-here",
 "LUMIGO_DEBUG": "false"
 }'

Method 2: Serverless Framework

If you're using the Serverless Framework, add Lumigo as a plugin:

# serverless.yml
plugins:
 - serverless-lumigo

provider:
 environment:
 LUMIGO_TRACER_TOKEN: ${env:LUMIGO_TRACER_TOKEN}
 layers:
 - arn:aws:lambda:${self:provider.region}:114300393969:layer:lumigo-node:XXX

functions:
 myFunction:
 handler: src/handler.myFunction
 lumigo:
 token: ${env:LUMIGO_TRACER_TOKEN}

Method 3: Terraform

resource "aws_lambda_function" "my_function" {
 function_name = "my-function"
 handler = "index.handler"
 runtime = "nodejs18.x"
 
 layers = [
 "arn:aws:lambda:us-east-1:114300393969:layer:lumigo-node:XXX"
 ]
 
 environment {
 variables = {
 LUMIGO_TRACER_TOKEN = var.lumigo_token
 LUMIGO_DEBUG = "false"
 }
 }
}

4. Configuration Examples by Runtime

Node.js Setup

For Node.js Lambda functions, Lumigo automatically instruments:

Basic Setup (Zero Code Changes)

// No code changes needed! Just add the layer and environment variable.

// Your existing handler works as-is:
exports.handler = async (event, context) => {
 const dynamodb = new AWS.DynamoDB.DocumentClient();
 const result = await dynamodb.get({
 TableName: 'Users',
 Key: { userId: event.userId }
 }).promise();
 
 return result.Item;
};

Manual Instrumentation (Advanced)

For custom spans or additional context, you can manually instrument:

const lumigo = require('@lumigo/tracer')({
 token: process.env.LUMIGO_TRACER_TOKEN
});

exports.handler = lumigo.trace(async (event, context) => {
 // Create a custom span for a specific operation
 const span = lumigo.createSpan('custom-operation', {
 'custom.tag': 'value',
 'operation.type': 'data-processing'
 });
 
 try {
 // Your business logic
 const result = await processData(event);
 
 span.setTag('result.size', result.length);
 return result;
 } catch (error) {
 span.setTag('error', true);
 span.setTag('error.message', error.message);
 throw error;
 } finally {
 span.finish();
 }
});

Python Setup

Python functions are automatically instrumented for:

Basic Setup

# No code changes needed with Lambda Layer

# Your existing handler:
import boto3
import json

def handler(event, context):
 dynamodb = boto3.resource('dynamodb')
 table = dynamodb.Table('Users')
 
 response = table.get_item(
 Key={'userId': event['userId']}
 )
 
 return {
 'statusCode': 200,
 'body': json.dumps(response['Item'])
 }

Manual Instrumentation

from lumigo_tracer import lumigo_tracer

@lumigo_tracer(token=os.environ.get('LUMIGO_TRACER_TOKEN'))
def handler(event, context):
 # Create custom span
 with lumigo_tracer.span('custom-operation') as span:
 span.set_tag('operation.type', 'data-processing')
 
 try:
 result = process_data(event)
 span.set_tag('result.size', len(result))
 return result
 except Exception as e:
 span.set_tag('error', True)
 span.set_tag('error.message', str(e))
 raise

Java Setup

For Java Lambda functions, add the Lumigo dependency:

// pom.xml

 
 io.lumigo
 lumigo-java-tracer
 1.0.0
 
// Handler.java
import io.lumigo.Lumigo;

public class Handler implements RequestHandler<Map, String> {
 static {
 Lumigo.init(System.getenv("LUMIGO_TRACER_TOKEN"));
 }
 
 @Override
 public String handleRequest(Map event, Context context) {
 // Your code - automatically instrumented
 DynamoDB dynamoDB = new DynamoDB(AmazonDynamoDBClientBuilder.defaultClient());
 Table table = dynamoDB.getTable("Users");
 
 Item item = table.getItem("userId", event.get("userId"));
 return item.toJSON();
 }
}

Container Setup (Docker/Kubernetes)

For containerized applications, you can use Lumigo's auto-instrumentation or sidecar pattern:

Auto-Instrumentation (Node.js Container)

# Dockerfile
FROM node:18-alpine

# Install Lumigo tracer
RUN npm install -g @lumigo/tracer

# Set environment variables
ENV LUMIGO_TRACER_TOKEN=your-token-here
ENV LUMIGO_DEBUG=false

# Wrap your application
CMD ["lumigo", "node", "app.js"]

Kubernetes Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
 name: my-app
spec:
 replicas: 3
 template:
 spec:
 containers:
 - name: app
 image: my-app:latest
 env:
 - name: LUMIGO_TRACER_TOKEN
 valueFrom:
 secretKeyRef:
 name: lumigo-secrets
 key: token
 - name: LUMIGO_DEBUG
 value: "false"

5. Distributed Tracing Setup

How Lumigo Injects Trace Context

Lumigo automatically propagates trace context across service boundaries. Here's how it works:

HTTP Requests

When your Lambda makes an HTTP request, Lumigo automatically adds trace headers:

// Automatically added headers:
X-Trace-Id: abc123xyz
X-Span-Id: def456uvw
X-Parent-Span-Id: ghi789rst

Downstream services that are also instrumented with Lumigo will pick up these headers and continue the trace.

AWS Service Integration

Lumigo automatically extracts trace context from:

Capturing Asynchronous Flows

One of Lumigo's key strengths is automatically connecting async invocations. Here's an example:

Async Flow Example: User Request │ ▼ API Gateway │ ▼ Lambda: processUpload (sends to SQS) │ ├─► SQS Queue │ │ │ ▼ │ Lambda: processImage (sends to SNS) │ │ │ ├─► SNS Topic │ │ │ │ │ ├─► Lambda: sendNotification │ │ └─► Lambda: updateDatabase │ │ │ └─► S3: Save processed image Lumigo automatically connects all these into a single trace!

SQS Integration

When a Lambda sends a message to SQS, Lumigo automatically adds trace context to message attributes:

// Your code (no changes needed)
const AWS = require('aws-sdk');
const sqs = new AWS.SQS();

await sqs.sendMessage({
 QueueUrl: 'https://sqs.us-east-1.amazonaws.com/123456789/my-queue',
 MessageBody: JSON.stringify(data)
}).promise();

// Lumigo automatically adds:
// MessageAttributes: {
// 'lumigo-trace-id': { StringValue: 'abc123', DataType: 'String' },
// 'lumigo-span-id': { StringValue: 'def456', DataType: 'String' }
// }

When the consumer Lambda processes the message, Lumigo extracts the trace context and continues the trace.

Automatic vs Manual Instrumentation

Lumigo provides both automatic and manual instrumentation:

Automatic Instrumentation (Default)

With just the Lambda layer and environment variable, Lumigo automatically traces:

Manual Instrumentation (When Needed)

Use manual instrumentation for:

// Node.js example
const lumigo = require('@lumigo/tracer');

exports.handler = lumigo.trace(async (event) => {
 // Automatic: AWS SDK calls are traced
 const dynamodb = new AWS.DynamoDB.DocumentClient();
 await dynamodb.get({...}).promise();
 
 // Manual: Custom span for business logic
 const span = lumigo.createSpan('process-payment', {
 'payment.amount': event.amount,
 'payment.currency': 'USD'
 });
 
 try {
 const result = await processPayment(event);
 span.setTag('payment.status', 'success');
 return result;
 } catch (error) {
 span.setTag('payment.status', 'failed');
 span.setTag('error', error.message);
 throw error;
 } finally {
 span.finish();
 }
});

Environment Variables and Configuration

Key environment variables for Lumigo:

# Required
LUMIGO_TRACER_TOKEN=your-token-here

# Optional - Debugging
LUMIGO_DEBUG=true # Enable debug logging
LUMIGO_LOG_LEVEL=INFO # DEBUG, INFO, WARN, ERROR

# Optional - Sampling
LUMIGO_SAMPLE_RATE=1.0 # 1.0 = 100%, 0.1 = 10%

# Optional - Data Redaction
LUMIGO_REDACT_ALL=false # Redact all payloads
LUMIGO_REDACT_REGEX=.*password.* # Redact fields matching regex

# Optional - Performance
LUMIGO_MAX_ENTRY_SIZE=10000 # Max payload size (bytes)
LUMIGO_SKIP_HTTP_ENDPOINTS=health # Skip tracing for specific endpoints

IAM Permissions

Lumigo requires minimal IAM permissions. The CloudFormation template creates a role with:

{
 "Version": "2012-10-17",
 "Statement": [
 {
 "Effect": "Allow",
 "Action": [
 "logs:CreateLogGroup",
 "logs:CreateLogStream",
 "logs:PutLogEvents"
 ],
 "Resource": "arn:aws:logs:*:*:*"
 },
 {
 "Effect": "Allow",
 "Action": [
 "xray:PutTraceSegments",
 "xray:PutTelemetryRecords"
 ],
 "Resource": "*"
 }
 ]
}
💡 Security Best Practice: Use AWS Secrets Manager or Parameter Store for the LUMIGO_TRACER_TOKEN instead of hardcoding it in environment variables. This allows rotation and better security.

6. Visualizing Traces

Understanding Lumigo's Transaction Map

Lumigo's transaction map provides a visual representation of your request flow. Here's how to read it:

Lumigo transaction map visualization showing request flow from API Gateway through Lambda functions, DynamoDB queries, and external API calls with latency breakdown
Lumigo transaction map showing complete request flow with service interactions and timing
Transaction Map Layout: ┌─────────────────────────────────────────────────────┐ │ Transaction: Process Order (Total: 475ms) │ ├─────────────────────────────────────────────────────┤ │ │ │ API Gateway [50ms] │ │ │ │ │ ▼ │ │ Lambda: processOrder [420ms] │ │ ├─ DynamoDB: getUser [180ms] ⚠️ │ │ ├─ Lambda: validatePayment [150ms] │ │ │ └─ External: stripe [120ms] │ │ └─ DynamoDB: saveOrder [90ms] │ │ │ │ Legend: │ │ ⚠️ = Slow (>150ms) │ │ ❌ = Error │ │ 🔵 = Cold Start │ └─────────────────────────────────────────────────────┘

Key Features

Latency Waterfall Charts

Waterfall charts show the sequential timing of operations, making it easy to identify bottlenecks:

Lumigo latency waterfall chart showing sequential timing of operations with API Gateway, Lambda functions, DynamoDB queries, and external API calls with duration bars
Waterfall chart showing sequential operation timing and identifying performance bottlenecks
Waterfall Chart Example: Time (ms) 0 100 200 300 400 500 │ │ │ │ │ │ API Gateway ████ │ Lambda Start ████████████████████████████████ │ │ DynamoDB │ ████████████████ (180ms - bottleneck!) │ │ Lambda Call │ │ ████████████ │ │ │ Stripe API │ │ │ ████████ │ │ │ │ DynamoDB │ │ │ │ ████████ │ │ │ │ │ Response │ │ │ │ │ █ Total: 475ms Bottleneck: DynamoDB getUser (180ms = 38%)

Payload Inspection

Lumigo captures request and response payloads automatically, which is invaluable for debugging:

Benefits

Data Redaction

For security and compliance, redact sensitive data:

# Redact specific fields
LUMIGO_REDACT_REGEX=.*password.*|.*token.*|.*secret.*

# Redact all payloads (only keep metadata)
LUMIGO_REDACT_ALL=true

# Programmatic redaction (Node.js)
const lumigo = require('@lumigo/tracer');
lumigo.redact(['password', 'ssn', 'creditCard']);

Error and Timeout Detection

Lumigo automatically detects and highlights:

Each error includes:

Correlating Traces with Logs and Metrics

Lumigo integrates with CloudWatch Logs and X-Ray for comprehensive observability:

7. Real-World Example: E-Commerce Order Processing

Let's walk through a complete example: an e-commerce order processing system. This will show how Lumigo captures a real-world distributed flow.

Architecture

Order Processing Flow: User → API Gateway → Lambda: createOrder │ ├─► DynamoDB: Orders (write) ├─► SQS: order-queue (send message) └─► Response: Order ID SQS → Lambda: processOrder │ ├─► Lambda: validatePayment (invoke) │ └─► External API: Stripe ├─► DynamoDB: Inventory (check/update) ├─► SNS: order-processed (publish) └─► S3: order-receipt (upload) SNS → Lambda: sendConfirmationEmail └─► SES: Send email

Implementation

Here's the code for the main order processing Lambda:

// createOrder.js
const AWS = require('aws-sdk');
const dynamodb = new AWS.DynamoDB.DocumentClient();
const sqs = new AWS.SQS();

exports.handler = async (event) => {
 const orderId = generateOrderId();
 const userId = event.requestContext.authorizer.userId;
 
 // Create order in DynamoDB
 await dynamodb.put({
 TableName: 'Orders',
 Item: {
 orderId,
 userId,
 items: event.body.items,
 status: 'pending',
 createdAt: new Date().toISOString()
 }
 }).promise();
 
 // Send to processing queue
 await sqs.sendMessage({
 QueueUrl: process.env.ORDER_QUEUE_URL,
 MessageBody: JSON.stringify({ orderId, userId })
 }).promise();
 
 return {
 statusCode: 200,
 body: JSON.stringify({ orderId, status: 'created' })
 };
};
// processOrder.js
const AWS = require('aws-sdk');
const lambda = new AWS.Lambda();
const dynamodb = new AWS.DynamoDB.DocumentClient();
const sns = new AWS.SNS();

exports.handler = async (event) => {
 const { orderId, userId } = JSON.parse(event.Records[0].body);
 
 // Validate payment
 const paymentResult = await lambda.invoke({
 FunctionName: 'validatePayment',
 Payload: JSON.stringify({ orderId, userId })
 }).promise();
 
 if (JSON.parse(paymentResult.Payload).status!== 'success') {
 throw new Error('Payment validation failed');
 }
 
 // Update inventory
 for (const item of order.items) {
 await dynamodb.update({
 TableName: 'Inventory',
 Key: { productId: item.productId },
 UpdateExpression: 'SET quantity = quantity -:qty',
 ExpressionAttributeValues: { ':qty': item.quantity }
 }).promise();
 }
 
 // Publish to SNS
 await sns.publish({
 TopicArn: process.env.ORDER_PROCESSED_TOPIC,
 Message: JSON.stringify({ orderId, status: 'processed' })
 }).promise();
 
 return { statusCode: 200 };
};

Sample Trace Output

Here's what Lumigo captures for this flow (simplified JSON):

{
 "traceId": "abc123xyz",
 "transactionName": "Process Order",
 "duration": 1250,
 "spans": [
 {
 "name": "API Gateway",
 "service": "apigateway",
 "duration": 45,
 "tags": {
 "http.method": "POST",
 "http.path": "/orders",
 "http.status_code": 200
 }
 },
 {
 "name": "Lambda: createOrder",
 "service": "lambda",
 "duration": 320,
 "coldStart": false,
 "spans": [
 {
 "name": "DynamoDB.Put",
 "service": "dynamodb",
 "duration": 85,
 "tags": {
 "table": "Orders",
 "operation": "PutItem"
 }
 },
 {
 "name": "SQS.SendMessage",
 "service": "sqs",
 "duration": 45,
 "tags": {
 "queue": "order-queue"
 }
 }
 ]
 },
 {
 "name": "Lambda: processOrder",
 "service": "lambda",
 "duration": 885,
 "coldStart": true,
 "coldStartDuration": 1200,
 "spans": [
 {
 "name": "Lambda.Invoke: validatePayment",
 "service": "lambda",
 "duration": 450,
 "spans": [
 {
 "name": "HTTP Request",
 "service": "external",
 "duration": 380,
 "tags": {
 "http.url": "https://api.stripe.com/v1/charges",
 "http.status_code": 200
 }
 }
 ]
 },
 {
 "name": "DynamoDB.Update",
 "service": "dynamodb",
 "duration": 280,
 "tags": {
 "table": "Inventory"
 }
 },
 {
 "name": "SNS.Publish",
 "service": "sns",
 "duration": 155,
 "tags": {
 "topic": "order-processed"
 }
 }
 ]
 }
 ],
 "errors": [],
 "metadata": {
 "region": "us-east-1",
 "accountId": "123456789012"
 }
}

What This Trace Reveals

From this trace, we can immediately see:

8. Best Practices

Minimize Noise and Control Sampling

High-traffic applications can generate millions of traces. Use sampling to control costs and focus on what matters:

# Sample 10% of traces (reduce costs by 90%)
LUMIGO_SAMPLE_RATE=0.1

# Sample 100% of errors, 10% of successful requests
# (Requires custom logic in your code)
const lumigo = require('@lumigo/tracer');

exports.handler = lumigo.trace(async (event) => {
 const shouldTrace = event.isError || Math.random() < 0.1;
 
 if (shouldTrace) {
 lumigo.setTraceSampled(true);
 }
 
 // Your code...
});

Secure Sensitive Data

Always redact sensitive information:

# Environment variables
LUMIGO_REDACT_REGEX=.*password.*|.*token.*|.*secret.*|.*api[_-]?key.*
LUMIGO_REDACT_ALL=false # Only redact matching fields

# Programmatic (Node.js)
const lumigo = require('@lumigo/tracer');
lumigo.redact([
 'password',
 'creditCard',
 'ssn',
 'apiKey',
 'authorization'
]);

Integrate with CI/CD

Add Lumigo to your deployment pipeline:

# GitHub Actions example
name: Deploy Lambda with Lumigo

on:
 push:
 branches: [main]

jobs:
 deploy:
 runs-on: ubuntu-latest
 steps:
 - uses: actions/checkout@v3
 
 - name: Configure AWS credentials
 uses: aws-actions/configure-aws-credentials@v2
 with:
 aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
 aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
 aws-region: us-east-1
 
 - name: Deploy with Lumigo
 run: |
 aws lambda update-function-configuration \
 --function-name my-function \
 --layers arn:aws:lambda:us-east-1:114300393969:layer:lumigo-node:XXX \
 --environment Variables="{
 LUMIGO_TRACER_TOKEN=${{ secrets.LUMIGO_TOKEN }},
 LUMIGO_DEBUG=false
 }"

Use Lumigo Alerts for SLOs

Set up alerts based on trace data:

Monitor Latency Hotspots

Use Lumigo's service map to identify slow services:

9. Troubleshooting & Common Pitfalls

Missing Spans

Problem: Some operations aren't showing up in traces.

Common Causes:

Solutions:

Cold Start Confusion

Problem: Traces show high latency, but it's unclear if it's cold start or actual execution time.

Solution: Lumigo automatically marks cold starts. Look for the cold start indicator in the trace. If cold starts are frequent, consider:

Improper Environment Variables

Problem: Traces not appearing, or errors in logs.

Common Issues:

Verification:

# Check environment variables
aws lambda get-function-configuration \
 --function-name my-function \
 --query 'Environment.Variables'

# Test token validity
curl -H "Authorization: Bearer $LUMIGO_TRACER_TOKEN" \
 https://api.lumigo.io/v1/traces

Network/Permission Issues

Problem: Tracer can't send data to Lumigo collector.

Check:

10. Conclusion & Next Steps

Distributed tracing with Lumigo transforms how you understand and optimize your serverless and microservices architectures. By automatically instrumenting your functions and connecting async flows, Lumigo gives you the visibility you need to:

Recommended Next Steps

  1. Instrument All Functions: Add Lumigo to all your Lambda functions, not just the critical ones
  2. Set Up Dashboards: Create custom dashboards for your key metrics (error rate, latency, throughput)
  3. Configure Alerts: Set up alerts for error rates, latency spikes, and cold start increases
  4. Optimize Hotspots: Use trace data to identify and fix the top 5 performance bottlenecks
  5. Integrate with Logging: Connect Lumigo traces with CloudWatch Logs for complete debugging context
  6. Share with Team: Train your team on reading traces and using Lumigo for debugging
Real-World Impact: In a production deployment, implementing Lumigo distributed tracing helped us identify 8 slow API endpoints, optimize DynamoDB queries, reduce cold starts by 60%, and achieve a 57% reduction in average API response time - from 420ms to 180ms. The investment in observability paid for itself within the first month through reduced debugging time and improved user experience.

Start with instrumenting your most critical functions, then expand coverage as you see the value. The zero-code instrumentation makes it easy to get started, and the deep insights will quickly become indispensable for your operations.

Need Help Setting Up Distributed Tracing?

Our observability experts can help you implement Lumigo APM, optimize trace sampling, set up alerts, and use distributed tracing to improve your application performance. Get expert guidance on APM setup and optimization.

View Case Studies