Connecting the dots with OpenTelemetry — Part-II

Mohit Shukla
Engineering at Bureau
6 min readMay 25, 2023

--

Co-Authored by Nisarg Tike

Objective

The objective of this document is to understand how we are instrumenting AWS Lambda with OpenTelemetry.

Background

In our previous blog post, we introduced OpenTelemetry and discussed its implementation at Bureau. Today, we are going to delve deeper into our infrastructure and identify a significant blind spot that needs addressing.

As an identity verification provider, we sometimes rely on third-party sources like pan and driving license databases. However, we often face issues with the reliability of these services, which can affect the accuracy of our service as well. To improve our offering and give our customers better error feedback, we need to gain more intelligence from our systems to better understand how these services are performing.

At present, We have majorly all of our applications instrumented with OTEL. We get traces that are being used for observability, alerting, and performance analysis from systems. We had no visibility into what happens when the end service makes calls to downstream systems, such as Lambda functions, and further to third parties. This causes a gap in our understanding of the performance behavior and downtimes of these third parties. hence, we decided to instrument the lambdas with OTEL and use the span from the services to lambda to understand the system better.

Introduction

AWS Lambda runs in a container independently as a serverless unit. Once the execution is done and the response is returned, it freezes the execution environment. Due to this, async processes that run on the side are killed which is why exporting tracing data is unreliable.

We can use AWS Lambda layers that allow developers to package and reuse shared code and dependencies, such as libraries and custom runtime, across multiple functions to solve this. The Lambda Layer is subscribed to the Lambda function’s telemetry API, providing access to trace data even after the function execution has been completed. Additionally, AWS has its own fork of the OpenTelemetry Collector i.e. ADOT which can be used as a Lambda Layer to collect and process telemetry data.

OTEL Lamba Layer

How does it work?

  • The ADOT layer makes the OTEL collector listen on port 4318 on HTTP. The spans are being pushed to the ADOT collector using lambda extensions telemetry API on this endpoint.
  • OpenTelemetry Collector (ADOT) is responsible for gathering trace data from the Lambda Layers and forwarding it to a central OTEL collector service. The collector is hosted internally on AWS ECS and published on gRPC protocol on port 4317 to receive the data. The collected data is then processed and exported to a service like New Relic and Honeycomb for further analysis and visualization.

Steps

  1. Following is the instrumentation code to be added. `main` function can be replaced as follows:
import (
otelExporter "github.com/Bureau-Inc/overwatch-common/otel"
sdktrace "go.opentelemetry.io/otel/sdk/trace"
"go.opentelemetry.io/contrib/instrumentation/github.com/aws/aws-lambda-go/otellambda"
"go.opentelemetry.io/otel/trace"
)

type OtelExporterConfig struct {
SvcName string
ApiKey string
Endpoint string
UseHTTP bool
}

var AWSLambdaFunctionName = "AWS_LAMBDA_FUNCTION_NAME"

func main() {
exporterConfig := otelExporter.OtelExporterConfig{
SvcName: os.Getenv(AWSLambdaFunctionName),
Endpoint: os.Getenv("OTEL_EXPORTER_ENDPOINT"),
UseHTTP: true,
}

// install otel exporter
cleanup := otelExporter.InstallOtelExporter(context.Background(), exporterConfig)
defer cleanup()
lambda.Start(otellambda.InstrumentHandler(Handler))
}

func InstallOtelExporter(ctx context.Context, cfg OtelExporterConfig) func() {

opts := []otlptracehttp.Option{
otlptracehttp.WithEndpoint(cfg.Endpoint),
}

if cfg.UseHTTP {
opts = append(opts, otlptracehttp.WithInsecure())
}

client := otlptracehttp.NewClient(opts...)

exporter, err := otlptrace.New(ctx, client)
if err != nil {
log.Fatalf("creating OTLP trace exporter: %v", err)
}

tracerProvider := sdktrace.NewTracerProvider(
sdktrace.WithBatcher(exporter),
sdktrace.WithResource(newResource(cfg.SvcName)),
)

otel.SetTextMapPropagator(propagation.NewCompositeTextMapPropagator(propagation.TraceContext{}, propagation.Baggage{}))
otel.SetTracerProvider(tracerProvider)

return func() {
if err := tracerProvider.Shutdown(ctx); err != nil {
log.Fatalf("stopping tracer provider: %v", err)
}
}
}

We are using our common package of otelExporter, instead of this official otel exporter packages can be used.

2. The following snippet will update the context in the Handler function.

import (
"go.opentelemetry.io/otel/trace"
)

type(
LambdaRequest struct {
Body string `json:"body"`
MultiValueHeaders map[string][] string `json:"multiValueHeaders"`
}

LambdaResponse struct {
StatusCode int `json:"statusCode"`
MultiValueHeaders map[string][] string `json:"multiValueHeaders"`
Body string `json:"body"`
}
)


func Handler(ctx context.Context, req LambdaRequest)(LambdaResponse, error) {
...
traceId: = getTraceId(req)
spanId: = getSpanId(req)
spanContext, err: = constructNewSpanContext(traceId, spanId)
if err != nil {
log.Printf("Error constructNewSpanContext %s", err.Error())
}
ctx = trace.ContextWithSpanContext(ctx, spanContext)
...
}

func getTraceId(req LambdaRequest) string {
if req.MultiValueHeaders != nil {
if val, ok: = req.MultiValueHeaders["x-bureau-trace-id"];
ok {
if len(val) > 0 {
return val[0]
}
}
}
return ""
}

func getSpanId(req LambdaRequest) string {
if req.MultiValueHeaders != nil {
if val, ok: = req.MultiValueHeaders["x-bureau-span-id"];
ok {
if len(val) > 0 {
return val[0]
}
}
}
return ""
}

func constructNewSpanContext(traceId, spanId string)(spanContext trace.SpanContext, err error) {
traceID, err: = trace.TraceIDFromHex(traceId)
if err != nil {
return spanContext, err
}
spanID, err: = trace.SpanIDFromHex(spanId)
if err != nil {
return spanContext, err
}
spanContextConfig: = trace.SpanContextConfig {
TraceID: traceID,
SpanID: spanID,
TraceFlags: 01,
Remote: false,
}
spanContext = trace.NewSpanContext(spanContextConfig)
return spanContext, nil
}

Adding Span Attributes

When manually instrumenting an application with OpenTelemetry, it’s important to include default HTTP span attributes in order to provide additional contextual information for each span. These attributes can later be used for analysis, visualization, querying, and data filtering.

import (
otelExporter "github.com/Bureau-Inc/overwatch-common/otel"
"go.opentelemetry.io/otel/trace"
"net/http"
"go.opentelemetry.io/otel/attribute"
)

type (
HttpContext struct {
Client http.Client
URL string
Method string
ReqBody []byte
ReqHeaders map[string]string
RespBody []byte
StatusCode int
RespHeaders http.Header
}
)


func NewClientSpan(ctx context.Context, spanName string) ((ctx context.Context, trace.Span) {
sctx, span := startSpan(ctx, spanName, trace.SpanKindClient)
return sctx, span
}

func SpanEndWithError(span trace.Span, err error) {
span.RecordError(err)
span.SetStatus(codes.Error, err.Error())
span.End()
}

func startSpan(ctx context.Context, spanName string, kind trace.SpanKind) (context.Context, trace.Span) {
return otel.GetTracerProvider().Tracer("").Start(ctx, spanName, trace.WithSpanKind(kind))
}

func (r *HttpContext) TriggerRequest(ctx context.Context) error {
bctx, span := otelExporter.NewClientSpan(ctx, "calling-http-endpoint")
defer span.End()

httpRequest, err := http.NewRequestWithContext(
ctx,
strings.ToUpper(r.Method),
r.URL,
strings.NewReader(string(r.ReqBody)),
)

span.SetAttributes(attribute.String("http.url", r.URL))
span.SetAttributes(attribute.String("http.flavour", httpRequest.Proto))
span.SetAttributes(attribute.String("http.host", httpRequest.Host))
span.SetAttributes(attribute.String("http.method", httpRequest.Method))
span.SetAttributes(attribute.String("http.request_content_length", fmt.Sprint(httpRequest.ContentLength)))
span.SetAttributes(attribute.String("http.scheme", httpRequest.URL.Scheme))
span.SetAttributes(attribute.String("http.target", httpRequest.URL.Path))
span.SetAttributes(attribute.String("http.user_agent", httpRequest.UserAgent()))
if err != nil {
otelExporter.SpanEndWithError(span, err)
...
}
statusCode = "200"
span.SetAttributes(attribute.String("http.statusCode", statusCode))
...
}

Please remembers to close the span whenever an error is encountered, in the case of an HTTP error from the third party, we can add an attribute for the HTTP status code as well.

if err, ok := err.(net.Error); ok && err.Timeout() {
log.ERROR("Gateway timeout!", tag.NewAnyError(err))
eventContext.ResponseCode = http.StatusGatewayTimeout
span.SetAttributes(attribute.String("http.status_code", fmt.Sprint(http.StatusGatewayTimeout)))
otelExporter.SpanEndWithError(span, err)
return gce.GatewayTimeoutAPIError()
}

We can use the below “collector.yaml” for the otel collector lambda layer.

receivers:
otlp:
protocols:
grpc:
http:

exporters:
logging:
loglevel: debug
otlp:
endpoint: otel-collector-service:4317
tls:
insecure: true

service:
pipelines:
traces:
receivers: [otlp]
exporters: [logging, otlp]

Update the environmental variables for the lambda to enable the otel collector.

AWS_LAMBDA_EXEC_WRAPPER: /opt/otel-instrument
OPENTELEMETRY_COLLECTOR_CONFIG_FILE: /var/task/otel/collector.yaml
OTEL_EXPORTER_ENDPOINT: 0.0.0.0:4318
OTEL_PROPAGATORS: tracecontext

Command to update the function:

aws lambda update-function-configuration --function-name lambda-name --runtime provided.al2 \
--layers arn:aws:lambda:ap-south-1:901920570463:layer:aws-otel-collector-amd64-ver-0-68-0:1 \
--environment "Variables={OPENTELEMETRY_COLLECTOR_CONFIG_FILE=/var/task/otel/collector.yaml,AWS_LAMBDA_EXEC_WRAPPER=/opt/otel-instrument,OTEL_EXPORTER_ENDPOINT=0.0.0.0:4318,OTEL_PROPAGATORS=tracecontext}"

In action:

Span for HTTP requests.

Summary

In this article, we discuss how AWS Lambda can be instrumented with OpenTelemetry to gain more visibility into system behavior and improve the accuracy of services that rely on third-party sources. We saw the instrumentation of the Golang lambda functions with OpenTelemetry, and the ADOT lambda layer to collect the function’s tracing data. The blog provides steps and code samples for how to set up and use ADOT to collect and export telemetry data to a central collector service for further processing.

References

Thank you for reading! I hope you found this article informative and helpful. If you have any feedback or questions, please don’t hesitate to contact me.

Special Thanks to Shekh Ataul nandeeshwar kodihalli Abhinav Mishra Parth Shrivastava from Bureau Team, Jessica Kerr, and Martin Thwaites from Honeycomb.

--

--