Building Event-Driven Architectures with AWS SQS and Lambda
Learn how to decouple microservices and build resilient, scalable systems using Amazon SQS as a message broker with Lambda consumers.
Why Event-Driven?
In traditional request-response architectures, services are tightly coupled. When Service A calls Service B synchronously, a failure in B cascades to A. Event-driven architectures break this chain by introducing a message broker between producers and consumers.
Amazon SQS (Simple Queue Service) is one of the simplest and most reliable ways to achieve this on AWS.
The Architecture
┌──────────┐ ┌─────┐ ┌──────────┐
│ Producer │────▶│ SQS │────▶│ Lambda │
│ Service │ │Queue│ │ Consumer │
└──────────┘ └─────┘ └──────────┘
The producer publishes messages to an SQS queue. Lambda polls the queue and processes messages in batches. If processing fails, messages return to the queue (or move to a dead-letter queue after max retries).
Setting Up the Queue
Using the AWS CDK with TypeScript:
import * as sqs from "aws-cdk-lib/aws-sqs";
const orderQueue = new sqs.Queue(this, "OrderQueue", {
visibilityTimeout: Duration.seconds(60),
retentionPeriod: Duration.days(7),
deadLetterQueue: {
maxReceiveCount: 3,
queue: new sqs.Queue(this, "OrderDLQ", {
retentionPeriod: Duration.days(14),
}),
},
});
Key settings:
- visibilityTimeout should be longer than your Lambda timeout to prevent duplicate processing
- deadLetterQueue catches messages that fail repeatedly so they don't block the queue
- retentionPeriod controls how long unprocessed messages stay in the queue
The Lambda Consumer
import { SQSEvent, SQSRecord } from "aws-lambda";
export async function handler(event: SQSEvent): Promise<void> {
const failedIds: string[] = [];
for (const record of event.Records) {
try {
await processRecord(record);
} catch (error) {
console.error(`Failed to process ${record.messageId}`, error);
failedIds.push(record.messageId);
}
}
// Partial batch failure reporting
if (failedIds.length > 0) {
return {
batchItemFailures: failedIds.map((id) => ({
itemIdentifier: id,
})),
} as any;
}
}
async function processRecord(record: SQSRecord): Promise<void> {
const body = JSON.parse(record.body);
// Process the order...
console.log("Processing order:", body.orderId);
}
Partial batch failure reporting is critical. Without it, if one message in a batch of 10 fails, all 10 get retried. With batchItemFailures, only the failed message retries.
Wiring It Together
import * as lambdaEventSources from "aws-cdk-lib/aws-lambda-event-sources";
consumerFn.addEventSource(
new lambdaEventSources.SqsEventSource(orderQueue, {
batchSize: 10,
maxBatchingWindow: Duration.seconds(5),
reportBatchItemFailures: true,
})
);
Monitoring and Alerting
Set CloudWatch alarms on these SQS metrics:
- ApproximateNumberOfMessagesVisible — messages waiting to be processed. A growing number means your consumer can't keep up
- ApproximateAgeOfOldestMessage — how long the oldest message has been waiting. Spike = processing bottleneck
- NumberOfMessagesReceived on the DLQ — any message here means something failed 3+ times and needs investigation
Best Practices
- Make consumers idempotent. SQS guarantees at-least-once delivery, so your handler may process the same message twice.
- Set Lambda concurrency limits. Without them, a sudden flood of messages spawns hundreds of Lambda instances and may overwhelm downstream databases.
- Use message deduplication for FIFO queues when exactly-once processing matters.
- Keep messages small. Store large payloads in S3 and pass the S3 key in the SQS message.
Conclusion
SQS + Lambda is one of the most battle-tested patterns on AWS. It gives you decoupling, automatic retry, and horizontal scaling with almost zero operational overhead. Start with a Standard queue for most use cases, and reach for FIFO only when message ordering is a hard requirement.