Building a Text-to-Speech Converter using AWS Lambda and Polly.

This time, I challenged myself to build a web application entirely using AWS services. While the idea of connecting multiple services together felt slightly overwhelming at first, the process turned out to be not only educational but also incredibly fun! 😄

The goal? To create a Text-to-Speech converter, a web app where users can input any text and receive spoken audio in return, powered by Amazon Polly.

This wasn’t just about writing a Lambda function, but understanding how different AWS services like IAM, API Gateway, and Polly come together to deliver a real-world cloud solution.

What I Built:

A simple yet powerful Text-to-Speech web app that:

Accepts user input via a web page
Sends the text to a Lambda function
Uses Amazon Polly to convert text into speech
Streams back the audio as a playable response

AWS Services used:

IAM
AWS Lambda
Amazon Polly
API Gateway
S3

Implementation steps.

Setting up the IAM Role.

→ The first step in the process is to create an IAM Role using “Create role” action that would allow Lambda function to access other AWS Services securely.

→ Since an AWS Service called Lambda would be using this role, the trusted entity type had to be an AWS Service and Use case being Lambda

→ Once done, I attached the required permissions AmazonPollyFullAccess(to let function interact with Polly service) and AWSLambdaBasicExecutionRole(required for basic Lambda functionality).

→ After reviewing the permissions and settings, I gave the role a meaningful name to easily identify it later and created it successfully.

Creating the Lambda function.

→ Once the IAM role was ready, I moved on to building the core component of the project: the AWS Lambda function.

Searching for AWS Lambda.png

Create a function action.png

→ Since this project involved writing the code from scratch, I chose the “Author from scratch” which gives us the same Hello World program boiler plate. I then filled the basic information like function name and runtime which is Python 3.9 and chose existing IAM role which we created in the previous step.

Lambda function basic information.png

Existing role.png

→ Finally I added this Python code with Polly integration

import boto3
import base64
import json

polly_client = boto3.client('polly');

def lambda_handler(event, context):
    
    text = event['queryStringParameters']['text'];

    response = polly_client.synthesize_speech(
        Text = text,
        OutputFormat = 'mp3',

        # The VoiceId is a required parameter in synthesize_speech API call of AWS Polly.
        # It tells which voice to use when converting text to speech.
        # For Example, Raveena correspond's to female's voice with Indian English accent.
        VoiceId = 'Raveena'
    )

    audio = response['AudioStream'].read()

    # This converts binary audio data from AWS Polly to text based data in Base64 format.
    # Base64 is a way to represent binary data using only text characters.
    # It is needed because browser can't directly process raw binary in JSON responses.
    encoded_audio = base64.b64encode(audio).decode('utf-8')

    return {
        'statusCode': 200,
        'headers': {
            'Content-type' : 'audio/mpeg',
            'Access-Control-Allow-Origin' : '*'
        },
        'body': encoded_audio,
        'isBase64Encoded': True
    }

Testing the Code.

→ Before proceeding towards API Gateway, it was needed to test the Python code to make sure our Polly integration was working correctly. I clicked on the “Test” inside Lambda console and configured a new Test Event with a name and API Gateway AWS Proxy template.

→ In the event JSON, I passed a sample query string parameter with the text I wanted to convert.
```
{
  "resource": "/speech",
  "path": "/speech",
  "httpMethod": "GET",
  "queryStringParameters": {
    "text": "Testing AWS Lambda and Polly!"
  },
  "headers": {},
  "multiValueHeaders": {},
  "requestContext": {},
  "body": null,
  "isBase64Encoded": false
}
```
→ When I ran the test, the function executed successfully and returned a long base64-encoded string.

→ To confirm that this encoded string was a valid audio, I copied the response and used Base64 Guru to decode and play the audio. The result? It played exactly as expected.
Creating the API Gateway Endpoint.

→ After confirming that the Lambda function worked as expected, I proceeded to expose it via API Gateway so that it could be triggered through a browser-based frontend.

→ I created a HTTP API and configured it with appropriate name and IP Address type.

→ I then added a new route with the GET method and defined the path as /speech. For the integration target, I selected the Lambda function I created earlier. This essentially allowed the frontend to send a GET request with text and receive audio in return.

→ Finally, I created a stage named dev, enabled Auto Deploy so I wouldn’t need to manually push changes, and noted the invoke URL, which I’d use in the frontend code.