This time, I challenged myself to build a web application entirely using AWS services. While the idea of connecting multiple services together felt slightly overwhelming at first, the process turned out to be not only educational but also incredibly fun! 😄

The goal? To create a Text-to-Speech converter, a web app where users can input any text and receive spoken audio in return, powered by Amazon Polly.

This wasn’t just about writing a Lambda function, but understanding how different AWS services like IAM, API Gateway, and Polly come together to deliver a real-world cloud solution.

What I Built:

A simple yet powerful Text-to-Speech web app that:

  1. Accepts user input via a web page
  2. Sends the text to a Lambda function
  3. Uses Amazon Polly to convert text into speech
  4. Streams back the audio as a playable response

AWS Services used:

  1. IAM
  2. AWS Lambda
  3. Amazon Polly
  4. API Gateway
  5. S3

Implementation steps.

  1. Setting up the IAM Role.

    → The first step in the process is to create an IAM Role using “Create role” action that would allow Lambda function to access other AWS Services securely.

    Searching IAM.png

    IAM Create role action.png

    → Since an AWS Service called Lambda would be using this role, the trusted entity type had to be an AWS Service and Use case being Lambda

    IAM Trusted Entity type.png

    IAM Use case.png

    → Once done, I attached the required permissions AmazonPollyFullAccess(to let function interact with Polly service) and AWSLambdaBasicExecutionRole(required for basic Lambda functionality).

    Polly permission.png

    Lambda permission.png

    → After reviewing the permissions and settings, I gave the role a meaningful name to easily identify it later and created it successfully.

    Naming the IAM Role.png

    Role Creation confirmation.png

  2. Creating the Lambda function.

    → Once the IAM role was ready, I moved on to building the core component of the project: the AWS Lambda function.

    Searching for AWS Lambda.png

    Create a function action.png

    → Since this project involved writing the code from scratch, I chose the “Author from scratch” which gives us the same Hello World program boiler plate. I then filled the basic information like function name and runtime which is Python 3.9 and chose existing IAM role which we created in the previous step.

    Lambda function basic information.png

    Existing role.png

    → Finally I added this Python code with Polly integration

    import boto3
    import base64
    import json
    
    polly_client = boto3.client('polly');
    
    def lambda_handler(event, context):
        
        text = event['queryStringParameters']['text'];
    
        response = polly_client.synthesize_speech(
            Text = text,
            OutputFormat = 'mp3',
    
            # The VoiceId is a required parameter in synthesize_speech API call of AWS Polly.
            # It tells which voice to use when converting text to speech.
            # For Example, Raveena correspond's to female's voice with Indian English accent.
            VoiceId = 'Raveena'
        )
    
        audio = response['AudioStream'].read()
    
        # This converts binary audio data from AWS Polly to text based data in Base64 format.
        # Base64 is a way to represent binary data using only text characters.
        # It is needed because browser can't directly process raw binary in JSON responses.
        encoded_audio = base64.b64encode(audio).decode('utf-8')
    
        return {
            'statusCode': 200,
            'headers': {
                'Content-type' : 'audio/mpeg',
                'Access-Control-Allow-Origin' : '*'
            },
            'body': encoded_audio,
            'isBase64Encoded': True
        }
    
  3. Testing the Code.

    → Before proceeding towards API Gateway, it was needed to test the Python code to make sure our Polly integration was working correctly. I clicked on the “Test” inside Lambda console and configured a new Test Event with a name and API Gateway AWS Proxy template.

    Test action in Lambda console.png

    Test event template.png

    → In the event JSON, I passed a sample query string parameter with the text I wanted to convert.

    {
      "resource": "/speech",
      "path": "/speech",
      "httpMethod": "GET",
      "queryStringParameters": {
        "text": "Testing AWS Lambda and Polly!"
      },
      "headers": {},
      "multiValueHeaders": {},
      "requestContext": {},
      "body": null,
      "isBase64Encoded": false
    }
    
    

    → When I ran the test, the function executed successfully and returned a long base64-encoded string.

    Test Execution successful.png

    → To confirm that this encoded string was a valid audio, I copied the response and used Base64 Guru to decode and play the audio. The result? It played exactly as expected.

    Base64 Guru.png

  4. Creating the API Gateway Endpoint.

    → After confirming that the Lambda function worked as expected, I proceeded to expose it via API Gateway so that it could be triggered through a browser-based frontend.

    API Gateway Search.png

    → I created a HTTP API and configured it with appropriate name and IP Address type.

    HTTP API.png

    API name and IP Address type.png

    → I then added a new route with the GET method and defined the path as /speech. For the integration target, I selected the Lambda function I created earlier. This essentially allowed the frontend to send a GET request with text and receive audio in return.

    API Gateway Integration.png

    API Gateway Route configuration.png

    → Finally, I created a stage named dev, enabled Auto Deploy so I wouldn’t need to manually push changes, and noted the invoke URL, which I’d use in the frontend code.

    API Gateway Invoke URL.png

    API Gateway Stage configuration.png