Skip to main content

To add the realtime audio transcriptions in a Dyte meeting you can use Google Cloud Speech-to-Text and Cloud translation APIs.

These Google services are paid, a Google Cloud account is required to proceed.


This integration is Web only at the moment

Integration Steps

1. Setup Google Cloud Credentials

You must have a project & a service account with GCP (Google Cloud Platform) to use Google transcriptions. Make sure that service account allows Google Speech-to-Text and Google Translation API.

Once done, download the keys for the service account.

2. Setup a Server

Setup a server to forward the Audio Data from client to Google Cloud. You don't want to put your GCP credentials on client side and therefore need a server which forwards audio data to Google Cloud

For this, we have provided a sample in NodeJS for you to checkout (dyte-io/google-transcription)[]. Please find it here. Currently, we only have NodeJS samples; if you're working on a different backend, feel free to port this code.

To use this sample, please clone the repository using the following command.

git clone

2.1 Environment Setup

cp .env.example .env

Edit the .env file as per your GCP service account credentials and Save it.

Note: PRIVATE_KEY should be in a single line. Try picking the value from the service account's key's JSON file as is.

2.2 Run the server

npm install

This would automatically install @google-cloud/speech, and @google-cloud/translate.

npm run dev

The HTTP endpoint where this server is accessible will now be called backend_url for remaining section of the guide

Frontend Setup

3.1 Installation

npm install @dytesdk/google-transcription

Source available at (dyte-io/google-transcription)(

3.2 Integrate

The second step is to look for the place in your codebase where you are initiating a Dyte meeting.

Once you have found the place and got a hold of the meeting object, add the following code to the file to import the SDK.

import DyteGoogleSpeechRecognition from '@dytesdk/google-transcription';

Add the following code just after the point where you have access to the meeting object.

const speech = new DyteGoogleSpeechRecognition({
meeting, // Dyte meeting object from DyteClient.init
target: 'hi', // Language that the current user wants to see
source: 'en-US', // Language that the current user would speak in
baseUrl: <backend-url>, // Backend URL from step 2.2

speech.on('transcription', async (data) => {
// ... do something with transcription


Here you are setting up the GoogleSpeechRecognition with the values that the current user would prefer and activating the recognition just afterward using speech.transcribe(). Then we listen to every new transcription using speech.on('transcription', aJsCallbackFunction)

To see the support languages, please refer to

With this, you would now be able to receive the live transcriptions. Feel free to put them in UI as per your need.

If you need a sample of this guide, please refer to