To add the realtime audio transcriptions in a Dyte meeting you can use Google Cloud Speech-to-Text and Cloud translation APIs.
These Google services are paid, a Google Cloud account is required to proceed.
This integration is Web only at the moment
Integration Steps
1. Setup Google Cloud Credentials​
You must have a project & a service account with GCP (Google Cloud Platform) to use Google transcriptions. Make sure that service account allows Google Speech-to-Text and Google Translation API.
Once done, download the keys for the service account.
2. Setup a Server​
Setup a server to forward the Audio Data from client to Google Cloud. You don't want to put your GCP credentials on client side and therefore need a server which forwards audio data to Google Cloud
For this, we have provided a sample in NodeJS for you to checkout (dyte-io/google-transcription)[https://github.com/dyte-io/google-transcription/tree/main/server]. Please find it here. Currently, we only have NodeJS samples; if you're working on a different backend, feel free to port this code.
To use this sample, please clone the repository using the following command.
git clone git@github.com:dyte-io/google-transcription.git
2.1 Environment Setup​
cp .env.example .env
Edit the .env
file as per your GCP service account credentials and Save it.
Note: PRIVATE_KEY should be in a single line. Try picking the value from the service account's key's JSON file as is.
2.2 Run the server​
npm install
This would automatically install @google-cloud/speech, and @google-cloud/translate.
npm run dev
The HTTP endpoint where this server is accessible will now be called backend_url
for remaining section of the guide
Frontend Setup​
3.1 Installation​
npm install @dytesdk/google-transcription
Source available at (dyte-io/google-transcription)(https://github.com/dyte-io/google-transcription/tree/main/client)
3.2 Integrate​
The second step is to look for the place in your codebase where you are initiating a Dyte meeting.
Once you have found the place and got a hold of the meeting object, add the following code to the file to import the SDK.
import DyteGoogleSpeechRecognition from '@dytesdk/google-transcription';
Add the following code just after the point where you have access to the meeting object.
const speech = new DyteGoogleSpeechRecognition({
meeting, // Dyte meeting object from DyteClient.init
target: 'hi', // Language that the current user wants to see
source: 'en-US', // Language that the current user would speak in
baseUrl: <backend-url>, // Backend URL from step 2.2
});
speech.on('transcription', async (data) => {
// ... do something with transcription
});
speech.transcribe();
Here you are setting up the GoogleSpeechRecognition with the values that the current user would prefer and activating the recognition just afterward using speech.transcribe(). Then we listen to every new transcription using speech.on('transcription', aJsCallbackFunction)
To see the support languages, please refer to
- https://cloud.google.com/speech-to-text/docs/speech-to-text-supported-languages
- https://cloud.google.com/translate/docs/languages
With this, you would now be able to receive the live transcriptions. Feel free to put them in UI as per your need.
If you need a sample of this guide, please refer to https://github.com/dyte-io/google-transcription/blob/main/client/demo/index.ts