LingoX API Documentation

This document provides comprehensive instructions on how to interact with the LingoX real-time translation API.

Back to App

REST API Endpoints

GET /api/v1/languages

Fetches the list of supported languages, along with default source and target languages for UI configuration.

Success Response (200)

{
  "supported_languages": [
    ["en-US", "English"],
    ["vi", "Vietnamese"],
    ...
  ],
  "default_source": "en-US",
  "default_target": "vi"
}

Error Response (503)

{
  "error": "Service temporarily unavailable."
}

Languages Table

Language Code	Language Name

POST /api/v1/session/create

Creates a session (a room) for a Multi-Device Mode conversation. Sessions expire after 15 minutes (900 seconds) of inactivity.

Request Body

{
  "language": "en-US",
  "translate_language": "vi"
}

Success Response (200)

{
  "conversation_id": "a-unique-uuid-string",
  "join_url": "http://your-host/join/a-unique-uuid-string"
}

Error Response (503)

{
  "error": "Redis service is unavailable."
}

GET /api/v1/session/{conversation_id}

Retrieves details for an existing multi-device session, primarily to check the initiator's language settings.

Success Response (200)

{
  "user_a_lang": "en-US",
  "user_a_translate": "vi"
}

Error Response (404)

{
  "error": "Session not found."
}

Error Response (503)

{
  "error": "Redis service is unavailable."
}

WebSocket Endpoints & Data Formats

Flow 1: Personal & Single-Device Modes

These modes use a single, stateful WebSocket connection. The client sends a configuration message to set the languages and can send a new one at any time to change them.

Endpoint

wss://your-domain.com/api/v1/ws/single

Sequence of Events

Client establishes a WebSocket connection.
Client sends a JSON `config` message to set the source and target languages.
Client begins streaming binary audio data.
Server streams back JSON messages containing transcription and translation results.
To switch languages, the client sends a new `config` message and then resumes streaming audio.

Client-to-Server Messages

Language Configuration (JSON): Sent once at the beginning and anytime the languages need to change.
```
{
  "type": "config",
  "source_lang": "en-US",
  "target_lang": "vi"
}
```
Audio Data (Binary): Raw audio chunks from the microphone. The audio should be in a format compatible with Deepgram's real-time API (typically PCM, WAV, or similar formats).

Server-to-Client Messages (JSON)

{
  "is_final": false,
  "original": "The transcribed text from the source language.",
  "translation": "The translated text in the target language."
}

is_final: false: An interim, non-final result. The UI should display this for responsiveness but expect it to change.
is_final: true: A final, complete utterance. This is the definitive transcript and translation for a sentence or phrase.

Flow 2: Multi-Device Mode

This mode uses a REST endpoint to create a session and then a unique WebSocket endpoint for each participant in that session.

Endpoint

wss://your-domain.com/api/v1/ws/{conversation_id}?language={user_language_code}

Example: wss://localhost:5566/api/v1/ws/xyz-123?language=en-US

Connection Errors

Code 1008: Invalid session ID - The conversation_id doesn't exist, has expired, or Redis is unavailable.

Query Parameters

language (required): The user's source language code (e.g., "en-US", "vi"). If not provided, defaults to "en-US".

Sequence of Events

Client A (Initiator) calls `POST /api/v1/session/create` to get a `conversation_id`.
Client A connects to the WebSocket endpoint using the `conversation_id` and their chosen language as a query parameter.
Client B (Joiner) gets the `conversation_id` (e.g., via QR code or link) and connects to the same WebSocket endpoint with their chosen language.
The server notifies clients when users join or leave via system messages.
When a client sends audio, the server sends interim transcripts back to the speaker and the final, translated transcript to both participants.

Client-to-Server Messages

Audio Data (Binary): Raw audio chunks from the microphone. The audio should be in a format compatible with Deepgram's real-time API (typically PCM, WAV, or similar formats).

Server-to-Client Messages (JSON)

System Events:

// Notifies clients of connection status changes
{
  "type": "user_joined" | "user_left",
  "client_count": 2
}

// Informs a client of their partner's language upon connection
{
  "type": "partner_language",
  "lang": "vi"
}

Transcription/Translation:

{
  "is_mine": true,
  "is_interim": false,
  "original": "The transcribed text.",
  "translation": "The translated text (or original if no translation needed)."
}

is_mine: true if the message originated from this client; false otherwise. This allows the UI to position the message correctly (e.g., on the right or left side of a chat window).
is_interim: true for real-time, partial transcripts sent only to the speaker (`is_mine: true`) for a responsive feel. false for the final, complete utterance sent to all clients in the room.