speechmatics.models

Data models and message types used by the library.

class speechmatics.models._TranscriptionConfig(language=None, **kwargs)[source]

Base model for defining transcription parameters.

additional_vocab: dict = None

Additional vocabulary that is not part of the standard language.

asdict() Dict[Any, Any][source]

Returns model as a dict while excluding None values recursively.

diarization: str = None

Indicates type of diarization to use, if any.

domain: str = None

Optionally request a language pack optimized for a specific domain, e.g. ‘finance’

enable_entities: bool = None

Indicates if inverse text normalization entity output is enabled.

language: str = 'en'

ISO 639-1 language code. eg. en

operating_point: str = None

Specifies which acoustic model to use.

output_locale: str = None

RFC-5646 language code for transcript output. eg. en-AU

punctuation_overrides: dict = None

Permitted puctuation marks for advanced punctuation.

class speechmatics.models.AudioSettings(encoding: Optional[str] = None, sample_rate: int = 44100, chunk_size: int = 4096)[source]

Real-time: Defines audio parameters.

chunk_size: int = 4096

Chunk size.

encoding: str = None

Encoding format when raw audio is used. Allowed values are pcm_f32le, pcm_s16le and mulaw.

sample_rate: int = 44100

Sampling rate in hertz.

class speechmatics.models.BatchSpeakerDiarizationConfig(speaker_sensitivity: Optional[int] = None)[source]

Batch mode: Speaker diarization config.

speaker_sensitivity: int = None

The sensitivity of the speaker detection.

class speechmatics.models.BatchTranscriptionConfig(language=None, **kwargs)[source]

Batch: Defines transcription parameters for batch requests. The .as_config() method will return it wrapped into a Speechmatics json config.

channel_diarization_labels: List[str] = None

Add your own speaker or channel labels to the transcript

fetch_data: speechmatics.models.FetchData = None

Optional configuration for fetching file for transcription.

notification_config: speechmatics.models.NotificationConfig = None

Optional configuration for callback notification.

speaker_diarization_config: speechmatics.models.BatchSpeakerDiarizationConfig = None

The sensitivity of the speaker detection.

srt_overrides: speechmatics.models.SRTOverrides = None

Optional configuration for SRT output.

class speechmatics.models.ClientMessageType(value)[source]

Real-time: Defines various messages sent from client to server.

AddAudio = 'AddAudio'

Adds more audio data to the recognition job. The server confirms receipt by sending an ServerMessageType.AudioAdded message.

EndOfStream = 'EndOfStream'

Indicates that the client has no more audio to send.

SetRecognitionConfig = 'SetRecognitionConfig'

Allows the client to re-configure the recognition session.

StartRecognition = 'StartRecognition'

Initiates a recognition job based on configuration set previously.

class speechmatics.models.ConnectionSettings(url: str, message_buffer_size: int = 512, ssl_context: ssl.SSLContext = <factory>, semaphore_timeout_seconds: float = 120, ping_timeout_seconds: float = 60, auth_token: typing.Optional[str] = None, generate_temp_token: typing.Optional[bool] = False)[source]

Defines connection parameters.

auth_token: str = None

auth token to authenticate a customer. This auth token is only applicable for RT-SaaS.

generate_temp_token: Optional[bool] = False

Automatically generate a temporary token for authentication. Non-enterprise customers must set this to True. Enterprise customers should set this to False.

message_buffer_size: int = 512

Message buffer size in bytes.

ping_timeout_seconds: float = 60

Ping-pong timeout in seconds.

semaphore_timeout_seconds: float = 120

Semaphore timeout in seconds.

ssl_context: ssl.SSLContext

SSL context.

url: str

Websocket server endpoint.

class speechmatics.models.FetchData(url: str, auth_headers: Optional[str] = None)[source]

Batch: Optional configuration for fetching file for transcription.

auth_headers: str = None

A list of additional headers to be added to the input fetch request when using http or https. This is intended to support authentication or authorization, for example by supplying an OAuth2 bearer token

url: str

URL to fetch

class speechmatics.models.NotificationConfig(url: str, contents: Optional[str] = None, method: str = 'post', auth_headers: Optional[str] = None)[source]

Batch: Optional configuration for callback notification.

auth_headers: str = None

A list of additional headers to be added to the notification request when using http or https. This is intended to support authentication or authorization, for example by supplying an OAuth2 bearer token

contents: str = None

Specifies a list of items to be attached to the notification message. When multiple items are requested, they are included as named file attachments.

method: str = 'post'

The HTTP(S) method to be used. Only post and put are supported.

url: str

URL for notification. The id and status query parameters will be added.

class speechmatics.models.RTSpeakerDiarizationConfig(max_speakers: Optional[int] = None)[source]

Real-time mode: Speaker diarization config.

max_speakers: int = None

This enforces the maximum number of speakers allowed in a single audio stream.

class speechmatics.models.SRTOverrides(max_line_length: int = 37, max_lines: int = 2)[source]

Batch: Optional configuration for SRT output.

max_line_length: int = 37

Maximum count of characters per subtitle line including white space

max_lines: int = 2

Sets maximum count of lines in a subtitle section

class speechmatics.models.ServerMessageType(value)[source]

Real-time: Defines various message types sent from server to client.

AddPartialTranscript = 'AddPartialTranscript'

Indicates a partial transcript, which is an incomplete transcript that is immediately produced and may change as more context becomes available.

AddTranscript = 'AddTranscript'

Indicates the final transcript of a part of the audio.

AudioAdded = 'AudioAdded'

Server response to ClientMessageType.AddAudio, indicating that audio has been added successfully.

EndOfTranscript = 'EndOfTranscript'

Server response to ClientMessageType.EndOfStream, after the server has finished sending all AddTranscript messages.

Error = 'Error'

Indicates n generic error message.

Info = 'Info'

Indicates a generic info message.

RecognitionStarted = 'RecognitionStarted'

Server response to ClientMessageType.StartRecognition, acknowledging that a recognition session has started.

Warning = 'Warning'

Indicates a generic warning message.

class speechmatics.models.TranscriptionConfig(language=None, **kwargs)[source]

Real-time: Defines transcription parameters.

enable_partials: bool = None

Indicates if partial transcription, where words are produced immediately, is enabled.

max_delay: float = None

Maximum acceptable delay.

max_delay_mode: str = None

Determines whether the threshold specified in max_delay can be exceeded if a potential entity is detected. Flexible means if a potential entity is detected, then the max_delay can be overriden until the end of that entity. Fixed means that max_delay specified ignores any potential entity that would not be completed within that threshold.

speaker_change_sensitivity: float = None

Sensitivity level for speaker change.

speaker_diarization_config: speechmatics.models.RTSpeakerDiarizationConfig = None

Configuration for speaker diarization.