speechmatics.models

Data models and message types used by the library.

class speechmatics.models._TranscriptionConfig(language=None, **kwargs)[source]

Base model for defining transcription parameters.

additional_vocab: dict = None

Additional vocabulary that is not part of the standard language.

asdict() Dict[Any, Any][source]

Returns model as a dict while excluding None values recursively.

diarization: str = None

Indicates type of diarization to use, if any.

domain: str = None

Optionally request a language pack optimized for a specific domain, e.g. ‘finance’

enable_entities: bool = None

Indicates if inverse text normalization entity output is enabled.

language: str = 'en'

ISO 639-1 language code. eg. en

operating_point: str = None

Specifies which acoustic model to use.

output_locale: str = None

RFC-5646 language code for transcript output. eg. en-AU

punctuation_overrides: dict = None

Permitted puctuation marks for advanced punctuation.

class speechmatics.models.AudioSettings(encoding: Optional[str] = None, sample_rate: int = 44100, chunk_size: int = 4096)[source]

Real-time: Defines audio parameters.

chunk_size: int = 4096

Chunk size.

encoding: str = None

Encoding format when raw audio is used. Allowed values are pcm_f32le, pcm_s16le and mulaw.

sample_rate: int = 44100

Sampling rate in hertz.

class speechmatics.models.BatchConnectionSettings(url: str, message_buffer_size: int = 512, ssl_context: ssl.SSLContext = <factory>, semaphore_timeout_seconds: float = 120, ping_timeout_seconds: float = 60, auth_token: Union[str, NoneType] = None, generate_temp_token: Union[bool, NoneType] = False)[source]
class speechmatics.models.BatchLanguageIdentificationConfig(expected_languages: Optional[List[str]] = None)[source]

Batch mode: Language identification config.

expected_languages: List[str] = None

Expected languages for language identification

class speechmatics.models.BatchSpeakerDiarizationConfig(speaker_sensitivity: Optional[float] = None)[source]

Batch mode: Speaker diarization config.

speaker_sensitivity: float = None

The sensitivity of the speaker detection. This is a number between 0 and 1, where 0 means least sensitive and 1 means most sensitive.

class speechmatics.models.BatchTranscriptionConfig(language=None, **kwargs)[source]

Batch: Defines transcription parameters for batch requests. The .as_config() method will return it wrapped into a Speechmatics json config.

channel_diarization_labels: List[str] = None

Add your own speaker or channel labels to the transcript

fetch_data: speechmatics.models.FetchData = None

Optional configuration for fetching file for transcription.

language_identification_config: speechmatics.models.BatchLanguageIdentificationConfig = None

Optional configuration for language identification.

notification_config: speechmatics.models.NotificationConfig = None

Optional configuration for callback notification.

sentiment_analysis_config: Optional[speechmatics.models.SentimentAnalysisConfig] = None

Optional configuration for sentiment analysis of the transcript

speaker_diarization_config: speechmatics.models.BatchSpeakerDiarizationConfig = None

The sensitivity of the speaker detection.

srt_overrides: speechmatics.models.SRTOverrides = None

Optional configuration for SRT output.

summarization_config: speechmatics.models.SummarizationConfig = None

Optional configuration for transcript summarization.

topic_detection_config: Optional[speechmatics.models.TopicDetectionConfig] = None

Optional configuration for detecting topics of the transcript

translation_config: speechmatics.models.TranslationConfig = None

Optional configuration for translation.

class speechmatics.models.BatchTranslationConfig(target_languages: Optional[List[str]] = None)[source]

Batch mode: Translation config.

class speechmatics.models.ClientMessageType(value)[source]

Real-time: Defines various messages sent from client to server.

AddAudio = 'AddAudio'

Adds more audio data to the recognition job. The server confirms receipt by sending an ServerMessageType.AudioAdded message.

EndOfStream = 'EndOfStream'

Indicates that the client has no more audio to send.

SetRecognitionConfig = 'SetRecognitionConfig'

Allows the client to re-configure the recognition session.

StartRecognition = 'StartRecognition'

Initiates a recognition job based on configuration set previously.

class speechmatics.models.ConnectionSettings(url: str, message_buffer_size: int = 512, ssl_context: ssl.SSLContext = <factory>, semaphore_timeout_seconds: float = 120, ping_timeout_seconds: float = 60, auth_token: typing.Optional[str] = None, generate_temp_token: typing.Optional[bool] = False)[source]

Defines connection parameters.

auth_token: Optional[str] = None

auth token to authenticate a customer.

generate_temp_token: Optional[bool] = False

Automatically generate a temporary token for authentication. Non-enterprise customers must set this to True. Enterprise customers should set this to False.

message_buffer_size: int = 512

Message buffer size in bytes.

ping_timeout_seconds: float = 60

Ping-pong timeout in seconds.

semaphore_timeout_seconds: float = 120

Semaphore timeout in seconds.

ssl_context: ssl.SSLContext

SSL context.

url: str

Websocket server endpoint.

class speechmatics.models.FetchData(url: str, auth_headers: Optional[str] = None)[source]

Batch: Optional configuration for fetching file for transcription.

auth_headers: str = None

A list of additional headers to be added to the input fetch request when using http or https. This is intended to support authentication or authorization, for example by supplying an OAuth2 bearer token

url: str

URL to fetch

class speechmatics.models.NotificationConfig(url: str, contents: Optional[List[str]] = None, method: str = 'post', auth_headers: Optional[List[str]] = None)[source]

Batch: Optional configuration for callback notification.

auth_headers: List[str] = None

A list of additional headers to be added to the notification request when using http or https. This is intended to support authentication or authorization, for example by supplying an OAuth2 bearer token

contents: List[str] = None

Specifies a list of items to be attached to the notification message. When multiple items are requested, they are included as named file attachments.

method: str = 'post'

The HTTP(S) method to be used. Only post and put are supported.

url: str

URL for notification. The id and status query parameters will be added.

class speechmatics.models.RTConnectionSettings(url: str, message_buffer_size: int = 512, ssl_context: ssl.SSLContext = <factory>, semaphore_timeout_seconds: float = 120, ping_timeout_seconds: float = 60, auth_token: Union[str, NoneType] = None, generate_temp_token: Union[bool, NoneType] = False)[source]
class speechmatics.models.RTSpeakerDiarizationConfig(max_speakers: Optional[int] = None)[source]

Real-time mode: Speaker diarization config.

max_speakers: int = None

This enforces the maximum number of speakers allowed in a single audio stream.

class speechmatics.models.RTTranslationConfig(target_languages: Optional[List[str]] = None, enable_partials: bool = False)[source]

Real-time mode: Translation config.

enable_partials: bool = False

Indicates if partial translation, where sentences are produced immediately, is enabled.

class speechmatics.models.SRTOverrides(max_line_length: int = 37, max_lines: int = 2)[source]

Batch: Optional configuration for SRT output.

max_line_length: int = 37

Maximum count of characters per subtitle line including white space

max_lines: int = 2

Sets maximum count of lines in a subtitle section

class speechmatics.models.SentimentAnalysisConfig[source]

Sentiment Analysis config.

class speechmatics.models.ServerMessageType(value)[source]

Real-time: Defines various message types sent from server to client.

AddPartialTranscript = 'AddPartialTranscript'

Indicates a partial transcript, which is an incomplete transcript that is immediately produced and may change as more context becomes available.

AddPartialTranslation = 'AddPartialTranslation'

Indicates a partial translation, which is an incomplete translation that is immediately produced and may change as more context becomes available.

AddTranscript = 'AddTranscript'

Indicates the final transcript of a part of the audio.

AddTranslation = 'AddTranslation'

Indicates the final translation of a part of the audio.

AudioAdded = 'AudioAdded'

Server response to ClientMessageType.AddAudio, indicating that audio has been added successfully.

EndOfTranscript = 'EndOfTranscript'

Server response to ClientMessageType.EndOfStream, after the server has finished sending all AddTranscript messages.

Error = 'Error'

Indicates n generic error message.

Info = 'Info'

Indicates a generic info message.

RecognitionStarted = 'RecognitionStarted'

Server response to ClientMessageType.StartRecognition, acknowledging that a recognition session has started.

Warning = 'Warning'

Indicates a generic warning message.

class speechmatics.models.SummarizationConfig(content_type: Literal['informative', 'conversational', 'auto'] = 'auto', summary_length: Literal['brief', 'detailed'] = 'brief', summary_type: Literal['paragraphs', 'bullets'] = 'bullets')[source]

Defines summarization parameters.

content_type: Literal['informative', 'conversational', 'auto'] = 'auto'

Optional summarization content_type parameter.

summary_length: Literal['brief', 'detailed'] = 'brief'

Optional summarization summary_length parameter.

summary_type: Literal['paragraphs', 'bullets'] = 'bullets'

Optional summarization summary_type parameter.

class speechmatics.models.TopicDetectionConfig(topics: Optional[List[str]] = None)[source]

Defines topic detection parameters.

topics: List[str] = None

Optional list of topics for topic detection.

class speechmatics.models.TranscriptionConfig(language=None, **kwargs)[source]

Real-time: Defines transcription parameters. The .as_config() method removes translation_config and returns it wrapped into a Speechmatics json config.

ctrl: dict = None

Internal Speechmatics flag that allows to give special commands to the engine.

enable_partials: bool = None

Indicates if partials for both transcripts and translation, where words are produced immediately, is enabled.

enable_transcription_partials: bool = None

Indicates if partial transcripts, where words are produced immediately, is enabled.

enable_translation_partials: bool = None

Indicates if partial translation, where words are produced immediately, is enabled.

max_delay: float = None

Maximum acceptable delay.

max_delay_mode: str = None

Determines whether the threshold specified in max_delay can be exceeded if a potential entity is detected. Flexible means if a potential entity is detected, then the max_delay can be overriden until the end of that entity. Fixed means that max_delay specified ignores any potential entity that would not be completed within that threshold.

speaker_change_sensitivity: float = None

Sensitivity level for speaker change.

speaker_diarization_config: speechmatics.models.RTSpeakerDiarizationConfig = None

Configuration for speaker diarization.

streaming_mode: bool = None

Indicates if we run the engine in streaming mode, or regular RT mode.

translation_config: speechmatics.models.TranslationConfig = None

Optional configuration for translation.

class speechmatics.models.TranslationConfig(target_languages: Optional[List[str]] = None)[source]

Translation config.

target_languages: List[str] = None

Target languages for which translation should be produced.

class speechmatics.models.UsageMode(value)[source]

An enumeration.