About SmartSpeech | Documentation SmartMarket
Skip to main content

About SmartSpeech


English version of the documentation is under development.

You can read the Russian version.

SmartSpeech is a service for speech recognition and synthesis for Salute virtual assistants. Using this service, you can:

  • Transform your speech into text and back, all in real time.
  • Apply deep learning across the speech database.
  • Apply all functions of the service to phone communications.

Use cases:

  • Speech synthesis for chats, guides, and product descriptions. Users of an app or website can both watch and listen to the content. Use speech synthesis for chats, guides, and product descriptions.
  • Voice input for texts. Customers use voice input, when they face writing difficulties. Integrate speech recognition into chats, search or navigation.
  • Interactive menu and auto attendant. You can use speech synthesis and recognition to create an interactive voice response (IVR) or auto attendant — thus optimizing the call center operations.
  • Telemarketing. The SmartSpeech technologies help to exclude human operators — you can make your telemarketing more efficient.

When using SmartSpeech, the maximum load handled by the server should not exceed 10 simultaneous streams at a given time.


To use the service, you need to be either registered as a sole trader or legal entity and pass the moderation.

To start using the service, read the Terms of Use section.


Speech synthesis includes:

  • API for speech synthesis
  • How to create an audio out of a text
  • Speech synthesis tagging


Speech recognition includes:

  • API for speech recognition:
    • Synchronous recognition (HTTP)
    • Stream recognition (gRPC)
    • Asynchronous recognition (HTTP and gRPC)
  • How to get a text out of an audio
  • How to improve recognition outcomes

SmartSpeech team contacts

Please send your questions to: SmartSpeech@sberbank.ru.