Adjusting Speech
SSML stands for Speech Synthesis Mark Up Language. A markup language is a computer language that is used to annotate text documents to describe the structure, presentation, and semantics of a document. SSML elements are used to adjust the voice, style, prosody, volume, and other characteristics of a script.
Document Structure
An SSML document is constructed using SSML elements, also known as tags. These elements enable customization of various aspects of speech such as tone, style, pitch, prosody, volume, and others.
Here's an example of SSML in action, to demonstrate the basic structure and syntax:

Supported Voices
Speech controls are currently limited to voices indicated by the circular Pipio logo. We will be adding support to additional voices in the near future.

<break>
Add a break/pause
Attribute
Description
Required
The absolute duration of a pause in seconds (such as 2s) or milliseconds (such as 500ms). Valid values range from 0 to 5000 milliseconds. If you set a value greater than the supported maximum, the service will use 5000ms. If the time attribute is set, the strength attribute is ignored.
Break Examples


<prosody>
Customize the pitch and speaking rate of text contained by the element. Currently the rate and pitch attributes are supported.
<pitch>
This is the baseline pitch for the contained text.
Attribute
Description
A number followed by Hz which represents the adjustment in pitch by hertz. For example, 20Hz
A percentage, e.g. 10%, +15.2%, or -8%. The "-" and "+" signs are optional. However, if you're looking to lower the pitch, the "-" sign is required.
Pitch Examples


<rate>
The change in the speaking rate for the contained text
Attribute
Description
A percentage, e.g. -50% or +200%.
A value of 100% means no change in speaking rate
A value of 200% means a speaking rate twice the default rate
A value of -50% means a speaking rate of half the default rate.
The default rate for a voice depends on the language and dialect and on the personality of the voice.
Rate Example
