ATTENTION! June 29, 2025: Google has fixed the speechSynthesis bug in Google Chrome browser.
speechSynthesis and coolTTS will now work with Google voices again! (Chrome Version 138.0.7204.50)
CoolTTS Demonstration Test Box
Introduction
Summary: CoolTTS combines TTS and SSML using pure JavaScript.
Terms: TTS = Text-to-speech
SSML = Speech Synthesis Markup Language
speechSynthesis = The JavaScript Interface built-in to many browsers for text to speech
Description: speechSynthesis has been part of many browsers for years.
The original W3 specification
said that speechSynthesis should work with SSML. However, to this day, no browsers have built
SSML into the speechSynthesis interface. "CoolTTS Javascript TTS Player" attempts to fix this
problem by providing simple JavaScript functions for TTS and SSML including the SSML
<break> tag
to add pauses to the speech. CoolTTS also supports the SSML <mark> tag.
The other problem with speechSynthesis is that each browser (or voice) doesn't
implement some of the basic features of the speechSynthesis interface or they have annoying bugs.
"CoolTTS Javascript TTS Player" attempts to fix these problems as well. Sadly, there isn't a lot you
can do with some of the speechSynthesis limitations of browsers. In addition,
updates to the browsers often introduce new bugs.
Features
JavaScript speechSynthesis
Microsoft Local Voices
Microsoft Online Voices (Edge for Desktop)
Google Voices (Chrome for Desktop)
iOS Voices
Android Voices
Avg Time between utterances
1100ms
900ms
800ms
500ms
500ms
Avg Time between sentences
1100ms
800ms
550ms
500ms
500ms
Word Boundary Event
✔
✔
✔
Sentence Boundary Event
✔
Pause Event
✔
✔
Resume Event
✔
✔
speechSynthesis.paused
✔
✔
SSML
❌
JavaScript speechSynthesis With CoolTTS
Microsoft Local Voices
Microsoft Online Voices (Edge for Desktop)
Google Voices (Chrome for Desktop)
iOS Voices
Android Voices
Avg Time between utterances
1100ms
900ms
800ms
500ms
500ms
Avg Time between sentences
1100ms
800ms
550ms
500ms
500ms
Word Boundary Event
✔
✔
✔
Sentence Boundary Event
✔
✔
✔
✔
✔
Pause Event
✔
✔
✔
✔
✔
Resume Event
✔
✔
✔
✔
✔
cooltts.paused
✔
✔
✔
✔
✔
SSML
✔
✔
✔
✔
✔
Pricing
You may use this website for free for testing of CoolTTS. Make sure that you test in different
browsers to see how speechSynthesis and CoolTTS has different voices and works differently in
each browser.
To use CoolTTS on your own website, for a limited time, you can download the JavaScript file for the price of a donation.
The suggested donation is $19.99 USD.
Download
Please thoroughly test CoolTTS using this web page in different browsers before downloading cooltts.js. Make sure
that you understand the limitations and differences of speechSynthesis in different browsers
and with different voices. CoolTTS uses ONLY the free voices that are built-in to the browser. It will not work with the
voices that are available with subscription TTS services.
Please make a donation to reveal the download link.
For support for CoolTTS, please leave a comment below.
How To Use
Upload cooltts.js to your server in the same folder as your html file.
Paste the following code in your html file to load cooltts.js:
In your html file, for every element that you want a CoolTTS Player to appear above it,
you must add a cooltts class to it. Example:
<div class="cooltts">
The player controls will not appear above the elements until most of the page is loaded.
So if the web page has a lot of external scripts, advertisements or images then it may take
a while for the player controls to appear. (*Note: On iOS devices speechSynthesis can stop
working on web pages with external resources such as Google Ads. It is best not to have
speechSynthesis or CoolTTS on websites with external resources like Google Ads.)
You can also send a string or an element directly to cooltts.play() . But because of
browser security policies it will probably not start playing speech unless there is a user
gesture or interaction that invokes it (a button click).
To send a string: cooltts.play("Hello world!"); To send an element: cooltts.play(document.getElementById("speech_div"));
Other CoolTTS controls:
cooltts.stop();
cooltts.pause();
cooltts.resume();
cooltts.rewind();
cooltts.fastforward();
CoolTTS also dispatches custom cooltts events that your web page can listen for with an EventListener:
document.addEventListener('cooltts', function() {console.log(event);}, false); See eventListener for more information about the 'cooltts' event.
<break>
The break tag can insert a pause in the speech to text. It can have one of two attributes:
strength or time.
strength can be none, x-weak, weak, medium, strong or x-strong.
time can be in seconds or milliseconds. Examples: time="250ms" or time="3s"
W3 Specification
Example:
<audio>
The audio tag can be used to play an audio file during text to speech. When the
audio tag is reached then Text-To-Speech will pause while the audio file plays. When the
audio file ends then Text-To-Speech should resume. Text in-between the audio open and
closing tag will be spoken if the audio file fails to play for some reason.
W3 Specification
Example of a possibly working audio file:
Example of missing audio file:
Captions
CoolTTS will display captions of the text-to-speech if the variable cooltts.captions=true Or to display captions you can add the cooltts_captions class to any element.
Example:
<emphasis>
The emphasis element requests that the contained text be spoken with emphasis or stress.
The optional level attribute indicates the strength of emphasis to be applied.
Defined values are "strong", "moderate", "none" and "reduced". The default level is "moderate".
W3 Specification
eventListener
CoolTTS dispatches custom cooltts events that you can listen for with an EventListener:
document.addEventListener('cooltts', function() {console.log(event);}, false);
The JavaScript speechSynthesis interface built-in to most browsers only dispatches events on
SpeechSynthesis utterances. (Except for the "onvoiceschanged" event)
A website maker has to add multiple event listeners to each of the utterances
that are sent to the speechSynthesis.speak() queue. It dispatches an event when each utterance starts
and ends, but it doesn't dispatch an event for the beginning and ending of the entire queue of utterances.
CoolTTS tries to solve that issue. Also, many of the better quality voices in most browsers do not
dispatch many of the utterance events.
statechange: CoolTTS sends an event with event.detail.type=="statechange"
when the TTS player changes state. The value of event.detail.state can be:
started, playing, paused, resumed, rewind, fastforward, ended, stopped. The variable cooltts.state can
also be checked at any time to see the current state of text-to-speech.
JavaScript Code:
<script>
document.addEventListener('cooltts', function() {
if (event.detail.type == "statechange") {
console.log(event);
console.log("state: "+event.detail.state // started, playing, paused, resumed, rewind, fastforward, ended, stopped.
+"node: ",event.detail.node // the text node
,event.target // the player controls
,event.detail.element); // the element being played
}
}, false);
</script>
start or end: Whenever a speechSynthesis utterance starts or ends SpeechSynthesisUtterance
sends a SpeechSynthesisEvent. CoolTTS also sends the event along with a "cooltts" event.
W3 Specification JavaScript Code:
<script>
document.addEventListener('cooltts', function() {
if (event.detail.type == "SpeechSynthesisEvent") {
console.log(event.detail.event); // SpeechSynthesisEvent properties
console.log("SpeechSynthesisEvent:"
+" type:"+event.detail.event.type // start or end
+", node:",event.detail.node // the text node
+", sentence:"+event.detail.sentence
+", sentenceIndex:"+event.detail.sentenceIndex
+", sentenceLength:"+event.detail.sentenceLength
,event.target // the player controls
,event.detail.element); // the element being played
}
}, false);
</script>
boundary: If you use the robotic sounding local Microsoft voices in Windows 11
in a browser (David, Mark and Zira), then the utterances dispatch "boundary" events for "sentence" and
"word" boundaries. In Google Chrome, Google voices do not dispatch a "boundary" event. In Microsoft Edge,
Microsoft Online (Natural) voices dispatch "boundary" events for "word" boundaries only. CoolTTS
doesn't have a way to make a "boundary" event for voices that don't support it. However, CoolTTS does
make a pseudo sentence "boundary" event for voices like Google voices and Microsoft Online (Natural voices.
CoolTTS divides speechSynthesis utterances into sentences. The "start" and "end" event for each utterance that
CoolTTS dispatches provides an event.detail.sentenceIndex and an event.detail.sentenceLength variable.
For voices that support the word "boundary" event, you can listen for the 'cooltts' event:
JavaScript Code:
<script>
document.addEventListener('cooltts', function() {
if (event.detail.type == "SpeechSynthesisEvent" && event.detail.event.type == "boundary") {
// Not all voices dispatch a boundary event
console.log(event.detail.event); // SpeechSynthesisEvent properties
console.log("name: "+event.detail.event.name // "word" or "sentence"
+", charIndex: "+event.detail.event.charIndex
+", charLength: "+event.detail.event.charLength
+", sentence:"+event.detail.sentence
+", sentenceIndex:"+event.detail.sentenceIndex
+", sentenceLength:"+event.detail.sentenceLength
+", node: ",event.detail.node // the text node
,event.detail.element); // the element being played
// To calculate the word position in the text node:
var word_start = event.detail.sentenceIndex + event.detail.event.charIndex;
var word_end = word_start + event.detail.event.charLength;
}
}, false);
</script>
Boundary event example:
Smiley face mouth speech movements
Smiley face mouth speech movements
JavaScript Code:
<script>
</script>
pause: JavaScript speechSynthesis has a "pause" event, however for most of the better quality
voices the "pause" event is never dispatched. Also speechSynthesis.paused
is often "false" in most browsers for most voices even when speechSynthesis is paused.
CoolTTS fixes this problem by dispatching a "paused" event whenever the speechSynthesis is paused. Also you can check
the variable cooltts.paused. If it is "true" then speechSynthesis is paused.
JavaScript Code:
<script>
document.addEventListener('cooltts', function() {
if (event.detail.type == "statechange" && event.detail.state == "paused") {
console.log(event);
console.log("node: ",event.detail.node // the text node
,event.target // the player controls
,event.detail.element); // the element being played
}
}, false);
</script>
resume: JavaScript speechSynthesis has a "resume" event, however for most of the better quality
voices the "resume" event is never dispatched. Also speechSynthesis.paused
is often "false" in most browsers for most voices even when speechSynthesis is paused.
CoolTTS fixes this problem by dispatching a "resumed" event whenever the speechSynthesis is resumed. Also you can check
the variable cooltts.paused. If it is "true" then speechSynthesis is paused.
JavaScript Code:
<script>
document.addEventListener('cooltts', function() {
if (event.detail.type == "statechange" && event.detail.state == "resumed") {
console.log(event);
console.log("node: ",event.detail.node // the text node
,event.target // the player controls
,event.detail.element); // the element being played
}
}, false);
</script>
error: JavaScript speechSynthesis dispatches an "error" event when there is an error. It also
dispatches an "error" event when SpeechSynthesis is stopped and it dispatches
error: "canceled" or "interrupted". CoolTTS also passes the error event along.
W3 Specification JavaScript Code:
<script>
document.addEventListener('cooltts', function() {
if (event.detail.type == "SpeechSynthesisErrorEvent") {
console.log(event.detail.event); // The SpeechSynthesisErrorEvent properties
console.log("error: "+event.detail.event.error // The type of error
+", node: ",event.detail.node // the text node
,event.detail.element); // the element being played
}
}, false);
</script>
mark:
The W3 Specification
for the JavaScript speechSynthesis interface says that a mark event should be fired when a mark tag
is reached. Also speechSynthesisUtterance has an "onmark" event listener. But it is apparently never fired
presumably because none of the browsers ever integrated SSML into the speechSynthesis interface. CoolTTS
attempts to fix that by providing a "mark" event. See <mark> for how to use it.
Hidden elements
CoolTTS will speak elements that are not visible to the user such as elements with CSS display:none
or visibility:hidden. If you do not want CoolTTS to speak these elements then you can add
class="cooltts_skip" to the element.
Example:
The element for the player below has CSS display:none;
Example:
The element for the player below has CSS visibility:hidden;
xml:lang
xml:lang is a defined attribute for the speak, lang, desc, p, s, token, and w elements.
It accepts a 2 letter language code and an optional 2 letter country code. CoolTTS will add
the value of the xml:lang attribute to the utterance being spoken by the speechSynthesis interface.
This may change the voice that is being used. However, if a Microsoft Online Multilingual voice is selected then the
same voice will likely be used.
W3 Specification
Example:
<mark>
The mark tag can be used to place a mark for an event that you want to happen at that mark.
Add an event listener for the mark. Each mark should have a name attribute, such as:
<mark name="my_mark">
According to
https://developer.mozilla.org/en-US/docs/Web/API/SpeechSynthesisUtterance/mark_event browsers are
supposed to have a built-in "mark" eventListener for speechSynthesis utterances. However, it appears
that the event is never fired in browsers because they do not correctly parse SSML. Therefore, CoolTTS
has its own method for parsing mark tags and using an event listener for the "cooltts" event.
Sadly, with JavaScript speechSynthesis in the popular browsers
there is a slight pause when a mark tag is reached because one utterance ends and another begins and
the popular browsers (Google Chrome and Microsoft Edge) and quality voices have a slight pause between utterances.
So if the mark tag is in the middle of a sentence it will sound strange.
You can listen for a mark event by adding an event listener for the "cooltts" event:
document.addEventListener('cooltts', function() {console.log(event);}, false);
Mark event properties:
event.target
The player controls of the event
event.detail.type
mark
event.detail.name
The name given in the name attribute of the mark tag
A p element represents a paragraph. An s element represents a sentence. Both elements
can have a lang attribute.
<s>
An s element represents a sentence. s elements can have a lang attribute. If you
use an <s> tag in a standard html document then it will be parsed by the html browser as a strikethrough
tag. To stop that from happening you may want to add style="text-decoration: inherit;" to the tag.
<phoneme>
The phoneme element provides a phonemic/phonetic pronunciation for the contained text.
Unfortunately, there seems to be no good method for doing phoneme pronunciations with the JavaScript
speechSynthesis interface in today's browsers. It would always sound jerky and inaccurate. Therefore,
the phoneme tag will probably never be part of CoolTTS.
W3 specification
<prosody>
The prosody element permits control of the pitch, speaking rate and volume of the
speech output. All of the attributes are optional. Because of limitations in how browsers implement the
speechSynthesis interface it is impossible to follow completely the
W3 specification for prosody.
prosody attributes:
pitch
Adjust the pitch of the voice. The value can be a number from 0 to 2. Where 1 is the default value.
(Note: Google voices only go as low as 0.1. Also Note: Microsoft Natural voices do not have the ability to
change the pitch at all.) Or the value can be: "x-low", "low", "medium", "high", "x-high", or "default".
rate
Adjust the rate or speed of the speech. That value can be a range from 0.1 to 2. Where 1 is the default value.
(Note: Microsoft local voices or non-Natural voices can have a value from 0.1 to 10.)
Or rate can be a percetage from 10% (slowest) to 200% (fastest). 100% is default rate. Or rate can be:
"x-slow", "slow", "medium", "fast", "x-fast", or "default".
volume
Adjust the volume of the speech. speechSynthesis in browsers only allows for a volume from
0 (lowest) to 1 (highest). The default value is 1.
Or the value for volume can be:
"silent", "x-soft", "soft", "medium", "loud", "x-loud", or "default".
pitch
rate
volume
<say-as>
The say-as element allows the author to indicate information on the type of text construct contained
within the element and to help specify the level of detail for rendering the contained text.
W3 Specification
The interpret-as attribute supports the following values:
verbatim or spell-out
digits
date
cardinal
ordinal
time
Skip element
CoolTTS will speak elements that are not visible to the user such as elements with CSS display:none
or visibility:hidden. If you do not want CoolTTS to speak these elements then you can add
class="cooltts_skip" to the element.
Example:
This is a feature of CoolTTS, not SSML. You can do something similar in SSML using the sub tag and
alias="".
Example:
<sub>
The sub element is employed to indicate that the text in the alias attribute value substitutes
the contained text for pronunciation. This allows a document to contain both a spoken and written form.
The REQUIRED alias attribute specifies the string to be spoken instead of the enclosed string.
W3 Specification
Note: If you use the <sub> tag in an HTML document then the HTML browser will treat the tag
as a subscript html tag which can have the undesirable effect of changing the text to be smaller and lower
than the surrounding text. To prevent that from happening you may want to use to add:
style="font-size: inherit; vertical-align: inherit; to the element.
Example:
<voice>
The voice element allows you to attempt to change the voice by any combination of name,
gender, age or language attributes.
W3 Specification
name
For the name attribute you can put any name that is available for TTS in the browser that is
being used. CoolTTS allows you to do partial matching and use | OR operator. For example, if you
put name="Brian|Male" then if the user is using Micrsoft Edge browser then it will probably choose
the voice: "Microsoft Brian Online (Natural) - English (United States)". If the user is using Google Chrome
browser then it will choose a voice with the word "Male" in it. It would most likely be "Google UK English Male".
gender
Options for the gender attribute are "male", "female", "neutral", or the empty string "".
Unfortunately, the JavaScript speechSynthesis interface in modern browsers doesn't have a method for gender.
Google Chrome's "Google" voices mostly have only one gender for each language and the gender is female.
The two exceptions are Spanish and UK English which both have male and female voices. CoolTTS will attempt to
make the other languages and dialects sound "male" if gender="male" is included in the voice tag by changing
their pitch. Microsoft Online Natural voices cannot change pitch so we need to choose a male or female voice.
But Microsoft does not specify if a voice is male or female for each language. Instead they put different
gender names. Instead of making a long list of female and male names for Micrsoft Edge browser, CoolTTS will
choose one of the Multilingual male or female voices.
age
The JavaScript speechSynthesis interface does not have a method for age. In Google Chrome the pitch of the
voice can be changed to a higher level to try and mimic a child. The pitch can also be lowered in Chrome and the
rate and can slowed a bit to try and mimic an older person. Microsoft Online Natural voices do not have the ability
to change pitch. So in Microsoft Edge browser to mimic an older person the only thing that can be done is to slow
down the rate a little bit. Microsoft Edge includes at least 3 child voices at the moment: Maisie (en-GB),
Ana (en-US), Eloise (fr-FR). CoolTTS will pick one of those voices for an age specification of 12 or under.
language
The language attribute accepts a 2 letter language code and an optional 2 letter country code, such as "en-US"
CoolTTS will add the value of the language attribute to the utterance being spoken by the speechSynthesis interface.
This may change the voice that is being used. However, if a Microsoft Multilingual voice (Microsoft Edge) is selected then the
same voice will likely be used.
Browser Limitations of JavaScript speechSynthesis
Desktop and Laptop Computers
It seems that the only two browsers have put any effort into the JavaScript speechSynthesis interface:
Google Chrome and Microsoft Edge browsers. Edge has put a little more effort into the programming
and has a nice selection of voices. Other Chromium browsers (Opera, Brave, Vivaldi) and Firefox have not put much
effort into speechSynthesis. They MIGHT have a few older, robotic sounding voices available that come with
the operating system. In Windows, they might have older Microsoft voices such as Microsoft David, Mark and Zira.
Please do not expect CoolTTS to work well with these browsers. If users want a good sounding Text-to-speech
interface for free then they need to use either Google Chrome or Microsoft Edge. Note that JavaScript speechSynthesis
in every popular browser can only play one utterance at a time. So if a new utterance is started, even in a different
tab and with a different website, then the first utterance will stop playing.
Events:
Microsoft local voices are available in most Windows browsers. Microsoft local voices are usually
low quality but they dispatch more events than other voices including Google voices. They dispatch "pause",
"resume", and word and sentence "boundary" events.
Google Chrome
Events:
Google Chrome has a few high quality Google voices in various languages. But none of the Google voices
fire a "boundary" event. Utterances using
Google voices also never fire a "pause" or "resume" event. But CoolTTS attempts to solve this problem by dispatching
a "paused" and "resumed" event as well as many other events. When utterances using Google voices
are paused speechSynthesis.paused still stays false. CoolTTS provides a more accurate coolTTS.paused
variable.
Pause Bug:
Google voices don't resume if speechSynthesis.pause() is invoked in the middle of an utterance
and then resume() is invoked more than 15 seconds later.
Google voices also sometimes ignore pause() if the pause is invoked close to the end and start of utterances.
CoolTTS attempts to solve these issues by using its own queue instead of the speechSynthesis.speak() queue and
by pausing and resuming based on sentences rather than in the middle of words.
Time limit Bug:
Another issue with Google voices is that they will stop and get stuck after about 14 seconds of text to speech.
CoolTTS attempts to solve this problem by calling speechSynthesis.pause() and speechSynthesis.resume() every 12
seconds. However, there is a slight stutter when those commands are called. If Chrome gets stuck then calling
speechSynthesis.speak() with a new utterance does nothing unless speechSynthesis.cancel() is called first.
Blank Utterance Bug:
If an utterance is blank with a Google voice then Chrome never fires the "start" event on the utterance. Instead
it goes right to the "end" event for the utterance.
Pitch:
You are able to change the pitch, rate and volume on Google voices.
New Lines:
A nice feature of Google voices is that they ignore new lines (\n) in the middle of sentences
and speak them as the same sentence. New lines in HTML are generally not visible but are treated as a space, so it
makes sense that Google voices deal with them this way. However, if the new line starts with a capital letter then
Google voices treat them as a different sentence. This also makes sense.
Microsoft Edge
Events:
Microsoft Edge browser has dozens of high quality Online Natural voices in many languages. The Microsoft Online
Natural voices fire a "boundary" event for "word" boundaries, but not for "sentence" boundaries.
Utterances using Microsoft Natural voices also never fire a "pause" or "resume" event. But CoolTTS attempts to solve
this problem by dispatching a "paused" and "resumed" event as well as many other events.
When utterances using Microsoft Online Natural voices
are paused speechSynthesis.paused still stays false. CoolTTS provides a more accurate coolTTS.paused
variable.
Pitch:
Another issue with Microsoft Online Natural voices is that the pitch never changes even when setting
the pitch variable on the utterance. Setting the volume and rate variables does work though.
New Lines:
Both Microsoft Local voices and Microsoft Online voices treat new lines (\n) as a new sentence. HTML usually
doesn't display new lines but treats them as spaces or white space. When Microsoft voices treat new lines as
new sentences then it can cause unwanted pausing in the middle of a sentence. CoolTTS attempts to solve this
potential problem.
Mozilla Firefox
Mozilla Firefox might have some older, robotic sounding voices that come with the operating system.
It will probably be using Microsoft local voices in Windows. English might be the only language available. The JavaScript
speechSynthesis interface works for the most part in Firefox browser, however, if there is more than one
utterance in the speechSynthesis.speak() queue then it ignores speechSynthesis.pause(). Therefore, the
<break> tag doesn't work. To compensate for that issue, CoolTTS will change the cooltts.use_cooltts_queue
variable to true and then then <break> tag works.
That causes coolTTS to use its own queue instead of the speechSynthesis.speak() queue.
Mobile Devices
Mobile device browsers on iPhones, iPads and Android devices are not very good at speechSynthesis either.
The mobile browsers usually don't have the same quality voices as their desktop browser counterparts.
Volume:
Often times users of mobile devices turn their media volume all the way down at some point when
browsing the Internet. Then if they press play on a web page with speechSynthesis they don't hear any audio. They
usually end up pausing or stopping the speechSynthesis and then they try to press the volume up button on the side of
the device. But with the media paused the volume up button is only changing the ringer volume. The user has to
figure out how to press the play button and then press the volume up button on the side of the device WHILE the
media is playing. So it can be difficult for some users of mobile browsers to figure out how to hear the audio
from TTS speechSynthesis. There are no methods in JavaScript for detecting the volume level of a mobile device.
iOS:
Mute: If an Apple mobile device is soft muted (muted with the button on the side of the device) then JavaScript
speechSynthesis will be silent. There is no indication to the user that their device is muted. There is also
no JavaScript method for detecting if a device is muted. So a user may have a difficult time figuring out how
to hear speechSynthesis on an iOS device like an iPhone or iPad.
Voices: It seems that every browser for iOS is just a skin on top of a Safari engine. So speechSynthesis on iOS
will probably run the same in every browser on iPhone/iPad. Starting with around iOS 16 Apple installed a
lot of voices for JavaScript speechSynthesis. But half of them seem to be some kind of joke.
They are robotic sounding and have strange sound effects or instruments with the voice playback.
("Sandy", "Shelley", "Grandma", "Grandpa", "Eddy", "Reed", "Anna", "Rocko", "Flo", "Bahh", "Albert", "Jester", "Organ", "Cellos", "Zarvox", "Superstar", "Bells", "Trinoids", "Kathy", "Boing", "Whisper", "Good News", "Wobble", "Bad News", "Bubbles", "Junior", "Ralph")
It has been decided to filter these voices out of CoolTTS voice options. What is left over are some ok
quality voices for some languages, however, they seem to list duplicates of the same voices many times.
External resources/Ads: Another issue with iOS devices is that they seem to not fire any events
on a SpeechSynthesis utterance (except for an error event if the utterance is canceled) if the web page is
large with a lot of information. If that is the case, then iOS will probably only play one sentence with
CoolTTS and then pause. Many websites with external resources or advertisements like Google Ads often
mess up the JavaScript speechSynthesis so that it does not work properly. So it is best to use JavaScript
speechSynthesis on a web page without ads and external resources for iOS devices.
lang: Changing the "lang" attribute for an utterance in iOS does not change the voice like it does with
other browsers. In iOS you have to choose a voice to change the language.
Events:
iOS voices dispatch "pause" and "resume" events. iOS voices also dispatch "boundary" events,
but only for word boundaries, not sentence boundaries. speechSynthesis.paused correctly reports "true" when
speechSynthesis.pause() is invoked.
New Lines:
iOS voices seem to ignore new lines (\n) in the middle of sentences
and speak them as the same sentence. New lines in HTML are generally not visible but are treated as a space, so it
makes sense that iOS voices deal with them this way. However, even if the new line starts with a capital letter then
iOS voices still treats them as the same sentence. This may make two different sentences sound like one. CoolTTS
attempts to solve this issue.
Android:
Events:
Android does not dispatch "pause" or "resume" events. speechSynthesis.paused always reports "false" even after
speechSynthesis.pause() is invoked. Android voices also do not dispatch "boundary" events.
Pause Bug:
On some Android devices speechSynthesis.resume() does not work after speechSynthesis.pause() and the speechSynthesis
utterance remains paused indefinitely. CoolTTS attempts to solve this issue.
New Lines:
Android voices seem to treat new lines (\n) as a new sentence. HTML usually
doesn't display new lines but treats them as spaces or white space. When Android voices treat new lines as
new sentences then it can cause unwanted pausing in the middle of a sentence. CoolTTS attempts to solve this
potential problem.
Future Development
If there is enough interest for this project then I will continue to work on it, fixing bugs
and possibly adding new features.
There are no plans to make this script work with subscription TTS services. Those services have their own methods
for processing JavaScript voices and SSML using their own APIs. Those subscription services can get expensive. The
point of this project is to use the JavaScript speechSynthesis interface built-in to many modern day browsers and to
provide a method to use it with SSML.
For support questions, please leave a comment below.
History
6/3/2025 - Version 1.1 - Improved applying settings changes while playing or paused.
5/31/2025 - Version 1.0 - CoolTTS JavaScript TTS Player created.
Last updated on June 30, 2025
Created on January 21, 2025