Tag Archives: Speech

Text-To-Speech in PowerShell

If you’re like me, you’ve used the Narrator, built into Windows, to read something to you that you’ve written. If you’re not, then you may find this whole thing, strange. But yes, for me, I’ll occasionally do this to ensure I didn’t miss a word or two, or that I didn’t use the wrong tense. Or maybe, I just wanted to hear how something sounds when it isn’t me saying it out loud, and instead, it’s one of my computer’s robot voices. So yeah, I do that sometimes.

I’m growing tired of opening Narrator, so I’ve decided I should make a PowerShell function to do the text-to-speech for me. My biggest complaint is that the Narrator tells me way too much information before it actually starts doing what I expect it to do, and yes I do understand the Narrator wasn’t included in the operating system for someone such as myself. Still, sometimes, depending on something’s importance, I prefer to have what I’ve written, read to me, to help ensure my accuracy. It’s a final safety (I’d-prefer-to-not-sound-like-an-idiot for this piece of writing) check.

I’ve had some exposure with System.Speech. But, what was that for? Ah yes, I remember. It was for the Convert-TMNatoAlphabet function. I have copied it here from its previous home on the Microsoft TechNet Gallery. This function would do exactly as the examples below indicate. It takes a string made up of numbers and letters and writes it out in the Nato phonetic alphabet. It’s an old function, so I’ve included an old image. You can’t tell in the image examples, but at some point, I added a Speak parameter. If that was included, as -Speak, not only would it display the result on the screen, but it would say them out loud, as well.

The end goal now is to write a function that I’ll keep around to read things back to me without opening the Narrator. I provide some text, and PowerShell provides me a robot voice to help me recognize if I’ve missed any words while having something read to me. I think I’ll pause here for now and go write the code. Yeah, that’s right, that hasn’t actually been done yet. So for you, back in split second, and for me it’ll be some time longer.

This got a touch more confusing than I had hoped it would, but it’s nothing you won’t be able to comprehend. We need to remember that we’re operating in a PowerShell world now — not a Windows PowerShell world. That means that we’re going to assume we’re using something greater than version 5.1, but that 5.1 is available to us (since I’m running Windows 10). We’re going to use powershell.exe (5.1 and earlier) from pwsh.exe (6.0 and later). Let’s take a look at the below-completed function in order to understand what it’s doing and then let’s test it out. One final note before we do that, however. While you may have no interest in having anything spoken by your computer, the concepts in this example may prove to be helpful if you ever need to use Windows PowerShell, from PowerShell.

Function Convert-TextToSpeech {
    [CmdletBinding()]
    Param (
        [Parameter(Mandatory)]
        [string]$Text,
        [Parameter()]
        [ValidateSet(1,2,3,4,5,6,7,8,9,10)]
        [int]$SpeechSpeed = 5
    ) # End Param.

    Begin {
        Function ConvertTextToSpeech {
            [CmdletBinding()]Param (
                [Parameter()]$Text,
                [Parameter()]$SpeechSpeed
            ) # End Param.
            Add-Type -AssemblyName System.Speech
            $VoiceEngine = New-Object System.Speech.Synthesis.SpeechSynthesizer
            $VoiceEngine.Rate = $SpeechSpeed - 2
            $VoiceEngine.Speak($Text)
        } # End Function: ConvertTextToSpeech.
    } # End Begin.

    Process {
        $Session = New-PSSession -Name WinPSCompatSession -UseWindowsPowerShell
        Invoke-Command -Session $Session -ScriptBlock ${Function:ConvertTextToSpeech} -ArgumentList $Text,$SpeechSpeed
    } # End Process.

    End {
        Remove-PSSession -Name WinPSCompatSession
    } # End End.
} # End Function: Convert-TextToSpeech.

The entire code example is within a single function, which is both expected and preferred. Copy, paste, run the code, and the Convert-TextToSpeech function becomes available to be invoked within your current PowerShell session. This function does several things. The first thing to notice is that the function includes a parameter named Text, which expects a value is included for that parameter when the function is invoked. This is the text to be read out loud. It also includes a parameter named SpeechSpeed that accepts a value from 1 through 10. It includes a default value of 5, which is what we’re going to use for the normal/standard/default speed. Therefore, anything less than 5 is slower than normal and anything greater than 5 would be faster.

In the Begin block, the function creates a nested function. It uses the same name as our main function; however, it removes the dash from in between the words “Convert” and “Text,” making it a completely different function. We could have used the same name since these functions would be scoped differently, however, I thought that would probably end up being even more confusing than this already may be.

Notice that the nested function includes the identical Text parameter as our parent function. The parameter value that comes in when the parent function is invoked will be passed along to our nested function’s Text parameter. We will see that shortly. It also includes a SpeechSpeed parameter, as well. This will also be passed from the parent function to this child function. There are not as many rules about what these parameters will accept in the child function. That’s because the validation features and variable type enforcements are already requirements in the parent function. Therefore, they don’t need to be in place for this nested function, too.

That nested function’s purpose is to load what’s necessary to actually do the text-to-speech conversion before doing it. This is code that cannot be executed using .NET Core (cross-platform). It can only be executed by the .NET Framework (Windows). This is why we need Windows PowerShell. If I later determine that’s not entirely true, I’ll be sure to update this post. While we’re looking at this nested function though, do notice where we set the Rate. This is the SpeechSpeed, which I’m beginning to figure out, isn’t easy to say out loud. Meh. As it was mentioned earlier, the values the function allows for are 1 through 10 where 1 is the slowest and 10 is the fastest. You’ll notice that in the nested function we subtract two from that value. That’s because three is actually the normal speed.

The Process block only includes two lines, however, they are both quite important. The first line creates a new PowerShell Remoting session. The remoting session will not be to a remote or different computer, but instead will stay on the same computer and make use of Windows PowerShell, or powershell.exe. It will create a new “remote” Windows PowerShell session where it can run powershell.exe and make use of the .NET Framework. The nested function will be “taken” into the remote session using Invoke-Command along with the two parameter values passed into the parent function. After the text is read in the “remote” Windows PowerShell session, it will return to this function where the code will progress into the End block. From here, our function will remove the PS Remoting session and the function invocation will end.

In order to experience the below example, copy the above function, paste it into the ConsoleHost or VSCode and run it so that the function is added to the session/memory. Then, invoke the function by running your own example, or even the one below. The one below will start out slowly and gradually get faster. I’ve included the audio of the function being invoked beneath the example code. Watch your use of quotes/double quotes in your text. It could break things.

1..10 | ForEach-Object {Convert-TextToSpeech -Text "This is testing $_" -SpeechSpeed $_}

Again, you may have no interest in a function that can read text out loud, but one day you may have a need to run something inside Windows PowerShell from (just) PowerShell.

Edit: If you have a large block of text, you might rather store it in a text file and use a Get-Content command to read in the file’s contents, before it’s used as the parameter value for the Text parameter. Something like the below example.

Convert-TextToSpeech -Text (Get-Content .\Desktop\file.txt)