Introduction
Powershell scripting has never been my favourite area to work in. Coming from a background of C# and C++, I've always found Powershell to be a bit hacky, not very rigorous and quite time consuming to write and test. Recently, I had need to multi thread some long running Powershell scripts and the results I got as well as the processes and frameworks I used to achieve them have completely changed my opinion of Powershell.
I was able to get a process that previously took up to one hour, to complete in less than 2 minutes. What I found was a system that is intuitive, robust and scales incredibly well. I never thought I'd see these kind of results using Powershell but I have been very happily surprised.
Background
I want to present a series of examples to demonstrate the main features of multi threaded Powershell. Some experience of Powershell as well an understanding of multi thread programming in general is assumed. All scripts were developed using Powershell 3. I have not tested them in any other version.
When I first started out on this work, I came up with three possible approaches to multi threading my Powershell scripts.
PowerShell Background Jobs
From MSDN - Cmdlets can perform their action internally or as a Windows PowerShell background job. When a cmdlet runs as a background job, the work is done asynchronously in its own thread separate from the pipeline thread that the cmdlet is using. From the user perspective, when a cmdlet runs as a background job, the command prompt returns immediately even if the job takes an extended amount of time to complete, and the user can continue without interruption while the job runs.
Powershell jobs are quite a high level construct - as such, there is limited control at the low level and limited ability to manage multiple threads and have them share variables, etc.
PowerShell Workflows
PowerShell workflows are a new concept in PowerShell based on the Windows Workflow Foundation engine. They support parallel processing out of the box. Again, like Jobs, they are quite high level and the amount of control they give us is limited. From Technet
PowerShell Runspaces
.NET provides the System.Management.Automation.Runspaces namespace that gives us access to a set of classes designed to create, manipulate and orchestrate a pool of Powershell processes. This forms the basis of multi threading your Powershell scripts.
.NET Task Parallel Library
I had the idea to try to directly leverage the TPL from within Powershell and effectively tackle the problem in exactly the same way as one would if writing multi threaded code in .NET, e.g., instantiating Task objects, etc.
Background jobs and workflows didn't provide me with enough control so I quickly dismissed them. My preference was to use the TPL but I quickly found that things didn't quite work. Although we can write .NET code directly from within Powershell, that doesn't mean we should try to follow the same patterns in both. They are both markedly different and at the thread level I found that trying to instantiate and manipulate threads from within a Powershell script was a recipe for disaster. That left me using the System.Management.Automation.Runspace
namespace and the results were quite pleasing.
Examples
A First Simple Example
This is our first demonstration of a multi threaded Powershell script. We create 50 local text files by downloading a file from the web. We do it first sequentially and then in parallel and compare the results.
The sequential code should be self explanatory. When executing the process in parallel, the first step is to create a RunspacePool which hosts one or more Runspaces. A Runspace is an independent operating environment in which a Powershell process can run. For the purposes of our example, we can think of it as a thread. The RunspacePool
will allow many Powershell processes to run concurrently and acts like our thread pool. We instantiate an instance of a PowerShell
class and define what command(s) it will run and then execute it asynchronously using BeginInvoke.
We also introduce the idea of SessionState which allows us to share variables and more across all our Runspaces
in the RunspacePool
.
Try tweaking the value of $numThreads
to see how using more or less threads affects performance on your system.
cls
$scriptPath = split-path -parent $MyInvocation.MyCommand.Definition
$folderLocation = [System.IO.Path]::Combine($scriptPath, "PowerShellMultiThreading_SimpleExample")
if (Test-Path $folderLocation)
{
Remove-Item $folderLocation -Recurse -Force
}
New-Item -Path $folderLocation -ItemType directory -Force > $null
# This script block will download a file from the web and create a local version
$ScriptBlock = {
Param (
[string]$fileName,
[string]$url
)
$contents = Invoke-WebRequest $url -UseBasicParsing
Set-Content $fileName $myString # use a common variable
Add-Content $fileName $contents # add the text download from the www
}
####################### Run the process sequentially ############################
Write-Host "First lets create the 50 text files by running the process sequentially"
$startTime = Get-Date
$myString = "this is not session state"
1..50 | % {
$fileName = "test$_.txt"
$fileName = [System.IO.Path]::Combine($folderLocation, $fileName)
Invoke-Command -ScriptBlock $ScriptBlock -ArgumentList $fileName,
"http://www.textfiles.com/100/adventur.txt"
}
$endTime = Get-Date
$totalSeconds = "{0:N4}" -f ($endTime-$startTime).TotalSeconds
Write-Host "All files created in $totalSeconds seconds"
####################### Run the process in parallel ############################
Write-Host ""
$numThreads = 5
Write-Host "Now lets try creating 50 files by running up $numThreads background threads"
Remove-Item $folderLocation -Recurse -Force
New-Item -Path $folderLocation -ItemType directory -Force > $null
# Create session state
$myString = "this is session state!"
$sessionState = [System.Management.Automation.Runspaces.InitialSessionState]::CreateDefault()
$sessionstate.Variables.Add((New-Object
-TypeName System.Management.Automation.Runspaces.SessionStateVariableEntry
-ArgumentList "myString" ,$myString, "example string"))
# Create runspace pool consisting of $numThreads runspaces
$RunspacePool = [RunspaceFactory]::CreateRunspacePool(1, 5, $sessionState, $Host)
$RunspacePool.Open()
$startTime = Get-Date
$Jobs = @()
1..50 | % {
$fileName = "test$_.txt"
$fileName = [System.IO.Path]::Combine($folderLocation, $fileName)
$Job = [powershell]::Create().AddScript($ScriptBlock).AddParameter("fileName",
$fileName).AddParameter("url", "http://www.textfiles.com/100/adventur.txt")
$Job.RunspacePool = $RunspacePool
$Jobs += New-Object PSObject -Property @{
RunNum = $_
Job = $Job
Result = $Job.BeginInvoke()
}
}
Write-Host "Waiting.." -NoNewline
Do {
Write-Host "." -NoNewline
Start-Sleep -Seconds 1
} While ( $Jobs.Result.IsCompleted -contains $false) #Jobs.Result is a collection
$endTime = Get-Date
$totalSeconds = "{0:N4}" -f ($endTime-$startTime).TotalSeconds
Write-Host "All files created in $totalSeconds seconds"
The output I get looks like this. Approximately twice as fast using 5 threads.
First lets create the 50 text files by running the process sequentially
All files created in 9.0655 seconds
Now lets try creating 50 files by running up 5 background threads
Waiting......All files created in 5.4018 seconds
Locking Example
Locking is a standard technique in multithreaded programming that ensures only one thread at a time can access a shared resource (like a variable). This ensures consistent results and eliminates any contention.
In this example, we spin up 100 background processes that all try to update the same text file. Without any locking, many threads will fail as they will fail to achieve an exclusive lock on the file in question. With locking, we ensure only one thread can update the file at a time.
I am using the LockObject powershell module by David Wyatt here to implement the locking.
cls
$scriptPath = split-path -parent $MyInvocation.MyCommand.Definition
$folderLocation = [System.IO.Path]::Combine($scriptPath, "PowerShellMultiThreading_LockExample")
if (Test-Path $folderLocation)
{
Remove-Item $folderLocation -Recurse -Force
}
New-Item -Path $folderLocation -ItemType directory -Force > $null
$summaryFile = [System.IO.Path]::Combine($folderLocation, "summaryfile.txt")
# Create session state, load in the locking script and create a shared variable $summaryFile
$sessionstate = [System.Management.Automation.Runspaces.InitialSessionState]::CreateDefault()
$sessionstate.ImportPSModule("$scriptPath\LockObject.psm1")
$sessionstate.Variables.Add(
(New-Object System.Management.Automation.Runspaces.SessionStateVariableEntry
(
)
$runspacepool = [runspacefactory]::CreateRunspacePool(1, 100, $sessionstate, $Host)
$runspacepool.Open()
$ScriptBlock_NoLocking = {
Param (
[int]$RunNumber = 0
)
try
{
Add-Content $summaryFile $RunNumber -ErrorAction stop
}
Catch [System.Exception]
{
Write-Host $_.Exception.ToString()
}
}
$ScriptBlock_Locking = {
Param (
[int]$RunNumber = 0
)
try
{
lock ($summaryFile) {
Add-Content $summaryFile $RunNumber -ErrorAction stop
}
}
Catch [System.Exception]
{
Write-Host $_.Exception.ToString()
}
}
Write-Host "Try to update summaryFile with no locking - we are going to see a lot of exceptions"
New-Item -Path $summaryFile -ItemType file -Force > $null
$Jobs = @()
1..100 | % {
$Job = [powershell]::Create().AddScript($ScriptBlock_NoLocking).AddArgument($_)
$Job.RunspacePool = $runspacepool
$Jobs += New-Object PSObject -Property @{
RunNum = $_
Job = $Job
Result = $Job.BeginInvoke()
}
}
Write-Host "Waiting.." -NoNewline
Do {
Write-Host "." -NoNewline
Start-Sleep -Seconds 1
} While ( $Jobs.Result.IsCompleted -contains $false)
$contents = Get-Content $summaryFile
$numEntries = $contents.count
Write-Host ""
Write-Host "Update complete: summaryFile contains $numEntries entries, should contain 100"
Write-Host ""
Write-Host ""
Write-Host "Try to update summaryFile with locking -
should work correctly and summaryFile should be updated with all 100 entries"
New-Item -Path $summaryFile -ItemType file -Force > $null
$Jobs = @()
1..100 | % {
$Job = [powershell]::Create().AddScript($ScriptBlock_Locking).AddArgument($_)
$Job.RunspacePool = $runspacepool
$Jobs += New-Object PSObject -Property @{
RunNum = $_
Job = $Job
Result = $Job.BeginInvoke()
}
}
Write-Host "Waiting.." -NoNewline
Do {
Write-Host "." -NoNewline
Start-Sleep -Seconds 1
} While ( $Jobs.Result.IsCompleted -contains $false)
$contents = Get-Content $summaryFile
$numEntries = $contents.count
Write-Host ""
Write-Host "Update complete: summaryFile contains $numEntries entries"
If you run this, you should see some exceptions being thrown from the first process and eventually you will be left with a test file that has not been updated with all 100 entries that it should contain. With locking, there should be no errors and the file will contain all 100 entries.
Sharing Variables Across Threads
In this example, we instantiate an array and add it to SessionState
, then we spin up two threads. The first takes the array and adds a random letter to it every second. While that process is running, we create another thread which outputs the values in the array every 1.5 seconds. We can see the second thread outputs values that were added to the array in the first thread, which shows that the array is indeed shared across both threads.
Note: I have taken no measures to ensure the process is thread safe here.
cls
# create an array and add it to session state
$arrayList = New-Object System.Collections.ArrayList
$arrayList.AddRange((
$sessionstate = [system.management.automation.runspaces.initialsessionstate]::CreateDefault()
$sessionstate.Variables.Add(
(New-Object System.Management.Automation.Runspaces.SessionStateVariableEntry
(
)
$runspacepool = [runspacefactory]::CreateRunspacePool(1, 2, $sessionstate, $Host)
$runspacepool.Open()
$ps1 = [powershell]::Create()
$ps1.RunspacePool = $runspacepool
$ps1.AddScript({
for ($i = 1; $i -le 15; $i++)
{
$letter = Get-Random -InputObject (97..122) | % {[char]$_} # a random lowercase letter
$null = $arrayList.Add($letter)
start-sleep -s 1
}
}) > $null
# on the first thread start a process that adds values to $arrayList every second
$handle1 = $ps1.BeginInvoke()
# now on the second thread, output the value of $arrayList every 1.5 seconds
$ps2 = [powershell]::Create()
$ps2.RunspacePool = $runspacepool
$ps2.AddScript({
Write-Host "ArrayList contents is "
foreach ($i in $arrayList)
{
Write-Host $i -NoNewline
Write-Host " " -NoNewline
}
Write-Host ""
}) > $null
1..10 | % {
$handle2 = $ps2.BeginInvoke()
if ($handle2.AsyncWaitHandle.WaitOne())
{
$ps2.EndInvoke($handle2)
}
start-sleep -s 1.5
}
Return Data From a Background Thread
In this example, we will spin up 5 background threads that each do the simple job of concatenating two string
s. Each background thread returns a custom object containing the original string
s and the concatenated result. By calling EndInvoke
, these custom objects are then returned to the calling script.
cls
$numThreads = 5
$ScriptBlock = {
Param (
[string]$string1,
[string]$string2
)
$concatenatedString = "$string1$string2"
return New-Object PSObject -Property @{
String1 = $string1
String2 = $string2
Concatenated_Result = $concatenatedString
}
}
$RunspacePool = [RunspaceFactory]::CreateRunspacePool(1, $numThreads)
$RunspacePool.Open()
$Jobs = @()
1..20 | % {
$letter = Get-Random -InputObject (65..90) | % {[char]$_} # a random uppercase letter
$Job = [powershell]::Create().AddScript($ScriptBlock).AddParameter
("string1", "value").AddParameter("string2", $letter)
$Job.RunspacePool = $RunspacePool
$Jobs += New-Object PSObject -Property @{
RunNum = $_
Job = $Job
Result = $Job.BeginInvoke()
}
}
# EndInvoke returns the objects from the background threads
ForEach ($Job in $Jobs)
{
$Job.Job.EndInvoke($Job.Result)
}
Wrapping Up
I hope that gave you some ideas about how you can multi thread some of your existing Powershell processes. Any feedback is welcome and if the tip helped you, then please leave a vote.