PowerShell Tokenizer more Accurate than AST in Certain Scenarios

Feb 21, 2019 · AST PowerShell ·

As many of you know, I've been working on some module building tools. One of the things I needed was to retrieve a list of PowerShell modules that each function required (a list of dependencies). This seemed simple enough through PowerShell's AST (Abstract Syntax Tree) as shown in the following example.

1$File = 'U:\GitHub\PowerShell\MrToolkit\Public\Find-MrModuleUpdate.ps1'
2$AST = [System.Management.Automation.Language.Parser]::ParseFile($File, [ref]$null, [ref]$null)
3$AST.ScriptRequirements.RequiredModules.Name

The modules that are retrieved by the AST are simply the ones specified in a functions Requires statement. What if someone forgot to add a required module to the Requires statement? How could this be validated?

Light bulb moment: I'll retrieve a list of all the commands used in a PowerShell function using the AST and then determine the module they exist in using Get-Command. Sounds simple enough, right? Well, not so fast.

While I've written functions on top of the functionality shown in this blog article, I wanted to keep this as simple as possible and eliminate those functions as the source of problems.

First, I've set a variable named File to the path of a function of mine named Start-MrAutoStoppedService which is contained in a PS1 file by the same name. It can be found in my PowerShell GitHub repo.

1$File = 'U:\GitHub\PowerShell\MrToolkit\Public\Start-MrAutoStoppedService.ps1'

Now I'll retrieve a list of all the commands used in the specified function with the AST.

1$AST = [System.Management.Automation.Language.Parser]::ParseFile($File, [ref]$null, [ref]$null)
2$AST.FindAll({$args[0].GetType().Name -like 'CommandAst'}, $true) |
3ForEach-Object {
4    $_.CommandElements[0].Value
5}

As you can see in the previous set of results, the AST thinks there's a command named State, but that's actually part of a WMI filter.

 1PROCESS {
 2    $Params.ComputerName = $ComputerName
 3
 4    Invoke-Command @Params {
 5        $Services = Get-WmiObject -Class Win32_Service -Filter {
 6            State != 'Running' and StartMode = 'Auto'
 7        }
 8
 9        foreach ($Service in $Services.Name) {
10            Get-ItemProperty -Path "HKLM:\SYSTEM\CurrentControlSet\Services\$Service" |
11            Where-Object {$_.Start -eq 2 -and $_.DelayedAutoStart -ne 1} |
12            Select-Object -Property @{label='ServiceName';expression={$_.PSChildName}} |
13            Start-Service @Using:RemoteParams
14        }
15    }
16}

Using the tokenizer instead of the AST returns more accurate results excluding State as shown in the following example.

1$Token = $null
2$null = [System.Management.Automation.Language.Parser]::ParseFile($File, [ref]$Token, [ref]$null)
3Write-Output ($Token | Where-Object {$_.TokenFlags -eq 'CommandName'}).Value

While I'll clean this up and turn it into a function, the following example shows the basic functionality to retrieve a list of required modules from a function based on the commands used within it instead of relying on someone to remember to add them to the Requires statement.

1$Token = $null
2$null = [System.Management.Automation.Language.Parser]::ParseFile($File, [ref]$Token, [ref]$null)
3Write-Output ($Token | Where-Object {$_.TokenFlags -eq 'CommandName'}).Value |
4Get-Command | Select-Object -ExpandProperty Source -Unique

Maybe I'm missing something as far as the AST goes and maybe there's a way to retrieve an accurate list using it? Please post your questions, comments, and/or suggestions as a comment to this blog article.