Lazy Diary @ Hatena Blog

PowerShell / Java / miscellaneous things about software development, Tips & Gochas. CC BY-SA 4.0/Apache License 2.0

CharsetEncoder#canEncode() equivalent for PowerShell

Context:

You want to test whether a codepoint is valid in a specific character encoding.

Problem

In .NET, there are no equivalent functions to CharsetEncoder#canEncode() in Java.

Solution

If you want to test whether a character is valid in an encoding, you can test by using character interconversion like:

function Test-Character {
  Param(
    [Parameter(ValueFromPipeline=$true,Mandatory=$true)]
    [string] $Character,
    [Parameter(ValueFromPipeline=$false,Mandatory=$true)]
    $Encoding
  )
  begin {
    if ($PSVersionTable.PSEdition -eq "Core") {
      [System.Text.Encoding]::RegisterProvider([System.Text.CodePagesEncodingProvider]::Instance)
    }
    $TestEncoding = [System.Text.Encoding]::GetEncoding($Encoding)
  }
  process {
    [String]::new($TestEncoding.GetChars($TestEncoding.GetBytes($Character))).Equals($Character)  
  }
}

You can call this like Test-Character -Character "あ" -Encoding 932 and will get $True, and Test-Character -Character "♩" -Encoding 932 and will get $False. This cmdlet is suitable for test whether a character valid in Unicode is valid in another encoding.

Also, you can test codepoints with Convert-CodePoint like:

function Test-CodePoint {
  Param(
    [Parameter(ValueFromPipeline=$true,Mandatory=$true)]
    [string] $CodePoint,
    [Parameter(ValueFromPipeline=$false,Mandatory=$true)]
    $Encoding
  )
  begin {
    [System.Text.Encoding]::RegisterProvider([System.Text.CodePagesEncodingProvider]::Instance)
  }
  process {
    $UnicodeCodePoint = (Convert-CodePoint -CodePoint $CodePoint -From $Encoding -To "utf-16BE")
    $ReverseCodePoint = (Convert-CodePoint -CodePoint $UnicodeCodePoint -From "utf-16BE" -To $Encoding)
    $CodePoint.Equals($ReverseCodePoint)
  }
}

You can call this like Test-CodePoint -CodePoint "84BE" -Encoding 932 and will get $True, and Test-CodePoint -CodePoint "84BF" -Encoding 932 and will get $False. This cmdlet is suitable for test whether a codepoint is valid in an encoding.