Lazy Diary @ Hatena Blog

PowerShell / Java / miscellaneous things about software development, Tips & Gochas. CC BY-SA 4.0/Apache License 2.0

How to extract non-MS932 (Shift_JIS) compliant characters from string

function Get-NonMS932CompliantCharacter {
  Param(
    [Parameter(ValueFromPipeline=$true,Mandatory=$true)]
    [string] $TargetString
  )
  process {
    $TargetStringBytes = [Text.Encoding]::UTF32.GetBytes($TargetString);
    for ($i=0; $i -lt $TargetStringBytes.Length; $i+=4) {
        $TargetChar = [Text.Encoding]::UTF32.GetString($TargetStringBytes, $i, 4);
        $MS932Bytes = [Text.Encoding]::GetEncoding(932).GetBytes($TargetChar);
        $MS932Char = [Text.Encoding]::GetEncoding(932).GetString($MS932Bytes,0,$MS932Bytes.Length)
        if ($TargetChar -ne $MS932Char) {
            $TargetChar
        }
    }
  }
}

ex:

PS > "あえうえお①𩸽X𠀋か㐂" | Get-NonMS932CompliantCharacter
𩸽
𠀋
㐂