Lazy Diary @ Hatena Blog

PowerShell / Java / miscellaneous things about software development, Tips & Gochas. CC BY-SA 4.0/Apache License 2.0

How to convert from a code point (U+xxxx) to a code point in another character encoding

function Convert-CodePoint {
  Param(
    [Parameter(ValueFromPipeline=$true,Mandatory=$true)]
    [string] $CodePoint,
    [Parameter(ValueFromPipeline=$false,Mandatory=$true)]
    $From,
    [Parameter(ValueFromPipeline=$false,Mandatory=$true)]
    $To
  )
  begin {
    [System.Text.Encoding]::RegisterProvider([System.Text.CodePagesEncodingProvider]::Instance)
    $FromEncoding = [System.Text.Encoding]::GetEncoding($From)
    $ToEncoding = [System.Text.Encoding]::GetEncoding($To)
    $HexNumber = [System.Globalization.NumberStyles]::HexNumber
  }
  process {
    Select-String -InputObject $CodePoint -Pattern ".{2}" -AllMatches `
        | ForEach-Object { $_.Matches } `
        | ForEach-Object { [Byte]::Parse($_.Value, $HexNumber) } `
        | Set-Variable InputCharBytes
    $ToEncoding.GetBytes($FromEncoding.GetString($InputCharBytes)) `
        | ForEach-Object { $_.ToString("X2") } `
        | Set-Variable OutputCharCodePoints
    $OutputCharCodePoints -join ""
  }
}

You can call this like Convert-CodePoint -CodePoint "3042" -From "utf-16BE" -To 932 and will get “82A0” (codepoint of character “あ” in Shift-JIS).

Note: Use surrogate pairs (ex. D867DE3D) instead of actual Unicode codepoint (U+29E3D) if you want to use code points outside of BMP.

Note: Unicode codepoints (U+xxxx) are shown in big-endian order.