charset
日本では、平成29年に閣議決定された「世界最先端IT国家創造宣言・官民データ活用推進基本計画」を受けて、「デジタル・ガバメント推進方針」というものが定められており、各府省ではその内容を具体化した「デジタル・ガバメント中長期計画」というものを定…
変換対照の文字は、文化庁 常用漢字表*1で康煕字典体が示されているものを対照とした。常用漢字表のPDFの内容をテキストファイルへダンプし、以下のスクリプトで常用漢字とカッコ書きの康煕字典体とのペアを抽出した*2。 > Get-Content .\常用漢字表.txt | W…
Problem You can set locale of the current thread in PowerShell like below: [System.Threading.Thread]::CurrentThread.CurrentCulture = [System.Globalization.CultureInfo]("en-US") But error messages of PowerShell interpreter shows in current …
Problem You have to call [System.Text.Encoding]::RegisterProvider([System.Text.CodePagesEncodingProvider]::Instance) when you want to use additional character encodings in PowerShell Core (Linux). On the other hand, calling [System.Text.En…
Context In PowerShell, You can call REST API with Invoke-RestMethod like: Invoke-RestMethod -Headers $headers -Method Get -Timeout 10 -Uri "https://api.github.com/users/octocat/orgs" Problem The result of Invoke-RestMethod causes mojibake …
Problem When run the script in next URL with PropList.txt on unicode.org, result file did not contain character properties for ordinary characters like 'x', 'y', or 'z'. http://satob.hatenablog.com/entry/2017/11/21/002957 Reason PropList.t…
Context You want to make a list of pair of unicode codepoint and its character property, like below: 00009,Cc 00020,Zs 00021,Po 00024,Sc ... Solution with PowerShell You can make the list from ftp://ftp.unicode.org/Public/UNIDATA/PropList.…
Tested in Windows 7 (Japanese). Import-Csv does not have -Encoding option in PowerShell 2.0. There are no option for UTF-32BE in PowerShell 2.0. (note: PowerShell ISE can handle UTF-32BE Files) Import-Csv does not support Unknown and Strin…
Context: You use PowerShell 2.0 (Windows 7 or Windows Server 2008 R2). You want to read CSV file that contain non-ASCII characters. Problem: In PowerShell 2.0, Import-Csv cmdlet doesn’t have -Encoding option. Solution: If you want to read …
Acceptable parameters for -Encoding option are different for Get-Content, Set-Content, Export-Csv, Import-Csv, and Out-File. # cmdlet Default ASCII UTF-7 UTF-8 UTF-16LE UTF-16BE UTF-32LE UTF-32BE Byte Default OEM String Unknown 1 Get-Conte…
Context: You want to write the result of ConvertTo-Csv in UTF-8 encoding without BOM. e.g. You need a file that can be read by a Java program (Java File API cannot handle BOM in UTF-8 encoded files). UTF-8 in PowerShell, e.g. ConvertTo-Csv…
Context: Japanese character encoding Shift_JIS (シフトJIS) and Microsoft Codepage 932 (a.k.a. MS932, Windows-31J in IANA) are slightly different. For example, full-width cent sign (¢) is 0x8191 in both Shift_JIS and MS932, but it is mappe…
(In English: How to get information about a Japanese character from 戸籍統一文字情報 site (managed by The Ministry of Justice (Japan)) with PowerShell) 問題: ある漢字に関する情報(読みや、子の名に使える文字か等)を調べたければ、法務省の戸…
Context: You want to test whether a codepoint is valid in a specific character encoding. Problem In .NET, there are no equivalent functions to CharsetEncoder#canEncode() in Java. Solution If you want to test whether a character is valid in…
Problem: If you want to use character encodings other than the default registered encodings, you have to call [System.Text.Encoding]::RegisterProvider([System.Text.CodePagesEncodingProvider]::Instance). But even if you call that method, th…
function Convert-CodePoint { Param( [Parameter(ValueFromPipeline=$true,Mandatory=$true)] [string] $CodePoint, [Parameter(ValueFromPipeline=$false,Mandatory=$true)] $From, [Parameter(ValueFromPipeline=$false,Mandatory=$true)] $To ) begin { …
function ConvertFrom-CodePoint { Param( [Parameter(ValueFromPipeline=$true,Mandatory=$true)] [string] $CodePoint, [Parameter(ValueFromPipeline=$false,Mandatory=$true)] $From ) begin { [System.Text.Encoding]::RegisterProvider([System.Text.C…