Lazy Diary @ Hatena Blog

PowerShell / Java / miscellaneous things about software development, Tips & Gochas. CC BY-SA 4.0/Apache License 2.0

charset

各府省のデジタル・ガバメント中長期計画におけるJIS X 0213の採用状況

日本では、平成29年に閣議決定された「世界最先端IT国家創造宣言・官民データ活用推進基本計画」を受けて、「デジタル・ガバメント推進方針」というものが定められており、各府省ではその内容を具体化した「デジタル・ガバメント中長期計画」というものを定…

康煕字典体から常用漢字へ変換するコマンドレット

変換対照の文字は、文化庁 常用漢字表*1で康煕字典体が示されているものを対照とした。常用漢字表のPDFの内容をテキストファイルへダンプし、以下のスクリプトで常用漢字とカッコ書きの康煕字典体とのペアを抽出した*2。 > Get-Content .\常用漢字表.txt | W…

How to change locale of error messages of PowerShell interpreter

Problem You can set locale of the current thread in PowerShell like below: [System.Threading.Thread]::CurrentThread.CurrentCulture = [System.Globalization.CultureInfo]("en-US") But error messages of PowerShell interpreter shows in current …

How to use additional character encodings in PowerShell Core (Linux) and Desktop (Windows)

Problem You have to call [System.Text.Encoding]::RegisterProvider([System.Text.CodePagesEncodingProvider]::Instance) when you want to use additional character encodings in PowerShell Core (Linux). On the other hand, calling [System.Text.En…

Invoke-RestMethod to GitLab API causes mojibake

Context In PowerShell, You can call REST API with Invoke-RestMethod like: Invoke-RestMethod -Headers $headers -Method Get -Timeout 10 -Uri "https://api.github.com/users/octocat/orgs" Problem The result of Invoke-RestMethod causes mojibake …

There are no properties for ordinary characters in PropList.txt

Problem When run the script in next URL with PropList.txt on unicode.org, result file did not contain character properties for ordinary characters like 'x', 'y', or 'z'. http://satob.hatenablog.com/entry/2017/11/21/002957 Reason PropList.t…

Get CodePoint-Property Pair from Scripts.txt on Unicode.org

Context You want to make a list of pair of unicode codepoint and its character property, like below: 00009,Cc 00020,Zs 00021,Po 00024,Sc ... Solution with PowerShell You can make the list from ftp://ftp.unicode.org/Public/UNIDATA/PropList.…

Supported character encodings in Get-Content and Import-Csv (in PowerShell 2.0/4.0)

Tested in Windows 7 (Japanese). Import-Csv does not have -Encoding option in PowerShell 2.0. There are no option for UTF-32BE in PowerShell 2.0. (note: PowerShell ISE can handle UTF-32BE Files) Import-Csv does not support Unknown and Strin…

You cannot use -Encoding option with Import-Csv in PowerShell 2.0

Context: You use PowerShell 2.0 (Windows 7 or Windows Server 2008 R2). You want to read CSV file that contain non-ASCII characters. Problem: In PowerShell 2.0, Import-Csv cmdlet doesn’t have -Encoding option. Solution: If you want to read …

Difference of acceptable parameters for -Encoding option

Acceptable parameters for -Encoding option are different for Get-Content, Set-Content, Export-Csv, Import-Csv, and Out-File. # cmdlet Default ASCII UTF-7 UTF-8 UTF-16LE UTF-16BE UTF-32LE UTF-32BE Byte Default OEM String Unknown 1 Get-Conte…

How to write result of ConvertTo-Csv to a file in UTF-8 without BOM

Context: You want to write the result of ConvertTo-Csv in UTF-8 encoding without BOM. e.g. You need a file that can be read by a Java program (Java File API cannot handle BOM in UTF-8 encoded files). UTF-8 in PowerShell, e.g. ConvertTo-Csv…

.NET cannot distinguish Shift_JIS from MS932(Windows-31J)

Context: Japanese character encoding Shift_JIS (シフトJIS) and Microsoft Codepage 932 (a.k.a. MS932, Windows-31J in IANA) are slightly different. For example, full-width cent sign (¢) is 0x8191 in both Shift_JIS and MS932, but it is mappe…

PowerShellで法務省 戸籍統一文字情報のページからあるコードポイントの文字の情報を取得する

(In English: How to get information about a Japanese character from 戸籍統一文字情報 site (managed by The Ministry of Justice (Japan)) with PowerShell) 問題: ある漢字に関する情報(読みや、子の名に使える文字か等)を調べたければ、法務省の戸…

CharsetEncoder#canEncode() equivalent for PowerShell

Context: You want to test whether a codepoint is valid in a specific character encoding. Problem In .NET, there are no equivalent functions to CharsetEncoder#canEncode() in Java. Solution If you want to test whether a character is valid in…

System.Text.Encoding.GetEncodings() does not show all available encodings after call RegisterProvider()

Problem: If you want to use character encodings other than the default registered encodings, you have to call [System.Text.Encoding]::RegisterProvider([System.Text.CodePagesEncodingProvider]::Instance). But even if you call that method, th…

How to convert from a code point (U+xxxx) to a code point in another character encoding

function Convert-CodePoint { Param( [Parameter(ValueFromPipeline=$true,Mandatory=$true)] [string] $CodePoint, [Parameter(ValueFromPipeline=$false,Mandatory=$true)] $From, [Parameter(ValueFromPipeline=$false,Mandatory=$true)] $To ) begin { …

How to convert from a code point (U+xxxx) to a character

function ConvertFrom-CodePoint { Param( [Parameter(ValueFromPipeline=$true,Mandatory=$true)] [string] $CodePoint, [Parameter(ValueFromPipeline=$false,Mandatory=$true)] $From ) begin { [System.Text.Encoding]::RegisterProvider([System.Text.C…