Lazy Diary @ Hatena Blog

PowerShell / Java / miscellaneous things about software development, Tips & Gochas. CC BY-SA 4.0/Apache License 2.0

charset

Supported character encodings in Get-Content and Import-Csv (in PowerShell 2.0/4.0)

Tested in Windows 7 (Japanese). Import-Csv does not have -Encoding option in PowerShell 2.0. There are no option for UTF-32BE in PowerShell 2.0. (note: PowerShell ISE can handle UTF-32BE Files) Import-Csv does not support Unknown and Strin…

You cannot use -Encoding option with Import-Csv in PowerShell 2.0

Context: You use PowerShell 2.0 (Windows 7 or Windows Server 2008 R2). You want to read CSV file that contain non-ASCII characters. Problem: In PowerShell 2.0, Import-Csv cmdlet doesn’t have -Encoding option. Solution: If you want to read …

Difference of acceptable parameters for -Encoding option

Acceptable parameters for -Encoding option are different for Get-Content, Set-Content, Export-Csv, Import-Csv, and Out-File. # cmdlet Default ASCII UTF-7 UTF-8 UTF-16LE UTF-16BE UTF-32LE UTF-32BE Byte Default OEM String Unknown 1 Get-Conte…

How to write result of ConvertTo-Csv to a file in UTF-8 without BOM

Context: You want to write the result of ConvertTo-Csv in UTF-8 encoding without BOM. e.g. You need a file that can be read by a Java program (Java File API cannot handle BOM in UTF-8 encoded files). UTF-8 in PowerShell, e.g. ConvertTo-Csv…

.NET cannot distinguish Shift_JIS from MS932(Windows-31J)

Context: Japanese character encoding Shift_JIS (シフトJIS) and Microsoft Codepage 932 (a.k.a. MS932, Windows-31J in IANA) are slightly different. For example, full-width cent sign (¢) is 0x8191 in both Shift_JIS and MS932, but it is mappe…

PowerShellで法務省 戸籍統一文字情報のページからあるコードポイントの文字の情報を取得する

(In English: How to get information about a Japanese character from 戸籍統一文字情報 site (managed by The Ministry of Justice (Japan)) with PowerShell) 問題: ある漢字に関する情報(読みや、子の名に使える文字か等)を調べたければ、法務省の戸…

CharsetEncoder#canEncode() equivalent for PowerShell

Context: You want to test whether a codepoint is valid in a specific character encoding. Problem In .NET, there are no equivalent functions to CharsetEncoder#canEncode() in Java. Solution If you want to test whether a character is valid in…

System.Text.Encoding.GetEncodings() does not show all available encodings after call RegisterProvider()

Problem: If you want to use character encodings other than the default registered encodings, you have to call [System.Text.Encoding]::RegisterProvider([System.Text.CodePagesEncodingProvider]::Instance). But even if you call that method, th…

How to convert from a code point (U+xxxx) to a code point in another character encoding

function Convert-CodePoint { Param( [Parameter(ValueFromPipeline=$true,Mandatory=$true)] [string] $CodePoint, [Parameter(ValueFromPipeline=$false,Mandatory=$true)] $From, [Parameter(ValueFromPipeline=$false,Mandatory=$true)] $To ) begin { …

How to convert from a code point (U+xxxx) to a character

function ConvertFrom-CodePoint { Param( [Parameter(ValueFromPipeline=$true,Mandatory=$true)] [string] $CodePoint, [Parameter(ValueFromPipeline=$false,Mandatory=$true)] $From ) begin { [System.Text.Encoding]::RegisterProvider([System.Text.C…