Lazy Diary @ Hatena Blog

PowerShell / Java / miscellaneous things about software development, Tips & Gochas. CC BY-SA 4.0/Apache License 2.0

Unicode

康煕字典体から常用漢字へ変換するコマンドレット

変換対照の文字は、文化庁 常用漢字表*1で康煕字典体が示されているものを対照とした。常用漢字表のPDFの内容をテキストファイルへダンプし、以下のスクリプトで常用漢字とカッコ書きの康煕字典体とのペアを抽出した*2。 > Get-Content .\常用漢字表.txt | W…

How to separate a string into codepoint-wise characters with PowerShell

Context: You have a Unicode string that contain non-ASCII characters as well as ASCII characters. You want to separate that string into characters. Problem: If you split the string with the code below: $TemporaryArray = $InputString -split…

How to extract non-MS932 (Shift_JIS) compliant characters from string

function Get-NonMS932CompliantCharacter { Param( [Parameter(ValueFromPipeline=$true,Mandatory=$true)] [string] $TargetString ) process { $TargetStringBytes = [Text.Encoding]::UTF32.GetBytes($TargetString); for ($i=0; $i -lt $TargetStringBy…

tr equivalent in PowerShell (Unicode surrogate pair-aware)

There is no straightforward tr equivalent in Windows, so I made an cmdlet that you can use like tr command. This tr cmdlet is aware of Unicode characters including surrogate pairs. function tr { Param( [Parameter(ValueFromPipeline=$true,Ma…

java.text.BreakIteratorによる文字数(grapheme)カウント

JIS X 0213など、シフトJISやマイクロソフト コードページ932以外の文字を業務プログラムで扱う場合には、入力された文字列を証明書などに確実に収めるため、文字数を正しくカウントする必要があります。 JIS X 0213では複数のコードポイントで1文字を表す文…

There are no properties for ordinary characters in PropList.txt

Problem When run the script in next URL with PropList.txt on unicode.org, result file did not contain character properties for ordinary characters like 'x', 'y', or 'z'. http://satob.hatenablog.com/entry/2017/11/21/002957 Reason PropList.t…

Get CodePoint-Property Pair from Scripts.txt on Unicode.org

Context You want to make a list of pair of unicode codepoint and its character property, like below: 00009,Cc 00020,Zs 00021,Po 00024,Sc ... Solution with PowerShell You can make the list from ftp://ftp.unicode.org/Public/UNIDATA/PropList.…