Lazy Diary @ Hatena Blog

PowerShell / Java / miscellaneous things about software development, Tips & Gochas. CC BY-SA 4.0/Apache License 2.0

CharsetEncoder#canEncode() equivalent for PowerShell

Context:

You want to test whether a codepoint is valid in a specific character encoding.

Problem

In .NET, there are no equivalent functions to CharsetEncoder#canEncode() in Java.

Solution

If you want to test whether a character is valid in an encoding, you can test by using character interconversion like:

function Test-Character {
  Param(
    [Parameter(ValueFromPipeline=$true,Mandatory=$true)]
    [string] $Character,
    [Parameter(ValueFromPipeline=$false,Mandatory=$true)]
    $Encoding
  )
  begin {
    if ($PSVersionTable.PSEdition -eq "Core") {
      [System.Text.Encoding]::RegisterProvider([System.Text.CodePagesEncodingProvider]::Instance)
    }
    $TestEncoding = [System.Text.Encoding]::GetEncoding($Encoding)
  }
  process {
    [String]::new($TestEncoding.GetChars($TestEncoding.GetBytes($Character))).Equals($Character)  
  }
}

You can call this like Test-Character -Character "あ" -Encoding 932 and will get $True, and Test-Character -Character "♩" -Encoding 932 and will get $False. This cmdlet is suitable for test whether a character valid in Unicode is valid in another encoding.

Also, you can test codepoints with Convert-CodePoint like:

function Test-CodePoint {
  Param(
    [Parameter(ValueFromPipeline=$true,Mandatory=$true)]
    [string] $CodePoint,
    [Parameter(ValueFromPipeline=$false,Mandatory=$true)]
    $Encoding
  )
  begin {
    [System.Text.Encoding]::RegisterProvider([System.Text.CodePagesEncodingProvider]::Instance)
  }
  process {
    $UnicodeCodePoint = (Convert-CodePoint -CodePoint $CodePoint -From $Encoding -To "utf-16BE")
    $ReverseCodePoint = (Convert-CodePoint -CodePoint $UnicodeCodePoint -From "utf-16BE" -To $Encoding)
    $CodePoint.Equals($ReverseCodePoint)
  }
}

You can call this like Test-CodePoint -CodePoint "84BE" -Encoding 932 and will get $True, and Test-CodePoint -CodePoint "84BF" -Encoding 932 and will get $False. This cmdlet is suitable for test whether a codepoint is valid in an encoding.

子の名に使える非漢字

子の名に使える漢字」は法務省に一覧表があるが、「子の名に使える非漢字」については一覧表のようななものがない。
「あ」や「ア」のような一般的なひらがな・カタカナは当たり前に使えるとして、日本の文字コードの規格であるJIS X 0208およびJIS X 0213の中には、日常あまり使わないひらがな・カタカナや、かな文字とも漢字とも言えない文字がいろいろ載っている。
そこで、子の名に使えるんだか使えないんだかよくわからない文字について、実際に使えるのか使えないのか調べてみた。

判断には法務省戸籍統一文字情報のページを使用する。このページで文字を検索し、個別の文字のページで、「文字区分」の欄に「子の名に使える文字」と記載があれば、その文字は子の名に使えるということになる。

# 文字 コード 戸籍統一文字 子の名に使える 戸籍統一文字番号
1 U+308E Yes Yes 905250
2 U+3094 No No
3 U+3095 No No
4 U+3096 No No
5 U+309D Yes Yes 901890
6 U+309E Yes Yes 901900
7 U+30EE Yes Yes 906160
8 U+30F4 Yes Yes 906220
9 U+30F5 Yes Yes 906230
10 U+30F6 Yes Yes 906240
11 U+30F7 No No
12 U+30F8 No No
13 U+30F9 No No
14 U+30FA No No
15 U+30FD Yes No 901870
16 U+30FE Yes No 901880
17 U+3003 Yes No 901910
18 U+4EDD Yes No 901920
19 U+3005 Yes Yes 901930

Meanings of :owner, :repo, ... in GitHub API document

In GitHub API document (ex. https://developer.github.com/v3/repos/ ), examples of how to access API showed like GET /repos/:owner/:repo/contributors. Meanings of indicator variable-like valiables such as :owner, :repo are listed below:

# variable meanings
1 :owner username, or name of organization in GitHub Enterprise
2 :repo name of repository
3 :number sequence number of pull request or issue (the sequence is shared by pull requests and issues)
4 :id ID of comment, etc.
5 :sha SHA-1 hash of a commit
6 :ref Path to a reference (heads/BranchA for a branch named BranchA)

How to convert string to number with AngularJS

There is more than one way to convert string to number in JavaScript.

Context:

  • Use AngularJS to bind text boxes to variables.
  • Sum up input values in some text boxes.
  • Omit ng-required, ng-pattern, ng-minlength, ng-maxlength, etc., wherever possible.
  • Use {{~}} to print summed up value.
  • Should handle values larget than 32-bit integer.
  • String “45a”, “0xAB”, “-0xDE”, “2e4”, and “Inifinity” should be treated as illegal value (NaN) or 0.

Comparison:

There is no only neat things to do …

  • parseInt(value, 10) … Convert “45a” to 45, “0xAB” to 171, “-0xDE” to -222, and “2e4” to 2. Could not be used in AngularJS {{ ~ }} because it create a new object.
  • parseFloat(value) … Convert “45a” to 45, “2e4” to 20000, and “Infinity” to Infinity. Could not be used in AngularJS {{ ~ }} because it create a new object.
  • Number(value) … Convert “0xAB” to 171, and “Infinity” to Infinity. Could not be used in AngularJS {{ ~ }} because it create a new object.
  • value|0, value>>0, value<<0, … Convert “0xAB” to 171, and “2e4” to 20000. Cannot handle value larger than 231 (=2147483648).
  • +value … Convert “0xAB” to 171, “2e4” to 20000, and “Infinity” to Infinity. {{ (+a) + (+b) | number }} is treated as string concat on AngularJS 1.2.1 (no problem on AngularJS 1.4.8)
  • value*1 … Convert “0xAB” to 171, “2e4” to 20000, and “Infinity” to Infinity. {{ undefined*1 }} becomes NaN on AngularJS 1.4.8 (no problem on JavaScript console…)
  • value-0 … Convert “0xAB” to 171, “2e4” to 20000, and “Infinity” to Infinity.

Comparison of these code on some values: http://jsfiddle.net/satob/fu1sjtmd/

To treat “0xAB”, “-0xDE”, “2e4”, and “Inifinity” as illiegal value, you should use ng-pattern to cutoff non-digit characters like this: http://jsfiddle.net/satob/8oeLfLrz/

URL of GitHub Enterprise API is different from github.com's one

Problem:

I have tried to access GitHub Enterprise API with a URL like showed below, based on examples from some websites, but the response was 404 error.

https://x.x.x.x/repos/Project/Repository/git/refs/heads

Reason:

URL of GitHub Enterprise API is different from github.com’s one. github’s API URL is like:

https://api.github.com/repos/User/Repository/git/refs/heads

and GitHub Enterprise API URL is like:

https://x.x.x.x/api/v3/repos/Project/Repository/git/refs/heads

Solution:

Fix the URL and got response successfully.