Lazy Diary @ Hatena Blog

PowerShell / Java / miscellaneous things about software development, Tips & Gochas. CC BY-SA 4.0/Apache License 2.0

You should not pass the result of Get-ChildItem into Get-Content (and the like) directly

Context:

You can pass the result of Get-ChildItem into Get-Content directly:

PS /home/satob/tmp> Get-ChildItem | Where-Object { $_.Name -like "*.csv" } | ForEach-Object { Get-Content $_ }          
"a","x"
"b","2"
...

Problem:

You cannot pass the result of Get-ChildItem into Get-Content directly when you Get-ChildItem from other than current directory:

PS /home/satob> Get-ChildItem /tmp | Where-Object { $_.Name -like "*.csv" } | ForEach-Object { Get-Content $_ }         
Get-Content : Cannot find path '/home/satob/foobar.csv' because it does not exist.
At line:1 char:80
+ ... -Object { $_.Name -like "*.csv" } | ForEach-Object { Get-Content $_ }
+                                                          ~~~~~~~~~~~~~~
    + CategoryInfo          : ObjectNotFound: (/home/satob/foobar.csv:String) [Get-Content], ItemNotFoundException
    + FullyQualifiedErrorId : PathNotFound,Microsoft.PowerShell.Commands.GetContentCommand

It is troublesome because the problem will not occur when you test with the files in current directory.

Reason:

The result of Get-ChildItem is handled as relative path when you pass them to Get-Content.

Note: The problem will not occur even without FullName when you use wildcard with Get-ChildItem.

PS /home/satob> Get-ChildItem tmp/*.csv | ForEach-Object { Get-Content $_ }                                             
"a","x"
"b","2"
...

Solution:

You should specify FullName property explicitly:

PS /home/satob> Get-ChildItem tmp/ | Where-Object { $_.Name -like "*.csv" } | ForEach-Object { Get-Content $_.FullName }
"a","x"
"b","2"
...

You can also use FullName property with wildcards:

PS /home/satob> Get-ChildItem tmp/*.csv | ForEach-Object { Get-Content $_.FullName }                                    
"a","x"
"b","2"
...

Note: You should not use Name property with or without wildcards. It contains relative path:

PS /home/satob> Get-ChildItem tmp/ | Where-Object { $_.Name -like "*.csv" } | ForEach-Object { Get-Content $_.Name }    
Get-Content : Cannot find path '/home/satob/a.csv' because it does not exist.
At line:1 char:80
+ ... ct { $_.Name -like "*.csv" } | ForEach-Object { Get-Content $_.Name }
+                                                     ~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : ObjectNotFound: (/home/satob/a.csv:String) [Get-Content], ItemNotFoundException
    + FullyQualifiedErrorId : PathNotFound,Microsoft.PowerShell.Commands.GetContentCommand
PS /home/satob> Get-ChildItem tmp/*.csv | ForEach-Object { Get-Content $_.Name }                                        
Get-Content : Cannot find path '/home/satob/a.csv' because it does not exist.
At line:1 char:44
+ Get-ChildItem tmp/*.csv | ForEach-Object { Get-Content $_.Name }
+                                            ~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : ObjectNotFound: (/home/satob/a.csv:String) [Get-Content], ItemNotFoundException
    + FullyQualifiedErrorId : PathNotFound,Microsoft.PowerShell.Commands.GetContentCommand

You cannot use -Encoding option with Import-Csv in PowerShell 2.0

Context:

Problem:

In PowerShell 2.0, Import-Csv cmdlet doesn’t have -Encoding option.

Solution:

If you want to read a CSV file without CRLF in cells, you can use Get-Content -Encoding and ConvertFrom-Csv like:

PS > Get-Content -Encoding UTF8 ./foobar.csv | ConvertFrom-Csv

If you want to read a CSV file with CRLF in cells, you should convert the CSV file to UTF-16 with BOM, and read the file with Import-Csv (Import-Csv can recognize UTF-16 encoding automatically even in PowerShell 2.0):

PS > Get-Content -Encoding UTF8 ./foobar.csv | Out-File -Encoding Unicode -Path "./foobar-utf16.csv"
PS > Import-Csv ./foobar-utf16.csv

Difference of acceptable parameters for -Encoding option

Acceptable parameters for -Encoding option are different for Get-Content, Set-Content, Export-Csv, Import-Csv, and Out-File.

# cmdlet Default ASCII UTF-7 UTF-8 UTF-16LE UTF-16BE UTF-32LE UTF-32BE Byte Default OEM String Unknown
1 Get-Content Ascii Ascii UTF7 UTF8 Unicode BigEndianUnicode UTF32 BigEndianUTF32 Byte Default Oem String Unknown
2 Set-Content Ascii Ascii UTF7 UTF8 Unicode BigEndianUnicode UTF32 BigEndianUTF32 Byte Default Oem String Unknown
3 Export-Csv ASCII ASCII UTF7 UTF8 Unicode BigEndianUnicode UTF32 - - Default OEM - -
4 Import-Csv ASCII ASCII UTF7 UTF8 Unicode BigEndianUnicode UTF32 - - Default OEM - -
5 Out-File default ascii utf7 utf8 unicode bigendianunicode utf32 - - default oem string unknown
PS > Get-Content -Encoding
Ascii             BigEndianUTF32    Default           String            Unknown           UTF7
BigEndianUnicode  Byte              Oem               Unicode           UTF32             UTF8
PS > Set-Content -Encoding
Ascii             BigEndianUTF32    Default           String            Unknown           UTF7
BigEndianUnicode  Byte              Oem               Unicode           UTF32             UTF8
PS > Import-Csv -Encoding
ASCII             Default           Unicode           UTF7
BigEndianUnicode  OEM               UTF32             UTF8
PS > Export-Csv -Encoding
ASCII             Default           Unicode           UTF7
BigEndianUnicode  OEM               UTF32             UTF8
PS > Out-File -Encoding
ascii             default           string            unknown           utf7
bigendianunicode  oem               unicode           utf32             utf8

Functional limitation of The Nu Html Checker (v.Nu)

Problem:

The Nu Html Checker (v.Nu) is useful HTML validator. It can be used not only from a web browser, but also from command line. But, this validator has some functional limitations (for both web interface and CLI version).

  • v.Nu cannot handle non-Unicode files. For example, you will get errors like below for Shift_JIS files:
~/download/dist$ java -jar vnu.jar sjis.html 
"file:/home/satob/download/dist/sjis.html":11.40-11.40: error: Malformed byte sequence: “83”.
"file:/home/satob/download/dist/sjis.html":11.42-11.42: error: Malformed byte sequence: “81”.
"file:/home/satob/download/dist/sjis.html":11.44-11.44: error: Malformed byte sequence: “83”.
  • v.Nu treats an HTML 4.01 doctype as obsolete doctype and shows an error:
~/download/dist$ java -jar vnu.jar html4.html
"file:/home/satob/download/dist/index.html":1.1-3.44: info warning: Obsolete doctype. Expected “<!DOCTYPE html>”.
  • Partial HTML (often used for Single Page Applications) gets error:
~/download/dist$ java -jar vnu.jar partial.html 
"file:/home/satob/download/dist/partial.html":1.1-1.19: error: Start tag seen without seeing a doctype first. Expected “<!DOCTYPE html>”.
  • Custom attributes (and tags) for JavaScript frameworks are treated as invalidate attribute:
~/download/dist$ java -jar vnu.jar angular.html 
"file:/home/satob/download/dist/angular.html":9.1-9.15: error: Attribute “ng-app” not allowed on element “div” at this point.
"file:/home/satob/download/dist/angular.html":12.10-12.44: error: Attribute “ng-model” not allowed on element “input” at this point.
"file:/home/satob/download/dist/angular.html":13.1-13.18: error: Attribute “ng-bind” not allowed on element “p” at this point.

In other words, v.Nu can handle only UTF-8 encoded, HTML5, full-length, and plain HTML.

Reason:

It seems by design.

Solution:

v.Nu doesn’t have options for these valid-depending-on-context HTML. There is no workaround for this problem, so use another HTML validator.

For example, The Eclipse built-in HTML validator can handle non-Unicode files, HTML 4.01, partial HTML, and has option for non-standard tags and attributes.

How to write result of ConvertTo-Csv to a file in UTF-8 without BOM

Context:

  • You want to write the result of ConvertTo-Csv in UTF-8 encoding without BOM. e.g. You need a file that can be read by a Java program (Java File API cannot handle BOM in UTF-8 encoded files).
  • UTF-8 in PowerShell, e.g. ConvertTo-Csv | Out-File -Encoding utf8 or Export-Csv -Encoding UTF8, will prepend a BOM to a file.

Problem

There are some soutions that will not work as expected.

PS > "1" | ConvertTo-Csv | Set-Variable tmp
PS > [System.IO.File]::WriteAllLines("/tmp/foobar.csv", $tmp, $UTF8woBomEncoding)                                   
Cannot find an overload for "WriteAllLines" and the argument count: "3".
At line:1 char:1
+ [System.IO.File]::WriteAllLines("/tmp/foobar.csv", $tmp, $UTF8woBomEn ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (:) [], MethodException
    + FullyQualifiedErrorId : MethodCountCouldNotFindBest
  • WriteAllLines() with @() in 2nd parameter also returns error like this:
PS > "1" | ConvertTo-Csv | Set-Variable tmp
PS > [System.IO.File]::WriteAllLines("/tmp/foobar.csv", @($tmp), $UTF8woBomEncoding)                         
Cannot find an overload for "WriteAllLines" and the argument count: "3".
At line:1 char:1
+ [System.IO.File]::WriteAllLines("/tmp/foobar.csv", @($tmp), $UTF8woBo ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (:) [], MethodException
    + FullyQualifiedErrorId : MethodCountCouldNotFindBest
PS > "1" | ConvertTo-Csv | % { [Text.Encoding]::UTF8.GetBytes($_) } | Set-Content -Encoding Byte -Path "./foobar.csv"
PS > Get-Content ./foobar.csv                                                                           
#TYPE System.String"Length""1"
PS > "1" | ConvertTo-Csv | Set-Variable tmp
PS > $UTF8woBomEncoding = New-Object System.Text.UTF8Encoding $False
PS > [System.IO.File]::WriteAllText("/tmp/foobar.csv", $tmp, $UTF8woBomEncoding)
PS > Get-Content /tmp/foobar.csv
#TYPE System.String"Length""1"

Reason:

  • WriteAllLines() expects string[] or IEnumerable[string], by contrast, the type of result of ConvertTo-Csv or @() is Object[].
PS > [System.IO.File]::WriteAllLines                                                                    

OverloadDefinitions                                                                                                    
-------------------                                                                                                    
static void WriteAllLines(string path, System.Collections.Generic.IEnumerable[string] contents)                        
static void WriteAllLines(string path, System.Collections.Generic.IEnumerable[string] contents, System.Text.Encoding en
coding)

PS > "1" | ConvertTo-Csv | Set-Variable tmp
PS > Get-Member -InputObject $tmp

   TypeName: System.Object[]
  • The result of ConvertTo-Csv doesn’t have line breaks on the end of line.

Solution:

You should cast the result of ConvertTo-Csv to [String[]], and use WriteAllLines() like this:

PS > "1" | ConvertTo-Csv | Set-Variable tmp
PS > $UTF8woBomEncoding = New-Object System.Text.UTF8Encoding $False
PS > [System.IO.File]::WriteAllLines("/tmp/foobar.csv", [String[]]$tmp, $UTF8woBomEncoding)
PS > Get-Content /tmp/foobar.csv
#TYPE System.String
"Length"
"1"

2017/05/23 add

Or, with % { [Text.Encoding]::UTF8.GetBytes($_) } | Set-Content -Encoding Byte or WriteAllText(), you can add CRLF at end of each line explicitly:

PS > "1" | ConvertTo-Csv | Out-String | % { [Text.Encoding]::UTF8.GetBytes($_) } | Set-Content -Encoding Byte -Path "./foobar.csv"
PS > Get-Content ./foobar.csv
#TYPE System.String
"Length"
"1"
PS > $UTF8woBomEncoding = New-Object System.Text.UTF8Encoding $False
PS > "1" | ConvertTo-Csv | Out-String | Set-Variable tmp
PS > [System.IO.File]::WriteAllText("/tmp/foobar.csv", $tmp, $UTF8woBomEncoding)
PS > Get-Content /tmp/foobar.csv
#TYPE System.String
"Length"
"1"