Get-ChildItem vs Dir in PowerShell

Batch as usual

Recently I was in the need of modifying a huge amount of files across a network share. After installing Serviio media streaming service I noticed that it will crash "randomly". After checking the logs it was clear that ffmpeg was crashing when trying to open subtitle files that were not in the encoding indicated by the Serviio console. I needed a quick way to update all subtitle files to a common encoding, so I decided to convert all files to UTF8.

Since I use a windows environment I went directly to PowerShell and wrote this.

 Get-ChildItem -path .\ -filter *.srt -file  | ForEach-Object { (get-content $_.FullName) | Out-File $_.FullName -encoding utf8 }

This worked almost right except for files with "[" or "]" (among other symbols) on the name or path. To solve it, just added the "-LiteralPath" switch to tell powershell not to consider any wildcards on the path name and just use it exactly as it is.

 Get-ChildItem -path .\ -filter *.srt -file  | ForEach-Object { (get-content -LiteralPath $_.FullName) | Out-File -LiteralPath $_.FullName -encoding utf8 }

And done! All subtitles files in my media server are now in UTF-8 and Serviio will work without crashing. 

Too slow?

Performance however was a concern, I noticed this was a bit slower that it should, considering a fast network, small size of the subtitle files (avg<100kb) and how simple the process is. This single-line script has only 2 parts:

(get the files)  then for each file (convert it)

I started to dig a little bit into Get-ChildItem I found that there have been complains about its performance for some time, but it is much better now than in previous versions. Anyway I tried a different way to do that same first part of getting the files and compared it against Get-ChildItem.

Using "cmd /c" executed "dir /s /b <pattern>" and did some tests, local and over the network. See the image below for an example measuring the search for .exe files in another drive.

Both over the network and locally, the "dir" version worked faster, of course it grabs less information than Get-ChildItem, which actually creates an object around the file returned.

For a final test I then changed the original script to: 

cmd /c dir /s /b *.srt  | foreach { (Get-Content -LiteralPath $_) | Out-File -LiteralPath $_ -Encoding UTF8 }

It works 100% like the original script and a bit faster. Although since the big chunk of  execution time belongs to the conversion of the files encoding, the jump in speed is not that big in this particular case. However, when I need to do search and filtering of files in the terms of thousands I no longer use Get-ChildItem.

Hope this is useful for you all.


Popular posts from this blog

The case for a 4 week sprint

NODE.js fs.readFile and the BOM marker