Merge multiple files by bytes on Windows
Merging multiple files on Linux is very easy as to run a simple cat command and redirect its output to the merged file.
Merging multiple files on Windows however, needs more complex operation
Assuming we would want to do the same as the following command on Linux:
cat * > merged
which is expanded to the following command by shell:
cat part1 part2 part3 > merged
We have the following ways to achieve this on Windows (not counting Cygwin / MingW64, as that’s pretty much cheating)
Using cmd
The copy command is cmd’s built-in instead of a system tool, to merge multiple files, use a command like:
copy /B part1 + part2 + part3 merged
in which:
/Bmeans copying bytewise, this is needed unless you want to merge them text-wise- The merged output must be the last
argument, thecopybuilt-in does not write to stdout - Each to-be-merged components shall be connected with
+as seperator between their arguments, as+itself is an argument cmddoes not expand globs, every part much be typed as arguments explicitly (even if it expands, that won’t be of much use anyway due to the+argument needed in between)
Note that above command could also be written as:
copy /B part1+part2+part3 merged
But I’d recommend against this pattern as it makes weak seperation between files to be mergd.
Using PowerShell < 6.0
The Get-Content is PowerShell’s built-in, to merge multiple files, use a command like:
Get-Content -Encoding Byte -ReadCount 0x100000 part* | Set-Content -Encoding Byte merged
in which:
-Encodingdeclares how to decode/encode files,Bytehere means do not decode/encode bytes in file and keep them as-is-ReadCountdeclares how many words to read at once, forByteencoding this means how many bytes to read at once, the default is1and that’s insane for byte-reading. I’ve set it to0x100000which means 1MiB for each read
Note that the above command is very slow, slower than cmd, as bytes need to be parsed into native objects into PowerShell, this runs a few MiBs above 10 MiB/s, be patient.
Using PowerShell >= 6.0
PowerShell 6.0 brings -AsByteStream argument to Get-Content built-in, so the above command could be rewritten as:
Get-Content -AsByteStream -ReadCount 0x100000 part* | Set-Content -AsByteStream merged
This is more efficient than the < 6.0 version and should run faster, but still much much slower than cmd
Note that PowerShell >= 6.0 also drops -Encoding Byte for Get-Content and Set-Content, it’s not possible to compare the performance between these versions.