Converting Windows And Unix Text Files
Ever had the need to work on a text file (e.g. *.txt) under both *nix (Linux/Unix) and Windows? Ever edit a text file that looked perfectly fine in Windows but turns out crap with a whole lot of CTRL-M (^M) symbols at the end of each line? Ever save a good text file in Unix and have find that all the text have been scrunched into a single line in Windows? I’ve experienced all the above when I first started out with Unix. The key, I suppose is understanding why the heck it is happening and the ways around it.
Text file formats differ slightly between the Unix and Windows Operating System (OS). Under Windows, a line ends with a line feed (LF) followed by a carriage return (CR) ASCII characters. On Unix, the end of line is represented with a single line feed (LF). The effect is that your text file gets mangled in translation. While the characters are still perfectly valid, the text editors (e.g. vi for Unix and notepad for Windows) interprets the combination differently. There are many ways to work around this difference. Here’s a list of the most handy workarounds to know.
There are two ways of sharing files between Unix and Windows clients. The first is to backup the file to some form of removable media (e.g. floppy disk/CD/portable drive) and physically mount the removable media on the target client. The alternative is to transfer the files via network using an File Transfer Protocol (FTP) client.
Transferring Files Via FTP
This is probably the most convenient means of file sharing when two clients are connected to the same network. It is important to note that there are two transfer modes, ASCII and Binary. Which mode to use depends on the type of file that you’re trying to transfer. Normally, binary is the way to go, especially when transferring executable files, compressed files, formatted documents, and basically anything else that you would like to receive on the other end ‘as is’ without modifications. However, this is not the case for text files. When working with text files via FTP, it is best to use the ASCII mode of transfer. This mode will automatically detect and make necessary changes to the text document depending on the target operating system. Case in point our differing new line representation. Remember this and you’ll save yourself alot of grief modifying the files to look the way it should.
And The Other Thing…
With FTP, the change is automatic if you used ASCII as your transfer mode. But what if you accidently tranferred the file via Binary transfer or you’re some form of removable media? The good news is that you’re not screwed. There are ways to salvage the files and make them readable again.
Unix –> Windows (Wordpad)
This one’s pretty standard. Accessing the text file with Notepad, Windows’ default text editor, will give you one solid block of text minus the new line characters. In such a situation, Wordpad, a barebones word precessor which comes standard with any Windows install, is your friend.
- Open the file in Wordpad.
- Select File » Save as from the menu.
- Change the “Save as type” to Text Document.
- A prompt will appear warning that “You are about to save the document in a Text-Only format, which will remove all formatting. Are you sure you want to do this?“. Say yes.
Congratulations, you have a clean text file again. No hassle, no pain.
Windows –> Unix
With Unix, you have a host of options available to you. These range from command line utilities to vi text editors, awk scripts to perl scripts. Take your pick. The bad news is that they don’t always work for every ‘Nix system. The following are the two most consistent options that I usually use. If one doesn’t hack it, you can always try the next.
- vi editor
Since vi is the defacto text editor on a ‘Nix system, there’s little worry about support across different ‘Nix systems. If you’re using ‘Nix, chances are you’ll have the standard vi editor installed. And since viewing or editing the file is when you actually find out that something’s wrong, it’s also the best time to fix it.- Edit the file using vi.
- Key in the following command to find and replace all instances of ^M.
:%s/^M//gTo get the ‘^M’ character, press ‘CTRL+v’ followed by ‘Enter’ or ‘Return’.
- Save the file.
- awk script one liners
The beauty of using awk is that the conversion happens from the command line without the need for editing using the following one liners:
awk ‘{ sub(”\r$”, “”); print }’ input-file > output-fileWhere input-file is the scrambled text file, and output-file is the cleaned text file. Similarly, you can also convert Unix files to Windows files using the following awk one liner:
awk ’sub(”$”, “\r”)’ input-file > output-file
Well, that’s about it. Good luck with those pesky conversion nuances. Oh by the way, Mac OS to Windows and Mac OS to ‘Nix text files are two whole different ball games.
Related posts:
Shell Script Nuggets: File Date Manipulation And Checking
FRREEee….ddDDOOM! (IE7, Firefox2.0, Bleach101)
Shell Script Nuggets: Mathematic Computations
Tips: Migrating Workstations
Blue Screen Of Death

felixleong Said,
July 27, 2006 @ 6:50 pm
Isn’t there the unix2dos and dos2unix commands in Linux as well?
gbyeow Said,
July 27, 2006 @ 11:29 pm
Yes. But that’s specific to SUN Solaris and certain flavours of Linux (and possibly only if you installed it in the first place). It doesn’t exist for HP or DECALPHA. Unless of course the servers here are lying to me. I’m just listing the common ones that will work on all platforms. No sense putting up a horde of options that no one can use.
Simonsays Said,
July 28, 2006 @ 10:48 am
Very, very good article. Not enough developers know about the line terminator issue. I hope many of them will flock to this page :)
Two more points:
1. Modern free *nix GUI desktop environments provide great tools like KWrite and gEdit that can handle the cross platform conversion with no sweat, assuming that you’re not batch processing a bunch of files from another universe. Of course, not all server environments allow the hefty installation of GNOME/KDE/Fluxbox, but if you have a Knoppix LiveCD and a compatible workstation/notebook (that the LiveCD drivers support) nearby, it helps (just pinjam 1 from the marketing fellas/accounts dept). Also, the de facto text editor I try to use on Win32 platforms for reading *nix text is WordPad. It handles all the line feeds and carriage returns way better than Notepad.
2. There’s a larger iceberg under the surface though, in regards to cross-platform text conversion, way past the line terminator issue, and it’s called ‘internationalization’. We developers should really be attempting to go on the utf-8 bandwagon and leave 7-bit ASCII (now called ANSI) behind. The reason is that the world is a much smaller place, virtually, and maybe one day someone from another language may need to read our documentation/vice-versa, and his/her system settings are set to his/her default language. utf-8 would be a no-brainer in that situation. BTW Unicode embeds ANSI. The IETF recommended a utf-8 standard for FTP (google RFC2640) so if we have a fairly recent FTP client who plays by the rules, the encoding can be preserved. I’m not sure about utf-8 support in various *nix shells though.
Heck, I gotta practice what I preach! utf-8, here we come!
Simonsays Said,
July 28, 2006 @ 10:59 am
Shoot. I didn’t read the Wordpad part of your article. Sorry. :P stoooopid moi…
Anyway, there is a “Save As Type: Unicode Text Document” option in WordPad’s “File>Save As…” option. There’s also a “Save in this format by default checkbox” which sounds good for me, I guess :)
tcc Said,
August 1, 2006 @ 9:34 am
I’ve read almost every post on the web about the line termination issue, but I haven’t found the answer to the following variation:
I am putting together text files in Windows (I happen to be outputting using VB), and these files need to be transferred to a Unix box. I need to make them Unix-ready BEFORE I send them (they’re being FTP’ed in a TAR file in binary mode along with some other things).
So how do I rip out all of the Ctrl-M/Chr(13)’s before I send the text file? I thought about just writing out the file as a block of text with Chr(10) inserted where I want a line feed in Unix, but I’m concerned that I’ll still end up with a Ctrl-M at the end resulting when VB closes the file.
gbyeow Said,
August 1, 2006 @ 1:18 pm
Hi ttc,
When it comes to text files, most languages adhere to a WYWIWYG approach. What You Write Is What You Get. VB similarly follows the same rule and will not append any new characters to the end of the file when it closes it. The only exception to this rule is when you a built in function which automatically appends a newline character when it is done. The newline character that is used is OS dependant.
Doing something of this form will give you the result you desire:
Dim fs As New FileStream("c:\somefile.txt", FileMode.Append, FileAccess.Write, FileShare.Write)fs.Close()
Dim sw As New StreamWriter("c:\somefile.txt", True, Encoding.ASCII)
Dim somestring As String
somestring = "Line 1." & vbLf
sw.Write(somestring)
somestring = "Line 2" & vbLf
sw.Write(somestring)
sw.Close()
The text file will come out with the linefeed character at the end of each line. VB does not automatically add the ^M character when it closes the file (unless as I mentioned earlier, you’re using one of those built in functions that does so). Transporting it to Unix will give you the correct readable text format.
Hope this helps and good luck.
Sooth Said,
August 7, 2006 @ 3:44 pm
Windows uses linefeeds and carriage returns while unix just uses line feeds.
If you have perl, you can do something like this
W->U
perl -pe ’s/\r//g’ outfile.txt
U->W
perl -pe ’s/\n/\n\r/g’ outfile.txt
Simonsays Said,
August 9, 2006 @ 10:24 am
Using VBScript, I’ve typed this in a .vbs file and double-clicked it. I got a sometext_test.txt that I FTP’d using a Cygwin Midnight Commander installation to a RH9 Linux installation. I opened the file in gEdit and vim and the file opened perfectly.
Dim text_test
Set text_test = CreateObject(”Scripting.FileSystemObject”)
Set text_test_file = text_test.CreateTextFile(”sometext_test.txt”,True)
text_test_file.WriteLine(”This is the first line of text.”)
text_test_file.WriteLine(”This is the second line of text.”)
text_test_file.Close
gbyeow Said,
August 9, 2006 @ 1:21 pm
I’m guessing you FTPed it using ASCII and without tarring it first. It won’t look so pretty if you did prior to beaming it up.
Boredworkers.com » FRREEee….ddDDOOM! (IE7, Firefox2.0, Bleach101) Said,
November 3, 2006 @ 6:07 pm
[…] After some issues with Windows/Unix linefeed characters inside one of our demo archives, all the source files are finally checked in to McCabe TRUEchange and I’m free from the overbearing deadline. Or am I? […]
Rajesh Said,
November 13, 2007 @ 4:17 pm
Very useful information. Thanks a lot.
G: Most welcome.
leorick Said,
November 25, 2008 @ 2:45 pm
Or using Notepad++, under menu “Format” > “Convert to Windows/Unix/MAC”
dc Said,
May 27, 2009 @ 12:15 am
I’m trying to write a script that will convert a linux html file into windows format i.e. with (cr)(lf) at the end of each line rather than just a (lf).
I have been trying to use the below script as posted above:
U->W
perl -pe ’s/\n/\n\r/g’ outfile.txt
For a start I think it should be:
perl -pe ’s/\n/\r\n/g’ outfile.txt
But that still doesn’t work.
Why is this so difficult? It seems like a trivial task!!!!
G: It works for me…
perl -pe ’s/\n/\r\n/g’ inputfile.txt > outputfile.txt
Doing this on a unix system. What are you using?
brrman Said,
January 19, 2011 @ 4:29 am
Thank you for the Notepad++ tip. Get app and worked for my Windows -> Unix conversion.