Text Files and Binary Files

A text file contains only textual information like alphabets, digits and special symbols. In actuality the ASCII codes of these characters are stored in text files. A good example of a text file is any C program, say textfile1.txt.
 
As against this, a binary file is merely a collection of bytes. This collection might be a compiled version of a C program (say textfile1.exe), or music data stored in a wave file or a picture stored in a graphic file. A very easy way to find out whether a file is a text file or a binary file is to open that file in Turbo C/C++. If on opening the file you can make out what is displayed then it is a text file, otherwise it is a binary file.
 
As mentioned while explaining the file-copy program, the program cannot copy binary files successfully. We can improve the same program to make it capable of copying text as well as binary files as shown below.

#include "stdio.h"
int main() 
{
	FILE *fs, *ft;
	int ch;
	fs = fopen("pr1.exe", "rb");
	if (fs == NULL) 
	{
		puts("Cannot open source file");
		exit(0);
	}
	ft = fopen("newpr1.exe", "wb");
	if (ft == NULL) 
	{
		puts("Cannot open target file");
		fclose(fs);
		exit(0);
	}
	while (1) 
	{
		ch = fgetc(fs);
		if (ch == EOF)
			break;
		else
			fputc(ch, ft);
	}
	fclose(fs);
	fclose(ft);

	getchar();
	return 0;
}

Using this program we can comfortably copy text as well as binary files. Note that here we have opened the source and target files in “rb” and “wb” modes respectively. While opening the file in text mode we can use either “r” or “rt”, but since text mode is the default mode we usually drop the ‘t’.
 
From the programming angle there are three main areas where text and binary mode files are different. These are:
(a) Handling of newlines
(b) Representation of end of file
(c) Storage of numbers
 
Let us explore these three differences.

Text versus Binary Mode: Newlines

We have already seen that, in text mode, a newline character is converted into the carriage return-linefeed combination before being written to the disk. Likewise, the carriage return-linefeed combination on the disk is converted back into a newline when the file is read by a C program. However, if a file is opened in binary mode, as opposed to text mode, these conversions will not take place.
 

Text versus Binary Mode: End of File

The second difference between text and binary modes is in the way the end-of-file is detected. In text mode, a special character, whose ASCII value is 26, is inserted after the last character in the file to mark the end of file. If this character is detected at any point in the file, the read function would return the EOF signal to the program.
 
As against this, there is no such special character present in the binary mode files to mark the end of file. The binary mode files keep track of the end of file from the number of characters present in the directory entry of the file.
 
There is a moral to be derived from the end of file marker of text mode files. If a file stores numbers in binary mode, it is important that binary mode only be used for reading the numbers back, since one of the numbers we store might well be the number 26 (hexadecimal 1A). If this number is detected while we are reading the file by opening it in text mode, reading would be terminated prematurely at that point.
Thus the two modes are not compatible. See to it that the file that has been written in text mode is read back only in text mode. Similarly, the file that has been written in binary mode must be read back only in binary mode.
 

Text versus Binary Mode: Storage of Numbers

The only function that is available for storing numbers in a disk file is the fprintf( ) function. It is important to understand how numerical data is stored on the disk by fprintf( ). Text and characters are stored one character per byte, as we would expect. Are numbers stored as they are in memory, two bytes for an integer, four bytes for a float, and so on? No.
 
Numbers are stored as strings of characters. Thus, 1234, even though it occupies two bytes in memory, when transferred to the disk using fprintf( ), would occupy four bytes, one byte per character. Similarly, the floating-point number 1234.56 would occupy 7 bytes on disk. Thus, numbers with more digits would require more disk space.
 
Hence if large amount of numerical data is to be stored in a disk file, using text mode may turn out to be inefficient. The solution is to open the file in binary mode and use those functions (fread( ) and fwrite( ) which are discussed later) which store the numbers in binary format. It means each number would occupy same number of bytes on disk as it occupies in memory.

Share

You may also like...

No © Loop and Break