The range of a char variable is typically either -128 to 127 or 0 to 255, but this is not guaranteed and can vary depending on the specific programming language, compiler, and underlying system architecture. For portability, it is best to use signed char and unsigned char explicitly. The uncertainty surrounding the char type's range arises because its primary function is to store a single character, with its numerical interpretation being secondary and implementation-defined.
The different types of character variables
To fully understand the range of a char, it's important to distinguish between the three character types in languages like C and C++:
signed charunsigned charchar(plainchar)
All three types are guaranteed to be at least one byte in size. However, the key difference lies in how they handle the sign bit, which determines their numeric range.
The unsigned char type
An unsigned char uses all of its bits to represent non-negative integer values. On a system where one byte is 8 bits (the standard on virtually all modern systems), the range is calculated as follows:
-
**Total values:**28=2562 to the eighth power equals 256
28=256
-
Range: 0 to 28−12 to the eighth power minus 1
28−1
, which is 0 to 255.
This type is often used for representing raw byte-oriented data, such as image pixel values or network packets, where a negative value is not needed.
The signed char type
A signed char reserves one of its bits to indicate whether the value is positive or negative. The remaining bits are used for the magnitude. In a standard 8-bit, two's complement system, the range is:
-
**Total values:**28=2562 to the eighth power equals 256
28=256
-
Range: -128 to 127.
This is calculated by having an equal number of positive and negative values, with one extra negative value because zero is considered non-negative.
The ambiguity of the plain char type
The range of a plain char is determined by the compiler implementation. This was done for historical reasons, allowing the compiler to use whichever representation was most efficient on the target architecture.
- Signed
charbehavior: On many common architectures (like x86), achardefaults to being signed. In this case, its range is the same as asigned char: -128 to 127. - Unsigned
charbehavior: On other architectures (like ARM), acharmay default to being unsigned. Its range would then be the same as anunsigned char: 0 to 255.
This implementation-defined behavior is a major source of non-portability and subtle bugs if a developer makes assumptions about the range of a plain char.
Character representation and the character set
While the char type stores an integer, its primary purpose is to represent a character from a character set, such as ASCII. The integer value corresponds to a specific character's encoding.
- ASCII standard: The original ASCII standard defines 128 characters, with numerical values from 0 to 127. These are universally handled by both
signedandunsigned``chartypes. - Extended ASCII: The "Extended" character set uses values from 128 to 255. When dealing with these characters, the sign of a
charbecomes critical. Anunsigned charcan represent all 256 values, while asigned charwill interpret values above 127 as negative numbers.
Portability and safety considerations
Because of the implementation-defined nature of char, it is considered a best practice for writing portable code to always specify the signedness when performing arithmetic operations or when the numeric range is important.
- For numeric data: If you intend to use a single-byte integer, explicitly declare it as either
signed charorunsigned charto ensure consistent behavior across different systems. - For byte-level operations: When dealing with raw, binary data where the bit pattern is the priority and not a signed number,
unsigned charis the correct and safer choice. - For text data: When dealing with standard ASCII characters (0-127), a plain
charis generally safe. However, when working with Unicode or extended character sets, more advanced, multi-byte character types likewchar_tor language-specific string types should be used.
Conclusion: A fundamental but complex type
The char variable, while seemingly simple, highlights a fundamental complexity in low-level programming: the abstraction between a character and its underlying integer representation. Its range is not a single, fixed value, but rather one of two possibilities that are left to the implementation. By understanding the distinction between char, signed char, and unsigned char, developers can write robust, predictable, and portable code that avoids the pitfalls of implementation-defined behavior.