C Character Set

Categories: C language

C Character Set

 

Backstory

 

Character is 1-byte information that denotes alphabets, digits, and some special characters like !, @, etc. So simple it seems, but it has a long history of varying standards like EBCDIC, ASCII, etc. Read on...

 

In the early days, there used to be an encoding system called Extended Binary-Coded Decimal Interchange Code(EBCDIC), developed by IBM. EBCDIC can support 256 different types of characters. A few important features of EBCDIC are:

 

  • Each character fits in 8 bits.
  • The same type of characters are not grouped together.
  • Different versions of EBCDIC are not compatible.

 

Slowly, ASCII encoding was developed in 1963 by American Standards Association (ASA). ASCII was simpler and accommodated fewer characters than EBCDIC. It has 128 characters and needs 7 bits to display a single character.

 

Another Conflict

Most computers were using 8-bit bytes and ASCII requires only 7 bits (i.e., 27 = 128 characters), We have one extra bit to spare. Soon, few organizations developed their own conventions for [128, 255] characters. IBM developed the OEM character set, which included peculiar characters like |, Ã, Æ etc. IBM changed these character sets, i.e., [128, 255] according to every country. For example, character code 130 displays é in Europe, and it shows  in Israel. If this appears as a small issue, wait until Asian languages come into the picture with thousands of characters! In these difficult times, slowly a standard was making its way...

 

Unicode Era

Unlike directly converting character code into binary, Unicode has a different perspective on characters. This allows Unicode to accommodate an unlimited number of characters (in different types of encodings). This article doesn't discuss the implementations of Unicode, but here are the key points to note:

 

  • Unicode is just a standard. UTF-8, UTF-16 etc... are actual encodings.
  • Popular Myth: UTF-8 requires 2 bytes (16 bit) to store a character, Thus at max 216 (65,536) characters can be represented. This is false. Some characters are stored in 1 byte. Some are stored in 2 bytes. Some even require 6 bytes!
  • Representing characters is not as simple as converting it into binary. Read more about UTF-8 encoding here
  • UTF-8 is a superset of ASCII, i.e., characters with ASCII code [0, 127] can be represented with the same character code.

 

Introduction of C Character Set

Majorly, there are two character sets in C language.

 

  • Source Character Set: This is the set of characters that can be used to write source code. Before preprocessing phase, the first step of C PreProcessor (CPP) is to convert the source code's encoding into Source Character Set (SCS). Eg: A, Tab, B, SPACE, \n, etc.

 

  • Execution Character Set: This is the set of characters that can be interpreted by the running program. After preprocessing phase, CPP converts character and string constant's encoding into Execution Character Set (ECS). Eg: A, B, \a, etc.

 

Basic Character Set

Source and Execution Character sets have few common characters. The set of common characters is called Basic Character Set. Let's discuss more about it below:

 

Alphabets: which includes both uppercase and lowercase characters. ASCII code of uppercase characters is in the range [65, 90] whereas ASCII code of lowercase characters is in the range [97, 122]. Eg: A, B, a, b etc.

  • Uppercase and lowercase characters differ by just one bit.
  • Utility Functions: isalpha, islower, isupper check whether the character is alphabet, lowercase alphabet, uppercase alphabet respectively. tolower, toupper transforms the alphabets to appropriate case.

 

Digits: Includes digits from 0 to 9 (inclusive). ASCII code of digits is in the range [48, 57]. Eg: 0, 1, 2 etc.

Utility functions: isdigit checks whether the input character is a digit. isalnum checks whether a character is an alphanumeric character.

Punctuation/Special Characters: The default C locale classifies the below characters as punctuation characters.

 

Utility functions: ispunct checks whether a character is punctuation character. Below table contains the list of all punctuation characters, ASCII code and their usecases.

Top Blogs
C Language Interview Question and Answers Published at:- Benefits of C language over other programming languages Published at:- History of C Language : Introduction to C Programming Language Published at:- How does C Programming Language Work Published at:- Importance of C Programming Language Published at:- C Character Set Published at:- Input and Output Functions in C Published at:- Introduction to Implementation of Queue using Linked List Published at:- Definition of C Language Published at:- History of C Language Published at:- Features of C Language Published at:- How to install C Language Published at:- Compilation process in c Published at:- printf() and scanf() in C Published at:- Variables in C Language Published at:- Types of Variables in C Language Published at:- Data Types in C Published at:- C Identifiers Published at:- C Operators in c Language Published at:- C Format Specifier in C Language Published at:- Escape Sequence in C Published at:- What is ASCII code? Published at:- Constants in C Published at:- Tokens in C language Published at:- Operators in C Language Published at:- C Boolean in C language Published at:- Boolean with Logical Operators in C language Published at:- Static in C Language Published at:- Difference in C language Term Published at:- Programming Errors in C Language Published at:- Compile time vs Runtime In C language Published at:- Differences Between Compile-Time and Runtime In C Language Published at:- Conditional Operator in C Language Published at:- Meaning of Bitwise Operator in C Language Published at:- What is the 2s complement in C Published at:- The C Language in the C Control Statement Published at:- C Switch Statement In C Language Published at:- Difference Between if-else and switch Published at:- C Loops of C Language Published at:- do while loop in C Published at:- while loop in C Language Published at:- Properties of while loop in C Language Published at:- for loop in C Language Published at:- Nested Loops in C Language Published at:- Nested Loops in C Language Published at:- C break statement in C Language Published at:- C continue statement in C Language Published at:- C goto statement in C Language Published at:- C Functions Published at:- Types of Functions in C Language Published at:- Call by value and Call by reference in C Language Published at:- Recursion in C Language Published at:- Recursive Function In C Language Published at:- Storage Classes in C Published at:- C Array in C Language Published at:- Two Dimensional Array in C Language Published at:- What is an Array in C Language Published at:- Passing Array to Function in C Published at:- C Pointers in C Language Published at:- C Double Pointer (Pointer to Pointer) Published at:- Pointer Arithmetic in C Language Published at:- C Double Pointer (Pointer to Pointer) in C Language Published at:- Pointer Arithmetic in C Language Published at:- Pointer to function in C Language Published at:- Dangling Pointers in C Language Published at:- sizeof() operator in C Language Published at:- const Pointer in C Language Published at:- Pointer to Constant In C Language Published at:- void pointer in C Language Published at:- Advantages of void pointer in C Language Published at:- C dereference pointer in C Language Published at:- What is a Null Pointer in C Language Published at:- C Function Pointer in C Language Published at:- Function pointer as argument in C Language Published at:- Dynamic memory allocation in C Language Published at:- C Strings In C Language Published at:- Traversing String in C Language Published at:- Accepting string as the input in C Language Published at:- Pointers with strings in C Language Published at:- C gets() and puts() functions in C Language Published at:- C String Functions in C Language Published at:- C Math in C Language Published at:- C Structure in C Language Published at:- What is Structure in C Language Published at:- typedef in C Language Published at:- C Array of Structures Published at:- Nested Structure in C Language Published at:- Types of Nested Structure in C Language Published at:- Passing structure to function in C Language Published at:- Structure Padding in C Language Published at:- Why structure padding in C Language Published at:- Changing order of the variables In C Language Published at:- Union in C Language Published at:- Deciding the size of the union in C Language Published at:- File Handling in C Language Published at:- C fprintf() and fscanf() in C Language Published at:- C fputc() and fgetc() in C Language Published at:- C fputs() and fgets() in C Language Published at:- C fseek() function in C Language Published at:- C rewind() function in Language Published at:- C ftell() function in C Language Published at:- C Preprocessor Directives in C Language Published at:- C Predefined Macros in C Language Published at:- C #include in C Language Published at:- C #define in C Language Published at:- C #undef in C Language Published at:- C #ifdef in C Language Published at:- C #ifndef in C Language Published at:- C #if in C Language Published at:- C #else in C Language Published at:- C #error in C Language Published at:- C #pragma in C Language Published at:- Command Line Arguments in C Language Published at:- C Expressions in C Language Published at:- Inception Of C Language Tutorial for Beginners Published at:- The C Compiler work in C language and its important Published at:- Program Structure with “Hello World” Example Published at:- Data Segments in C Language Published at:- Flow of C Program in C Language Published at:- What is a programming language in C Language Published at:- Differences between Machine-Level language and Assembly language Published at:- Differences between Low-Level language and High-Level language Published at:- Enum in C Language Published at:- What is getch() in C Language Published at:- What is the function call in C Language Published at:- Function Calling in C Language Published at:- Difference between typedef and define in C Published at:- Use of typedef keyword in Structure C Language Published at:- Program in C Language with Practical Published at:- Difference between the typedef and the #define in C Published at:-
R4R.co.in Team
The content on R4R is created by expert teams.