Sunday 29 June 2014

Behavior of built-in functions in c/c++

GCC normally generates special code to handle certain built-in functions more efficiently; for instance, calls to "alloca" may become single instructions that adjust the stack directly, and calls to "memcpy" may become inline copy loops. The resulting code is often both smaller and faster, but since the function calls no longer appear as such, you cannot set a breakpoint on those calls, nor can you change the behavior of the functions by linking with a different library.
--- GCC doc
As per the above comment, GCC compiler internally links with different function body for same function call
Ex:

Ex:1
#include<stdio.h>

int main()
{
        int a=10;
        printf("Value of a:%d\n",a);
}
If you look into the object file for the above mentioned code, it contains two function signatures those are
00000000 T main
         U printf
 

Now consider the below mentioned code
Ex:2
#include<stdio.h>

int main()
{
        printf("Hello World!");
}

the object file for the above mentioned code will contains two function signatures those are
00000000 T main
         U puts

Note that, both example mentioned above use a function call printf, but the object code shows two different function signatures, those are printf and puts.
The above examples show that compiler use printf function internally when more than one parameter is passed in printf function, but same printf function is replaced by puts when it is called with one parameter.

GCC compiler also provide an option to disable this behavior if required, for example

If Ex:2 is compiled with -fno-builtin option then the object file will contain the signatures:
00000000 T main
         U printf

In this way compiler will not consider any built-in function for optimization, but what if only one function requires built-in function optimization.
Let see how to do this....
  • Define one macro with the same function name
  • prepend __builtin_ keyword with the function 
Example:

#include<stdio.h>
#define printf __builtin_printf
int main()
{
        int a=10;
        printf("Hello Wolrd!\n");
        printf("Value of a:%d\n",a);
}
This way compiler will always use builtin function optimization even if code is compiled with -fno-builtin.


Thursday 5 June 2014

Few points to keep in mind while using printf function

printf function allows c/c++ programmer to write formatted string to the standard output (stdout).
Though c++ introduced a better technique to write the output data in different output channel by using stream classes like ostream, iostream, fstream etc... but still printf is preferred in some places by many of the programmers, because it provides an easy interface (depends programmer to programmer) to write formatted output but along with this it has some problem (if code is not compiled with -Wformat or -Wall) which is described below.

The declaration of printf function is int printf(const char* format,...);
where format takes the const string/format-specifiers. if format contains any format specifier then printf function expect the corresponding data to be printed.

As per the above mentioned printf declaration, first parameter of printf must be const char* type and then rest can be anything. That means at the compile time compiler will only check for the first parameter and it doesn't bother about other parameters. Now consider the below example which may create problem at runtime.

* Problem when you forgot to pass the value for specified format specifier

int a=10,b=20;

printf("value of a = %d, value of  b = %d",a); //here value of b is not passed but compiler doesn't bother for that.

Output:
value of a = 10, value of b = -1074196968

here value of b is expected as 20 but it is printing a garbage value because b was not passed and compiler also does not throw error for this mistake.

* Problem when you use wrong format specifier

int a=16450,b=1651;

printf("value of a = %c, value of  b = %d",a,b); //here value of a is formatted by %c format specifier


Output:
value of a = B, value of b=1651

here printf function is taking the lower 1 byte of a, that is  66 and printing the ascii char B.

* Problem when you use data type of more than 4 bytes and try to print with wrong format specifier


long long int x=0x4100000042;

int a=10,b=20;

printf("value of x=%d, value of a = %d, value of  b = %d",x,a,b);


Output:
value of x=66, value of a = 65, value of b = 10

value of x is printed with %d format specifier where it is a long long int type so it is taking the lower 4 byte and printing but the expected value of a and b was 10 and 20 respectively and it was printing 65 and 10....
where is the problem ?

All the data passed to the printf funtion will be pushed into a stack, and it will be processed by using va_list, now suppose a long long int value which is 8 byte is passed to the printf funtion but format specifier is given %d that is to print integer value (a 4 byte data). So printf will take 4 byte value from stack and print.
In the above example value will be pushed as

MSB <<   [0x00000014 | 0x0000000a | 0x00000041 | 0x00000042]  <<   LSB
                <------ b ----> <-----a -----> <-------------- x --------------->


  • while printing "x=%d" , it will take first 4 bytes (from LSB) because format specifier is "%d", first 4 bytes data is 0x00000042 = 66. So it is printing "value of x = 66".
  •  while printing "a = %d", it will take next 4 bytes (from LSB) because format specifier is "%d", next 4 bytes data is 0x00000041 = 65, so it is printing "value of a = 65"
  • while printing "b = %d", it will take next 4 bytes (from LSB) because format specifier is "%d", next 4 bytes data is 0x0000000a = 10, so it is printing "value of b = 10".

Since no other format specifiers is present now so it will stop printing the data, but you can see one data is still there in memory which is not yet printed, so if one more format specifier is given then the remaining data also should be printed, like:

long long int x=0x4100000042;

int a=10,b=20;

printf("value of x=%d, value of a = %d, value of  b = %d, extra data = %d",x,a,b);


Output:
value of x=66, value of a = 65, value of b = 10, extra data = 20
NOTE: In this example there is no corresponding data for "extra data = %d".

Another example:
(few compiler will throw error for this example)

#include<cstdio>

class C
{
private:
        int x,y;
public:
        C(int a, int b):x(a),y(b)
        {
        }
};

int main()
{
        C obj(4,7);
        printf("x : %d, y : %d\n",obj); // note two format specifier but only one variable is passed.
}

Output:
x : 4, y : 7
NOTE: Even x and y are  private member of C class, printf is printing the value, because here x and y is not accessed by using obj  like obj.x and obj.y thats why compiler is allowing the data of x and y to be passed to printf function.