| Robert Clair Published by Addison-Wesley Professional ISBN-10: 0-321-71138-6 ISBN-13: 978-0-321-71138-0 |
Objective-C is an extension of C. Most of this book concentrates on what Objective-C adds to C, but in order to program in Objective-C, you have to know the basics of C. When you do such mundane things as add two numbers together, put a comment in your code, or use an if statement, you do them the identical way in both C and Objective-C. The non-object part of Objective-C isn't similar to C, or C-like, it is C. Objective-C 2.0 is currently based on the C99 standard for C.
This chapter begins a two-chapter review of C. The review isn't a complete description of C; it covers only the basic parts of the language. Topics like bit operators, the details of type conversion, Unicode characters, macros with arguments, and other arcana are not mentioned. It is intended as an aide-memoire for those whose knowledge of C is rusty, or as a quick reference for those who are adept at picking up a new language from context. The following chapter continues the review of C, and treats the topics of declaring variables, variable scope, and where in memory C puts variables. If you are an expert C/C++ programmer, you can probably skip this chapter. (However, a review never hurts. I learned some things in the course of writing the chapter.) If you are coming to Objective-C from a different C-like language, such as Java or C#, you should probably at least skim the material. If your only programming experience is with a scripting language, or if you are a complete beginner, you will probably find it helpful to read a book on C in parallel with this book.
Note - I recommend that everyone read Chapter 2, "More About C Variables." In my experience, many who should be familiar with the material it contains are not familiar with that material.
There are many books on C. The original Kernighan and Ritchie book, The C Programming Language, is still one of the best.1 It is the book most people use to learn C. For a language lawyer's view of C, or to explore some of the darker corners of the language, consult C: A Reference Manual by Harbison and Steele.2
Think for a moment about how you might go about learning a new natural language. The first thing to do is look at how the language is written: Which alphabet does it use (if it uses an alphabet at all; some languages use pictographs)? Does it read left to right, right to left, or top to bottom? Then you start learning some words. You need at least a small vocabulary to get started. As you build your vocabulary, you can start making the words into phrases, and then start combining your phrases into complete sentences. Finally, you can combine your sentences into complete paragraphs.
This review of C follows roughly the same progression. The first section looks at the structure of a C program, how C code is formatted, and the rules and conventions for naming various entities. The following sections cover variables and operators, which are roughly analogous to nouns and verbs in a natural language, and how they are combined into longer expressions and statements. The last major section covers control statements. Control statements allow a program to do more interesting things than execute statements in a linear sequence. The final section of the review covers the C preprocessor, which allows you to do some programmatic editing of source files before they are sent to the compiler, and the printf function, which is used for character output.
The Structure of a C Program
This chapter begins by looking at some structural aspects of a C program: the main routine, formatting issues, comments, names and naming conventions, and file types.
main Routine
All C programs have a main routine. After the OS loads a C program, the program begins executing with the first line of code in the main routine. The standard form of the main routine is as follows:
int main(int argc, const char * argv[])
{
return 0;
}
The key features are:
- The leading int on the first line indicates that main returns an integer value to the OS as a return code.
- The name main is required.
- The rest of the first line refers to command line arguments passed to the program from the OS. main receives argc number of arguments, stored as strings in the array argv. This part isn't important for the moment; just ignore it.
- All the executable code goes between a pair of curly brackets.
- The return 0; line indicates that a zero is passed back to the OS as a return code. In Unix systems (including Mac OS X and iOS), a return code of zero indicates "no error" and any other value means an error of some sort.
If you are not interested in processing command line arguments or returning an error code to the OS (for example, when doing the exercises in the next several chapters), you can use a simplified form of main:
int main( void )
{
}
The void indicates that this version of main takes no arguments. In the absence of an explicit return statement a return value of zero is implied.
Formatting
Statements are terminated by a semicolon. A whitespace character (blank, tab, or newline) is required to separate names and keywords. C ignores any additional whitespace: indenting and extra spaces have no effect on the compiled executable; they may be used freely to make your code more readable. A statement can extend over multiple lines; the following three statements are equivalent:
distance = rate*time;
distance = rate * time;
distance =
rate *
time;
Comments
Comments are notations for the programmer's edification. The compiler ignores them.
C supports two styles of comments:
- Anything following two forward slashes (//) and before the end of the line is a comment. For example:
Anything enclosed between /* and */ is also a comment:
This type of comment may extend over multiple lines. For example:
It can be used to temporarily "comment out" blocks of code during the development process.
This style of comment cannot be nested:
*/
However, the following is legal:
Variable and Function Names
Variable and function names in C consist of letters, numbers, and the underscore character ( _ ):
- The first character must be an underscore or a letter.
- Names are case sensitive: bandersnatch and Bandersnatch are different names.
- There cannot be any whitespace in the middle of a name.
Here are some legal names:
j
taxesForYear2010
bananas_per_bunch
bananasPerBunch
These names are not legal:
2010YearTaxes
rock&roll
bananas per bunch
Naming Conventions
As a kindness to yourself and anyone else that might have to read your code, you should use descriptive names for variables and functions. bpb is easy to type, but it might leave you pondering when you return to it a year later; whereas bananas_per_bunch is self-explanatory.
Many plain C programs use the convention of separating the words in long variable and function names with underscores:
apples_per_basket
Objective-C programmers usually use CamelCase names for variables. CamelCase names use capital letters to mark the beginnings of subsequent words in a name:
applesPerBasket
Names beginning with an underscore are traditionally used for variables and functions that are meant to be private, or for internal use:
_privateVariable
_leaveMeAlone
However, this is a convention; C has no enforcement mechanism to keep variables or functions private.
Files
The code for a plain C program is placed in one or more files that have a .c filename extension:
ACProgram.c
Note - Mac OS X filenames are not case sensitive. The file system will remember the case you used to name a file, but it treats myfile.c, MYFILE.c, and MyFile.c as the same filename.
Code that uses the Objective-C objects (the material covered starting in Chapter 3, "An Introduction to Object-Oriented Programming") is placed in one or more files that have a .m filename extension:
AnObjectiveCProgram.m
Note - Because C is a proper subset of Objective-C, it's OK to put a plain C program in a .m file.
There are some naming conventions for files that define and implement Objective-C classes (discussed in Chapter 3), but C does not have any formal rules for the part of the name preceding the filename extension. It is silly, but not illegal, to name the file containing the code for an accounting program:
MyFlightToRio.m
C programs also use header files. Header files usually contain various definitions that are shared by many .c and .m files. Their contents are merged into other files by using a #include or #import preprocessor directive. (See Preprocessor, later in this chapter.) Header files have a .h filename extension as shown here:
AHeaderFile.h
Note - The topic is beyond the scope of this book, but it is possible to mix Objective-C and C++ code in the same program. The result is called Objective-C++. Objective-C++ code must be placed in a file with a .mm filename extension:
AnObjectiveCPlusPlusProgram.mm
For more information, see Using C++ With Objective-C.
Variables
A variable is a name for some bytes of memory in a program. When you assign a value to a variable, what you are really doing is storing that value in those bytes. Variables in a computer language are like the nouns in a natural language. They represent items or quantities in the problem space of your program.
C requires that you tell the compiler about any variables that you are going to use by declaring them. A variable declaration has the form:
variabletype name;
C allows multiple variables in a single declaration:
variabletype name1, name2, name3;
A variable declaration causes the compiler to reserve storage (memory) for that variable. The value of a variable is the contents of its memory location. The next chapter describes variable declarations in more detail. It covers where variable declarations are placed, where the variables are created in memory, and the lifetimes of different classes of variables.
Integer Types
C provides the following types to hold integers: char, short, int, long, and long long. Table 1.1 shows the size in bytes of the integer types on 32- and 64-bit Mac OS X executables. (32-bit and 64-bit executables are discussed in Appendix C, "32-Bit and 64-Bit.")
The char type is named char because it was originally intended to hold characters; but it is frequently used as an 8-bit integer type.
Table 1.1 The Sizes of Integer Types
Type | 32-bit | 64-bit |
char
| 1 byte
| 1 byte
|
short
| 2 bytes
| 2 bytes
|
int
| 4 bytes
| 4 bytes
|
long
| 4 bytes
| 8 bytes
|
long long
| 8 bytes
| 8 bytes
|
An integer type can be declared to be unsigned:
unsigned char a;
unsigned short b;
unsigned int c;
unsigned long d;
unsigned long long e;
When used alone, unsigned is taken to mean unsigned int:
unsigned a;
An unsigned variable's bit pattern is always interpreted as a positive number. If you assign a negative quantity to an unsigned variable, the result is a very large positive number. This is almost always a mistake.
Floating-Point Types
C's floating-point types are float, double, and long double. The sizes of the floating-point types are the same in both 32- and 64-bit executables:
float aFloat;
double aDouble;
long double aLongDouble;
Floating-point values are always signed.
Truth Values
Ordinary expressions are commonly used for truth values. Expressions that evaluate to zero are considered false, and expressions that evaluate to non-zero are considered true (see the following sidebar).
_Bool, bool, and BOOL - Early versions of C did not have a defined Boolean type. Ordinary expressions were (and still are) used for Boolean values (truth values). As noted in the text, an expression that evaluates to zero is considered false and one that evaluates to non-zero is considered true. A majority of C code is still written this way.
C99, the current standard for C, introduced a _Bool type. _Bool is an integer type with only two allowed values, 0 and 1. Assigning any non-zero value to a _Bool results in 1:
_Bool b = 35;
If you include the file stdbool.h in your source code files, you can use bool as an alias for _Bool and the Boolean constants true and false. (true and false are just defined as 1 and 0, respectively.)
#include <stdbool.h>
bool b = true;
You will rarely see either _Bool or bool in Objective-C code, because Objective-C defines its own Boolean type, BOOL. BOOL is covered in Chapter 3.
Initialization
Variables can be initialized when they are declared:
int a = 9;
int b = 2*4;
float c = 3.14159;
char d = 'a';
A character enclosed in single quotes is a character constant. It is numerically equal to the encoding value of the character. Here, the variable d has the numeric value of 97, which is the ASCII value of the character a.
Pointers
A pointer is a variable whose value is a memory address. It "points" to a location in memory.
You declare a variable to be a pointer by preceding the variable name with an * in the declaration. The following code declares pointerVar to be a variable pointing to a location in memory that holds an integer:
int *pointerVar;
The unary & operator ("address of" operator) is used to get the address of a variable so it can be stored in a pointer variable. The following code sets the value of the pointer variable b to be the address of the integer variable a:
1 int a = 9;
2
3 int *b;
4
5 b = &a;
Now let's take a look at that example line by line:
- Line 1 declares a to be an int variable. The compiler reserves four bytes of storage for a and initializes them with a value of 9.
- Line 3 declares b to be a pointer to an int.
- Line 5 uses the & operator to get the address of a and then assigns a's address as the value of b.
Figure 1.1 illustrates the process. (Assume that the compiler has located a beginning at memory address 1048880.) The arrow in the figure shows the concept of pointing.
Figure 1.1 Pointer variables
The unary * operator (called the "contents of" or "dereferencing" operator) is used to set or retrieve the contents of a memory location by using a pointer variable that points to that location. One way to think of this is to consider the expression *pointerVar to be an alias, another name, for whatever memory location is stored in the contents of pointerVar. The expression *pointerVar can be used to either set or retrieve the contents of that memory location. In the following code, b is set to the address of a, so *b becomes an alias for a:
int a;
int c;
int *b;
a = 9;
b = &a;
c = *b;
*b = 10;
Pointers are used in C to reference dynamically allocated memory (Chapter 2). Pointers are also used to avoid copying large chunks of memory, such as arrays and structures (discussed later in this chapter), from one part of a program to another. For example, instead of passing a large structure to a function, you pass the function a pointer to the structure. The function then uses the pointer to access the structure. As you will see later in the book, Objective-C objects are always referenced by pointer.
Generic Pointers
A variable declared as a pointer to void is a generic pointer:
void *genericPointer;
A generic pointer may be set to the address of any variable type:
int a = 9;
void *genericPointer;
genericPointer = &a;
However, trying to obtain a value from a generic pointer is an error because the compiler has no way of knowing how to interpret the bytes at the address indicated by the generic pointer:
int a = 9;
int b;
void *genericPointer;
genericPointer = &a;
b = *genericPointer;
To obtain a value through a void* pointer, you must cast it to a pointer to a known type:
int a = 9;
int b;
void *genericPointer;
genericPointer = &a;
b = *((int*) genericPointer) ;
The cast operator (int*) forces the compiler to consider genericPointer to be a pointer to an integer. (See Conversion and Casting, later in the chapter.)
C does not check to see that a pointer variable points to a valid area of memory. Incorrect use of pointers has probably caused more crashes than any other aspect of C programming.
Arrays
C arrays are declared by adding the number of elements in the array, enclosed in square brackets ([]), to the declaration, after the type and array name:
int a[100];
Individual elements of the array are accessed by placing the index of the element in [] after the array name:
a[6] = 9;
The index is zero-based. In the previous example, the legitimate indices run from 0 to 99. Access to C arrays is not bounds checked on either end. C will blithely let you do the following:
int a[100];
a[200] = 25;
a[-100] = 30;
Using an index outside of the array's bounds lets you trash memory belonging to other variables resulting in either crashes or corrupted data. Taking advantage of the lack of checking is one of the pillars of mischievous malware.
The bracket notation is just a nicer syntax for pointer arithmetic. The name of an array, without the array brackets, is a pointer variable pointing to the beginning of the array. These two lines are completely equivalent:
a[6] = 9;
*(a + 6) = 9;
When compiling expression using pointer arithmetic, the compiler takes into account the size of the type the pointer is pointing to. If a is an array of int, the expression *(a + 2) refers to the contents of the four bytes (one int worth) of memory at an address eight bytes (two int) beyond the beginning of the array a. However, if a is an array of char, the expression *(a + 2) refers to the contents of one byte (one char worth) of memory at an address two bytes (two char) beyond the beginning of the array a.
Multidimensional Arrays
Multidimensional arrays are declared as follows:
int b[4][10];
Multidimensional arrays are stored linearly in memory by rows. Here, b[0][0] is the first element, b[0][1] is the second element, and b[1][0] is the eleventh element.
Using pointer notation:
b[i][j]
may be written as:
*(b + i*10 + j)
Strings
A C string is a one-dimensional array of bytes (type char) terminated by a zero byte. A constant C string is coded by placing the characters of the string between double quotes (""):
"A constant string"
When the compiler creates a constant string in memory, it automatically adds the zero byte at the end. But if you declare an array of char that will be used to hold a string, you must remember to include the zero byte when deciding how much space you need. The following line of code copies the five characters of the constant string "Hello" and its terminating zero byte to the array aString:
char aString[6] = "Hello";
As with any other array, arrays representing strings are not bounds checked. Overrunning string buffers used for program input is a favorite trick of hackers.
A variable of type char* can be initialized with a constant string. You can set such a variable to point at a different string, but you can't use it to modify a constant string:
char *aString = "Hello";
aString = "World";
aString[4] = 'q';
The first line points aString at the constant string "Hello". The second line changes aString to point at the constant string "World". The third line causes a crash, because constant strings are stored in a region of protected read-only memory.
Structures
A structure groups a collection of related variables so they may be referred to as a single entity. The following is an example of a structure declaration:
struct dailyTemperatures
{
float high;
float low;
int year;
int dayOfYear;
};
The individual variables in a structure are called member variables or just members for short. The name following the keyword struct is a structure tag. A structure tag identifies the structure. It can be used to declare variables typed to the structure:
struct dailyTemperatures today;
struct dailyTemperatures *todayPtr;
In the preceding example, today is a dailyTemperatures structure, whereas todayPtr is a pointer to a dailyTemperatures structure.
The dot operator (.) is used to access individual members of a structure from a structure variable. The pointer operator (->) is used to access structure members from a variable that is a pointer to a structure:
todayPtr = &today;
today.high = 68.0;
todayPtr->high = 68.0;
The last two statements do the same thing.
Structures can have other structures as members. The previous example could have been written like this:
struct hiLow
{
float high;
float low;
};
struct dailyTemperatures
{
struct hiLow tempExtremes;
int year;
int dayOfYear;
};
Setting the high temperature for today would then look like this:
struct dailyTemperatures today;
today.tempExtremes.high = 68.0;
Note - The compiler is free to insert padding into a structure to force structure members to be aligned on a particular boundary in memory. You shouldn't try to access structure members by calculating their offset from the beginning of the structure or do anything else that depends on the structure's binary layout.
typedef
The typedef declaration provides a means for creating aliases for variable types:
typedef float Temperature;
Temperature can now be used to declare variables, just as if it were one of the built in types:
Temperature high, low;
typedefs just provide alternate names variable for types. Here, high and low are still floats. The term typedef is often used as a verb when talking about C code, as in "Temperature is typedef'd to float."
Enumeration Constants
An enum statement lets you define a set of integer constants:
enum woodwind { oboe, flute, clarinet, bassoon };
The result of the previous statement is that oboe, flute, clarinet, and bassoon are constants with values of 0, 1, 2, and 3, respectively.
If you don't like going in order from zero, you can assign the values of the constant yourself. Any constant without an assignment has a value one higher than the previous constant:
enum woodwind { oboe=100, flute=150, clarinet, bassoon=200 };
The preceding statement makes oboe, flute, clarinet, and bassoon are now 100, 150, 151, and 200, respectively.
The name after the keyword enum is called an enumeration tag. Enumeration tags are optional. Enumeration tags can be used to declare variables:
enum woodwind soloist;
soloist = oboe;
Enumerations are useful for defining multiple constants, and for helping to make your code self-documenting, but they aren't distinct types and they don't receive much support from the compiler. The declaration enum woodwind soloist; shows your intent that soloist should be restricted to one of oboe, flute, clarinet, or bassoon, but unfortunately, the compiler does nothing to enforce the restriction. The compiler considers soloist to be an int and it lets you assign any integer value to soloist without generating a warning:
enum woodwind { oboe, flute, clarinet, bassoon };
enum woodwind soloist;
soloist = 5280;
Note - Enumeration constants occupy the same name space as variable names. You can't have a variable and enumeration constant with the same name.
Operators
Operators are like verbs. They cause things to happen to your variables.
Arithmetic Operators
C has the usual binary operators +, -, *, / for addition, subtraction, multiplication, and division, respectively.
Note - If both operands to the division operator (/) are integer types, C does integer division. Integer division truncates the result of doing the division. The value of 7 / 3 is 2.
Remainder Operator
The remainder or modulus operator (%) calculates the remainder from an integer division. The result of the following expression is 1:
int a = 7;
int b = 3;
int c = a%b;
Both operands of the remainder operator must be integer types.
Increment and Decrement Operators
C provides operators for incrementing and decrementing variables:
a++;
++a;
Both lines add 1 to the value of a. However, there is a difference between the two expressions if they are used as a part of a larger expression. The prefix version, ++a, increments the value of a before any other evaluation takes place. It is the incremented value that is used in the rest of the expression. The postfix version, a++, increments the value of a after other evaluations take place. The original value is used in the rest of the expression. This is illustrated by the following example:
int a = 9;
int b;
b = a++;
int c = 9;
int d;
d = ++c;
The postfix version of the operator increments the variable after its initial value has been used in the rest of the expression After the code has executed in this example, the value of b is 9 and the value of a is 10. The prefix version of the operator increments the variables value before it is used in the rest of the expression. In the example, the values of both c and d are 10.
The decrement operators a-- and --a behave in a similar manner.
Code that depends on the difference between the prefix and postfix versions of the operator is likely to be confusing to anyone but its creator.
Precedence
Is the following expression equal to 18 or 22:
2 * 7 + 4
The answer seems ambiguous because it depends on whether you do the addition first or the multiplication first. C resolves the ambiguity by making a rule that it does multiplication and division before it does addition and subtraction; so the value of the expression is 18. The technical way of saying this is that multiplication and division have higher precedence than addition and subtraction.
If you need to do the addition first, you can specify that by using parentheses:
2 * (7 + 4)
The compiler will respect your request and arrange to do the addition before the multiplication.
Note - C defines a complicated table of precedence for all its operators (see http://en.wikipedia.org/wiki/Order_of_operations). But specifying the exact order of evaluation that you want by using parentheses is much easier than trying to remember operator precedences.
Negation
The unary minus sign (-) changes an arithmetic value to its negative:
int a = 9;
int b;
b = -a;
Comparisons
C also provides operators for comparisons. The value of a comparison is a truth value. The following expressions evaluate to 1 if they are true and 0 if they are false:
a > b
a < b
a >= b
a <= b
a == b
a != b
Note - As with any language, testing for floating-point equality is risky because of rounding errors, and such a comparison is likely to give an incorrect result.
Logical Operators
The logical operators for AND and OR have the following form:
expression1 && expression2
expression1 || expression2
C uses short circuit evaluation. Expressions are evaluated from left to right, and evaluation stops as soon as the truth value for the entire expression can be deduced. If expression1
in an AND expression evaluates to false, the value of the entire expression is false and expression2
is not evaluated. Similarly, if expression1
in an OR expression evaluates to true, the entire expression is true and expression2
is not evaluated. Short circuit evaluation has interesting consequences if the second expression has any side effects. In the following example, if b is greater than or equal to a, the function CheckSomething()
is not called (if statements are covered later in this chapter):
if (b < a && CheckSomething())
{
...
}
Logical Negation
The unary exclamation point (!) is the logical negation operator. After the following line of code is executed, a has the value 0 if expression
is true (non-zero), and the value 1 if expression
is false (zero):
a = ! expression;
Assignment Operators
C provides the basic assignment operator:
a = b;
a is assigned the value of b. Of course, a must be something that is capable of being assigned to. Entities that you can assign to are called lvalues (because they can appear on the left side of the assignment operators). Here are some examples of lvalues:
float a;
float b[100]
float *c;
struct dailyTemperatures today;
struct dailyTemperatures *todayPtr;
c = &a;
todayPtr = &today;
a = 76;
b[0] = 76;
*c = 76;
today.high = 76;
todayPtr->high = 76;
Some things are not lvalues. You can't assign to an array name, the return value of a function, or any expression that does not refer to a memory location:
float a[100];
int x;
a = 76;
x*x = 76;
GetTodaysHigh() = 76;
Conversion and Casting
If the two sides of an assignment are of different variable types, the type of the right side is converted to the type of the left side. Conversions from shorter types to longer types or from integer types to floating-point types don't present a problem. Going the other way, from a longer type to a shorter type can cause loss of significant figures, truncation, or complete nonsense. For example:
int a = 14;
float b;
b = a;
float c = 12.5;
int d;
d = c;
char e = 128;
int f;
f = e;
int g = 333;
char h;
h = g;
You can force the compiler to convert the value of a variable to a different type by using a cast. In the last line of the following example, the (float) casts force the compiler to convert a and b to float and do a floating-point division:
int a = 6;
int b = 4;
float c, d;
c = a / b;
d = (float)a / (float)b;
You can cast pointers from pointer to one type to pointer to another. Casting pointers can be a risky operation with the potential to trash your memory, but it is the only way to dereference a pointer passed to you typed as void*. Successfully casting a pointer requires that you understand what type of entity the pointer is "really" pointing to.
Other Assignment Operators
C also has shorthand operators that combine arithmetic and assignment:
a += b;
a -= b;
a *= b;
a /= b;
These are equivalent to the following, respectively:
a = a + b;
a = a - b;
a = a * b;
a = a / b;
Expressions and Statements
Expressions and statements in C are the equivalent of phrases and sentences in a natural language.
Expressions
The simplest expressions are just single constants or variables:
14
bananasPerBunch
Every expression has a value. The value of an expression that is a constant is just the constant itself: the value of 14 is 14. The value of a variable expression is whatever value the variable is holding: the value of bananasPerBunch is whatever value it was given when it was last set by initialization or assignment.
Expressions can be combined to create other expressions. The following are also expressions:
j + 14
a < b
distance = rate * time
The value of an arithmetic or logical expression is just whatever you would get by doing the arithmetic or logic. The value of an assignment expression is the value given to the variable that is the target of the assignment.
Function calls are also expressions:
SomeFunction()
The value of a function call expression is the return value of the function.
Evaluating Expressions
When the compiler encounters an expression, it creates binary code to evaluate the expression and find its value. For primitive expressions, there is nothing to do: Their values are just what they are. For more complicated expressions, the compiler generates binary code that performs the specified arithmetic, logic, function calls, and assignments.
Evaluating an expression can cause side effects. The most common side effects are the change in the value of a variable due to an assignment, or the execution of the code in a function due to a function call.
Expressions are used for their value in various control constructs to determine the flow of a program (see Program Flow). In other situations, expressions may be evaluated only for the side effects caused by evaluating them. Typically, the point of an assignment expression is that the assignment takes place. In a few situations, both the value and the side effect are important.
Statements
When you add a semicolon (;) to the end of an expression, it becomes a statement. This is similar to adding a period to a phrase to make a sentence in a natural language. A statement is the code equivalent of a complete thought. A statement is finished executing when all of the machine language instructions that result from the compilation of the statement have been executed, and all of the changes to any memory locations the statement affects have been completed.
Compound Statements
You can use a sequence of statements, enclosed by a pair of curly brackets, any place where you can use a single statement:
{
timeDelta = time2 — time1;
distanceDelta = distance2 — distance1;
averageSpeed = distanceDelta / timeDelta;
}
There is no semicolon after the closing bracket. A group like this is called a compound statement or a block. Compound statements are very commonly used with the control statements covered in the next sections of the chapter.
Note - The use of the word block as a synonym for compound statement is pervasive in the C literature and dates back to the beginnings of C. Unfortunately, Apple has adopted the name block for its addition of closures to C (see Chapter 16, "Blocks"). To avoid confusion, the rest of this book uses the slightly more awkward name compound statement.
Program Flow
The statements in a program are executed in sequence, except when directed to do otherwise by a for, while, do-while, if, switch, or goto statement or a function call.
- An if statement conditionally executes code depending on the truth value of an expression.
- The for, while, and do-while statements are used to form loops. In a loop, the same statement or group of statements is executed repeatedly until a condition is met.
- A switch statement chooses a set of statements to execute based on the arithmetic value of an integer expression.
- A goto statement is an unconditional jump to a labeled statement.
- A function call is a jump to the code in the function's body. When the function returns, the program executes from the point after the function call.
These control statements are covered in more detail in the following sections.
Note - As you read the next sections, remember that every place it says statement, you can use a compound statement.
if
An if statement conditionally executes code depending on the truth value of an expression. It has the following form:
if ( expression )
statement
If expression
evaluates to true (non-zero), statement
is executed; otherwise, execution continues with the next statement after the if statement. An if statement may be extended by adding an else section :
if ( expression )
statement1
else
statement2
If expression
evaluates to true (non-zero), statement1
is executed; otherwise, statement2
is executed.
An if statement may also be extended by adding else if sections, as shown here:
if ( expression1 )
statement1
else if ( expression2 )
statement2
else if ( expression3 )
statement3
...
else
statementN
The expressions are evaluated in sequence. When an expression evaluates to non-zero, the corresponding statement is executed and execution continues with the next statement following the if statement. If the expressions are all false, the statement following the else clause is executed. (As with a simple if statement, the else clause is optional and may be omitted.)
Conditional Expression
A conditional expression is made up of three sub-expressions and has the following form:
expression1 ? expression2 : expression3
When a conditional expression is evaluated, expression1 is evaluated for its truth value. If it is true, expression2
is evaluated and the value of the entire expression is the value of expression2
. expression3
is not evaluated.
If expression1
evaluates to false, expression3
is evaluated and the value of the conditional expression is the value of expression3
. expression2
is not evaluated.
A conditional expression is often used as a shorthand for a simple if statement. For example:
a = ( b > 0 ) ? c : d;
is equivalent to:
if ( b > 0 )
a = c;
else
a = d;
while
The while statement is used to form loops as follows:
while ( expression ) statement
When the while statement is executed, expression
is evaluated. If it evaluates to true, statement
is executed and the condition is evaluated again. This sequence is repeated until expression
evaluates to false, at which point execution continues with the next statement after the while.
You will occasionally see this construction:
while ( 1 )
{
...
}
The preceding is an infinite loop from the while's point of view. Presumably, something in the body of the loop checks for a condition and breaks out of the loop when that condition is met.
do-while
The do-while statement is similar to the while, with the difference that the test comes after the statement rather than before:
do statement while ( expression );
One consequence of this is that statement
is always executed once, independent of the value of expression
. Situations where the program logic dictates that a loop body be executed at least once, even if the condition is false, are uncommon. As a consequence, do-while statements are rarely encountered in practice.
for
The for statement is the most general looping construct. It has the following form:
for (expression1; expression2; expression3) statement
When a for statement is executed, the following sequence occurs:
expression1
is evaluated once before the loop begins.expression2
is evaluated for its truth value. - If
expression2
is true, statement
is executed; otherwise, the loop ends and execution continues with the next statement after the loop. expression3
is evaluated. - Steps 2, 3, and 4 are repeated until
expression2
becomes false.
expression1
and expression3
are evaluated only for their side effects. Their values are discarded. They are typically used to initialize and increment a loop counter variable:
int j;
for ( j=0; j < 10; j++ )
{
}
Any of the expressions may be omitted (the semicolons must remain). If expression2
is omitted, the loop is an infinite loop, similar to while( 1 ):
for ( i=0; ; i++ )
{
...
}
Note - When you use a loop to iterate over the elements of an array, remember that array indices go from zero to one less than the number of elements in the array:
int j;
int a[25];
for (j=0; j < 25; j++ )
{
}
Writing the for statement in the preceding example as for (j=1; j <= 25; j++) is a common mistake.
break
The break statement is used to break out of a loop or a switch statement.
int j;
for (j=0; j < 100; j++ )
{
...
if ( someConditionMet ) break;
}
Execution continues with the next statement after the enclosing while, do, for, or switch statement. When used inside nested loops, break only breaks out of the innermost loop. Coding a break statement that is not enclosed by a loop or a switch causes a compiler error:
error: break statement not within loop or switch
continue
continue is used inside a while, do, or for loop to abandon execution of the current loop iteration. For example:
int j;
for (j=0; j < 100; j++ )
{
...
if ( doneWithIteration ) continue;
...
}
When the continue statement is executed, control passes to the next iteration of the loop. In a while or do loop, the control expression is evaluated for the next iteration. In a for loop, the iteration (third) expression is evaluated and then the control (second) expression is evaluated. Coding a continue statement that is not enclosed by a loop causes a compiler error.
Comma Expression
A comma expression consists of two or more expressions separated by commas:
expression1, expression2, ..., expressionN
The expressions are evaluated in order from left to right and the value of the entire expression is the value of the right-most sub-expression.
The principal use of the comma operator is to initialize and update multiple loop variables in a for statement. As the loop in the following example iterates, j goes from 0 to MAX-1 and k goes from MAX-1 to 0:
for ( j=0, k=MAX-1; j < MAX; j++, k--)
{
}
When a comma expression is used in a for loop, only the side effects of evaluating the sub-expressions (initializing and incrementing or decrementing j and k in the preceding example) are important. The value of the comma expression is discarded.
switch
A switch branches to different statements based on the value of an integer expression. The form of a switch statement is shown here:
switch ( integer_expression )
{
case value1:
statement
break;
case value2:
statement
break;
...
default:
statement
break;
}
In a slight inconsistency with the rest of C, each case may have multiple statements without the requirement of a compound statement.
value1, value2, ... must be either integers, character constants, or constant expressions that evaluate to an integer. (In other words, they must be reducible to an integer at compile time.) Duplicate cases with the same integer are not allowed.
When a switch statement is executed, integer
_expression
is evaluated and the switch compares the result with the integer case labels. If a match is found, execution jumps to the statement after the matching case label. Execution continues in sequence until either a break statement or the end of the switch is encountered. A break causes the execution to jump out to the first statement following the switch.
A break statement is not required after a case. If it is omitted, execution falls through to the following case. If you see the break omitted in existing code, it can be either a mistake (it is an easy one to make) or intentional (if the coder wanted a case and the following case to execute the same code).
If integer_expression
doesn't match any of the case labels, execution jumps to the statement following the optional default: label, if one is present. If there is no match and no default:, the switch does nothing, and execution continues with the first statement after the switch.
goto
C provides a goto statement:
goto label;
When the goto is executed, control is unconditionally transferred to the statement marked with label:
label: statement
- Labels are not executable statements; they just mark a point in the code.
- The rules for naming labels are the same as the rules for naming variables and functions.
- Labels always end with a colon.
Using goto statements with abandon can lead to tangled, confusing code (often referred to as spaghetti code). The usual boilerplate advice is "Don't use goto statements." Despite this, goto statements are useful in certain situations, such as breaking out of nested loops (a break statement only breaks out of the innermost loop):
for ( i=0; i < MAX_I; i++ )
for ( j=0; j < MAX_J; j++ )
{
...
if ( finished ) goto moreStuff;
}
moreStuff: statement
Note - Whether to use goto statements is one of the longest running debates in computer science. For a summary of the debate, see http://david.tribble.com/text/goto.html.
Functions
Functions typically have the following form:
returnType functionName( arg1Type arg1, ..., argNType argN )
{
statements
}
An example of a simple function looks like this:
float salesTax( float purchasePrice, float taxRate )
{
float tax = purchasePrice * taxRate;
return tax;
}
A function is called by coding the function name followed by a parenthesized list of expressions, one for each of the function's arguments. Each expression type must match the type declared for the corresponding function argument. The following example shows a simple function call:
float carPrice = 20000.00;
float stateTaxRate = 0.05;
float carSalesTax = salesTax( carPrice, stateTaxRate );
When the line with the function call is executed, control jumps to the first statement in the function body. Execution continues until a return statement is encountered or the end of the function is reached. Execution then returns to the calling context. The value of the function expression in the calling context is the value set by the return statement.
Note - Functions are not required to have any arguments or to return a value. Functions that do not return a value are typed void:
void FunctionThatReturnsNothing( int arg1 )
You may omit the return statement from a function that does not return a value.
Functions that don't take any arguments are indicated by using empty parentheses for the argument list:
int FunctionWithNoArguments()
Functions are sometimes executed solely for their side effects. This function prints out the sales tax, but changes nothing in the program's state:
void printSalesTax ( float purchasePrice, float taxRate )
{
float tax = purchasePrice * taxRate;
printf( "The sales tax is: %f.2\n", tax );
}
C functions are call by value. When a function is called, the expressions in the argument list of the calling statement are evaluated and their values are passed to the function. A function cannot directly change the value of any of the variables in the calling context. This function has no effect on anything in the calling context:
void salesTax( float purchasePrice, float taxRate, float carSalesTax)
{
carSalesTax = purchasePrice * taxRate;
return;
}
To change the value of a variable in the calling context, you must pass a pointer to the variable and use that pointer to manipulate the variable's value:
void salesTax( float purchasePrice, float taxRate, float *carSalesTax)
{
*carSalesTax = purchasePrice * taxRate;
return;
}
Note - The preceding example is still call by value. The value of a pointer to a variable in the calling context is passed to the function. The function then uses that pointer (which it doesn't alter) to set the value of the variable it points to.
Declaring Functions
When you call a function, the compiler needs to know the types of the function's arguments and return value. It uses this information to set up the communication between the function and its caller. If the code for the function comes before the function call (in the source code file), you don't have to do anything else. If the function is coded after the function call or in a different file, you must declare the function before you use it.
A function declaration repeats the first line of the function, with a semicolon added at the end:
void printSalesTax ( float purchasePrice, float taxRate );
It is a common practice to put function declarations in a header file. The header file is then included (see the next section) in any file that uses the function.
Note - Forgetting to declare functions can lead to insidious errors. If you call a function that is coded in another file (or in the same file after the function call), and you don't declare the function, neither the compiler nor the linker will complain. But the function will receive garbage for any floating-point argument and return garbage if the function's return type is floating-point.
Preprocessor
When C (and Objective-C) code files are compiled, they are first sent through an initial program, called the preprocessor, before being sent to the compiler proper. Lines that begin with a # character are directives to the preprocessor. Using preprocessor directives you can:
- Import the text of a file into one or more other files at a specified point.
- Created defined constants.
- Conditionally compile code (compile or omit statement blocks depending on a condition).
Including Files
The following line:
#include "HeaderFile.h"
causes the preprocessor to insert the text of the file HeaderFile.h into the file being compiled at the point of the #include line. The effect is the same as if you had used a text editor to copy and paste the text from HeaderFile.h into the file being compiled.
If the included filename is enclosed in quotations (""):
#include "HeaderFile.h"
the preprocessor will look for HeaderFile.h in the same directory as the file being compiled, then in a list of locations that you can supply as arguments to the compiler, and finally in a series of system locations.
If the included file is enclosed in angle brackets (<>):
#include <HeaderFile.h>
the preprocessor will look for the included file only in the standard system locations.
Note - In Objective-C, #include is superseded by #import, which produces the same result, except that it prevents the named file from being imported more than once. If the preprocessor encounters further #import directives for the same header file, they are ignored.
#define
#define is used for textual replacement. The most common use of #define is to define constants, such as:
#define MAX_VOLUME 11
The preprocessor will replace every occurrence of MAX_VOLUME in the file being compiled with an 11. A #define can be continued on multiple lines by placing a backslash (\) at the end of all but the last line in the definition.
Note - If you do this, the \ must be the last thing on the line. Following the \ with something else (such as a comment beginning with "//") results in an error.
A frequently used pattern is to place the #define in a header file, which is then included by various source files. You can then change the value of the constant in all the source files by changing the single definition in the header file. The traditional C naming convention for defined constants is to use all capital letters. A traditional Apple naming convention is to begin the constant name with a k and CamelCase the rest of the name:
#define kMaximumVolume 11
You will encounter both styles, sometimes in the same code.
Conditional Compilation
The preprocessor allows for conditional compilation:
#if condition
statements
#else
otherStatements
#endif
Here, condition
must be a constant expression that can be evaluated for a truth value at compile time. If condition
evaluates to true (non-zero), statements
are compiled, but otherStatements
are not. If condition
is false, statements
are skipped and otherStatements
are compiled.
The #endif is required, but the #else and the alternative code are optional. A conditional compilation block can also begin with an #ifdef directive:
#ifdef name
statements
#endif
The behavior is the same as the previous example, except that the truth value of #ifdef is determined by whether name
has been #define'd.
One use of #if is to easily remove and replace blocks of code during debugging:
#if 1
statements
#endif
By changing the 1 to a 0, statements
can be temporarily left out for a test. They can then be replaced by changing the 0 back to a 1.
#if and #ifdef directives can be nested, as shown here:
#if 0
#if 1
statements
#endif
#endif
In the preceding example, the compiler ignores all the code, including the other compiler directives, between the #if 0 and its matching #endif. statements
are not compiled.
If you need to disable and re-enable multiple statement blocks, you can code each block like this:
#if _DEBUG
statements
#endif
The defined constant _DEBUG can be added or removed in a header file or by using a —D flag to the compiler.
printf
Input and output (I/O) are not a part of the C language. Character and binary I/O are handled by functions in the C standard I/O library.
Note - The standard I/O library is one of a set of libraries of functions that is provided with every C environment.
To use the functions in the standard I/O library, you must include the library's header file in your program:
#include <stdio.h>
The only function covered here is printf, which prints a formatted string to your terminal window (or to the Xcode console window if you are using Xcode). The printf function takes a variable number of arguments. The first argument to printf is a format string. Any remaining arguments are quantities that are printed out in a manner specified by the format string:
printf( formatString, argument1, argument2, ... argumentN );
The format string consists of ordinary characters and conversion specifiers:
- Ordinary characters (not %) in the control string are sent unchanged to the output.
- Conversion specifiers begin with a percent sign (%). The letter following the % indicates the type of argument the specifier expects.
- Each conversion specification consumes, in order, one of the arguments following the format string. The argument is converted to characters that represent the value of the argument and the characters are sent to the output.
The only conversion specifiers used in this book are %d, for char and int, %f for float and double, and %s for C strings. C strings are typed as char*.
Here is a simple example:
int myInt = 9;
float myFloat = 3.145926;
char* myString = "a C string";
printf( "This is an Integer: %d, a float: %f, and a string: %s.\n",
myInt, myFloat, myString );
Note - The \n is the newline character. It advances the output so that any subsequent output appears on the next line.
The result of the preceding example is:
This is an Integer: 9, a float: 3.145926, and a string: a C string.
If the number of arguments following the format string doesn't match the number of conversion specifications, printf ignores the excess arguments or prints garbage for the excess specifications.
Note - This book uses printf only for logging and debugging non-object variables, not for the output of a polished program, so this section presents only a cursory look at format strings and conversion specifiers.
printf handles a large number of types and it provides very fine control over the appearance of the output. A complete discussion of the available types of conversion specifications and how to control the details of formatting is available via the Unix man command. To see them, type the following at a terminal window:
man 3 printf
Note - The Foundation framework provides NSLog, another logging function. It is similar to printf, but it adds the capability to print out object variables. It also adds the program name, the date, and the time in hours, minutes, seconds, and milliseconds to the output. This additional information can be visually distracting if all you want to know is the value of a variable or two, so this book uses printf in some cases where NSLog's additional capability is not required. NSLog is covered in Chapter 3.
Using gcc and gdb
When you write programs for Mac OS X or iOS, you should write and compile your programs using Xcode, Apple's Integrated Development Environment (IDE). You'll learn how to set up a simple Xcode Project in Chapter 4, "Your First Objective-C Program." However, for the simple C programs required in the exercises in this chapter and the next chapter, you may find it easier to write the programs in your favorite text editor and then compile and run them from a command line, using gcc, the GNU compiler. To do this, you will need:
- A terminal window. You can use the Terminal app (/Applications/Terminal) that comes with Mac OS X. If you are coming from another Unix environment, and you are used to xterms, you may prefer to download and use iTerm, an OS X native terminal application that behaves similarly to an xterm. (http://iterm.sourceforge.net/).
- A text editor. Mac OS X comes with both vi and emacs, or you can use a different editor if you have it.
- Command line tools. These may not be installed on your system. To check, type which gcc at the command prompt. If the response is /usr/bin/gcc, you are all set. However, if there is no response or the response is gcc: Command not found., you will have to install the command line tools from your install disk or from a downloaded Xcode disk image. (You can find a link to the current version of the developer tools on the Mac Dev Center web page, http://developer.apple.com/mac/). Start the install procedure, and when you get to the Custom Install stage, make sure that the box UNIX Dev Support is checked, as shown in Figure 1.2. Continue with the installation.
Figure 1.2 Installing the command line tools
You are now ready to compile. If your source code file is named MyCProgram.c, you can compile it by typing the following at the command prompt:
gcc -o MyCProgram MyCProgram.c
The -o flag allows you to give the compiler a name for your final executable. If the compiler complains that you have made a mistake or two, go back and fix them, then try again. When your program compiles successfully, you can run it by typing the executable name at the command prompt:
MyCProgram
If you want to debug your program using gdb, the GNU debugger, you must use the -g flag when you compile:
gcc -g -o MyCProgram MyCProgram.c
The -g flag causes gcc to attach debugging information for gdb to the final executable. To use gdb to debug a program, type gdb followed by the executable name:
gdb MyCProgram
Documentation for gdb is available at the GNU website, http://www.gnu.org/software/gdb/ or from Apple at http://developer.apple.com/mac/library/documentation/DeveloperTools/gdb/gdb/gdb_toc.html. In addition, there are many websites with instructions for using gdb. Search for "gdb tutorial".
Summary
This chapter has been a review of the basic parts of the C language. The review continues in Chapter 2, which covers the memory layout of a C program, declaring variables, variable scope and lifetimes, and dynamic allocation of memory. Chapter 3 begins the real business of this book: object-oriented programming and the object part of Objective-C.
Exercises
- Write a function that returns the average of two floating-point numbers. Write a small program to test your function and log the output. Next, put the function in a separate source file but "forget" to declare the function in the file that has your main routine. What happens? Now add the function declaration to the file with your main program and verify that the declaration fixes the problem.
- Write another averaging function, but this time try to pass the result back in one of the functions arguments. Your function should be declared like this:
void average( float a, float b, float average )
Write a small test program and verify that your function doesn't work. You can't affect a variable in the calling context by setting the value of a function parameter.
Now change the function and its call to pass a pointer to a variable in the calling context. Verify that the function can use the pointer to modify a variable in the calling context.
- Assume that you have a function, int FlipCoin(), that randomly returns a 1 to represent heads or a 0 to represent tails. Explain how the following code fragment works:
int flipResult;
if ( flipResult = FlipCoin() )
printf("Heads is represented by %d\n", flipResult );
else
printf("Tails is represented by %d\n", flipResult );
As you will see in Chapter 6, "Classes and Objects," an if condition similar to the one in the preceding example is used in the course of initializing an Objective-C object.
- An identity matrix is a square array of numbers with ones on the diagonal (the elements where the row number equals the column number) and zero everywhere else. The 2x2 identity matrix looks like this:
Write a program that calculates and stores the 4x4 identity matrix. When your program is finished calculating the matrix, it should output the result as a nicely formatted square array.
- Fibonacci numbers (http://en.wikipedia.org/wiki/Fibonacci_number) are a numerical sequence that appears in many places in nature and in mathematics. The first two Fibonacci numbers are defined to be 0 and 1. The nth Fibonacci number is the sum of the previous two Fibonacci numbers:
F<sub>n</sub> = F<sub>n-1</sub> + F<sub>n-2</sub>
Write a program that calculates and stores the first 20 Fibonacci numbers. After calculating the numbers, your program should output them, one on a line, along with their index. The output lines should be something like:
Fibonacci Number 2 is: 1
Use a #define to control the number of Fibonacci numbers your program produces, so that it can be easily changed.
- Rewrite your program from the previous exercise to use a while loop instead of a for loop.
- What if you are asked to calculate the first 75 Fibonacci numbers? If you are using ints to store the numbers, there is a problem. You will find that the 47th Fibonacci number is too big to fit in an int. How can you fix this?
- Judging by the number of tip calculators available in the iPhone App Store, a substantial fragment of the population has forgotten how to multiply. Help out those who can't multiply but can't afford an iPhone. Write a program that calculates a 15% tip on all the checks between $10 and $50. (For brevity, go by $0.50 increments.) Show both the check and the tip.
- Now make the tip calculator look more professional. Add a column for 20% tips (Objective-C programmers eat in classy joints). Place the proper headers on each column and use a pair of nested loops so that you can output out a blank line after every $10 increment.
Using the conversion specification %.2f instead of %f will limit the check and tip output to two decimal places. Using %% in the format string will cause printf to output a single % character.
- Define a structure that holds a rectangle. Do this by defining a structure that holds the coordinates of a point and another structure that represents a size by holding a width and a height. Your rectangle structure should have a point that represents the lower-left corner of the rectangle and a size. (The Cocoa frameworks define structures like these, but make your own for now.)
- One of the basic tenets of efficient computer graphics is "Don't draw if you don't have to draw." Graphics programs commonly keep a bounding rectangle for each graphic object. When it is time to draw the graphic on the screen, the program compares the graphic's bounding rectangle with a rectangle representing the window. If there is no overlap between the rectangles, the program can skip trying to draw the graphic. Overall, this is usually a win; comparing rectangles is much cheaper than drawing graphics.
Write a function that takes two rectangle structure arguments. (Use the structures that you defined in the previous exercise.) Your function should return 1 if there is a non-zero overlap between the two rectangles, and 0 otherwise. Write a test program that creates some rectangles and verify that your function works.
Brian W. Kernighan and Dennis M. Ritchie, The C Programming Language, Second Edition. (Englewood Cliffs: Prentice Hall, 1988).
Samuel P. Harbison and Guy L. Steele, C: A Reference Manual, Fifth Edition. (Upper Saddle River: Prentice Hall, 2002).