C for Java Programmers

Barbara Staudt Lerner
February 2001

Differences among Features Common to C and Java

Low-level Syntax

At the level of statements and declarations, C and Java are quite similar. The syntax that you have learned in Java will carry over (mostly, anyway) to C. C includes some constructs missing in Java, for which you will need to learn both the syntax and semantics.

There is no boolean data type in C. Programmers often include statements like the following to mimic the boolean datatype:

#define TRUE 1
#define FALSE 0
typedef int bool;

In C, the data type char is an 8 bit value capable of representing an ASCII character. An instance of the char type can also be treated as an 8 bit integer. Therefore, you can do arithmetic on variables declared as char.

In C, you can declare any of the numeric data types to be unsigned. This results in the lower bound for the type being 0. For example, int normally ranges from -231 to 231. An unsigned int ranges from 0 to 232.

In C, = is the assignment operator just as in Java. In both languages, = can also be used as an expression that returns the value being assigned. This allows the following convenient way of initializing two variables to the same value:

    a = b = 1;

1 is assigned to b. The assignment expression returns the value assigned and assigns this to a.

C allows the condition controlling an if-statement or while-statement to be an integer expression. If the integer expression evaluates to 0, this is treated the same as the boolean value false. A non-zero value is treated the same as true. This is bad programming style and should be avoided. It would be better to say:

    if (someInt != 0)

then to just say

    if (someInt)

although they are semantically equivalent.

Combining these last two paragraphs shows one of the most common syntactic errors in C programs:

    int a = 0, b = 1;
    if (a = b) {
      ...
    }

Assume that the programmer intended to say a == b as the condition, which is almost certainly the case. The programmer therefore intended that the body of the if-statement would be executed only if a and b had the same value. In Java, the code above would give a compilation error, but it does not in C since there is no boolean datatype. 1 is treated as true and the body of the if-statement is executed, which is not what the programmer intended. Be on the lookout for this simple error in your code!

In C, you must declare the size of an array when you declare the array. Memory is allocated for the array when the array is declared:

    int intArray[10];

Beware! C does not check array bounds like Java does. If you pass in a negative number for an array bound or an array bound that is greater than the size of the array, C will happily access some (seemingly random) piece of memory. If this appears on the left side of an assignment statement, it will happily change some (seemingly random) piece of memory. Always check array bounds yourself if you are not absolutely certain that the value is in the correct range!!!

Input and Output

To do output in C, you use the function printf. printf takes a variable number of arguments. The first argument is a string. The string may have embedded within it zero or more control sequences. For each control sequence there must be an additional parameter to printf defining the value to use for that control sequence. The most common control sequences are the following:

%c

A character.

%s

String

%d

An integer

The output is written to the file instead of to standard output. Here is an example use of printf:

    char month [4];
    int year;
    strcpy (month, "Sep");  // Assigns a value to a string variable.
    year = 1998;
    printf ("The month is %s.  The year is %d.\n", month, year);

scanf is the input function from C. Its format is similar to printf. The first argument is a string often exclusively consisting of control sequences and whitespace. For each control sequence, there must be an argument that is the address of a variable of the appropriate type. The memory must be allocated already. sscanf is a variant of scanf with an additional first argument representing a string to parse instead of reading from standard input. sscanf is typically used to convert a string to an integer. For example, suppose s contains "1998",

    sscanf(s, "%d", &year);

will set year equal to the integer 1998.

There is also an sprintf function that places the formatted output in a string.

Features in C but not in Java

Constants and Macros

To declare a constant in C, you use #define (not final) as in:

    #define MAX_SIZE 10

#define is actually a much more powerful macro mechanism. Everywhere that the defined name appears within the scope, it is replaced by the definition, which could be a complex expression. This is often done to make a statement look like a function call but without having the runtime overhead of making a procedure call. For example, here is a macro defining minimum:

    #define min(a,b)  (((a) < (b)) ? (a) : (b))

Later in the code, the programmer can say:

    min (i, 4)

and it is expanded by the preprocessor to the compiler to:

    (((i) < (4)) ? (i) : (4))

Since this is done by the preprocessor, the expression is inlined rather than being executed as a function call.

 

Compiler Directives

#define is one example of a compiler preprocessor command. This is a command that is executed by a preprocessor that scans the code prior to compilation. The preprocessor is run automatically when you run the compiler. Two other common directives in C are #ifdef and #ifndef. #ifdef takes a variable name for its condition. If that variable name is defined, it evaluates to true and its body is included in the source code that is compiled. #ifndef is similar but includes its body if the variable is not defined. Both may have #else clauses. They both end with the delimiter #endif.

    #ifdef HOST_SPARC
    #include <sys/time.h>
    #endif

This is how C programmers typically port programs between architectures. Architecture-dependent code is placed inside #ifdef statements. When the code is compiled, the appropriate variable is set for the architecture allowing the correct code to be compiled in. Unlike normal if-statements, these if-statements are evaluated at compilation time. The branch that is true at compilation time is compiled into the program. Branches that are false are not compiled in. The condition is not tested at runtime.

Life Outside of a Class

C does not have classes. Data types, variables, and functions are all be declared outside of classes. These are referred to by simply using their names. There is no . syntax required to dereference them. For example, if you want to implement a stack, you would define push and pop functions that take a stack as a parameter. So assume you have created a typedef for Stack, you would write:

void push (Stack s, int i) {...}
int pop (Stack s) {...};

Instead of calling a method on an object, as in Java, you call a function and pass the "object" as one of the parameters:

Stack myStack;
push (myStack 0);

The main program for a C program is called main, but it is declared externally to any class. Its signature is:

void main (int argc, char **argv);

The first parameter is the number of command-line arguments, including the program's name. The second parameter is an array of strings, each string containing one command-line argument. argv[0] is the program's name.

Struct Types

Data type declarations outside of classes are encapsulated inside a struct:

struct Date {
  char *month;
  int date;
  int year;
};
   
struct Date someDate;

Typically, when declaring a type one gives the type a name. Oddly enough, creates a type named struct Date. To give it a simple type name, a slightly different syntax is required:

typedef struct {
  char *month;
  int date;
  int year;
} Date;
   
Date someDate;

Union Types

A union type is a type that allows a particular piece of memory to store a value of different types at different types (a primitive precursor to subtyping). A union declaration looks a lot like a struct declaration:

union String_or_int {
  char *someString;
  int someInt;
};

The union itself does not keep track of which type is in it, so a union is typically used inside a struct where a second field of the struct remembers the type currently in the union field:

struct S_or_i {
  bool containsInt;
  union String_or_int x;
};

Pointers

In Java, all references to objects are pointers to objects. All references to primitive types, like int are values. In C, using a type name always means that the variable will have a value of that type. It is possible to introduce pointers to values explicitly and also to create types whose values are pointers to other values. Suppose we have a Date type, here is how we would declare a variable that is to contain a pointer to a date and also a type to represent a pointer to a date:

    Date *someDate;         // Variable containing a pointer to a Date

    typedef Date *DatePtr;  // Type defining a pointer to a Date
    DatePtr date2;	    // Variable containing a pointer to a Date

    date2 = someDate;

someDate and date2 both contain pointers to dates. The assignment statement results in both variables pointing to the same memory location and therefore sharing the same value as happens in Java.

Contrast the above with the following similar code that does not use pointers:

    Date someDate;
    Date date2;

    date2 = someDate;

Assuming that Date is simply a struct type, not a pointer type, the assignment statement above copies the value from someDate to date2. If the value referenced in either variable is changed, it has no effect on the other value. In Java, you would need to explicitly clone the value to have this effect. Unless you know the definition of the type involved in an assignment, you cannot tell whether the assignment results in value-sharing or value-copying.

A pointer is dereferenced using the -> syntax:

    some_pointer->some_field

To get a pointer to an object, you use the & operator:

    int *IntPtr;
    int anInt;

    anInt = 1;
    intPtr = &anInt;

To get the value pointed to by a pointer, you use the * operator:

    int *intPtr;
    int anInt, int2;

    anInt = 1;
    intPtr = &anInt;
    int2 = *anInt;

In C, all parameters are passed by value. If you want to be able to change the value of a parameter as a side effect, you must declare the parameter type to be a pointer and you must pass in the address of the variable that you want to change:

    void increment (int * anInt) {
        (*anInt)++;
    }

    int i = 0;
    increment (&i);

Memory Management

Java is a garbage-collected language. C is not. In Java, memory is allocated for an object when that object is constructed. The memory is deallocated when there are no more references to that object.

In C, objects (and structs) can either be automatic or manually allocated. Variables whose types are not pointers (such as classes or structs) are automatically allocated and deallocated. They are allocated when they are declared and deallocated at the end of the block in which they are declared. You do not use new to allocate a variable whose type is a class.

With pointer types, the programmer must explicitly allocate and deallocate memory. You must allocate memory before assigning a value to the object. You should deallocate the memory when you believe there are no more references to that memory. To allocate memory you need to know how big the value is. You can find this out using the sizeof function:

    sizeof (some_type)

Notice that you give sizeof a type name, not a variable name.

The syntax for allocating memory is:

    some_struct = (some_struct_type *) malloc (sizeof (some_struct_type));

To free memory, you use the free function:

free (some_struct);

For every value allocated with malloc there should be a deallocation with free.

Similarity between Arrays and Pointers

Suppose you want to have an array variable, but you do not know how big the array should be. Since C requires you to declare the size of the array when you declare the array variable, you cannot declare it to be an array. Instead you must declare a pointer to the desired element type and later allocate the appropriate amount of memory yourself:

    int *intArray;
    intArray = (int *) calloc (10, sizeof (int));

Even though you declared the variable to be a pointer, you can still dereference it as an array!

There is no string data type in C. Strings are simply arrays of characters. Since we typically want to allow variable length strings, string variables are typically declared to be pointers to characters:

    char *someString;

All strings must have the special null character '\0' as their last character to identify the end of the string since there is no length recorded with the string. String constants are enclosed in "" and implicitly end in the null character. Since strings are pointers, assigning one string variable to another results in the two variables pointing to the same piece of memory. Changing one string changes the other. If you do not want this effect, you must use the strcpy function:

    char *month;	    // Declare the string
    month = malloc (4);     // Allocate memory for string ending in null.
    strcpy (month, "Sep");  // Assigns a value to a string variable.

Remember to free the string when it is no longer being used.

One bizarre-looking thing you might see in C code is:

    char **stringArray;

This is a pointer to a pointer to a character. Keeping in mind that pointer declarations often mean variably-sized arrays, this syntax represents a variably-sized array of strings (since char * means string).


Last modified by Barbara Lerner on February 28, 2001