Wednesday, September 3, 2008

The Java IAQ: Infrequently Answered Questions by Peter Norvig Q: What is an Infrequently Answered Question? A question is infrequently answered either

The Java IAQ:
Infrequently Answered Questions

by Peter Norvig


Q: What is an Infrequently Answered Question?

A question is infrequently answered either because few people know the answer or because it is about an obscure, subtle point (but a point that may be crucial to you). I thought I had invented the term, but it also shows up at the very informative About.com Urban Legends site. There are lots of Java FAQs around, but this is the only Java IAQ. (There are a few Infrequently Asked Questions lists, including a satirical one on C.)

Q:The code in a finally clause will never fail to execute, right?

Well, hardly ever. But here's an example where the finally code will not execute, regardless of the value of the boolean choice:

  try {
if (choice) {
while (true) ;
} else {
System.exit(1);
}
} finally {
code.to.cleanup();
}


Q:Within a method m in a class C, isn't this.getClass() always C?

No. It's possible that for some object x that is an instance of some subclass C1 of C either there is no C1.m() method, or some method on x called super.m(). In either case, this.getClass() is C1, not C within the body of C.m(). If C is final, then you're ok.

Q: I defined an equals method, but Hashtable ignores it. Why?

equals methods are surprisingly hard to get right. Here are the places to look first for a problem:
  1. You defined the wrong equals method. For example, you wrote:

    public class C {
    public boolean equals(C that) { return id(this) == id(that); }
    }

    But in order for table.get(c) to work you need to make the equals method take an Object as the argument, not a C:

    public class C {
    public boolean equals(Object that) {
    return (that instanceof C) && id(this) == id((C)that);
    }
    }

    Why? The code for Hashtable.get looks something like this:

    public class Hashtable {
    public Object get(Object key) {
    Object entry;
    ...
    if (entry.equals(key)) ...
    }
    }

    Now the method invoked by entry.equals(key) depends upon the actual run-time type of the object referenced by entry, and the declared, compile-time type of the variable key. So when you as a user call table.get(new C(...)), this looks in class C for the equals method with argument of type Object. If you happen to have defined an equals method with argument of type C, that's irrelevent. It ignores that method, and looks for a method with signature equals(Object), eventually finding Object.equals(Object). If you want to over-ride a method, you need to match argument types exactly. In some cases, you may want to have two methods, so that you don't pay the overhead of casting when you know you have an object of the right class:

    public class C {
    public boolean equals(Object that) {
    return (this == that)
    || ((that instanceof C) && this.equals((C)that));
    }

    public boolean equals(C that) {
    return id(this) == id(that); // Or whatever is appropriate for class C
    }
    }

  2. You didn't properly implement equals as an equality predicate: equals must be symmetric, transitive, and reflexive. Symmetric means a.equals(b) must have the same value as b.equals(a). (This is the one most people mess up.) Transitive means that if a.equals(b) and b.equals(c) then a.equals(c) must be true. Reflexive means that a.equals(a) must be true, and is the reason for the (this == that) test above (it's also often good practice to include this because of efficiency reasons: testing for == is faster than looking at all the slots of an object, and to partially break the recursion problem on objects that might have circular pointer chains).
  3. You forgot the hashCode method. Anytime you define a equals method, you should also define a hashCode method. You must make sure that two equal objects have the same hashCode, and if you want better hashtable performance, you should try to make most non-equal objects have different hashCodes. Some classes cache the hash code in a private slot of an object, so that it need be computed only once. If that is the case then you will probably save time in equals if you include a line that says if (this.hashSlot != that.hashSlot) return false.
  4. You didn't handle inheritance properly. First of all, consider if two objects of different class can be equal. Before you say "NO! Of course not!" consider a class Rectangle with width and height fields, and a Box class, which has the above two fields plus depth. Is a Box with depth == 0 equal to the equivalent Rectangle? You might want to say yes. If you are dealing with a non-final class, then it is possible that your class might be subclassed, and you will want to be a good citizen with respect to your subclass. In particular, you will want to allow an extender of your class C to use your C.equals method using super as follows:

    public class C2 extends C {

    int newField = 0;

    public boolean equals(Object that) {
    if (this == that) return true;
    else if (!(that instanceof C2)) return false;
    else return this.newField == ((C2)that).newField && super.equals(that);
    }

    }

    To allow this to work, you have to be careful about how you treat classes in your definition of C.equals. For example, check for that instanceof C rather than that.getClass() == C.class. See the previous IAQ question to learn why. Use this.getClass() == that.getClass() if you are sure that two objects must be of the same class to be considered equals.

  5. You didn't handle circular references properly. Consider:

    public class LinkedList {

    Object contents;
    LinkedList next = null;

    public boolean equals(Object that) {
    return (this == that)
    || ((that instanceof LinkedList) && this.equals((LinkedList)that));
    }

    public boolean equals(LinkedList that) { // Buggy!
    return Util.equals(this.contents, that.contents) &&
    Util.equals(this.next, that.next);
    }

    }

    Here I have assumed there is a Util class with:

      public static boolean equals(Object x, Object y) {
    return (x == y) || (x != null && x.equals(y));
    }

    I wish this method were in Object; without it you always have to throw in tests against null. Anyway, the LinkedList.equals method will never return if asked to compare two LinkedLists with circular references in them (a pointer from one element of the linked list back to another element). See the description of the Common Lisp function list-length for an explanation of how to handle this problem in linear time with only two words of extra storge. (I don't give the answer here in case you want to try to figure it out for yourself first.)


Q: I tried to forward a method to super, but it occasionally doesn't work. Why?

This is the code in question, simplified for this example:

/** A version of Hashtable that lets you do
* table.put("dog", "canine");, and then have
* table.get("dogs") return "canine". **/

public class HashtableWithPlurals extends Hashtable {

/** Make the table map both key and key + "s" to value. **/
public Object put(Object key, Object value) {
super.put(key + "s", value);
return super.put(key, value);
}
}

You need to be careful when passing to super that you fully understand what the super method does. In this case, the contract for Hashtable.put is that it will record a mapping between the key and the value in the table. However, if the hashtable gets too full, then Hashtable.put will allocate a larger array for the table, copy all the old objects over, and then recursively re-call table.put(key, value). Now, because Java resolves methods based on the runtime type of the target, in our example this recursive call within the code for Hashtable will go to HashtableWithPlurals.put(key, value), and the net result is that occasionally (when the size of the table overflows at just the wrong time), you will get an entry for "dogss" as well as for "dogs" and "dog". Now, does it state anywhere in the documentation for put that doing this recursive call is a possibility? No. In cases like this, it sure helps to have source code access to the JDK.


Q: Why does my Properties object ignore the defaults when I do a get?

You shouldn't do a get on a Properties object; you should do a getProperty instead. Many people assume that the only difference is that getProperty has a declared return type of String, while get is declared to return an Object. But actually there is a bigger difference: getProperty looks at the defaults. get is inherited from Hashtable, and it ignores the default, thereby doing exactly what is documented in the Hashtable class, but probably not what you expect. Other methods that are inherited from Hashtable (like isEmpty and toString) will also ignore defaults. Example code:

Properties defaults = new Properties();
defaults.put("color", "black");

Properties props = new Properties(defaults);

System.out.println(props.get("color") + ", " +
props.getProperty(color));
// This prints "null, black"

Is this justified by the documentation? Maybe. The documentation in Hashtable talks about entries in the table, and the behavior of Properties is consistent if you assume that defauls are not entries in the table. If for some reason you thought defaults were entries (as you might be led to believe by the behavior of getProperty) then you will be confused.


Q:Inheritance seems error-prone. How can I guard against these errors?

The previous two questions show that a programmer neeeds to be very careful when extending a class, and sometimes just in using a class that extends another class. Problems like these two lead John Ousterhout to say "Implementation inheritance causes the same intertwining and brittleness that have been observed when goto statements are overused. As a result, OO systems often suffer from complexity and lack of reuse." (Scripting, IEEE Computer, March 1998) and Edsger Dijkstra to allegedly say "Object-oriented programming is an exceptionally bad idea which could only have originated in California." (from a collection of signature files). I don't think there's a general way to insure being safe, but there are a few things to be aware of:
  • Extending a class that you don't have source code for is always risky; the documentation may be incomplete in ways you can't foresee.
  • Calling super tends to make these unforeseen problems jump out.
  • You need to pay as much attention to the methods that you don't over-ride as the methods that you do. This is one of the big fallacies of Object-Oriented design using inheritance. It is true that inheritance lets you write less code. But you still have to think about the code you don't write.
  • You're especially looking for trouble if the subclass changes the contract of any of the methods, or of the class as a whole. It is difficult to tell when a contract is changed, since contracts are informal (there is a formal part in the type signature, but the rest appears only in comments). In the Properties example, it is not clear if a contract is being broken, because it is not clear if the defaults are to be considered "entries" in the table or not.

Q:What are some alternatives to inheritance?

Delegation is an alternative to inheritance. Delegation means that you include an instance of another class as an instance variable, and forward messages to the instance. It is often safer than inheritance because it forces you to think about each message you forward, because the instance is of a known class, rather than a new class, and because it doesn't force you to accept all the methods of the super class: you can provide only the methods that really make sense. On the other hand, it makes you write more code, and it is harder to re-use (because it is not a subclass).

For the HashtableWithPlurals example, delegation would give you this (note: as of JDK 1.2, Dictionary is considered obsolete; use Map instead):

/** A version of Hashtable that lets you do
* table.put("dog", "canine");, and then have
* table.get("dogs") return "canine". **/

public class HashtableWithPlurals extends Dictionary {

Hashtable table = new Hashtable();

/** Make the table map both key and key + "s" to value. **/
public Object put(Object key, Object value) {
table.put(key + "s", value);
return table.put(key, value);
}

... // Need to implement other methods as well
}

The Properties example, if you wanted to enforce the interpretation that default values are entries, would be better done with delegation. Why was it done with inheritance, then? Because the Java implementation team was rushed, and took the course that required writing less code.


Q: Why are there no global variables in Java?

Global variables are considered bad form for a variety of reasons:
  • Adding state variables breaks referential transparency (you no longer can understand a statement or expression on its own: you need to understand it in the context of the settings of the global variables).
  • State variables lessen the cohesion of a program: you need to know more to understand how something works. A major point of Object-Oriented programming is to break up global state into more easily understood collections of local state.
  • When you add one variable, you limit the use of your program to one instance. What you thought was global, someone else might think of as local: they may want to run two copies of your program at once.
For these reasons, Java decided to ban global variables.

Q: I still miss global variables. What can I do instead?

That depends on what you want to do. In each case, you need to decide two things: how many copies of this so-called global variable do I need? And where would be a convenient place to put it? Here are some common solutions:

If you really want only one copy per each time a user invokes Java by starting up a Java virtual machine, then you probably want a static instance variable. For example, you have a MainWindow class in your application, and you want to count the number of windows that the user has opened, and initiate the "Really quit?" dialog when the user has closed the last one. For that, you want:
// One variable per class (per JVM)
public Class MainWindow {
static int numWindows = 0;
...
// when opening: MainWindow.numWindows++;
// when closing: MainWindow.numWindows--;
}
In many cases, you really want a class instance variable. For example, suppose you wrote a web browser and wanted to have the history list as a global variable. In Java, it would make more sense to have the history list be an instance variable in the Browser class. Then a user could run two copies of the browser at once, in the same JVM, without having them step on each other.
// One variable per instance
public class Browser {
HistoryList history = new HistoryList();
...
// Make entries in this.history
}
Now suppose that you have completed the design and most of the implementation of your browser, and you discover that, deep down in the details of, say, the Cookie class, inside the Http class, you want to display an error message. But you don't know where to display the message. You could easily add an instance variable to the Browser class to hold the display stream or frame, but you haven't passed the current instance of the browser down into the methods in the Cookie class. You don't want to change the signatures of many methods to pass the browser along. You can't use a static variable, because there might be multiple browsers running. However, if you can guarantee that there will be only one browser running per thread (even if each browser may have multiple threads) then there is a good solution: store a table of thread-to-browser mappings as a static variable in the Browser class, and look up the right browser (and hence display) to use via the current thread:
// One "variable" per thread
public class Browser {
static Hashtable browsers = new Hashtable();
public Browser() { // Constructor
browsers.put(Thread.currentThread(), this);
}
...
public void reportError(String message) {
Thread t = Thread.currentThread();
((Browser)Browser.browsers.get(t))
.show(message)
}
}
Finally, suppose you want the value of a global variable to persist between invocations of the JVM, or to be shared among multiple JVMs in a network of machines. Then you probably should use a database which you access through JDBC, or you should serialize data and write it to a file.


Q: Can I write sin(x) instead of Math.sin(x)?

Short answer: Before Java 1.5, no. As of Java 1.5, yes, using static imports; you can now write import static java.lang.Math.* and then use sin(x) with impunity. But note the warning from Sun: "So when should you use static import? Very sparingly!"

Here are some of the options that could be used before Java 1.5:

If you only want a few methods, you can put in calls to them within your own class:
public static double sin(double x) { return Math.sin(x); }
public static double cos(double x) { return Math.cos(x); }
...
sin(x)
Static methods take a target (thing to the left of the dot) that is either a class name, or is an object whose value is ignored, but must be declared to be of the right class. So you could save three characters per call by doing:
// Can't instantiate Math, so it must be null.
Math m = null;
...
m.sin(x)
java.lang.Math is a final class, so you can't inherit from it, but if you have your own set of static methods that you would like to share among many of your own classes, then you can package them up and inherit them:
public abstract class MyStaticMethods {
public static double mysin(double x) { ... }
}

public class MyClass1 extends MyStaticMethods {
...
mysin(x)
}

Peter van der Linden, author of Just Java, recommends against both of the last two practices in his FAQ. I agree with him that Math m = null is a bad idea in most cases, but I'm not convinced that the MyStaticMethods demonstrates "very poor OOP style to use inheritance to obtain a trivial name abbreviation (rather than to express a type hierarchy)." First of all, trivial is in the eye of the beholder; the abbreviation may be substantial. (See an example of how I used this approach to what I thought was good effect.) Second, it is rather presumptuous to say that this is very bad OOP style. You could make a case that it is bad Java style, but in languages with multiple inheritance, this idiom would be more acceptable.

Another way of looking at it is that features of Java (and any language) necessarily involve trade-offs, and conflate many issues. I agree it is bad to use inheritance in such a way that you mislead the user into thinking that MyClass1 is inheriting behavior from MyStaticMethods, and it is bad to prohibit MyClass1 from extending whatever other class it really wants to extend. But in Java the class is also the unit of encapsulation, compilation (mostly), and name scope. The MyStaticMethod approach scores negative points on the type hierarchy front, but positive points on the name scope front. If you say that the type hierarchy view is more important, I won't argue with you. But I will argue if you think of a class as doing only one thing, rather than many things at once, and if you think of style guides as absolute rather than as trade-offs.


Q: Is null an Object?

Absolutely not. By that, I mean (null instanceof Object) is false. Some other things you should know about null:
  1. You can't call a method on null: x.m() is an error when x is null and m is a non-static method. (When m is a static method it is fine, because it is the class of x that matters; the value is ignored.)
  2. There is only one null, not one for each class. Thus, ((String) null == (Hashtable) null), for example.
  3. It is ok to pass null as an argument to a method, as long as the method is expecting it. Some methods do; some do not. So, for example, System.out.println(null) is ok, but string.compareTo(null) is not. For methods you write, your javadoc comments should say whether null is ok, unless it is obvious.
  4. In JDK 1.1 to 1.1.5, passing null as the literal argument to a constructor of an anonymous inner class (e.g., new SomeClass(null) { ...} caused a compiler error. It's ok to pass an expression whose value is null, or to pass a coerced null, like new SomeClass((String) null) { ...}
  5. There are at least three different meanings that null is commonly used to express:
    • Uninitialized. A variable or slot that hasn't yet been assigned its real value.
    • Non-existant/not applicable. For example, terminal nodes in a binary tree might be represented by a regular node with null child pointers.
    • Empty. For example, you might use null to represent the empty tree. Note that this is subtly different from the previous case, although some people make the mistake of confusing the two cases. The difference is whether null is an acceptable tree node, or whether it is a signal to not treat the value as a tree node. Compare the following three implementations of binary tree nodes with an in-order print method:

// null means not applicable
// There is no empty tree.

class Node {
Object data;
Node left, right;

void print() {
if (left != null)
left.print();
System.out.println(data);
if (right != null)
right.print();
}
}
// null means empty tree
// Note static, non-static methods

class Node {
Object data;
Node left, right;

void static print(Node node) {
if (node != null) node.print();
}

void print() {
print(left);
System.out.println(data);
print(right);
}
}
// Separate class for Empty
// null is never used

interface Node { void print(); }

class DataNode implements Node{
Object data;
Node left, right;

void print() {
left.print();
System.out.println(data);
right.print();
}
}

class EmptyNode implements Node {
void print() { }
}


Q: How big is an Object? Why is there no sizeof?

C has a sizeof operator, and it needs to have one, because the user has to manage calls to malloc, and because the size of primitive types (like long) is not standardized. Java doesn't need a sizeof, but it would still have been a convenient aid. Since it's not there, you can do this:

static Runtime runtime = Runtime.getRuntime();
...
long start, end;
Object obj;
runtime.gc();
start = runtime.freememory();
obj = new Object(); // Or whatever you want to look at
end = runtime.freememory();
System.out.println("That took " + (start-end) + "
bytes.");

This method is not foolproof, because a garbage collection could occur in the middle of the code you are instrumenting, throwing off the byte count. Also, if you are using a just-in-time compiler, some bytes may come from generating code.

You might be surprised to find that an Object takes 16 bytes, or 4 words, in the Sun JDK VM. This breaks down as follows: There is a two-word header, where one word is a pointer to the object's class, and the other points to the instance variables. Even though Object has no instance variables, Java still allocates one word for the variables. Finally, there is a "handle", which is another pointer to the two-word header. Sun says that this extra level of indirection makes garbage collection simpler. (There have been high performance Lisp and Smalltalk garbage collectors that do not use the extra level for at least 15 years. I have heard but have not confirmed that the Microsoft JVM does not have the extra level of indirection.)

An empty new String() takes 40 bytes, or 10 words: 3 words of pointer overhead, 3 words for the instance variables (the start index, end index, and character array), and 4 words for the empty char array. Creating a substring of an existing string takes "only" 6 words, because the char array is shared. Putting an Integer key and Integer value into a Hashtable takes 64 bytes (in addition to the four bytes that were pre-allocated in the Hashtable array): I'll let you work out why.


Q: In what order is initialization code executed? What should I put where?

Instance variable initialization code can go in three places within a class:

In an instance variable initializer for a class (or a superclass).
class C {
String var = "val";

In a constructor for a class (or a superclass).
    public C() { var = "val"; }

In an object initializer block. This is new in Java 1.1; its just like a static initializer block but without the keyword static.
    { var = "val"; }
}

The order of evaluation (ignoring out of memory problems) when you say new C() is:

  1. Call a constructor for C's superclass (unless C is Object, in which case it has no superclass). It will always be the no-argument constructor, unless the programmer explicitly coded super(...) as the very first statement of the constructor.
  2. Once the super constructor has returned, execute any instance variable initializers and object initializer blocks in textual (left-to-right) order. Don't be confused by the fact that javadoc and javap use alphabetical ordering; that's not important here.
  3. Now execute the remainder of the body for the constructor. This can set instance variables or do anything else.
In general, you have a lot of freedom to choose any of these three forms. My recommendation is to use instance variable initailizers in cases where there is a variable that takes the same value regardless of which constructor is used. Use object initializer blocks only when initialization is complex (e.g. it requires a loop) and you don't want to repeat it in multiple constructors. Use a constructor for the rest.

Here's another example:

Program:
class A {
String a1 = ABC.echo(" 1: a1");
String a2 = ABC.echo(" 2: a2");
public A() {ABC.echo(" 3: A()");}
}

class B extends A {
String b1 = ABC.echo(" 4: b1");
String b2;
public B() {
ABC.echo(" 5: B()");
b1 = ABC.echo(" 6: b1 reset");
a2 = ABC.echo(" 7: a2 reset");
}
}

class C extends B {
String c1;
{ c1 = ABC.echo(" 8: c1"); }
String c2;
String c3 = ABC.echo(" 9: c3");

public C() {
ABC.echo("10: C()");
c2 = ABC.echo("11: c2");
b2 = ABC.echo("12: b2");
}
}

public class ABC {
static String echo(String arg) {
System.out.println(arg);
return arg;
}

public static void main(String[] args) {
new C();
}
}

Output:
 1: a1
2: a2
3: A()
4: b1
5: B()
6: b1 reset
7: a2 reset
8: c1
9: c3
10: C()
11: c2
12: b2


Q: What about class initialization?

It is important to distinguish class initialization from instance creation. An instance is created when you call a constructor with new. A class C is initialized the first time it is actively used. At that time, the initialization code for the class is run, in textual order. There are two kinds of class initialization code: static initializer blocks (static { ... }), and class variable initializers (static String var = ...).

Active use is defined as the first time you do any one of the following:

  1. Create an instance of C by calling a constructor;
  2. Call a static method that is defined in C (not inherited);
  3. Assign or access a static variable that is declared (not inherited) in C. It does not count if the static variable is initialized with a constant expression (one involving only primitive operators (like + or ||), literals, and static final variables), because these are initialized at compile time.

Here is an example:

Program:
class A {
static String a1 = ABC.echo(" 1: a1");
static String a2 = ABC.echo(" 2: a2");
}

class B extends A {
static String b1 = ABC.echo(" 3: b1");
static String b2;
static {
ABC.echo(" 4: B()");
b1 = ABC.echo(" 5: b1 reset");
a2 = ABC.echo(" 6: a2 reset");
}
}

class C extends B {
static String c1;
static { c1 = ABC.echo(" 7: c1"); }
static String c2;
static String c3 = ABC.echo(" 8: c3");

static {
ABC.echo(" 9: C()");
c2 = ABC.echo("10: c2");
b2 = ABC.echo("11: b2");
}
}

public class ABC {
static String echo(String arg) {
System.out.println(arg);
return arg;
}

public static void main(String[] args) {
new C();
}
}

Output:
 1: a1
2: a2
3: b1
4: B()
5: b1 reset
6: a2 reset
7: c1
8: c3
9: C()
10: c2
11: b2


Q: I have a class with six instance variables, each of which could be initialized or not. Should I write 64 constructors?

Of course you don't need (26) constructors. Let's say you have a class C defined as follows:

public class C { int a,b,c,d,e,f; }

Here are some things you can do for constructors:

  1. Guess at what combinations of variables will likely be wanted, and provide constructors for those combinations. Pro: That's how it's usually done. Con: Difficult to guess correctly; lots of redundant code to write.

  2. Define setters that can be cascaded because they return this. That is, define a setter for each instance variable, then use them after a call to the default constructor:

    public C setA(int val) { a = val; return this; }
    ...
    new C().setA(1).setC(3).setE(5);

    Pro: This is a reasonably simple and efficient approach. A similar idea is discussed by Bjarne Stroustrop on page 156 of The Design and Evolution of C++. Con: You need to write all the little setters, they aren't JavaBean-compliant (since they return this, not void), they don't work if there are interactions between two values.

  3. Use the default constructor for an anonymous sub-class with a non-static initializer:

    new C() {{ a = 1; c = 3; e = 5; }}

    Pro: Very concise; no mess with setters. Con: The instance variables can't be private, you have the overhead of a sub-class, your object won't actually have C as its class (although it will still be an instanceof C), it only works if you have accessible instance variables, and many people, including experienced Java programmers, won't understand it. Actually, its quite simple: You are defining a new, unnamed (anonymous) subclass of C, with no new methods or variables, but with an initialization block that initializes a, c, and e. Along with defining this class, you are also making an instance. When I showed this to Guy Steele, he said "heh, heh! That's pretty cute, all right, but I'm not sure I would advocate widespread use..." As usual, Guy is right. (By the way, you can also use this to create and initialize a vector. You know how great it is to create and initialize, say, a String array with new String[] {"one", "two", "three"}. Now with inner classes you can do the same thing for a vector, where previously you thought you'd have to use assignement statements: new Vector(3) {{add("one"); add("two"); add("three")}}.)

  4. You can switch to a language that directly supports this idiom.. For example, C++ has optional arguments. So you can do this:

    class C {
    public: C(int a=1, int b=2, int c=3, int d=4, int e=5);
    }
    ...
    new C(10); // Construct an instance with defaults for b,c,d,e

    Common Lisp and Python have keyword arguments as well as optional arguments, so you can do this:

    C(a=10, c=30, e=50)            # Construct an instance; use defaults for b and d.


Q:When should I use constructors, and when should I use other methods?

The glib answer is to use constructors when you want a new object; that's what the keyword new is for. The infrequent answer is that constructors are often over-used, both in when they are called and in how much they have to do. Here are some points to consider
  • Modifiers: As we saw in the previous question, one can go overboard in providing too many constructors. It is usually better to minimize the number of constructors, and then provide modifier methods, that do the rest of the initialization. If the modifiers return this, then you can create a useful object in one expression; if not, you will need to use a series of statements. Modifiers are good because often the changes you want to make during construction are also changes you will want to make later, so why duplicate code between constructors and methods.
  • Factories: Often you want to create something that is an instance of some class or interface, but you either don't care exactly which subclass to create, or you want to defer that decision to runtime. For example, if you are writing a calculator applet, you might wish that you could call new Number(string), and have this return a Double if string is in floating point format, or a Long if string is in integer format. But you can't do that for two reasons: Number is an abstract class, so you can't invoke its constructor directly, and any call to a constructor must return a new instance of that class directly, not of a subclass. A method which returns objects like a constructor but that has more freedom in how the object is made (and what type it is) is called a factory. Java has no built-in support or conventions for factories, but you will want to invent conventions for using them in your code.
  • Caching and Recycling: A constructor must create a new object. But creating a new object is a fairly expensive operation. Just as in the real world, you can avoid costly garbage collection by recycling. For example, new Boolean(x) creates a new Boolean, but you should almost always use instead (x ? Boolean.TRUE : Boolean.FALSE), which recycles an existing value rather than wastefully creating a new one. Java would have been better off if it advertised a method that did just this, rather than advertising the constructor. Boolean is just one example; you should also consider recycling of other immutable classes, including Character, Integer, and perhaps many of your own classes. Below is an example of a recycling factory for Numbers. If I had my choice, I would call this Number.make, but of course I can't add methods to the Number class, so it will have to go somewhere else.

      public Number numberFactory(String str) throws NumberFormatException {
    try {
    long l = Long.parseLong(str);
    if (l >= 0 && l < cachedLongs.length) {
    int i = (int)l;
    if (cachedLongs[i] != null) return cachedLongs[i];
    else return cachedLongs[i] = new Long(str);
    } else {
    return new Long(l);
    }
    } catch (NumberFormatException e) {
    double d = Double.parseDouble(str);
    return d == 0.0 ? ZERO : d == 1.0 ? ONE : new Double(d);
    }
    }

    private Long[] cachedLongs = new Long[100];
    private Double ZERO = new Double(0.0);
    private Double ONE = new Double(1.0);

We see that new is a useful convention, but that factories and recycling are also useful. Java chose to support only new because it is the simplest possibility, and the Java philosophy is to keep the language itself as simple as possible. But that doesn't mean your class libraries need to stick to the lowest denominator. (And it shouldn't have meant that the built-in libraries stuck to it, but alas, they did.)

Q: Will I get killed by the overhead of object creation and GC?

Suppose the application has to do with manipulating lots of 3D geometric points. The obvious Java way to do it is to have a class Point with doubles for x,y,z coordinates. But allocating and garbage collecting lots of points can indeed cause a performance problem. You can help by managing your own storage in a resource pool. Instead of allocating each point when you need it, you can allocate a large array of Points at the start of the program. The array (wrapped in a class) acts as a factory for Points, but it is a socially-conscious recycling factory. The method call pool.point(x,y,z) takes the first unused Point in the array, sets its 3 fields to the specified values, and marks it as used. Now you as a programmer are responsible for returning Points to the pool once they are no longer needed. There are several ways to do this. The simplest is when you know you will be allocating Points in blocks that are used for a while, and then discarded. Then you do int pos = pool.mark() to mark the current position of the pool. When you are done with the section of code, you call pool.restore(pos) to set the mark back to the position. If there are a few Points that you would like to keep, just allocate them from a different pool. The resource pool saves you from garbage collection costs (if you have a good model of when your objects will be freed) but you still have the initial object creation costs. You can get around that by going "back to Fortran": using arrays of x,y and z coordinates rather than individual point objects. You have a class of Points but no class for an individual point. Consider this resource pool class:

 
public class PointPool {
/** Allocate a pool of n Points. **/
public PointPool(int n) {
x = new double[n];
y = new double[n];
z = new double[n];
next = 0;
}
public double x[], y[], z[];

/** Initialize the next point, represented as in integer index. **/
int point(double x1, double y1, double z1) {
x[next] = x1; y[next] = y1; z[next] = z1;
return next++;
}

/** Initialize the next point, initilized to zeros. **/
int point() { return point(0.0, 0.0, 0.0); }

/** Initialize the next point as a copy of a point in some pool. **/
int point(PointPool pool, int p) {
return point(pool.x[p], pool.y[p], pool.z[p]);
}

public int next;
}
You would use this class as follows:

 
PointPool pool = new PointPool(1000000);
PointPool results = new PointPool(100);
...
int pos = pool.next;
doComplexCalculation(...);
pool.next = pos;

...

void doComplexCalculation(...) {
...
int p1 = pool.point(x, y, z);
int p2 = pool.point(p, q, r);
double diff = pool.x[p1] - pool.x[p2];
...
int p_final = results.point(pool,p1);
...
}

Allocating a million points took half a second for the PointPool approach, and 6 seconds for the straightforward approach that allocates a million instances of a Point class, so that's a 12-fold speedup.

Wouldn't it be nice if you could declare p1, p2 and p_final as Point rather than int? In C or C++, you could just do typedef int Point, but Java doesn't allow that. If you're adventurous, you can set up make files to run your files through the C preprocessor before the Java compiler, and then you can do #define Point int.


Q: I have a complex expression inside a loop. For efficiency, I'd like the computation to be done only once. But for readability, I want it to stay inside the loop where it is used. What can I do?

Let's assume an example where match is a regular expression pattern match routine, and compile compiles a string into a finite state machine that can be used by match:

for(;;) {
...
String str = ...
match(str, compile("a*b*c*"));
...
}

Since Java has no macros, and little control over time of execution, your choices are limited here. One possibility, although not very pretty, is to use an inner interface with a variable initializer:

for(;;) {
...
String str = ...
interface P1 {FSA f = compile("a*b*c*);}
match(str, P1.f);
...
}

The value for P1.f gets initialized on the first use of P1, and is not changed, since variables in interfaces are implicitly static final. If you don't like that, you can switch to a language that gives you better control. In Common Lisp, the character sequence #. means to evaluate the following expression at read (compile) time, not run time. So you could write:

(loop
...
(match str #.(compile "a*b*c*"))
...)


Q: What other operations are surprisingly slow?

Where do I begin? Here are a few that are most useful to know about. I wrote a timing utility that runs snippets of code in a loop, reporting the results in terms of thousands of iterations per second (K/sec) and microseconds per iteration (uSecs). Timing was done on a Sparc 20 with the JDK 1.1.4 JIT compiler. I note the following:
  • These were all done in 1998. Compilers have changed since then.
  • Counting down (i.e. for (int i=n; i>0; i--)) is twice as fast as counting up: my machine can count down to 144 million in a second, but up to only 72 million.
  • Calling Math.max(a,b) is 7 times slower than (a > b) ? a : b. This is the cost of a method call.
  • Arrays are 15 to 30 times faster than Vectors. Hashtables are 2/3 as fast as Vectors.
  • Using bitset.get(i) is 60 times slower than bits & 1 <<>. This is the cost of a synchronized method call, mostly. Of course, if you want more than 64 bits, you can't use my bit-twiddling example. Here's a chart of times for getting and setting elements of various data structures:

      K/sec     uSecs          Code           Operation
    ========= ======= ==================== ===========
    147,058 0.007 a = a & 0x100; get element of int bits
    314 3.180 bitset.get(3); get element of Bitset
    20,000 0.050 obj = objs[1]; get element of Array
    5,263 0.190 str.charAt(5); get element of String
    361 2.770 buf.charAt(5); get element of StringBuffer
    337 2.960 objs2.elementAt(1); get element of Vector
    241 4.140 hash.get("a"); get element of Hashtable

    336 2.970 bitset.set(3); set element of Bitset
    5,555 0.180 objs[1] = obj; set element of Array
    355 2.810 buf.setCharAt(5,' ') set element of StringBuffer
    308 3.240 objs2.setElementAt(1 set element of Vector
    237 4.210 hash.put("a", obj); set element of Hashtable

  • Java compilers are very poor at lifting constant expressions out of loops. The C/Java for loop is a bad abstraction, because it encourages re-computation of the end value in the most typical case. So for(int i=0; i is three times slower than int len = str.length(); for(int i=0; i


Q: Can I get good advice from books on Java?

There are a lot of Java books out there, falling into three classes:

Bad. Most Java books are written by people who couldn't get a job as a Java programmer (since programming almost always pays more than book writing; I know because I've done both). These books are full of errors, bad advice, and bad programs. These books are dangerous to the beginner, but are easily recognized and rejected by a programmer with even a little experience in another language.

Excellent. There are a small number of excellent Java books. I like the official specification and the books by Arnold and Gosling, Marty Hall, and Peter van der Linden. For reference I like the Java in a Nutshell series and the online references at Sun (I copy the javadoc APIs and the language specification and its amendments to my local disk and bookmark them in my browser so I'll always have fast access.)

Iffy. In between these two extremes is a collection of sloppy writing by people who should know better, but either haven't taken the time to really understand how Java works, or are just rushing to get something published fast. One such example of half-truths is Edward Yourdon's Java and the new Internet programming paradigm from Rise and Resurrection of the American Programmer [footnote on Yourdon]. Here's what Yourdon says about how different Java is:

  • "Functions have been eliminated" It's true that there is no "function" keyword in Java. Java calls them methods (and Perl calls them subroutines, and Scheme calls them procedures, but you wouldn't say these languages have eliminated functions). One could reasonably say that there are no global functions in Java. But I think it would be more precise to say that there are functions with global extent; its just that they must be defined within a class, and are called "static method C.f" instead of "function f".
  • "Automatic coercions of data types have been eliminated" It's true that there are limits in the coercions that are made, but they are far from eliminated. You can still say (1.0 + 2) and 2 will be automatically coerced to a double. Or you can say ("one" + 2) and 2 will be coerced to a string.
  • "Pointers and pointer arithmetic have been eliminated" It's true that explicit pointer arithmetic has been eliminated (and good riddance). But pointers remain; in fact, every reference to an object is a pointer. (That's why we have NullPointerException.) It is impossible to be a competent Java programmer without understanding this. Every Java programmer needs to know that when you do:
        int[] a = {0, 1, 2};
    int[] b = a;
    b[0] = 99;
    then a[0] is 99 because a and b are pointers (or references) to the same object.
  • "Because structures are gone, and arrays and strings are represented as objects, the need for pointers has largely disappeared." This is also misleading. First of all, structures aren't gone, they're just renamed "classes". What is gone is programmer control over whether structure/class instances are allocated on the heap or on the stack. In Java all objects are allocated on the heap. That is why there is no need for syntactic markers (such as *) for pointers--if it references an object in Java, it's a pointer. Yourdan is correct in saying that having pointers to the middle of a string or array is considered good idiomatic usage in C and assembly language (and by some people in C++), but it is neither supported nor missed in other languages.
  • Yourdon also includes a number of minor typos, like saying that arrays have a length() method (instead of a length field) and that modifiable strings are represented by StringClass (instead of StringBuffer). These are annoying, but not as harmful as the more basic half-truths.