Welcome to Siva's Blog

~-Scribbles by Sivananda Hanumanthu
My experiences and learnings on Technology, Leadership, Domains, Life and on various topics as a reference!
What you can expect here, it could be something on Java, J2EE, Databases, or altogether on a newer Programming language, Software Engineering Best Practices, Software Architecture, SOA, REST, Web Services, Micro Services, APIs, Technical Architecture, Design, Programming, Cloud, Application Security, Artificial Intelligence, Machine Learning, Big data and Analytics, Integrations, Middleware, Continuous Delivery, DevOps, Cyber Security, Application Security, QA/QE, Automations, Emerging Technologies, B2B, B2C, ERP, SCM, PLM, FinTech, IoT, RegTech or any other domain, Tips & Traps, News, Books, Life experiences, Notes, latest trends and many more...

Wednesday, September 22, 2010

Java Anti-Patterns and detailed Examples

An antipattern is a classified bad design; in other words, it is the opposite of a design pattern that suggests good design. Antipatterns present bad solutions in a manner that makes it easy for concerned persons to understand the underlying problems and their consequences. While it is important to know design patterns, I believe it is equally important, if not more so, to understand antipatterns.

Let me justify that position. The world of software revolves around maintenance of applications. Of course, each software product's lifecycle starts with construction, but after the initial roll out, it needs to be maintained. Depending on the development team's skill, the product might have either a good or a bad design, with the terms good and bad here applied within a context, because a perfect design can qualify as an antipattern when applied in the wrong context. For example, using a Singleton object might be appropriate in a single-application server environment, but it can actually create issues if not handled properly in a clustered-application server environment. In contrast to a positive design pattern, antipatterns elicit a negative solution or legacy approach (a yesteryear's solution might be an antipattern in today's world) that could be the result of a team member's lack of sufficient information or bad judgment in approaching the design or solving an issue.


Ref:

Java Anti-Patterns

This page collects some bad code that may not look so obiously bad for beginners. Beginners often struggle with the language syntax. They also have little knowledge about the standard JDK class library and how to make the best use of it. In fact I have collected all examples from everyday junior code. I have modified the original code to give it example character and such that it highlights the problems. Many of these problems can easily be detected by FindBugs, which is available as a simple Eclipse Plug-in. I strongly recommend this tool to any beginner programmer. Also pros should run it from time to time on their codebase, and review its output carefully. It an easy to use tool and I always find some bugs when I use it.
Some of these may seem like micro-optimization or even premature optimization without profiling. But performance and memory wasted in millions of these small places adds up quickly and will grind an application to a crawl. And when I say application, I mean a server-side application running on an application server. That's what I do for a living. On desktop GUI applications the situation may not be as bad. In the end a lot of your application's performance depends on the overall quality of your code. By the way you should never underestimate the importance of memory footprint. Even though garbage collection is quite fast, most server-side code's scalability is dominated and limited by its memory use per request/transaction and the request/transaction duration.

String concatenation

String s = "";
for (Person p : persons) {
    s += ", " + p.getName();
}
s = s.substring(2); //remove first comma
This is a real performance killer: O(persons.length²). The repeated concatenation of strings in a loop causes excess garbage and array copying. Moreover it is ugly that the resulting string has to be fixed for an extra comma.
StringBuilder sb = new StringBuilder(persons.size() * 16); // well estimated buffer
for (Person p : persons) {
    if (sb.length() > 0) sb.append(", ");
    sb.append(p.getName);
}

Lost StringBuffer performance

StringBuffer sb = new StringBuffer();
sb.append("Name: ");
sb.append(name + '\n');
sb.append("!");
...
String s = sb.toString();
Despite good intentions the above code is not perfect. The most obvious mistake is the string concatenation in line 3. In line 4 appending a char would be faster than appending a String. An also major omission is the missing length initialization of the buffer which may incur unnecessary resizing (array copying). In JDK 1.5 and above a StringBuilder instead of StringBuffer should have been used: because it is only a local variable the implicit synchronization is overkill. Actually, using simple String concatenation compiles to almost perfect byte code: it's only missing the length initialization.
StringBuilder sb = new StringBuilder(100);
sb.append("Name: ");
sb.append(name);
sb.append("\n!");
String s = sb.toString();
String s = "Name: " + name + "\n!";

Testing for string equality

if (name.compareTo("John") == 0) ...
if (name == "John") ...
if (name.equals("John")) ...
if ("".equals(name)) ...
None of the above comparisons is wrong - but neither are they really good. The compareTo method is overkill and too verbose. The == operator tests for object identity which is probably not what you want. The equals method is the way to go, but reversing the constant and variable would give you extra safety if name is null plus an increase in speed because the equals method is always called from the same object if used in a loop. When testing for empty strings it's faster to check if their length is 0. Because the equals method may first calculate a hash value.
if ("John".equals(name)) ...
if (name.length() == 0) ...

Converting numbers to Strings

"" + set.size()
new Integer(set.size()).toString() 
The return type of the Set.size() method is int. A conversion to String is wanted. These two examples in fact do the conversion. But the first incurs the penalty of a concatenation operation (translates to (new StringBuilder()).append(i).toString())). And the second creates an intermediate Integer wrapper. The correct way of doing it is
String.valueOf(set.size())

Not taking adavantage of immutable objects

zero = new Integer(0);
return Boolean.valueOf("true");
Integer as well as Boolean are immutable. Thus it doesn't make sense to create several objects that represent the same value. Those classes have built-in caches for frequently used instances. In the case of Boolean there are even only two possible instances. The programmer can take advantage of this:
zero = Integer.valueOf(0);
return Boolean.TRUE;

XML parsers are for sissies

int start = xml.indexOf("");
int end = xml.indexOf("");
String name = xml.substring(start, end);
This naïve XML parsing only works with the most simple XML documents. It will however fail if a) the name element is not unique in the document, b) the content of name is not only character data c) the text data of name contains escaped characters d) the text data is specified as a CDATA section e) the document uses XML namespaces. XML is way too complex for string operations. There is a reason why XML parsers like Xerces are a over one megabyte jar files! The equivalent with JDOM is:
SAXBuilder builder = new SAXBuilder(false);
Document doc = doc = builder.build(new StringReader(xml));
String name = doc.getRootElement().getChild("name").getText();

Assembling XML with String operations

String name = ...
String attribute = ...
String xml = ""
            +""+ name +""
            +"";
Many beginners are tempted to produce XML output like shown above, by using String operations (which they know so well and which are so easy). Indeed it is very simple and almost beautiful code. However it has one severe shortcoming: It fails to escape reserved characters. So if the variables name or attribute contain any of the reserved characters <, >, &, " or ' this code would produce invalid XML. Also as soon as the XML uses namespaces, String operations may quickly become nasty and hard to maintain. Now XML should be assembled in a DOM. The JDom library is quite nice for that.
Element root = new Element("root");
root.setAttribute("att", attribute);
root.setText(name);
Document doc = new Documet();
doc.setRootElement(root);
XmlOutputter out = new XmlOutputter(Format.getPrettyFormat());
String xml = out.outputString(root);

The XML encoding trap

String xml = FileUtils.readTextFile("my.xml");
It is a very bad idea to read an XML file and store it in a String. An XML specifies its encoding in the XML header. But when reading a file you have to know the encoding beforehand! Also storing an XML file in a String wastes memory. All XML parsers accept an InputStream as a parsing source and they figure out the encoding themselves correctly. So you can feed them an InputStream instead of storing the whole file in memory temporarily. The byte order (big-endian, little-endian) is another trap when a multi-byte encoding (such as UTF-8) is used. XML files may carry a byte order mark at the beginning that specifies the byte order. XML parsers handle them correctly.

char is not int

int i = in.read();
char c = (char) i;
The above code assumes that you can create a character from a number. It's true technically: a character's number is the 16 bit Unicode codepoint number. But it is semantic nonsense. In Java the character is a semantic entity of its own. The character's byte representation is completely decoupled from that. If we encounter a char we don't need to worry whether the character is stored in UTF-8, UTF-16, USC-4 or ISO-8859-1 internally. It simply doesn't matter. We can compare it to other characters and it will always behave as expected. This concept is not known in C for example. In C the char type is just a numeric type. It can contain anything, even invalid data that does not represent characters. In C you have to know exactly which character encoding a char array uses or you may do wrong things when sorting, printing, searching etc. Also C programs may wrongly assume that a char is one byte long and contains values 0-127 or 0-256, which is true for ASCII, but not for many other character encodings (known as "multi-byte" character encodings). Anyway, in Java use Reader/Writer or CharsetEncoder/CharsetDecoder instead to convert between characters and their byte representation (see following paragraph).

Undefined encoding

Reader r = new FileReader(file);
Writer w = new FileWriter(file);
Reader r = new InputStreamReader(inputStream);
Writer w = new OutputStreamWriter(outputStream);
String s = new String(byteArray); // byteArray is a byte[]
byte[] a = string.getBytes();
Each line of the above converts between byte and char using the default platform encoding. The code behaves differently depending on the platform it runs on. This is harmful if the data flows from one platform to another. It is considered bad practice to rely on the default platform encoding at all. Conversions should always be performed with a defined encoding.
Reader r = new InputStreamReader(new FileInputStream(file), "ISO-8859-1");
Writer w = new OutputStreamWriter(new FileOutputStream(file), "ISO-8859-1");
Reader r = new InputStreamReader(inputStream, "UTF-8");
Writer w = new OutputStreamWriter(outputStream, "UTF-8");
String s = new String(byteArray, "ASCII");
byte[] a = string.getBytes("ASCII");

Unbuffered streams

InputStream in = new FileInputStream(file);
int b;
while ((b = in.read()) != -1) {
   ...
}
The above code reads a file byte by byte. Every read() call on the stream will cause a native (JNI) call to the native implementation of the filesystem. Depending on the implementation this may cause a syscall to the operating system. JNI calls are expensive and so are syscalls. The number of native calls can be reduced dramatically by wrapping the stream into a BufferedInputStream. Reading 1 MB of data from /dev/zero with the above code took about 1 second on my laptop. With the fixed code below it was down to 60 milliseconds! That's a 94% saving. This also applies for output streams of course. And it is true not only for the file system but also for sockets.
InputStream in = new BufferedInputStream(new FileInputStream(file));

Infinite heap

byte[] pdf = toPdf(file);
Here a method creates a PDF file from some input and returns the binary PDF data as a byte array. This code assumes that the generated file is small enough to fit into the available heap memory. If this code can not make this 100% sure then it is vulnerable to an out of memory condition. Especially if this code is run server-side which usually means many parallel threads. Bulk data must never be handled with byte arrays. Streams should be used and the data should be spooled to disk or a database.
File pdf = toPdf(file);
A similar anti-pattern is to buffer streaming input from an "untrusted" (security term) source. Such as buffering data that arrives on a network socket. If the application doesn't know how much data will be arriving it must make sure that it keeps an eye on the size of the data. If the amount of buffered data exceeds sane limits an error condition (exception) should be signalled to the caller, rather than driving the application against the wall by letting it run into an out of memory condition.

Infinite time

Socket socket = ...
socket.connect(remote);
InputStream in = socket.getInputStream();
int i = in.read();
The above code has two blocking calls that use unspecified timeouts. Imagine if the timeout is infinite. That may cause the application to hang forever. Generally it is an extremely stupid idea to have infinite timeouts in the first place. Infinity is extremely long. Even by the time the Sun turns into a red giant (it explodes), it's still a looong way to Infinity. The average programmer dies at 72. There is simply no real-world situation, where we want to wait that long. Infinite timeout is just an absurd thing. Use an hour, day, week, month, 1 year, 10 years. But not Infinity. To connect to a remote machine I personally find 20 seconds plenty of timeout. A human is not even as patient and would cancel the operation before. While there is a nice override for the connect() method that takes a timeout parameter, there is no such thing for the read(). But you can modify a Socket's socket timeout before every blocking call. (Not just once! You can set different timeouts for different situations.) The socket will throw an exception on blocking calls after that timeout. Also frameworks that communicate over the network should provide an API to control these timeouts and use sensible default values. Infinity is not sensible - it's insane and drives you mad. Who came up with this absolutely useless infinity timeout anyway?
Socket socket = ...
socket.connect(remote, 20000); // fail after 20s
InputStream in = socket.getInputStream();
socket.setSoTimeout(15000);
int i = in.read();
Unfortunately the file system API (FileInputStream, FileChannel, FileDescriptor, File) provides no way to set timeouts on file operations. That's very unfortunate. Because these are the most common blocking calls in a Java application: writing to stdout/stderr and reading from stdin are file operations, and writing to log files is common. Operations on the standard input/output streams depend directly on other processes outside of our Java VM. If they decide to block forever, so will reads/writes to these streams in our application. Disk I/O is a limited resource for which all processes on a system compete. There is no guarantee that a simple read/write on a file is quick. It may incur unspecified wait time. Also today remote file systems are ubiquitous. Disks may be on a SAN/NAS, or file systems may be mounted over the network (NFS, AFS, CIFS/Samba). So a filesystem call may actually be a network call: too bad that we don't have the power of the network API here! So if the OS decides that the timeout for the write is 60 seconds you're stuck with it. Solutions to this problem are: adequate buffering and queuing/asynchronous processing.

Catch all: I don't know the right runtime exception

Query q = ...
Person p;
try {
    p = (Person) q.getSingleResult();
} catch(Exception e) {
    p = null;
}
This is an example of a J2EE EJB3 query. The getSingleResult throws runtime exceptions when a) the result is not unique, b) there is no result c) when the query could not be executed due to database failure or so. The code above just catches any exception. A typical catch-all block. Using null as a result may be the right thing for case b) but not for case a) or c). In general one should not catch more exceptions than necessary. The correct exception handling is
Query q = ...
Person p;
try {
    p = (Person) q.getSingleResult();
} catch(NoResultException e) {
    p = null;
}

Exceptions are annoying

try {
    doStuff();
} catch(Exception e) {
    log.fatal("Could not do stuff");
}
doMoreStuff();
There are two problems with this tiny piece of code. First, if this is really a fatal condition then the method should abort and notify the caller of the fatal condition with an appropriate exception (so why is it catched in the first place?) Hardly ever can you just continue after a fatal condition. Second, this code is very hard to debug because the reason of the failure is lost. Exception objects carry detailed information about where the error occurred and what caused it. Individual subclasses may actually carry a lot of extra information that the caller can use to deal with the situation properly. It's a lot more than a simple error code (which is so popular in the C world. Just look at the Linux kernel. return -EINVAL everywhere...). If you catch highlevel exceptions then at least log the message and stack trace. You should not see exceptions as a necessary evil. They are a great tool for error handling.
try {
    doStuff();
} catch(Exception e) {
    throw new MyRuntimeException("Could not do stuff because: "+ e.getMessage, e);
}

Re-wrapping RuntimeException

try {
  doStuff();
} catch(Exception e) {
  throw new RuntimeException(e);
}
Sometimes you really want to re-throw any checked exception as RuntimeException. The above piece of code doesn't take into account however, that RuntimeException extends Exception. The RuntimeException doesn't need to be catched here. Also the exception's message is not propagated properly. A bit better is to catch the RuntimeException separately and not wrap it. Even better is to catch all the checked exceptions individually (even if they are a lot).
try {
  doStuff();
} catch(RuntimeException e) {
  throw e;
} catch(Exception e) {
  throw new RuntimeException(e.getMessage(), e);
}
try {
  doStuff();
} catch(IOException e) {
  throw new RuntimeException(e.getMessage(), e);
} catch(NamingException e) {
  throw new RuntimeException(e.getMessage(), e);
}

Not properly propagating the exception

try {
} catch(ParseException e) {
  throw new RuntimeException();
  throw new RuntimeException(e.toString());
  throw new RuntimeException(e.getMessage());
  throw new RuntimeException(e);
}
This codes just wraps a parsing error into a runtime exception in different ways. None of them provides really good information to the caller. The first just loses all information. The second may do anything depending on what information toString() produces. The default toString() implementation lists the fully qualified exception name followed by the message. Nesting many exceptions will produce an unwieldy long and ugly string, unsuitable for a user. The third just preserves the message, which is better than nothing. The last preserves the cause, but sets the message of the runtime exception to toString() of its cause (see above). The most useful and readable version is to propagate only the cause message in the runtime exception and pass the original exception as the cause:
try {
} catch(ParseException e) {
  throw new RuntimeException(e.getMessage(), e);
}

Catching to log

try {
    ...
} catch(ExceptionA e) {
    log.error(e.getMessage(), e);
    throw e;
} catch(ExceptionB e) {
    log.error(e.getMessage(), e);
    throw e;
}
This code only catches exception to write out a log statement and then rethrows the same exception. This is stupid. Let the caller decide if the message is important to log and remove the whole try/catch clause. Its only useful when you know that the caller doesn't log it. That's the case if the method is called by a framework which is not under your control.

Incomplete exception handling

try {
    is = new FileInputStream(inFile);
    os = new FileOutputStream(outFile);
} finally {
    try {
        is.close();
        os.close();
    } catch(IOException e) {
        /* we can't do anything */
    }
}
If streams are not closed, the underlying operating system can't free native resources. This programmer wanted to be careful about closing both streams. So he put the close in a finally clause. But if is.close() throws an IOException then os.close is not even executed. Both close statements must be wrapped in their own try/catch clause. Moreover, if creating the input stream throws an exception (because the file was not found) then os is null and os.close() will throw a NullPointerException. To make this less verbose I have stripped some newlines.
try {
    is = new FileInputStream(inFile);
    os = new FileOutputStream(outFile);
} finally {
    try { if (is != null) is.close(); } catch(IOException e) {/* we can't do anything */}
    try { if (os != null) os.close(); } catch(IOException e) {/* we can't do anything */}
}

The exception that never happens

try {
  ... do risky stuff ...
} catch(SomeException e) {
  // never happens
}
... do some more ...
Here the developer executes some code in a try/catch block. He doesn't want to rethrow the exception that one of the called methods declares to his annoyance. As the developer is clever he knows that in his particular situation the exception will never be thrown, so he just inserts an empty catch block. He even puts a nice comment in the empty catch block - but they are famous last words... The problem with this is: how can he be sure? What if the implementation of the called method changes? What if the exception is still thrown in some special case but he just didn't think of it? The code after the try/catch may do the wrong thing in that situation. The exception will go completely unnoticed. The code can be made much more reliable by throwing a runtime exception in the case. This works like an assertion and adheres to the "crash early" principle. The developer will notice if his assumption was wrong. The code after the try/catch will not be excecuted if the exception occured against all honest hope and expectation. If the exception really never occurs - fine, nothing changed.
try {
  ... do risky stuff ...
} catch(SomeException e) {
  // never happens hopefully
  throw new IllegalStateException(e.getMessage(), e); // crash early, passing all information
}
... do some more ...

The transient trap

public class A implements Serializable {
    private String someState;
    private transient Log log = LogFactory.getLog(getClass());
    
    public void f() {
        log.debug("enter f");
        ...
    }
}
Log objects are not serializable. The programmer knew this and correctly declared the log field as transient so it is not serialized. However the initialization of this variables happens in the class' initializer. Upon deserialization initializers and contructors are not executed! This leaves the deserialized object with a null log variable which subsequently causes a NullPointerException in f(). Rule of thumb: never use class initialization with transient variables. You can either solve this case here by using a static variable or by using a local variable:
public class A implements Serializable {
    private String someState;
    private static Log log = LogFactory.getLog(getClass());
    
    public void f() {
        log.debug("enter f");
        ...
    }
}

public class A implements Serializable {
    private String someState;
    
    public void f() {
        Log log = LogFactory.getLog(getClass());
        log.debug("enter f");
        ...
    }
}

Overkill initialization

public class B {
    private int count = 0;
    private String name = null;
    private boolean important = false;
}
This programmer used to code in C. So naturally he wants to make sure every variable is properly initialized. Here however it is not necessary. The Java language specification guarantees that member variables are initialized with certain values automatically: 0, null, false. By declaring them explicitly the programmer causes a class initializer to be executed before the constructor. This is unnecessary overkill and should be avoided.
public class B {
    private int count;
    private String name;
    private boolean important;
}

Log instances: static or not?

This section was edited and before actually suggested not to store log instances in static variables. Turns out I was wrong. Mea culpa. I apologize.
Store the darn log instance in a static final variable and be happy.
private static final Log log = LogFactory.getLog(MyClass.class);
Here is why:
  • Automatically thread-safe. But only with the final keyword included!
  • Usable from static and non-static code.
  • No problems with serializable classes.
  • Initialization cost only once: getLog() may not be as cheap as you might suppose.
  • Nobody is going to unload the Log class loader anyway.

Chosing the wrong class loader

Class clazz = Class.forName(name);
This code uses the class loader that loaded the current class. This is hardly ever what you want when you dynamically load an additional class. Especially in managed environments like Application servers, Servlet engines or Java Webstart this is most certainly wrong. This code will behave very differently depending on the environment it is run in. Environments use the context class loader to provide applications with a class loader they should use to retrieve "their own" classes.
ClassLoader cl = Thread.currentThread().getContextClassLoader();
if (cl == null) cl = getClass().getClassLoader(); // fallback
Class clazz = cl.loadClass(name);

Poor use of reflection

Class beanClass = ...
if (beanClass.newInstance() instanceof TestBean) ...
This programmer is struggling with the reflection API. He needs a way to check for inheritance but didn't find a way to do it. So he just creates a new instance and uses the instanceof operator he is used to. Creating an instance of a class you don't know is dangerous. You never know what this class does. It may be very expensive. Or the default constructor may not even exist. Then this if statement would throw an exception. The correct way of doing this check is to use the Class.isAssignableFrom(Class) method. Its semantics is upsidedown of instanceof.
Class beanClass = ...
if (TestBean.class.isAssignableFrom(beanClass)) ...

Synchronization overkill

Collection l = new Vector();
for (...) {
   l.add(object);
}
Vector is a synchronized ArrayList. And Hashtable is a synchronized HashMap. Both classes should only be used if synchronization is explicitly required. If however those collections are used as local temporary variables the synchronization is complete overkill and degrades performance considerably.
Collection l = new ArrayList();
for (...) {
   l.add(object);
}

Wrong list type

Without sample code. Junior developers often have difficulties to chose the right list type. They usually choose quite randomly from Vector, ArrayList and LinkedList. But there are performance considerations to make! The implementations behave quite differently when adding, iterating or accessing object by index. I'll ignore Vector in this list because it behaves like an ArrayList, just slower. NB: n is the size of the list, not the number of operations!

ArrayList LinkedList
add (append) O(1) or ~O(log(n)) if growing O(1)
insert (middle) O(n) or ~O(n*log(n)) if growing O(n)
remove (middle) O(n) (always performs complete copy) O(n)
iterate O(n) O(n)
get by index O(1) O(n)
The insert performance of the ArrayList depends on whether it has to grow during the insert or if the initial size is reasonably set. The growing occurs exponentially (by factor 2) so growing costs are O(log(n)). The exponential growing however may use much more memory than you actually need. The sudden need to resize the list also makes the response time sluggisch and will probably cause a major garbage collection if the list is large. Iterating over the lists is equally inexpensive. Indexed list element access however is very slow in linked lists of course.
Memory considerations: LinkedList wraps every element into a wrapper object. ArrayList allocates a completely new array each time it needs to grow and performs an array copy on every remove(). All standard Collections can not reuse their Iterator objects, which may cause iterator churn especially when recursively iterating large tree structures.
Personally I almost never use LinkedList. It would really only make sense when you wanted to insert objects in the middle of a list. But without access to the wrapper object this doesn't scale with O(1) but O(n) because you must first traverse the list until you find the insert position. So what exactly is the point of the LinkedList class? I recommend using ArrayLists only.

The HashMap size trap

Map map = new HashMap(collection.size());
for (Object o : collection) {
  map.put(o.key, o.value);
}
This developer had good intentions and wanted to make sure that the HashMap doesn't need to be resized. He thus set its initial size to the number of elements he was going to put into it. Unfortunately the HashMap implementation doesn't quite behave like this. It sets its internal threshold to threshold = (int)(capacity * loadFactor). So it will resize after 75% of the collection have been inserted into the map. The above code will thus always cause extra garbage.
Map map = new HashMap((int) (collection.size() / 0.75));

Hashtable, HashMap and HashSet are overrated

These classes are extremely popular. Because they have great usability for the developer. Unfortunately they are also horribly inefficient. Hashtable and HashMap wrap every key/value pair into an Entry wrapper object. An Entry object is surprisingly large. Not only does it hold a reference to key and value, but also stores the hash code and a forward reference to the next Entry of the hash bucket. When you look at heap dumps with a memory analyzer you will be shocked by how much space is wasted by them in large applications like an application server. When you look at the source code of HashSet you will see that the developers were extremely lazy and just used a HashMap in the backend!
Before using any of these classes, think again. IdentityHashMap can be a viable alternative. But be careful, it intentionally breaks the Map interface. It is much more memory efficient by implementing an open hashtable (no buckets), doesn't need an Entry wrapper and uses a simple Object[] as its backend. Instead of a HashSet a simple ArrayList may do similarly well (you can use contains(Object)) as long as it's small and lookups are rare.
For Sets that contain only a handful of entries the whole hashing is overkill and the memory wasted for the HashMap backend plus the wrapper objects is just nuts. Just use an ArrayList or even an array.
Actually it's a shame that there is no efficient Map and Set implementations in the standard JDK!

Lists are overrated

Also List implementations are very popular. But even lists are often not necessary. Simple arrays may do as well. I am not saying that you should not use Lists at all. They are great to work with. But know when to use arrays. The following are indicators that you should be using an array instead of a list:
  • The list has a fixed size. Example: days of the week. A set of constants.
  • The list is often (10'000 times) traversed.
  • The list contains wrapper objects for numbers (there are no lists of primitive types).
Let me illustrate that in code:
List codes = new ArrayList();
codes.add(Integer.valueOf(10));
codes.add(Integer.valueOf(20));
codes.add(Integer.valueOf(30));
codes.add(Integer.valueOf(40));

versus

int[] codes = { 10, 20, 30, 40 };
// horribly slow and a memory waster if l has a few thousand elements (try it yourself!)
List l = ...;
for (int i=0; i &ltl.size()-1; i++) {
    Mergeable one = l.get(i);
    Iterator j = l.iterator(i+1); // memory allocation!
    while (j.hasNext()) {
        Mergeable other = l.next();
        if (one.canMergeWith(other)) {
            one.merge(other);
            other.remove();
        }
    }
}

versus

// quite fast and no memory allocation
Mergeable[] l = ...;
for (int i=0; i < l.length-1; i++) {
    Mergeable one = l[i];
    for (int j=i+1; j < l.length; j++) {
        Mergeable other = l[j];
        if (one.canMergeWith(other)) {
            one.merge(other);
            l[j] = null;
        }
    }
}
You save an extra list object (wrapping an array), wrapper objects and possibly lots of iterator instances. Even Sun realized this. That's why Collections.sort() actually copies the list into an array and performs the sort on the array.

Object arrays are soooo flexible

/**
 * @returns [1]: Location, [2]: Customer, [3]: Incident
 */
Object[] getDetails(int id) {...
Even though documented, this kind of passing back values from a method is ugly and error prone. You should really declare a small class that holds the objects together. This is analoguos to a struct in C.
Details getDetails(int id) {...}

private class Details {
    public Location location;
    public Customer customer;
    public Incident incident;
}

Premature object decomposition

public void notify(Person p) {
    ...
    sendMail(p.getName(), p.getFirstName(), p.getEmail());
    ...
}
class PhoneBook {
    String lookup(String employeeId) {
        Employee emp = ...
        return emp.getPhone();
    }
}
In the first example it's painful to decompose an object just to pass its state on to a method. In the second example the use of this method is very limited. If overall design allows it pass the object itself.
public void notify(Person p) {
    ...
    sendMail(p);
    ...
}
class EmployeeDirectory {
    Employee lookup(String employeeId) {
        Employee emp = ...
        return emp;
    }
}

Modifying setters

private String name;

public void setName(String name) {
    this.name = name.trim();
}

public void String getName() {
    return this.name;
}
This poor developer suffered from spaces at the beginning or end of a name entered by the user. He thought to be clever and just removed the spaces inside the setter method of a bean. But how odd is a bean that modifies its data instead of just holding it? Now the getter returns different data than was set by the setter! If this was done inside an EJB3 entity bean a simple read from the DB would actually modify the data: For every INSERT there would be an UPDATE statement. Let alone how hard it is to debug these side-effects! In general, a bean should not modify its data. It is a data container, not business logic. Do the trimming where it makes sense: in the controller where the input occurs or in the logic where the spaces are not wanted.
person.setName(textInput.getText().trim());

Unnecessary Calendar

Calendar cal = new GregorianCalender(TimeZone.getTimeZone("Europe/Zurich"));
cal.setTime(date);
cal.add(Calendar.HOUR_OF_DAY, 8);
date = cal.getTime();
A typical mistake by a developer who is confused about date, time, calendars and time zones. To add 8 hours to a Date there is no need for a Calendar. Neither is the time zone of any relevance. (Think about is if you don't understand this!) However if we wanted to add days (not hours) we would need a Calendar, because we don't know the length of a day for sure (on DST change days may have 23 or 25 hours).
date = new Date(date.getTime() + 8L * 3600L * 1000L); // add 8 hrs

Relying on the default TimeZone

Calendar cal = new GregorianCalendar();
cal.setTime(date);
cal.set(Calendar.HOUR_OF_DAY, 0);
cal.set(Calendar.MINUTE, 0);
cal.set(Calendar.SECOND, 0);
Date startOfDay = cal.getTime();
The developer wanted to calculate the start of the day (0h00). First he obviously missed out the millisecond field of the Calendar. But the real big mistake is not setting the TimeZone of the Calendar object. The Calendar will thus use the default time zone. This may be fine in a Desktop application, but in server-side code this is hardly ever what you want: 0h00 in Shanghai is in a very different moment than in London. The developer needs to check which is the time zone that is relevant for this computation.
Calendar cal = new GregorianCalendar(user.getTimeZone());
cal.setTime(date);
cal.set(Calendar.HOUR_OF_DAY, 0);
cal.set(Calendar.MINUTE, 0);
cal.set(Calendar.SECOND, 0);
cal.set(Calendar.MILLISECOND, 0);
Date startOfDay = cal.getTime();

Time zone "conversion"

public static Date convertTz(Date date, TimeZone tz) {
  Calendar cal = Calendar.getInstance();
  cal.setTimeZone(TimeZone.getTimeZone("UTC"));
  cal.setTime(date);
  cal.setTimeZone(tz);
  return cal.getTime();
}
If you think this method does something useful, please go and read the article about time. This developer had not read the article and was desperately trying to "fix" the time zone of his date. Actually the method does nothing. The returned Date will not have any different value than the input. Because a Date does not carry time zone information. It is always UTC. And the getTime / setTime methods of Calendar always convert between UTC and the actual time zone of the Calendar.

Using Calendar.getInstance()

Calendar c = Calendar.getInstance();
c.set(2009, Calendar.JANUARY, 15);
This code assumes a Gregorian calendar. But what if the returned Calendar subclass is a Buddhistic, Julian, Hebrew, Islamic, Iranian or Discordian calendar? In these the year 2009 has a very different meaning. And a month called January doesn't exist. Calendar.getInstance() uses the current default locale to select an appropriate implementation. It depends on the Java implementaton which implementations are available. The utility of Calendar.getInstance() is thus very limited, and its use should be avoided as it's result is not well defined.
Calendar c = new GregorianCalendar(timeZone);
c.set(2009, Calendar.JANUARY, 15);

Calling Date.setTime()

account.changePassword(oldPass, newPass);
Date lastmod = account.getLastModified();
lastmod.setTime(System.currentTimeMillis());
The above code updates the last modified date of the account entity. The programmer wants to be conservative and avoids creating a new Date object. Instead she uses the the setTime method to modify the existing Date instance.
There is actually nothing wrong with that. But I just do not recommend this practice. Date objects are usually passed around carelessly. The same Date instance could be passed to numerous objects, which don't make a copy in their setters. Dates are often used like primitives. Thus if you modify a Date instance, other objects that use this instance might behave unexpectedly. Of course it is unclean design if an object exposes its intrinsic Date instance to the outside world, if you write code that strictly adheres to classical OO-principles (which I think is too inconvenient). General everyday Java practice however is to just copy Date references and not clone the object in setters. Thus every programmer should treat Date as immutable and should not modify existing instances. This should only be done for performance reasons in special situations. Even then the use of a simple long is probably equally good.
account.changePassword(oldPass, newPass);
account.setLastModified(new Date());

Assuming SimpleDateFormat was thread-safe

public class Constants {
    public static final SimpleDateFormat date = new SimpleDateFormat("dd.MM.yyyy");
}
The above code is flawed in several ways. It's broken, because it shares a static instance of a SimpleDateFormat with possibly any number of threads. SimpleDateFormat is not thread-safe. If multiple threads concurrently use this object the results are undefinied. You will may strange output from format and parse or even exceptions. Unfortunately this mistake is very common!
Yes, sharing a SimpleDateFormat requires proper synchronization. Yes that comes at a price (cache flushes, lock contention, etc.). And yes, creating a SimpleDateFormat is not free either (pattern parsing, object allocation). But simply ignoring thread-safety is not the solution, but a source of a lot of problems.
Of course this code also doesn't take the time zone into account. And then defining a class called Constants screams of yet another anti-pattern (see next section).

Having a global Configuration/Parameters/Constants class

public interface Constants {
    String version = "1.0";
    String dateFormat = "dd.MM.yyyy";
    String configFile = ".apprc";
    int maxNameLength = 32;
    String someQuery = "SELECT * FROM ...";
}
Often seen in large projects: one class or interface that contains all sorts of constants that are used throughout the application. Why is this bad? Because these constants are unrelated to each other. This class is the only thing that they have in common. And the reference to this class will pollute many again unrelated components of the application. You want to later extract a component and use it in a different application? Or share some classes between a server and a remote client? You may need to ship the constants class as well! This class has introduced a dependency between otherwise unrelated components. This inhibits reuse and loose coupling and gives way to chaos.
Instead put constants where they belong. In no case should constants be used across component boundaries. This is only allowed if the component is a library, on which an explicit dependency is wanted.

Not noticing overflows

public int getFileSize(File f) {
  long l = f.length();
  return (int) l;
}
This developer, for whatever reason, wrapped a call to determine the size of a file into a method that returns an int instead of a long. This code does not support files larger than 2 GB and just returns a wrong length in that case. Code that casts a value to a smaller size type must first check for a possible overflow and throw an exception.
public int getFileSize(File f) {
  long l = f.length();
  if (l > Integer.MAX_VALUE) throw new IllegalStateException("int overflow");
  return (int) l;
}
Another version of an overflow bug is the following. Note the missing parantheses in the first println statement.
long a = System.currentTimeMillis();
long b = a + 100;
System.out.println((int) b-a);
System.out.println((int) (b-a)); 
And last, a true gem that I uprooted during code review. Note how the programmer tried to be careful, but then failed so badly by assuming an int could ever become larger than its maximum value.
int a = l.size();
a = a + 100;
if (a > Integer.MAX_VALUE)
    throw new ArithmeticException("int overflow");

Using == with float or double

for (float f = 10f; f!=0; f-=0.1) {
  System.out.println(f);
}
The above code doesn't behave as expected. It causes an endless loop. Because 0.1 is an infinite binary decimal, f will never be exactly 0. Generally you should never compare float or double values with the equality operator ==. Always use less than or greater than. Java compilers should be changed to issue a warning in that case. Or even make == an illegal operation for floating point types in the Java Language Spec. It makes really no sense to have this feature.
for (float f = 10f; f>0; f-=0.1) {
  System.out.println(f);
}

Storing money in floating point variables

float total = 0.0f;
for (OrderLine line : lines) {
  total += line.price * line.count;
}
double a = 1.14 * 75; // 85.5 represented as 85.4999...
System.out.println(Math.round(a)); // surprising output: 85
BigDecimal d = new BigDecimal(1.14);
I have seen many developers coding such a loop. Including myself in my early days. When this code sums 100 order lines with every line having one 0.30$ item, the resulting total is calculated to exactly 29.999971. The developer notices the strange behaviour and changes the float to the more precise double, only to get the result 30.000001192092896. The somewhat surprising result is of course due to the difference in representation of numbers by humans (in decimal format) and computers (in binary format). It always occurs in its most annyoing form when you add fractional amounts of money or calculate the VAT.
Binary representation of floating point numbers was invented for inherently inexact values like measurements. Perfect for engineering! But unusable when you want exact math. Like banks.
There are business cases where you can not afford to lose precision. You lose precision when converting between decimal and binary and when rounding happens in not a well-defined mannor or at indeterminate points. To avoid losing presision you must use fixed point or integer arithmetics. This does not only apply to monetary values, but it is a frequent source of annoyance in business applications and therefore makes a good example. In the second example an unsuspecting user of the program would simply say the computer's calculator is broken. That's of course very embarassing for the programmer.
Consequently a non-integer amount of money should never ever be stored in a floating point data type. Please note that it is not just any calculation that is inexact. Even a simple multiplication with an integer can already yield an inexact result. It is the mere fact of storing a value in a binary representation (float, double) that may already cause rounding! You simply can not store 0.1 as an exact value in float or double. If you see a float or double in your financial code base, the code will most likely yield inexact results. Instead either a string or fixed point representation should be chosen. A text representation must be in a well-defined format and is not to be confused with user input/output in a locale specific format. Both representations must define the precision (number of digits before and after the decimal point) that is stored.
For calculations the class BigDecimal provides an excellent facility. The class can be used such that it throws runtime exceptions if precision is unexpectedly lost in an operation. This is very helpful to uproot subtle numerical bugs and enables the developer to correct the calculation.
BigDecimal total = BigDecimal.ZERO;
for (OrderLine line : lines) {
  BigDecimal price = new BigDecimal(line.price);
  BigDecimal count = new BigDecimal(line.count);
  total = total.add(price.multiply(count)); // BigDecimal is immutable!
}
total = total.setScale(2, RoundingMode.HALF_UP);
BigDecimal a = (new BigDecimal("1.14")).multiply(new BigDecimal(75)); // 85.5 exact
a = a.setScale(0, RoundingMode.HALF_UP); // 86
System.out.println(a); // correct output: 86
BigDecimal a = new BigDecimal("1.14");

Abusing finalize()

public class FileBackedCache {
   private File backingStore;
   
   ...
   
   protected void finalize() throws IOException {
      if (backingStore != null) {
        backingStore.close();
        backingStore = null;
      }
   }
}
This class uses the finalize method to release a file handle. The problem is that you can don't know when the method is called. The method is called by the garbage collector. If you are running out of file handles you want this method to be called rather sooner than later. But the GC will probably only invoke the method when you are about to run out of heap, which is a very different situation. It may take anything from milliseconds to days until GC and finalization runs. The garbage collector manages memory only. It does that very well. But it must not be abused to manage any other resources apart from that. The GC is not a generic resource management mechanism! I find Sun's API Doc of the finalize method very misleading in that respect. It actually suggest to use this method to close I/O resources - complete bullshit if you ask me. Again: I/O has nothing to do with memory!
Better code provides a public close method, which must be called by a well-defined lifecycle management, like JBoss MBeans or so.
public class FileBackedCache {
   private File backingStore;
   
   ...
   
   public void close() throws IOException {
      if (backingStore != null) {
        backingStore.close();
        backingStore = null;
      }
   }
}

No comments:

Post a Comment