Java I/O Performance Iterating Directories

A while ago at work we were confronted with the task of creating a directory listing in a Grails application. We’ve tried a couple of approaches, one the Groovy way and one the Java way. Both delivered only a poor performance. A short search brought forth a Stackoverflow thread addressing the issue of slow Java i/o performance with only one real solution: switch from Java 6 to 7. That’s no option at work but out of personal interest I gave it a try at home.

The Task

Write a simple Java application that uses the traditional way with File.list() and the new Java 7 features from the java.nio.file package that scans a path passed on the command line. The application then prints how long it took to scan through it all and how many files and folders it encountered on its journey through the filesystem jungle.

File.list()

This is a straight forward approach. There’s one class that contains the main() method and the workhorse countRecursive which calls itself repeatedly for each directory found. Time is measured in milliseconds.

    public class IOPerformance {
        private int files   = 0;
        private int folders = 0;

void countRecursive(File folder) {
 if (null == folder) {
 return;
 }

String[] list = folder.list();
 if (null == list) {
 return;
 }

for (String fileName : list) {
 String folderName = folder.getAbsolutePath();
 if (!folderName.endsWith(File.separator)) {
 folderName = folderName + File.separator;
 }

File item = new File(folderName + fileName);
 if (item.isDirectory()) {
 folders++;
 countRecursive(item);
 }
 else {
 files++;
 }
 }
 }

public void printResults() {
 System.out.println(String.format("Files found : %d", files));
 System.out.println(String.format("Folders found: %d", folders));
 }

public static void main(String[] args) {
 IOPerformance tester = new IOPerformance();

long start = Calendar.getInstance().getTimeInMillis();

tester.countRecursive(new File(args[1]));

long end = Calendar.getInstance().getTimeInMillis();

System.out.println(String.format("Time passed : %d", + end - start));
 tester.printResults();
 }
 }

One interesting note: this code automatically follows symbolic links!

java.nio.file

This is a new package in Java 7 which is supposed to facilitate the work with filesystems (take a look at this and this).
A recursive scan is actually much easier now since it is reduced to one method call Files.walkFileTree(...). Well, that’d be too easy so you’ll need to implement an interface, too. The whole process is explained in the Java Tutorials so I won’t repeat that. I’ll just show you my code.

    public class IOPerformance implements FileVisitor {
        private int files   = 0;
        private int folders = 0;

public void countWithVisitor(String source) throws IOException {
 Path p = Paths.get(source);
 Files.walkFileTree(
 p, 
 EnumSet.of(FileVisitOption.FOLLOW_LINKS), 
 Integer.MAX_VALUE, 
 this);
 }

@Override
 public FileVisitResult preVisitDirectory(Path dir, BasicFileAttributes attrs) {
 folders++;
 return FileVisitResult.CONTINUE;
 }

@Override
 public FileVisitResult postVisitDirectory(Path dir, IOException exc) {
 return FileVisitResult.CONTINUE;
 }

@Override
 public FileVisitResult visitFile(Path file, BasicFileAttributes attr) {
 files++;
 return FileVisitResult.CONTINUE;
 }

@Override
 public FileVisitResult visitFileFailed(Path file, IOException exc) {
 System.out.println("Could not visit: " + file);
 return FileVisitResult.CONTINUE;
 }

public void printResults() {
 System.out.println(String.format("Files found : %d", files));
 System.out.println(String.format("Folders found: %d", folders));
 }

public static void main(String[] args) {
 try {
 IOPerformance tester = new IOPerformance();

long start = Calendar.getInstance().getTimeInMillis();

tester.countWithVisitor(args[0]);

long end = Calendar.getInstance().getTimeInMillis();

System.out.println(String.format("Time passed : %d", + end - start));
 tester.printResults();
 }
 catch (IOException e) {
 e.printStackTrace();
 }
 }
 }

If you prefer the easy way out then extend SimpleFileVisitor<Path> instead of implementing the whole interface. This would reduce your code to one method to override instead of four.

Results

So is it worth it? Has anything changed at all? Before I answer these questions I give you a short insight in what I’ve tested.

  • The traditional approach …
    • compiled with Java 6 running on Java 6
    • compiled with Java 6 running on Java 7
    • compiled with Java 7 running on Java 7
  • The new approach (obviously compiled with and running on Java 7)
  • Just for fun: the traditional approach with C++

I ran each test four times but only measured the last three. The first counts as warmup when the runtime and application is loaded an thus spoils the results. Example: running the Java 6 version for the first time required 12 seconds to finish. Subsequent runs were way faster.

Technique Compiler  Runtime Result
Traditional Java 6 Java 6 4410 ms
Traditional Java 6 Java 7 4813 ms
Traditional Java 7 Java 7 4764 ms
java.nio.file Java 7 Java 7 3610 ms
C++ Clang 3 Native 1526 ms

Roundabout a second advantage for java.nio.file. I scanned my home folder which contains 332802 files and 81678 folders. It is faster but not blazingly fast. Only when scanning very complex tree structures then that one second will add up to make a real difference. It’s progress nevertheless and I appreciate that! However, C++ beats the crap out of Java in this regard.

One thought on “Java I/O Performance Iterating Directories

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s