Improving Parallel I/O Performance using Interval I/O
Today's most advanced scientific applications run on large computer systems consisting of hundreds of thousands of computer processors, access state of the art parallel file systems that allow files to be distributed across hundreds of disks, and utilize advanced interconnections systems that allow for theoretical speeds of hundreds of gigabytes per second. Despite these advanced technologies, these applications often fail to obtain a reasonable proportion of the theoretical top speed of the hardware. The reasons for the poor performance of these distributed file accesses include the noncontiguous access patterns used for scientific computing, increased contention due to false sharing, and the somewhat finicky nature of parallel file system performance. We argue that a more fundamental cause of this problem is the legacy view of a file as a linear sequence of bytes.
To address these issues, we introduce a novel approach called Interval I/O (input / output). Interval I/O is an innovative approach that uses application access patterns to partition a file into a series of intervals, which are used as the fundamental unit for subsequent file accesses. Use of this approach provides superior performance for noncontiguous access patterns. In addition, the approach reduces false contention and the unnecessary serialization it causes. Interval I/O also significantly increases the performance of atomic mode operations. Finally, the Interval I/O approach includes a technique for supporting collaborative file accesses for cooperating applications. We provide a prototype implementation of our Interval I/O system and use it to demonstrate performance improvements of as much as 1000% compared to commonly used methods when using Interval I/O with several common benchmarks.
