Participating teams will have to provide the best solutions and IO-performance for the following 6 tasks (3 benchmarks, 3 common HPC use cases, 1 secret task):

TaskDescription
IO500The IO500 benchmark is a comprehensive performance evaluation tool designed to assess the efficiency and scalability of HPC storage systems. It consists of several tests, including IOR for sequential read/write performance and mdtest for metadata operations, ensuring a thorough analysis of storage subsystems. The benchmark helps organizations and researchers to identify bottlenecks, compare different storage solutions, and guide the development of optimized storage architectures. Regularly updated results and rankings foster a competitive environment, encouraging continuous innovation in HPC storage technologies.
MD-WorkbenchThe MD-Workbench benchmark is a specialized benchmark designed to evaluate the performance of metadata operations in HPC filesystems. By simulating a range of metadata-intensive workloads, such as file creation, deletion, and attribute modification, MD-Workbench provides detailed insights into the efficiency and scalability of file system metadata handling.
ElbenchoElbencho is a benchmark designed to evaluate the performance of storage systems under various workloads. The benchmark assesses the read and write capabilities of file systems, including network file systems, by generating synthetic workloads that simulate real-world usage patterns. Elbencho supports both single-threaded and multi-threaded operations, allowing for comprehensive performance analysis across different configurations and scales.
NVIDIA DALINVIDIA DALI (Data Loading Library) is an advanced library designed to accelerate data preprocessing and augmentation for deep learning applications. By leveraging GPU acceleration, DALI streamlines the data pipeline, significantly reducing the time required to load, transform, and prepare data for neural network training. It supports a wide range of operations, including image and video decoding, resizing, cropping, and normalization, all performed with high efficiency. DALI can be integrated with popular deep learning frameworks like TensorFlow and PyTorch, allowing for easy incorporation into existing workflows.

Scoring for tasks

The scoring for the SSC is outlined in the table below. Any Scores that are tied to benchmarking will result in more points for higher scores. The maximum points are the number of teams in the competition, meaning for 5 teams the point spread is as follows:

The acronym DLIO refers to the “Deep learning IO Benchmark” and is used as a reerence here.

ApplicationTaskPoints
IO500Submission to IO500 Webpage (Research section)
Full submission, partly missing description 2 points, 1 point for reproducibility questionnaire
3
10 Client setup
Scoring based on results
5
Description of the configurations measured and performance improvement made
At least 5 different node/process combinations with reasoning on one summary page
5
MD WorkbenchLowest (maximum latency) for the fixed configuration
Scoring base on results
5
ElbenchoRun for arbitrary number of client nodes, 100 KByte files (same file size as DLIO)
The goal is to find out if a larger number of clients is automatically faster and a smaller number of clients is automatically slower or not. Submit write and read results. Ranking based on read result.
2
Run 10 Client nodes, 100 KByte file (same file size as DLIO) vs. 100 KByte random reads in shared files
The goal is to compare the read performance difference for reading 100KB files directly versus randomly reading 100KB records from one or multiple large files. Submit write and read results. Ranking based on read result.
5
Run 1 client node, single thread 100 KByte random reads in large files
The goal is to see how many read IOPS a single thread can achieve. Submit write and read results.
5
HINT: See the built-in help pages “elbencho –help-large” and “elbencho –help-multi” to get examples for working with large shared files and many small files. See “elbencho –help-dist” to get examples for working with multiple clients. The “–dryrun” option of elbencho can be helpful to see how big the resulting dataset will be, especially for distributed runs where each thread creates many small files.
NOTE: In all cases, make the dataset large enough so that the read test runs for at 20 seconds without using test parameters to read the same data multiple times like “–infloop” or “–iterations”. The goal is to try measuring drive access performance instead of RAM cache performance. Large file means 1GB file size or more. Generate the large files by doing sequential writes within each file (i.e. write without the “–rand” parameter).
NVIDIA DALI PipelineLarge file access pipeline with prepared python code
See https://gitlab.gwdg.de/ssc/dali-benchmark. Scoring base on results
5
Performance analysisWritten report
Comparison between benchmarks and theoretical maximum,1-2 pages of description, analysis, reasoning, and conclusion
10
Secret tasksRun IO500 and MD Workbench in parallel
Scoring base on results
5