Publication Date
2021
Document Type
Dissertation/Thesis
First Advisor
Papka, Michael E.
Degree Name
M.S. (Master of Science)
Legacy Department
Department of Computer Science
Abstract
High-performance computing (HPC) resources at facilities such as Argonne National Laboratory's Leadership Computing Facility (ALCF) enable a wide array of scientific experiments and research applications. In day-to-day operation, these platforms collect copious amounts of system, performance, and debugging logs, capturing data about how jobs, individual tasks, and the system as a whole, operate and perform. This thesis builds on previous efforts to examine how these logs can be used to better understand user and application behavior and system resource usage, in addition to demonstrating machine-learning-based (ML-based) techniques for characterizing applications and predicting job behavior using log data. Five datasets collected from the operation of two supercomputers at the ALCF from 2014--2020 were used for the analysis.
We first demonstrate that the usage of ALCF supercomputers is consistent, repetitive, and patterned, suggesting that it is suitable for training ML models. We next investigate the usage of workflow-based HPC jobs compared to ``traditional'' single-task HPC jobs, as well as the utilization of solid-state drive (SSD)-based cache drives on ALCF's Theta supercomputer. From these analyses, we enumerate potentially advantageous changes and adaptations the ALCF and other facilities might consider in current and future systems. We also show that hardware performance counters provide a viable alternative for application identity verification and resource-intensiveness classification using ML-based approaches, accomplishing near-parity in testing accuracy without overhead and coverage constraints faced by prior log-based approaches. Finally, we investigate methods to improve an ML-based technique for application runtime estimation, with implications for job scheduling on HPC systems.
Recommended Citation
Lewis, Ryan David, "Log Analysis and Visualization of HPC Application Performance Data" (2021). Graduate Research Theses & Dissertations. 7297.
https://huskiecommons.lib.niu.edu/allgraduate-thesesdissertations/7297
Extent
49 pages
Language
eng
Publisher
Northern Illinois University
Rights Statement
In Copyright
Rights Statement 2
NIU theses are protected by copyright. They may be viewed from Huskie Commons for any purpose, but reproduction or distribution in any format is prohibited without the written permission of the authors.
Media Type
Text