Providers of computing services such as data science clouds need to maintain large hardware infrastructures often with thousands of nodes. Using commodity hardware leads to heter-ogeneous setups that differ significantly in individual nodes’ performance, which must be understood to allow for account-ing, strategic planning, and to identify problems and bottle-necks. Today’s method of choice are active benchmarks, but they disturb normal operations and are too expensive to run continuously. They also struggle to be representative of an ever changing workload. We therefore design a passive benchmark-ing technique, which computes expressive and accurate perfor-mance metrics based on actual workloads. We prove the quality and performance benefits of our passive benchmark on a prac-tical workload in one of the world’s largest scientific computing infrastructures, the CERN Computing Center. In fact, our ap-proach allows continuous benchmarking of the active system, while avoiding costs in terms of downtime and achieves predic-tion quality comparable to the state-of-the-art approach of active benchmarking.
|