Modeling performance and power matrix of disparate computer systems using machine learning techniques (Modeling Compiler Systems Selection)
MetadataShow full item record
In the last couple of decades, there has been an exponential growth in the processor, cache, and memory features of computer systems. These hardware features play a vital role in determining the performance and power of a software application when executed on different computer systems. Furthermore, any minor alterations in hardware features or applications can alter and impact the performance and power consumption. Compute intensive (compute-bound) applications have a higher dependence on processor features, while data-intensive (memory-bound) applications have a higher dependence on memory features. To match the customized budgets in performance and power, selecting computer systems with appropriate hardware features (processor, cache, and memory) becomes extremely essential. To adhere to user-specific budgets, selecting computer systems requires access to physical systems to gather performance and power utilization data. To expect a user to have access to physical systems to achieve this task is prohibitive in cost; therefore, it becomes essential to develop a virtual model which would obviate the need for physical systems. Researchers have used system-level simulators for decades to build simulated computer systems using processor, cache, and memory features to provide estimates of performance and power. In one approach, building virtual systems using a full-system simulator (FSS), provides the closest possible estimate of performance and power measurement to a physical system. In the recent past, machine learning algorithms have been trained on the above-mentioned accurate FSS models to predict performance and power for varying features in similar systems, achieving fairly accurate results. However, building multiple computer systems in a full-system simulator is complex and an extremely slow process. The prob lem gets compounded due to the fact that access to such accurate simulators is limited. However, there is an alternative approach of utilizing the open-source gem5 simulator using its emulation mode to rapidly build simulated systems. Unfortunately, it compromises the measurement accuracy in performance and power as compared to FSS models. When these results are used to train any machine learning algorithm, the predictions would be slightly inaccurate compared to those trained using FSS models. To make this approach useful, one needs to reduce the inaccuracy of the predictions that are introduced due to the nature and design of the gem5 functionality and as a consequence of this, the variation introduced due to the types of applications, whether it is compute-intensive or data-intensive. This dissertation undertakes the above-mentioned challenge of whether one can effectively combine the speed of the open-access gem5 simulated system along with the accuracy of aphysical system to acquire accurate machine learning predictions. If this challenge is met, a user would be able to successfully select a system either in the cloud or in the real world to run applications within ones’ power and performance budget. In our proposed methodology, we first created several gem5 models using the emulation mode for available systems with varying features like the type of processors (Instruction Set Architecture, speed and cache configuration), type of memory its speed and size. We executed compute-intensive and data-intensive benchmark applications to these models to procure performance results. In the second step, 80% of the models, generated using the gem5 simulator in the emulation mode, were used to train machine learning algorithms like linear, support vector, Gaussian, tree-based and neural network. The remaining 20% models were used for the purpose of performance prediction. It was found that the treebased algorithm predicted the closest performance values compared to the simulated systems’ results obtained using the above-mentioned gem5 model. We subsequently used hardware configuration and application execution statistics data generated by the gem5 model and fed it to the Multicore Power Area and Timing (McPAT) modeling tool which would estimate power usage. To check the accuracy of the gem5 simulator results, the above-mentioned benchmark applications were fed to real systems with identical features. Thegiven application code was modified to invoke the Performance Application Programming Interface (PAPI) function to measure the power consumption. There was a sizeable difference between the results of the gem5 model and the real system in terms of performance and power. We conceptualized the idea of using scaling and transfer learning in the context of bridging the difference between predicted values to actual values. We proposed a scaling technique that can establish an application-specific scaling factor using a correlation coefficient between hardware features and performance/power. This scaling factor would capture the difference and apply it to a set of predicted values to conform to those of the physical system. The results demonstrate that for selected benchmark applications the scaling technique achieves a prediction accuracy of 75%-90% for performance and 60%-95% for power. The accuracy of the results validates that the scaling technique effectively attempts to bring predicted performance and power values closer to that of physical systems to enable the selection of an appropriate computer system(s). Another method to achieve better prediction values is to develop a model based on the existing transfer learning technique. To use the transfer learning method, we train the decision tree algorithm based on two sets of data; one, from a simulated system and the second from a closely matching physical system. Using trained models, we attempt to predict the performance and power of the target physical system. The target system is different from the source physical system used for training the machine learning algorithm. This model uses performance and power from a source physical system during training to bring predicted values closer to that of the target system. The results from the transfer learning technique for selected benchmark applications display the mean prediction accuracy for different target systems to be between 10% to 50%. In this work, we have demonstrated that our proposed techniques, scaling and transfer learning, are effective in estimating fairly accurate performance and power values for the physical system using the predicted values from a machine learning model trained on a gem5 simulated systems dataset. Therefore, these techniques provide a method to estimate performance and power values for physical computer systems, with known hardware features, without a need for access to these systems. With estimated performance and power values coupled with hardware features of the physical systems, we can select system(s) based on userprovided budget/s of performance and power.
- PhD Theses