== DATASET CONTENTS == 3000 out-of-order CPU cores running EEMBC FPMark benchmarks. == SUMMARY == This dataset contains gem5 simulation results and McPAT power consumption figures for 3000 out-of-order CPU cores running EEMBC FPMark benchmarks. The dataset is referred to internally as "exp2_big03fp". The benchmarks have been compiled for the ARM ISA with FP support and are run for 1G instructions or until completion. The simulations do not include an L2 cache. This dataset has not been used in any publications. It is, however, directly comparable to two datasets that have been used in publications [1,2]. The remainder of this README describes the data and associated tools in more detail. Please take particular note of the CAUTIONS section before using this data. == COPYRIGHT NOTICE == The dataset (simulation results) and this README file are released under the Creative Commons Attribution 4.0 International license (CC BY 4.0). This basically means that you are free to share and transform the dataset as long as you give appropriate credit. See https://creativecommons.org/licenses/by/4.0/ for details. The datasets contain results from the gem5 simulator and McPAT power model. These tools are governed by their own copyright terms. See: http://gem5.org https://code.google.com/archive/p/mcpat/ == DATA == The data is stored in an xz compressed archives. The uncompressed archive is several gigabytes in size, so make sure you have enough disk space. Extract with: $> tar -xJf exp2_big03fp.tar.xz The archive contains the following items. Replace NAME with the internal name (exp2_big03fp). NAME/ | + DESCRIPTION A short description of the experiment. | + NAME.pl The definition of the simulation run. All parameter | combinations in this file have been run. | + NAME.conf A list of parameters that are simulated together. | All combinations of parameters in NAME.pl are run with | each line in NAME.conf. Note that NAME.conf files can | contain unused configurations. 8000 configurations are | listed, but only 3000 have actually been simulated. | + fpmark1.conf A list of EEMBC FPMark benchmarks with command line | options for each one. | + WORKLOAD A list of benchmarks used for the experiment, with | compile date and compiler used. | + HG_HEAD The latest changeset applied to gem5. Also lists locally | applied patches. | + mcpat-template.xml Template McPAT XML file. This file is based off of work | done by Andrew Rice. See | https://www.cl.cam.ac.uk/~acr31/sicsa/ | + mcpat-run-log_* Log file(s) from running McPAT on the simulation | results. The logs often contain warnings about missing | fields. These can be safely ignored. The warnings occur | when an L2 cache has not been simulated with gem5 or | when a given counter is 0 and therefore is not listed in | stats.txt. | + SHASUMS SHA checksum for each file in the archive. Check with | $> shasum -c SHASUMS | +-NNNNNN/ Six-digit numbered directories. Each directory contains | one simulation. One simulation is one benchmark run on | one CPU configuration with gem5, with power calculated | with McPAT. The simulation directories contain the | following files: | + ecdf-run_log* The log file from the gem5 wrapper script. | + param_overrides.py Auto-generated python file that sets the gem5 CPU | model's parameters to the required values. | + config.ini gem5 configuration from gem5. | + config.json gem5 configuration in JSON format. | + stats.txt Hardware counters from gem5. Note that some stats.txt | files can contain more than one set of counters if | stats have been dumped more than once during the | simulation. Care must be taken to ensure the right | counters are used. | + simout STDOUT from gem5. Might include STDOUT from the | benchmark. | + stderr STDERR from gem5. | + mcpat_report.xml McPAT configuration XML file. This is | mcpat-template.xml with data filled in from config.ini | and stats.txt. | + mcpat_report Output from McPAT == TOOLS == Scripts needed for parsing the dataset have been posted to GitHub. $> git clone https://github.com/etomzak/McPAT-Utils.git utils_trunk gather.pl builds a tab-separated table of values extracted from each simulation's config.ini, stats.txt, ecdf-run_log* and mcpat_report files. The table generated by gather.pl is defined with a configuration file. The McPAT::ParseOut Perl module contains functionality for parsing output from McPAT. Documentation is available with: $> perldoc utils_trunk/gather.pl $> perldoc utils_trunk/McPAT/ParseOut.pm An example configuration file is provided in utils_trunk/gather-example.conf. The file declares a Perl hash, with keys for each of the four types of files that will be parsed. Each key point to an array of strings. Each string will become a column in the output table. The strings begin with "NN!", where NN is the number of the column. The rest of the string is a (potentially period- separated hierarchical) reference to a value in a file. For example, '2!system.cpu.numROBEntries' means that the second column will be the number of CPU reorder buffer entires. The simplest way to run gather.pl is like this: $> ./gather.pl --config gather-example.conf \ --root /path/to/NAME/ \ --mcpat mcpat_report \ --outfile out.csv\ --bar == CAUTIONS == The limitations of gem5 and McPAT are widely known and occasionally documented. See, e.g. [6]. Fundamentally, gem5 and McPAT use different microarchitectural models, and there is very little that an end-user can do to reconcile the two. Any conclusions drawn from gem5 and McPAT must be informed by an understanding of the tools' strengths and weaknesses as well as an understanding of the hardware that is being modeled. Just because gem5+McPAT say that a particular CPU design is good does not mean that it actually is good, or even that it is physically feasible. For ideas on how to work around these limitations, see, e.g., section 2.3 in [3], section 4.2 in [4], and sections 4.3.2, 4.3.3 and 6.2 in [5]. The simulations have been run with gem5 changeset 9351 and McPAT version 0.8 with minor bug fixes (including a fix for the search bug in ArrayST). These versions date from 2012. With every update to gem5 and McPAT, the data becomes more and more out of date. Consequently, the data is unlikely to yield new, accurate microarchitectural insights. It might, however, be useful for comparative studies, developing statistical techniques, etc. Finally, the dataset contains a mcpat-template.xml file. PLEASE, PLEASE DO *NOT* JUST REUSE THIS FILE FOR YOUR EXPERIMENTS and assume that it will give you correct results. Due to the differences between gem5 and McPAT, there is no universally correct way to convert between the two. Depending on the experiment, this file might not include the correct parameters, or it might not use the parameters correctly for your particular experiment. Remember that this dataset is distributed under the CC-BY-4.0 license, which means that the "Licensor offers the Licensed Material as-is and as-available, and makes no representations or warranties of any kind concerning the Licensed Material..." (paragraph 5.a). I.e., there is no guarantee that anything in the dataset is correct for you, or even that anything is correct at all (although I obviously made every effort to ensure that the dataset was correct for my use case). == AUTHOR == This dataset has been generated by Erik Tomusk. == ACKNOWLEDGEMENTS == A large part of this data has been generated with resources provided by the Edinburgh Compute and Data Facility (ECDF). (http://www.ecdf.ed.ac.uk/). The ECDF is partially supported by the eDIKT initiative (http://www.edikt.org.uk). == REFERENCES == [1] E. Tomusk. “EEMBC Benchmark Suite Simulations.“ Dataset, 2016. Available: https://dx.doi.org/10.7488/ds/1571 [2] E. Tomusk. “SPEC 2006 Integer Benchmark Suite Simulations.“ Dataset, 2016. Available: https://dx.doi.org/10.7488/ds/1584 [3] E. Tomusk, C. Dubach, M. O'Boyle, "Four metrics to evaluate heterogeneous multicores," in the ACM Transactions on Architecture and Code Optimization (TACO), vol. 12, no. 4, Nov 2015. Available: http://doi.acm.org/10.1145/2829950 [4] E. Tomusk, C. Dubach, M. O'Boyle, "Selecting heterogeneous cores for diversity," in the ACM Transactions on Architecture and Code Optimization (TACO) (to appear). [5] E. Tomusk. Heterogeneous Processor Composition: Metrics and Methods. PhD Thesis, University of Edinburgh. 2016. [6] S. Xi, H. Jacobson, P. Bose, G. Wei, and D. Brooks. "Quantifying sources of error in McPAT and potential impacts on architectural studies," in the International Symposium on High Performance Computer Architecture (HPCA), 2015. Available: http://dx.doi.org/10.1109/HPCA.2015.7056064