[opendc-sc18-software] A Reference Architecture for Datacenter Scheduling: Software Artifacts

This release contains the software artifacts of the paper A Reference Architecture for Datacenter Scheduling presented at Supercomputing 2018

For the paper, experiments have been run on the following traces:

  • Askalon (W-Eng) - askalon_workload_ee
  • Chronos (W-Ind) - chronos_exp_noscaler_ca

Each of the directories for the traces have the following structure:

  • /setup.txt
    This text file describes the trace used for the experiment in addition to the amount of times the experiment was repeated and the amount of warm-up experiments.
  • /setup.json
    This JSON file describes the topology of the datacenter used in the experiments. Each item represents the identifiers of the resource (here, CPU type) to use in the machine. The available CPU types are (1) Intel i7 (4 cores, 4100 MHz) and (2) Intel i5 (2 cores, 3500 MHz).
  • /trace
    This directory contains the trace used in the simulation. The trace is stored in the Grid Workload Format. See the Grid Workload Archive for more information.
  • /data/experiments.csv
    A CSV file containing information of all simulations that have been run on the OpenDC platform for this experiment.
  • /data/job_metrics.csv
    A CSV file containing metrics (NSL, JMS, etc.) for each job that ran during the simulations.
  • /data/stage_measurements.csv
    A CSV file containing timing measurements for the scheduling stages that ran during the simulations.
  • /data/task_metrics.csv
    A CSV file containing metrics for each task that ran during the simulations.
  • /data/tasks.csv
    A CSV file containing information about the tasks (submit time, runtime, etc.) that ran during the simulations as extracted from the traces.

    Additionally, we describe the format of each data file in the associated metadata file.

Hardware

The hardware used for running the experiments is a MacBook Pro with a 2,9 GHz Intel Core i7 processor and 16 GB 2133 MHz LPDDR3 internal memory.

Reproduction

This section describes the instructions for reproducing the paper results using a provided Docker image. Please make sure you have Docker installed and running.

For reproduction, you will run the following experiments:

  • askalon_workload_ee
    This is the large experiment of the paper and will take approximately 4 hours to complete similar hardware.
  • chronos_exp_noscaler_ca
    This is the smaller experiment of the paper and will take approximately 5 minutes to complete on similar hardware.

The Docker image atlargeresearch/sc18-experiment-runner can be used for running the experiments. A volume can be attached to the directory /home/gradle/simulator/data to capture the results of the experiments.

Make sure you have, in your current working directory, the following files:

  • /setup.json
    This JSON file describes the topology of the datacenter and can be found in this archive at askalon_workload_ee/setup.json.
  • /askalon_workload_ee.gwf
    This file contains the trace for the Askalon workload. This file can be found in the archive at askalon_workload_ee/trace/askalon_workload_ee.gwf.
  • /chronos_exp_noscaler_ca.gwf
    This file contains the trace for the Chronos workload. This file can be found in the archive at chronos_exp_noscaler_ca/trace/chronos_exp_noscaler_ca.gwf.

Then, you can start the Askalon experiments as follows:

$ docker run -it --rm -v $(pwd):/home/gradle/simulator/data atlargeresearch/sc18-experiment-runner -r 32 -w 4 -s data/setup.json data/askalon_workload_ee.gwf

The experiment runner can be configured with the following options

  • -r, --repeat
    The amount of times to repeat an experiment for each scheduler.
  • -w, --warm-up
    The amount of times to warm-up the simulator for each scheduler.
  • -p, --parallelism
    The number of experiments to run in parallel.
  • --schedulers
    The list of schedulers to test, separated by spaces. The following schedulers are available: SRTF-BESTFIT, SRTF-FIRSTFIT, SRTF-WORSTFIT, FIFO-BESTFIT, FIFO-FIRSTFIT, FIFO-WORSTFIT, RANDOM-BESTFIT, RANDOM-FIRSTFIT, RANDOM-WORSTFIT.

After the Askalon experiments have been finished, you can start the Chronos experiments. Make sure you have a copy of the result files in your directory as the result files will be overwritten.

$ docker run -it --rm -v $(pwd):/home/gradle/simulator/data atlargeresearch/sc18-experiment-runner -r 32 -w 4 -s data/setup.json data/chronos_exp_noscaler_ca.gwf