GPGPU Sim Tutorial

Embed Size (px)

Citation preview

  • 8/18/2019 GPGPU Sim Tutorial

    1/28

    GPGPU-Sim TutorialZhen Lin

    North Carolina State UniversityBased on GPGPU-Sim Tutorial and Manual by UBC

  • 8/18/2019 GPGPU Sim Tutorial

    2/28

    Outline

    GPGPU-Sim Overview• Demo1: Setup & Configuration

    • GPGPU-Sim Internals

    • Demo2: Scheduling Study

  • 8/18/2019 GPGPU Sim Tutorial

    3/28

    Outline

    GPGPU-Sim Overview• Demo1: Setup & Configuration

    • GPGPU-Sim Internals

    • Demo2: Scheduling Study

  • 8/18/2019 GPGPU Sim Tutorial

    4/28

    GPGPU-Sim in a Nutshell

    Microarchitecture timing model of contemporary GPUs• Run unmodified CUDA/OpenCL

  • 8/18/2019 GPGPU Sim Tutorial

    5/28

    What GPGPU-Sim Simulates

    Functional model• PTX

    • SASS

    • Timing model for the compute part of a GPU

    • Not for CPU or PCIe

    Only model microarchitecture timing relevant to compute

  • 8/18/2019 GPGPU Sim Tutorial

    6/28

    Functional model

    PTX• A low-level, data-parallel virtual machine and instruction set archi

    • Between CUDA and hardware ISA (SASS)

    • Stable ISA that spans multiple GPU generations

    • SASS/PTXPLUS• Hardware native ISA

    • PTX -> Translate + Optimize -> SASS• More accurate, but not well supported

    • CUDA tool chain

  • 8/18/2019 GPGPU Sim Tutorial

    7/28

    Functional Model (PTX)

    • Scalar ISA

    • SSA representation: register allocation not done in PTX

  • 8/18/2019 GPGPU Sim Tutorial

    8/28

    Timing Model for GPU Micro-Architectu

    GPGPU-Sim simulates the timing modelof a GPU running each launched CUDAkernel

    • Report stats (e.g. # cycles) for each kernel

    • Exclude any time spent on data transferon PCIe bus

    • CPU is assumed to be idle when the GPUis working

  • 8/18/2019 GPGPU Sim Tutorial

    9/28

  • 8/18/2019 GPGPU Sim Tutorial

    10/28

    Outline

    GPGPU-Sim Overview• Demo1: Setup & Configuration

    • GPGPU-Sim Internals

    • Demo2: Scheduling Study

  • 8/18/2019 GPGPU Sim Tutorial

    11/28

    Demo1

    Setup• Stats

    • Configuration

  • 8/18/2019 GPGPU Sim Tutorial

    12/28

    Outline

    GPGPU-Sim Overview• Demo1: Setup & Configuration

    • GPGPU-Sim Internals

    • Demo2: Scheduling Study

  • 8/18/2019 GPGPU Sim Tutorial

    13/28

    Overview of the Architecture

  • 8/18/2019 GPGPU Sim Tutorial

    14/28

    Inside a SIMT Core

    Pipeline stages• Fetch

    • Decode

    • Issue

    • Read operand

    • Execution

    • Writeback

  • 8/18/2019 GPGPU Sim Tutorial

    15/28

    Fetch + Decode

    Arbitrate the I-cacheamong warps

    • Cache miss handled byfetching again later

    • Fetched instruction isdecoded and then

    stored in the I-Buffer• 1 or more entries / warp

    • Only warp with vacantentries are considered infetch

  • 8/18/2019 GPGPU Sim Tutorial

    16/28

    Issue

    Selects a warp with a readyinstruction

    • Acquires the activemaskfrom TOS of SIMT stack

    • Invalid the I-buffer

  • 8/18/2019 GPGPU Sim Tutorial

    17/28

    December 2012 GPGPU-Sim Tutorial (MICRO 2012) 4: Microarchitecture Model

    Scoreboard

    • Checks for RAW and WAW

    dependency hazard• Flag instructions with hazards as not ready in I-Buffer

    (masking them out from the scheduler)

    • Instructions reserves dest registers at issue

    • Release them at writeback

  • 8/18/2019 GPGPU Sim Tutorial

    18/28

    December 2012 GPGPU-Sim Tutorial (MICRO 2012) 4: Microarchitecture Model

    Read Operand

    • Operand Collector Architecture (US Patent: 7834881)

     – Interleave operand fetch from different threads to achieve full utilization

    Bank 0 Bank 1 Bank 2 Bank 3

    R0 R1 R2 R3

    R4 R5 R6 R7

    R8 R9 R10 R11

    … … … …

    add.s32 R3, R1, R2; No Conflict

    mul.s32 R3, R0, R4; Conflict at bank 0

  • 8/18/2019 GPGPU Sim Tutorial

    19/28

    December 2012 GPGPU-Sim Tutorial (MICRO 2012) 4: Microarchitecture Model

    Operand Collector

    (from instruction issue stage)

    dispatch

  • 8/18/2019 GPGPU Sim Tutorial

    20/28

    Execution

    • ALU

    • Stream processor (SP)

    • Specific function unit (SFU)

    • MEM

    • Shared memory

    • Local memory

    • Global memory

    • Texture memory

    • Constant memory

  • 8/18/2019 GPGPU Sim Tutorial

    21/28

    December 2012 GPGPU-Sim Tutorial (MICRO 2012) 4: Microarchitecture Model

    ALU Pipelines

    • SIMD Execution Unit

    • Fully Pipelined

    • Each pipe may execute a subset of instructions

    • Configurable bandwidth and latency (depending on the inst

    • Default: SP + SFU pipes

  • 8/18/2019 GPGPU Sim Tutorial

    22/28

    December 2012 GPGPU-Sim Tutorial (MICRO 2012) 4: Microarchitecture Model

    Memory Unit• Model timing for memory

    instructions

    • Support half-warp (16threads)

    • Double clock the unit

    • Each cycle service half thewarp

    • Has a private writebackpath

     Access

    Coalesc. A

    G

    U

    Shared

    Mem

    Bank

    Conflict

    Const.

    Cache

    Texture

    Cache

    Data

    Cache

       M  e  m

      o  r  y   P  o  r   t

    MSH

  • 8/18/2019 GPGPU Sim Tutorial

    23/28

    Writeback

    • Write result to register file

    • Scoreboard updates the r-bit

  • 8/18/2019 GPGPU Sim Tutorial

    24/28

    Stack-Based Branch Divergence Hardwa

    • When the branch diverge

    • New entries are pushed to SIMT stack

    • RPC set to the immediate post dominator

    • Activemast indicates which threads are active

    • PC is sent to fetch unit

    • When RPC is reached

    • Pop the TOS

    • PC of new TOS is sent to the fetch unit

  • 8/18/2019 GPGPU Sim Tutorial

    25/28

    Outline

    • GPGPU-Sim Overview

    • Demo1: Setup & Configuration

    • GPGPU-Sim Internals

    • Demo2: Scheduling Study

  • 8/18/2019 GPGPU Sim Tutorial

    26/28

    Demo2

    • Software framework overview

    • To monitor the warp scheduling order

    • Compare with different scheduling policies

  • 8/18/2019 GPGPU Sim Tutorial

    27/28

    For More Information

    • http://www.gpgpu-sim.org/

    http://www.gpgpu-sim.org/http://www.gpgpu-sim.org/http://www.gpgpu-sim.org/

  • 8/18/2019 GPGPU Sim Tutorial

    28/28

    • Thanks & question?