NERSC logo National Energy Research Scientific Computing Center
  A DOE Office of Science User Facility
  at Lawrence Berkeley National Laboratory
 
PackagePlatformVersionModule Docs
IHPCT bassi 1.0 ihcpt  Vendor
(*) Denotes limited support

IBM High Performance Computing Toolkit (IHPCT)

The IBM High Performance Computing Toolkit is a collection of tools and libraries which make it easy to collect performance data from a program. It provides libraries to to gather data from programs parallelized with OpenMP, SHMEM and MPI. It allows you to collect data from the hardware performance counters too. With SiGMA a tool is provided which is able to show the utilisation of the memory with a very fine granularity. The PeekPerf GUI is able to show the collected data within one window.

Usage

To use the IHPCT load the ihpct module:

s00509> module load ihpct

Components

HPM Toolkit
The Hardware Performance Monitor (HPM) Toolkit was developed for performance measurement of applications running on IBM systems, including POWER 5, despite its lack of mention in the IBM documentation. See HW counter information at the end of this document. [HTML Docs] Sample hpmcount output using the default counter set is given below.
Trace Library
These libraries collect profiling and tracing data for MPI and TurboSHMEM programs. [PDF Docs]
SiGMA
The Simulation Guided Memory Analyzer (SiGMA) is a toolkit designed to help programmers to understand the precise memory references in scientific programs that are causing poor utilization of the memory subsystem. [PDF Docs]
Dynamic Performance Monitoring Interface for OpenMP (DPOMP)
DPOMP is a standard API for performance monitoring of OpenMP applications. [HTML Docs]
Peekperf
PeekPerf is a tool that will allow you to map your collected performance data back to the source code. [PDF Docs]

Documentation

See the IBM ACTC website.

hpmcount sample output

Here is sample output of running hpmcount on a serial program.

bassi% hpmcount name _of_your_executable

Of particular interest are the "Maximum resident set size" or memory used (524776 Kbytes in this example) and "Algebraic flop rate," which is an oft-quoted performance metric.

 hpmcount (V 3.1.1a) summary

 Execution time (wall clock time): 14.9649360179901 seconds

 ########  Resource Usage Statistics  ########  

 Total amount of time in user mode            : 14.766147 seconds
 Total amount of time in system mode          : 0.002758 seconds
 Maximum resident set size                    : 524776 Kbytes
 Average shared memory use in text segment    : 1168 Kbytes*sec
 Average unshared memory use in data segment  : 7760076 Kbytes*sec
 Number of page faults without I/O activity   : 175
 Number of page faults with I/O activity      : 8
 Number of times process was swapped out      : 0
 Number of times file system performed INPUT  : 0
 Number of times file system performed OUTPUT : 0
 Number of IPC messages sent                  : 0
 Number of IPC messages received              : 0
 Number of signals delivered                  : 0
 Number of voluntary context switches         : 6
 Number of involuntary context switches       : 194

 #######  End of Resource Statistics  ########

  PM_FPU_1FLOP (FPU executed one flop instruction)   :     12997150350
  PM_FPU_FMA (FPU executed multiply-add instruction) :      4267130255
  PM_ST_REF_L1 (L1 D cache store references)         :      6385968983
  PM_LD_REF_L1 (L1 D cache load references)          :     18877027723
  PM_INST_CMPL (Instructions completed)              :     49974865601
  PM_RUN_CYC (Run cycles)                            :     28160978574

  Utilization rate                               :          98.672 %
  MIPS                                           :        3339.464 
  Instructions per run cycle                     :           1.775 
  Instructions per load/store                    :           1.978 
  Algebraic floating point operations            :       21531.411 M
  Algebraic flop rate (flops / WCT)              :        1438.791 Mflop/s
  Algebraic flops / user time                    :        1458.160 Mflop/s
  FMA percentage                                 :          39.636 %
  % of peak performance                          :          19.165 %

HPM data

The IHPCT includes the hpmcount hardware performance utility and the hpm library. See the ACTC link above for documentation. The counters and sets on POWER 5 are much more complicated than those on POWER 3 (Seaborg). A listing of POWER 5 counters and counter sets follows.

ER5 supports 6 counters
 Counter 0: 213 events
  0: PM_0INST_CLB_CYC <--> Cycles no instructions in CLB
  1: PM_1INST_CLB_CYC <--> Cycles 1 instruction in CLB
  2: PM_1PLUS_PPC_CMPL <--> One or more PPC instruction completed
  3: PM_2INST_CLB_CYC <--> Cycles 2 instructions in CLB
  4: PM_3INST_CLB_CYC <--> Cycles 3 instructions in CLB
  5: PM_4INST_CLB_CYC <--> Cycles 4 instructions in CLB
  6: PM_5INST_CLB_CYC <--> Cycles 5 instructions in CLB
  7: PM_6INST_CLB_CYC <--> Cycles 6 instructions in CLB
  8: PM_BRQ_FULL_CYC <--> Cycles branch queue full
  9: PM_BR_UNCOND <--> Unconditional branch
  10: PM_CLB_FULL_CYC <--> Cycles CLB full
  11: PM_CR_MAP_FULL_CYC <--> Cycles CR logical operation mapper full
  12: PM_CYC <--> Processor cycles
  13: PM_DATA_FROM_L2 <--> Data loaded from L2
  14: PM_DATA_FROM_L25_SHR <--> Data loaded from L2.5 shared
  15: PM_DATA_FROM_L275_MOD <--> Data loaded from L2.75 modified
  16: PM_DATA_FROM_L3 <--> Data loaded from L3
  17: PM_DATA_FROM_L35_SHR <--> Data loaded from L3.5 shared
  18: PM_DATA_FROM_L375_MOD <--> Data loaded from L3.75 modified
  19: PM_DATA_FROM_RMEM <--> Data loaded from remote memory
  20: PM_DATA_TABLEWALK_CYC <--> Cycles doing data tablewalks
  21: PM_DSLB_MISS <--> Data SLB misses
  22: PM_DTLB_MISS <--> Data TLB misses
  23: PM_DTLB_MISS_16M <--> Data TLB miss for 16M page
  24: PM_DTLB_MISS_4K <--> Data TLB miss for 4K page
  25: PM_DTLB_REF_16M <--> Data TLB reference for 16M page
  26: PM_DTLB_REF_4K <--> Data TLB reference for 4K page
  27: PM_FAB_CMD_ISSUED <--> Fabric command issued
  28: PM_FAB_DCLAIM_ISSUED <--> dclaim issued
  29: PM_FAB_HOLDtoNN_EMPTY <--> Hold buffer to NN empty
  30: PM_FAB_HOLDtoVN_EMPTY <--> Hold buffer to VN empty
  31: PM_FAB_M1toP1_SIDECAR_EMPTY <--> M1 to P1 sidecar empty
  32: PM_FAB_P1toM1_SIDECAR_EMPTY <--> P1 to M1 sidecar empty
  33: PM_FAB_PNtoNN_DIRECT <--> PN to NN beat went straight to its destination
  34: PM_FAB_PNtoVN_DIRECT <--> PN to VN beat went straight to its destination
  35: PM_FPR_MAP_FULL_CYC <--> Cycles FPR mapper full
  36: PM_FPU0_1FLOP <--> FPU0 executed add, mult, sub, cmp or sel instruction
  37: PM_FPU0_DENORM <--> FPU0 received denormalized data
  38: PM_FPU0_FDIV <--> FPU0 executed FDIV instruction
  39: PM_FPU0_FMA <--> FPU0 executed multiply-add instruction
  40: PM_FPU0_FSQRT <--> FPU0 executed FSQRT instruction
  41: PM_FPU0_FULL_CYC <--> Cycles FPU0 issue queue full
  42: PM_FPU0_SINGLE <--> FPU0 executed single precision instruction
  43: PM_FPU0_STALL3 <--> FPU0 stalled in pipe3
  44: PM_FPU0_STF <--> FPU0 executed store instruction
  45: PM_FPU1_1FLOP <--> FPU1 executed add, mult, sub, cmp or sel instruction
  46: PM_FPU1_DENORM <--> FPU1 received denormalized data
  47: PM_FPU1_FDIV <--> FPU1 executed FDIV instruction
  48: PM_FPU1_FMA <--> FPU1 executed multiply-add instruction
  49: PM_FPU1_FSQRT <--> FPU1 executed FSQRT instruction
  50: PM_FPU1_FULL_CYC <--> Cycles FPU1 issue queue full
  51: PM_FPU1_SINGLE <--> FPU1 executed single precision instruction
  52: PM_FPU1_STALL3 <--> FPU1 stalled in pipe3
  53: PM_FPU1_STF <--> FPU1 executed store instruction
  54: PM_FPU_DENORM <--> FPU received denormalized data
  55: PM_FPU_FDIV <--> FPU executed FDIV instruction
  56: PM_FPU_1FLOP <--> FPU executed one flop instruction 
  57: PM_FPU_FULL_CYC <--> Cycles FPU issue queue full
  58: PM_FPU_SINGLE <--> FPU executed single precision instruction
  59: PM_FXU_IDLE <--> FXU idle
  60: PM_GCT_NOSLOT_CYC <--> Cycles no GCT slot allocated
  61: PM_GCT_FULL_CYC <--> Cycles GCT full
  62: PM_GCT_USAGE_00to59_CYC <--> Cycles GCT less than 60% full
  63: PM_GRP_BR_REDIR <--> Group experienced branch redirect
  64: PM_GRP_BR_REDIR_NONSPEC <--> Group experienced non-speculative branch redirect
  65: PM_GRP_DISP_REJECT <--> Group dispatch rejected
  66: PM_GRP_DISP_VALID <--> Group dispatch valid
  67: PM_GRP_IC_MISS <--> Group experienced I cache miss
  68: PM_GRP_IC_MISS_BR_REDIR_NONSPEC <--> Group experienced 
		non-speculative I cache miss or branch redirect
  69: PM_GRP_IC_MISS_NONSPEC <--> Group experienced non-speculative 
		I cache miss
  70: PM_GRP_MRK <--> Group marked in IDU
  71: PM_IC_PREF_REQ <--> Instruction prefetch requests
  72: PM_IERAT_XLATE_WR <--> Translation written to ierat
  73: PM_IOPS_CMPL <--> IOPS instructions completed
  74: PM_INST_DISP_ATTEMPT <--> Instructions dispatch attempted
  75: PM_INST_FETCH_CYC <--> Cycles at least 1 instruction fetched
  76: PM_INST_FROM_L2 <--> Instructions fetched from L2
  77: PM_INST_FROM_L25_SHR <--> Instruction fetched from L2.5 shared
  78: PM_INST_FROM_L3 <--> Instruction fetched from L3
  79: PM_INST_FROM_L35_SHR <--> Instruction fetched from L3.5 shared
  80: PM_ISLB_MISS <--> Instruction SLB misses
  81: PM_ITLB_MISS <--> Instruction TLB misses
  82: PM_L2SA_MOD_TAG <--> L2 slice A transition from modified to tagged
  83: PM_L2SA_RCLD_DISP <--> L2 Slice A RC load dispatch attempt
  84: PM_L2SA_RCLD_DISP_FAIL_RC_FULL <--> L2 Slice A RC load dispatch 
	attempt failed due to all RC full
  85: PM_L2SA_RCST_DISP <--> L2 Slice A RC store dispatch attempt
  86: PM_L2SA_RCST_DISP_FAIL_RC_FULL <--> L2 Slice A RC store 
		dispatch attempt failed due to all RC full
  87: PM_L2SA_RC_DISP_FAIL_CO_BUSY <--> L2 Slice A RC dispatch 
	attempt failed due to RC/CO pair chosen was miss and CO already busy
  88: PM_L2SA_SHR_MOD <--> L2 slice A transition from shared to modified
  89: PM_L2SA_ST_REQ <--> L2 slice A store requests
  90: PM_L2SB_MOD_TAG <--> L2 slice B transition from modified to tagged
  91: PM_L2SB_RCLD_DISP <--> L2 Slice B RC load dispatch attempt
  92: PM_L2SB_RCLD_DISP_FAIL_RC_FULL <--> L2 Slice B RC load dispatch 
		attempt failed due to all RC full
  93: PM_L2SB_RCST_DISP <--> L2 Slice B RC store dispatch attempt
  94: PM_L2SB_RCST_DISP_FAIL_RC_FULL <--> L2 Slice B RC store dispatch 
		attempt failed due to all RC full
  95: PM_L2SB_RC_DISP_FAIL_CO_BUSY <--> L2 Slice B RC dispatch 
		attempt failed due to RC/CO pair chosen was miss and CO already busy
  96: PM_L2SB_SHR_MOD <--> L2 slice B transition from shared to modified
  97: PM_L2SB_ST_REQ <--> L2 slice B store requests
  98: PM_L2SC_MOD_TAG <--> L2 slice C transition from modified to tagged
  99: PM_L2SC_RCLD_DISP <--> L2 Slice C RC load dispatch attempt
  100: PM_L2SC_RCLD_DISP_FAIL_RC_FULL <--> L2 Slice C RC load dispatch 
		attempt failed due to all RC full
  101: PM_L2SC_RCST_DISP <--> L2 Slice C RC store dispatch attempt
  102: PM_L2SC_RCST_DISP_FAIL_RC_FULL <--> L2 Slice C RC store 
		dispatch attempt failed due to all RC full
  103: PM_L2SC_RC_DISP_FAIL_CO_BUSY <--> L2 Slice C RC dispatch attempt 
		failed due to RC/CO pair chosen was miss and CO already busy
  104: PM_L2SC_SHR_MOD <--> L2 slice C transition from shared to modified
  105: PM_L2SC_ST_REQ <--> L2 slice C store requests
  106: PM_L3SA_ALL_BUSY <--> L3 slice A active for every cycle all CI/CO machines busy
  107: PM_L3SA_MOD_TAG <--> L3 slice A transition from modified to TAG
  108: PM_L3SA_REF <--> L3 slice A references
  109: PM_L3SB_ALL_BUSY <--> L3 slice B active for every cycle all CI/CO machines busy
  110: PM_L3SB_MOD_TAG <--> L3 slice B transition from modified to TAG
  111: PM_L3SB_REF <--> L3 slice B references
  112: PM_L3SC_ALL_BUSY <--> L3 slice C active for every cycle all CI/CO machines busy
  113: PM_L3SC_MOD_TAG <--> L3 slice C transition from modified to TAG
  114: PM_L3SC_REF <--> L3 slice C references
  115: PM_LARX_LSU0 <--> Larx executed on LSU0
  116: PM_LR_CTR_MAP_FULL_CYC <--> Cycles LR/CTR mapper full
  117: PM_LSU0_BUSY_REJECT <--> LSU0 busy due to reject
  118: PM_LSU0_DERAT_MISS <--> LSU0 DERAT misses
  119: PM_LSU0_FLUSH_LRQ <--> LSU0 LRQ flushes
  120: PM_LSU0_FLUSH_SRQ <--> LSU0 SRQ flushes
  121: PM_LSU0_FLUSH_ULD <--> LSU0 unaligned load flushes
  122: PM_LSU0_FLUSH_UST <--> LSU0 unaligned store flushes
  123: PM_LSU0_REJECT_ERAT_MISS <--> LSU0 reject due to ERAT miss
  124: PM_LSU0_REJECT_LMQ_FULL <--> LSU0 reject due to LMQ full or missed data coming
  125: PM_LSU0_REJECT_RELOAD_CDF <--> LSU0 reject due to reload CDF or tag update collision
  126: PM_LSU0_REJECT_SRQ_LHS <--> LSU0 SRQ rejects
  127: PM_LSU0_SRQ_STFWD <--> LSU0 SRQ store forwarded
  128: PM_LSU1_BUSY_REJECT <--> LSU1 busy due to reject
  129: PM_LSU1_DERAT_MISS <--> LSU1 DERAT misses
  130: PM_LSU1_FLUSH_LRQ <--> LSU1 LRQ flushes
  131: PM_LSU1_FLUSH_SRQ <--> LSU1 SRQ flushes
  132: PM_LSU1_FLUSH_ULD <--> LSU1 unaligned load flushes
  133: PM_LSU1_FLUSH_UST <--> LSU1 unaligned store flushes
  134: PM_LSU1_REJECT_ERAT_MISS <--> LSU1 reject due to ERAT miss
  135: PM_LSU1_REJECT_LMQ_FULL <--> LSU1 reject due to LMQ full or missed data coming
  136: PM_LSU1_REJECT_RELOAD_CDF <--> LSU1 reject due to reload CDF or tag update collision
  137: PM_LSU1_REJECT_SRQ_LHS <--> LSU1 SRQ rejects
  138: PM_LSU1_SRQ_STFWD <--> LSU1 SRQ store forwarded
  139: PM_LSU_BUSY_REJECT <--> LSU busy due to reject
  140: PM_LSU_FLUSH_LRQ_FULL <--> Flush caused by LRQ full
  141: PM_LSU_FLUSH_SRQ <--> SRQ flushes
  142: PM_LSU_FLUSH_ULD <--> LRQ unaligned load flushes
  143: PM_LSU_LRQ_S0_ALLOC <--> LRQ slot 0 allocated
  144: PM_LSU_LRQ_S0_VALID <--> LRQ slot 0 valid
  145: PM_LSU_REJECT_ERAT_MISS <--> LSU reject due to ERAT miss
  146: PM_LSU_REJECT_SRQ_LHS <--> LSU SRQ rejects
  147: PM_LSU_SRQ_S0_ALLOC <--> SRQ slot 0 allocated
  148: PM_LSU_SRQ_S0_VALID <--> SRQ slot 0 valid
  149: PM_LSU_SRQ_STFWD <--> SRQ store forwarded
  150: PM_MEM_FAST_PATH_RD_CMPL <--> Fast path memory read completed
  151: PM_MEM_HI_PRIO_PW_CMPL <--> High priority partial-write completed
  152: PM_MEM_HI_PRIO_WR_CMPL <--> High priority write completed
  153: PM_MEM_PWQ_DISP <--> Memory partial-write queue dispatched
  154: PM_MEM_PWQ_DISP_BUSY2or3 <--> Memory partial-write queue dispatched with 2-3 queues busy
  155: PM_MEM_READ_CMPL <--> Memory read completed or canceled
  156: PM_MEM_RQ_DISP <--> Memory read queue dispatched
  157: PM_MEM_RQ_DISP_BUSY8to15 <--> Memory read queue dispatched with 8-15 queues busy
  158: PM_MEM_WQ_DISP_BUSY1to7 <--> Memory write queue dispatched with 1-7 queues busy
  159: PM_MEM_WQ_DISP_WRITE <--> Memory write queue dispatched due to write
  160: PM_MRK_DATA_FROM_L2 <--> Marked data loaded from L2
  161: PM_MRK_DATA_FROM_L25_SHR <--> Marked data loaded from L2.5 shared
  162: PM_MRK_DATA_FROM_L275_MOD <--> Marked data loaded from L2.75 modified
  163: PM_MRK_DATA_FROM_L3 <--> Marked data loaded from L3
  164: PM_MRK_DATA_FROM_L35_SHR <--> Marked data loaded from L3.5 shared
  165: PM_MRK_DATA_FROM_L375_MOD <--> Marked data loaded from L3.75 modified
  166: PM_MRK_DATA_FROM_RMEM <--> Marked data loaded from remote memory
  167: PM_MRK_DTLB_MISS_16M <--> Marked Data TLB misses for 16M page
  168: PM_MRK_DTLB_MISS_4K <--> Marked Data TLB misses for 4K page
  169: PM_MRK_DTLB_REF_16M <--> Marked Data TLB reference for 16M page
  170: PM_MRK_DTLB_REF_4K <--> Marked Data TLB reference for 4K page
  171: PM_MRK_GRP_DISP <--> Marked group dispatched
  172: PM_MRK_GRP_ISSUED <--> Marked group issued
  173: PM_MRK_IMR_RELOAD <--> Marked IMR reloaded
  174: PM_INST_CMPL <--> Instructions completed
  175: PM_MRK_LD_MISS_L1 <--> Marked L1 D cache load misses
  176: PM_MRK_LD_MISS_L1_LSU0 <--> LSU0 L1 D cache load misses
  177: PM_MRK_LD_MISS_L1_LSU1 <--> LSU1 L1 D cache load misses
  178: PM_MRK_STCX_FAIL <--> Marked STCX failed
  179: PM_MRK_ST_CMPL <--> Marked store instruction completed
  180: PM_MRK_ST_MISS_L1 <--> Marked L1 D cache store misses
  181: PM_PMC4_OVERFLOW <--> PMC4 Overflow
  182: PM_PMC5_OVERFLOW <--> PMC5 Overflow
  183: PM_PTEG_FROM_L2 <--> PTEG loaded from L2
  184: PM_PTEG_FROM_L25_SHR <--> PTEG loaded from L2.5 shared
  185: PM_PTEG_FROM_L275_MOD <--> PTEG loaded from L2.75 modified
  186: PM_PTEG_FROM_L3 <--> PTEG loaded from L3
  187: PM_PTEG_FROM_L35_SHR <--> PTEG loaded from L3.5 shared
  188: PM_PTEG_FROM_L375_MOD <--> PTEG loaded from L3.75 modified
  189: PM_PTEG_FROM_RMEM <--> PTEG loaded from remote memory
  190: PM_RUN_CYC <--> Run cycles
  191: PM_SNOOP_DCLAIM_RETRY_QFULL <--> Snoop dclaim/flush retry due to write/dclaim queues full
  192: PM_SNOOP_PW_RETRY_RQ <--> Snoop partial-write retry due to collision with active read queue
  193: PM_SNOOP_RD_RETRY_QFULL <--> Snoop read retry due to read queue full
  194: PM_SNOOP_RD_RETRY_RQ <--> Snoop read retry due to collision with active read queue
  195: PM_SNOOP_RETRY_1AHEAD <--> Snoop retry due to one ahead collision
  196: PM_SNOOP_TLBIE <--> Snoop TLBIE
  197: PM_SNOOP_WR_RETRY_RQ <--> Snoop write/dclaim retry due to collision with active read queue
  198: PM_STCX_FAIL <--> STCX failed
  199: PM_STCX_PASS <--> Stcx passes
  200: PM_SUSPENDED <--> Suspended
  201: PM_TB_BIT_TRANS <--> Time Base bit transition
  202: PM_THRD_ONE_RUN_CYC <--> One of the threads in run cycles
  203: PM_THRD_PRIO_1_CYC <--> Cycles thread running at priority level 1
  204: PM_THRD_PRIO_2_CYC <--> Cycles thread running at priority level 2
  205: PM_THRD_PRIO_3_CYC <--> Cycles thread running at priority level 3
  206: PM_THRD_PRIO_4_CYC <--> Cycles thread running at priority level 4
  207: PM_THRD_PRIO_5_CYC <--> Cycles thread running at priority level 5
  208: PM_THRD_PRIO_6_CYC <--> Cycles thread running at priority level 6
  209: PM_THRD_PRIO_7_CYC <--> Cycles thread running at priority level 7
  210: PM_TLB_MISS <--> TLB misses
  211: PM_XER_MAP_FULL_CYC <--> Cycles XER mapper full
  212: PM_INST_FROM_L2MISS <--> Instructions fetched missed L2
 Counter 1: 205 events
  0: PM_0INST_CLB_CYC <--> Cycles no instructions in CLB
  1: PM_1INST_CLB_CYC <--> Cycles 1 instruction in CLB
  2: PM_2INST_CLB_CYC <--> Cycles 2 instructions in CLB
  3: PM_3INST_CLB_CYC <--> Cycles 3 instructions in CLB
  4: PM_4INST_CLB_CYC <--> Cycles 4 instructions in CLB
  5: PM_5INST_CLB_CYC <--> Cycles 5 instructions in CLB
  6: PM_6INST_CLB_CYC <--> Cycles 6 instructions in CLB
  7: PM_BRQ_FULL_CYC <--> Cycles branch queue full
  8: PM_BR_PRED_TA <--> A conditional branch was predicted, target prediction
  9: PM_CLB_FULL_CYC <--> Cycles CLB full
  10: PM_CMPLU_STALL_DCACHE_MISS <--> Completion stall caused by D cache miss
  11: PM_CMPLU_STALL_FDIV <--> Completion stall caused by FDIV or FQRT instruction
  12: PM_CMPLU_STALL_FXU <--> Completion stall caused by FXU instruction
  13: PM_CMPLU_STALL_LSU <--> Completion stall caused by LSU instruction
  14: PM_CR_MAP_FULL_CYC <--> Cycles CR logical operation mapper full
  15: PM_CYC <--> Processor cycles
  16: PM_DATA_FROM_L25_MOD <--> Data loaded from L2.5 modified
  17: PM_DATA_FROM_L35_MOD <--> Data loaded from L3.5 modified
  18: PM_DATA_FROM_LMEM <--> Data loaded from local memory
  19: PM_DATA_TABLEWALK_CYC <--> Cycles doing data tablewalks
  20: PM_DSLB_MISS <--> Data SLB misses
  21: PM_DTLB_MISS <--> Data TLB misses
  22: PM_DTLB_MISS_16M <--> Data TLB miss for 16M page
  23: PM_DTLB_MISS_4K <--> Data TLB miss for 4K page
  24: PM_DTLB_REF_16M <--> Data TLB reference for 16M page
  25: PM_DTLB_REF_4K <--> Data TLB reference for 4K page
  26: PM_FAB_CMD_ISSUED <--> Fabric command issued
  27: PM_FAB_DCLAIM_ISSUED <--> dclaim issued
  28: PM_FAB_HOLDtoNN_EMPTY <--> Hold buffer to NN empty
  29: PM_FAB_HOLDtoVN_EMPTY <--> Hold buffer to VN empty
  30: PM_FAB_M1toP1_SIDECAR_EMPTY <--> M1 to P1 sidecar empty
  31: PM_FAB_P1toM1_SIDECAR_EMPTY <--> P1 to M1 sidecar empty
  32: PM_FAB_PNtoNN_DIRECT <--> PN to NN beat went straight to its destination
  33: PM_FAB_PNtoVN_DIRECT <--> PN to VN beat went straight to its destination
  34: PM_FPR_MAP_FULL_CYC <--> Cycles FPR mapper full
  35: PM_FPU0_1FLOP <--> FPU0 executed add, mult, sub, cmp or sel instruction
  36: PM_FPU0_DENORM <--> FPU0 received denormalized data
  37: PM_FPU0_FDIV <--> FPU0 executed FDIV instruction
  38: PM_FPU0_FMA <--> FPU0 executed multiply-add instruction
  39: PM_FPU0_FSQRT <--> FPU0 executed FSQRT instruction
  40: PM_FPU0_FULL_CYC <--> Cycles FPU0 issue queue full
  41: PM_FPU0_SINGLE <--> FPU0 executed single precision instruction
  42: PM_FPU0_STALL3 <--> FPU0 stalled in pipe3
  43: PM_FPU0_STF <--> FPU0 executed store instruction
  44: PM_FPU1_1FLOP <--> FPU1 executed add, mult, sub, cmp or sel instruction
  45: PM_FPU1_DENORM <--> FPU1 received denormalized data
  46: PM_FPU1_FDIV <--> FPU1 executed FDIV instruction
  47: PM_FPU1_FMA <--> FPU1 executed multiply-add instruction
  48: PM_FPU1_FSQRT <--> FPU1 executed FSQRT instruction
  49: PM_FPU1_FULL_CYC <--> Cycles FPU1 issue queue full
  50: PM_FPU1_SINGLE <--> FPU1 executed single precision instruction
  51: PM_FPU1_STALL3 <--> FPU1 stalled in pipe3
  52: PM_FPU1_STF <--> FPU1 executed store instruction
  53: PM_FPU_FSQRT <--> FPU executed FSQRT instruction
  54: PM_FPU_FMA <--> FPU executed multiply-add instruction
  55: PM_FPU_STALL3 <--> FPU stalled in pipe3
  56: PM_FPU_STF <--> FPU executed store instruction
  57: PM_FXU_BUSY <--> FXU busy
  58: PM_MRK_FXU_FIN <--> Marked instruction FXU processing finished
  59: PM_GCT_NOSLOT_IC_MISS <--> No slot in GCT caused by I cache miss
  60: PM_GCT_FULL_CYC <--> Cycles GCT full
  61: PM_GCT_USAGE_60to79_CYC <--> Cycles GCT 60-79% full
  62: PM_GRP_BR_REDIR <--> Group experienced branch redirect
  63: PM_GRP_BR_REDIR_NONSPEC <--> Group experienced non-speculative branch redirect
  64: PM_GRP_DISP <--> Group dispatches
  65: PM_GRP_DISP_REJECT <--> Group dispatch rejected
  66: PM_GRP_DISP_VALID <--> Group dispatch valid
  67: PM_GRP_IC_MISS <--> Group experienced I cache miss
  68: PM_HV_CYC <--> Hypervisor Cycles
  69: PM_IC_PREF_REQ <--> Instruction prefetch requests
  70: PM_IERAT_XLATE_WR <--> Translation written to ierat
  71: PM_IOPS_CMPL <--> IOPS instructions completed
  72: PM_INST_DISP_ATTEMPT <--> Instructions dispatch attempted
  73: PM_INST_FETCH_CYC <--> Cycles at least 1 instruction fetched
  74: PM_INST_FROM_L1 <--> Instruction fetched from L1
  75: PM_INST_FROM_L25_MOD <--> Instruction fetched from L2.5 modified
  76: PM_INST_FROM_L35_MOD <--> Instruction fetched from L3.5 modified
  77: PM_INST_FROM_LMEM <--> Instruction fetched from local memory
  78: PM_ISLB_MISS <--> Instruction SLB misses
  79: PM_ITLB_MISS <--> Instruction TLB misses
  80: PM_L2SA_MOD_TAG <--> L2 slice A transition from modified to tagged
  81: PM_L2SA_RCLD_DISP <--> L2 Slice A RC load dispatch attempt
  82: PM_L2SA_RCLD_DISP_FAIL_RC_FULL <--> L2 Slice A RC load dispatch 
		attempt failed due to all RC full
  83: PM_L2SA_RCST_DISP <--> L2 Slice A RC store dispatch attempt
  84: PM_L2SA_RCST_DISP_FAIL_RC_FULL <--> L2 Slice A RC store 
		dispatch attempt failed due to all RC full
  85: PM_L2SA_RC_DISP_FAIL_CO_BUSY <--> L2 Slice A RC dispatch 
		attempt failed due to RC/CO pair chosen was miss and CO already busy
  86: PM_L2SA_SHR_MOD <--> L2 slice A transition from shared to modified
  87: PM_L2SA_ST_REQ <--> L2 slice A store requests
  88: PM_L2SB_MOD_TAG <--> L2 slice B transition from modified to tagged
  89: PM_L2SB_RCLD_DISP <--> L2 Slice B RC load dispatch attempt
  90: PM_L2SB_RCLD_DISP_FAIL_RC_FULL <--> L2 Slice B RC load dispatch 
	attempt failed due to all RC full
  91: PM_L2SB_RCST_DISP <--> L2 Slice B RC store dispatch attempt
  92: PM_L2SB_RCST_DISP_FAIL_RC_FULL <--> L2 Slice B RC store 
	dispatch attempt failed due to all RC full
  93: PM_L2SB_RC_DISP_FAIL_CO_BUSY <--> L2 Slice B RC dispatch 
	attempt failed due to RC/CO pair chosen was miss and CO already busy
  94: PM_L2SB_SHR_MOD <--> L2 slice B transition from shared to modified
  95: PM_L2SB_ST_REQ <--> L2 slice B store requests
  96: PM_L2SC_MOD_TAG <--> L2 slice C transition from modified to tagged
  97: PM_L2SC_RCLD_DISP <--> L2 Slice C RC load dispatch attempt
  98: PM_L2SC_RCLD_DISP_FAIL_RC_FULL <--> L2 Slice C RC load dispatch 
	attempt failed due to all RC full
  99: PM_L2SC_RCST_DISP <--> L2 Slice C RC store dispatch attempt
  100: PM_L2SC_RCST_DISP_FAIL_RC_FULL <--> L2 Slice C RC store 
	dispatch attempt failed due to all RC full
  101: PM_L2SC_RC_DISP_FAIL_CO_BUSY <--> L2 Slice C RC dispatch attempt 
	failed due to RC/CO pair chosen was miss and CO already busy
  102: PM_L2SC_SHR_MOD <--> L2 slice C transition from shared to modified
  103: PM_L2SC_ST_REQ <--> L2 slice C store requests
  104: PM_L3SA_ALL_BUSY <--> L3 slice A active for every cycle all 
	CI/CO machines busy
  105: PM_L3SA_MOD_TAG <--> L3 slice A transition from modified to TAG
  106: PM_L3SA_REF <--> L3 slice A references
  107: PM_L3SB_ALL_BUSY <--> L3 slice B active for every cycle all CI/CO machines busy
  108: PM_L3SB_MOD_TAG <--> L3 slice B transition from modified to TAG
  109: PM_L3SB_REF <--> L3 slice B references
  110: PM_L3SC_ALL_BUSY <--> L3 slice C active for every cycle all CI/CO machines busy
  111: PM_L3SC_MOD_TAG <--> L3 slice C transition from modified to TAG
  112: PM_L3SC_REF <--> L3 slice C references
  113: PM_LARX_LSU0 <--> Larx executed on LSU0
  114: PM_LR_CTR_MAP_FULL_CYC <--> Cycles LR/CTR mapper full
  115: PM_LSU0_BUSY_REJECT <--> LSU0 busy due to reject
  116: PM_LSU0_DERAT_MISS <--> LSU0 DERAT misses
  117: PM_LSU0_FLUSH_LRQ <--> LSU0 LRQ flushes
  118: PM_LSU0_FLUSH_SRQ <--> LSU0 SRQ flushes
  119: PM_LSU0_FLUSH_ULD <--> LSU0 unaligned load flushes
  120: PM_LSU0_FLUSH_UST <--> LSU0 unaligned store flushes
  121: PM_LSU0_REJECT_ERAT_MISS <--> LSU0 reject due to ERAT miss
  122: PM_LSU0_REJECT_LMQ_FULL <--> LSU0 reject due to LMQ full or missed data coming
  123: PM_LSU0_REJECT_RELOAD_CDF <--> LSU0 reject due to reload CDF or tag update collision
  124: PM_LSU0_REJECT_SRQ_LHS <--> LSU0 SRQ rejects
  125: PM_LSU0_SRQ_STFWD <--> LSU0 SRQ store forwarded
  126: PM_LSU1_BUSY_REJECT <--> LSU1 busy due to reject
  127: PM_LSU1_DERAT_MISS <--> LSU1 DERAT misses
  128: PM_LSU1_FLUSH_LRQ <--> LSU1 LRQ flushes
  129: PM_LSU1_FLUSH_SRQ <--> LSU1 SRQ flushes
  130: PM_LSU1_FLUSH_ULD <--> LSU1 unaligned load flushes
  131: PM_LSU1_FLUSH_UST <--> LSU1 unaligned store flushes
  132: PM_LSU1_REJECT_ERAT_MISS <--> LSU1 reject due to ERAT miss
  133: PM_LSU1_REJECT_LMQ_FULL <--> LSU1 reject due to LMQ full or missed data coming
  134: PM_LSU1_REJECT_RELOAD_CDF <--> LSU1 reject due to reload CDF or tag update collision
  135: PM_LSU1_REJECT_SRQ_LHS <--> LSU1 SRQ rejects
  136: PM_LSU1_SRQ_STFWD <--> LSU1 SRQ store forwarded
  137: PM_LSU_DERAT_MISS <--> DERAT misses
  138: PM_LSU_FLUSH_LRQ <--> LRQ flushes
  139: PM_LSU_FLUSH_LRQ_FULL <--> Flush caused by LRQ full
  140: PM_LSU_FLUSH_UST <--> SRQ unaligned store flushes
  141: PM_LSU_LMQ_SRQ_EMPTY_CYC <--> Cycles LMQ and SRQ empty
  142: PM_LSU_LRQ_S0_ALLOC <--> LRQ slot 0 allocated
  143: PM_LSU_LRQ_S0_VALID <--> LRQ slot 0 valid
  144: PM_LSU_REJECT_LMQ_FULL <--> LSU reject due to LMQ full or missed data coming
  145: PM_LSU_REJECT_RELOAD_CDF <--> LSU reject due to reload CDF or tag update collision
  146: PM_LSU_SRQ_S0_ALLOC <--> SRQ slot 0 allocated
  147: PM_LSU_SRQ_S0_VALID <--> SRQ slot 0 valid
  148: PM_MEM_FAST_PATH_RD_CMPL <--> Fast path memory read completed
  149: PM_MEM_HI_PRIO_PW_CMPL <--> High priority partial-write completed
  150: PM_MEM_HI_PRIO_WR_CMPL <--> High priority write completed
  151: PM_MEM_PWQ_DISP <--> Memory partial-write queue dispatched
  152: PM_MEM_PWQ_DISP_BUSY2or3 <--> Memory partial-write queue dispatched with 2-3 queues busy
  153: PM_MEM_READ_CMPL <--> Memory read completed or canceled
  154: PM_MEM_RQ_DISP <--> Memory read queue dispatched
  155: PM_MEM_RQ_DISP_BUSY8to15 <--> Memory read queue dispatched with 8-15 queues busy
  156: PM_MEM_WQ_DISP_BUSY1to7 <--> Memory write queue dispatched with 1-7 queues busy
  157: PM_MEM_WQ_DISP_WRITE <--> Memory write queue dispatched due to write
  158: PM_MRK_BRU_FIN <--> Marked instruction BRU processing finished
  159: PM_MRK_DATA_FROM_L25_MOD <--> Marked data loaded from L2.5 modified
  160: PM_MRK_DATA_FROM_L25_SHR_CYC <--> Marked load latency from L2.5 shared
  161: PM_MRK_DATA_FROM_L275_SHR_CYC <--> Marked load latency from L2.75 shared
  162: PM_MRK_DATA_FROM_L2_CYC <--> Marked load latency from L2
  163: PM_MRK_DATA_FROM_L35_MOD <--> Marked data loaded from L3.5 modified
  164: PM_MRK_DATA_FROM_L35_SHR_CYC <--> Marked load latency from L3.5 shared
  165: PM_MRK_DATA_FROM_L375_SHR_CYC <--> Marked load latency from L3.75 shared
  166: PM_MRK_DATA_FROM_L3_CYC <--> Marked load latency from L3
  167: PM_MRK_DATA_FROM_LMEM <--> Marked data loaded from local memory
  168: PM_MRK_DTLB_MISS_16M <--> Marked Data TLB misses for 16M page
  169: PM_MRK_DTLB_MISS_4K <--> Marked Data TLB misses for 4K page
  170: PM_MRK_DTLB_REF_16M <--> Marked Data TLB reference for 16M page
  171: PM_MRK_DTLB_REF_4K <--> Marked Data TLB reference for 4K page
  172: PM_MRK_GRP_BR_REDIR <--> Group experienced marked branch redirect
  173: PM_MRK_IMR_RELOAD <--> Marked IMR reloaded
  174: PM_INST_CMPL <--> Instructions completed
  175: PM_MRK_LD_MISS_L1_LSU0 <--> LSU0 L1 D cache load misses
  176: PM_MRK_LD_MISS_L1_LSU1 <--> LSU1 L1 D cache load misses
  177: PM_MRK_STCX_FAIL <--> Marked STCX failed
  178: PM_MRK_ST_GPS <--> Marked store sent to GPS
  179: PM_MRK_ST_MISS_L1 <--> Marked L1 D cache store misses
  180: PM_PMC1_OVERFLOW <--> PMC1 Overflow
  181: PM_PTEG_FROM_L25_MOD <--> PTEG loaded from L2.5 modified
  182: PM_PTEG_FROM_L35_MOD <--> PTEG loaded from L3.5 modified
  183: PM_PTEG_FROM_LMEM <--> PTEG loaded from local memory
  184: PM_SLB_MISS <--> SLB misses
  185: PM_SNOOP_DCLAIM_RETRY_QFULL <--> Snoop dclaim/flush retry due to write/dclaim queues full
  186: PM_SNOOP_PW_RETRY_RQ <--> Snoop partial-write retry due to collision with active read queue
  187: PM_SNOOP_RD_RETRY_QFULL <--> Snoop read retry due to read queue full
  188: PM_SNOOP_RD_RETRY_RQ <--> Snoop read retry due to collision with active read queue
  189: PM_SNOOP_RETRY_1AHEAD <--> Snoop retry due to one ahead collision
  190: PM_SNOOP_TLBIE <--> Snoop TLBIE
  191: PM_SNOOP_WR_RETRY_RQ <--> Snoop write/dclaim retry due to collision with active read queue
  192: PM_STCX_FAIL <--> STCX failed
  193: PM_STCX_PASS <--> Stcx passes
  194: PM_SUSPENDED <--> Suspended
  195: PM_GCT_EMPTY_CYC <--> Cycles GCT empty
  196: PM_THRD_GRP_CMPL_BOTH_CYC <--> Cycles group completed by both threads
  197: PM_THRD_PRIO_1_CYC <--> Cycles thread running at priority level 1
  198: PM_THRD_PRIO_2_CYC <--> Cycles thread running at priority level 2
  199: PM_THRD_PRIO_3_CYC <--> Cycles thread running at priority level 3
  200: PM_THRD_PRIO_4_CYC <--> Cycles thread running at priority level 4
  201: PM_THRD_PRIO_5_CYC <--> Cycles thread running at priority level 5
  202: PM_THRD_PRIO_6_CYC <--> Cycles thread running at priority level 6
  203: PM_THRD_PRIO_7_CYC <--> Cycles thread running at priority level 7
  204: PM_XER_MAP_FULL_CYC <--> Cycles XER mapper full
 Counter 2: 190 events
  0: PM_BR_ISSUED <--> Branches issued
  1: PM_BR_MPRED_CR <--> Branch mispredictions due CR bit setting
  2: PM_BR_MPRED_TA <--> Branch mispredictions due to target address
  3: PM_BR_PRED_CR <--> A conditional branch was predicted, CR prediction
  4: PM_BR_PRED_TA <--> A conditional branch was predicted, target prediction
  5: PM_CRQ_FULL_CYC <--> Cycles CR issue queue full
  6: PM_CYC <--> Processor cycles
  7: PM_DATA_FROM_L25_MOD <--> Data loaded from L2.5 modified
  8: PM_DATA_FROM_L275_SHR <--> Data loaded from L2.75 shared
  9: PM_DATA_FROM_L35_MOD <--> Data loaded from L3.5 modified
  10: PM_DATA_FROM_L375_SHR <--> Data loaded from L3.75 shared
  11: PM_DATA_FROM_LMEM <--> Data loaded from local memory
  12: PM_DC_INV_L2 <--> L1 D cache entries invalidated from L2
  13: PM_DC_PREF_DST <--> DST (Data Stream Touch) stream start
  14: PM_DC_PREF_STREAM_ALLOC <--> D cache new prefetch stream allocated
  15: PM_EE_OFF <--> Cycles MSR(EE) bit off
  16: PM_EE_OFF_EXT_INT <--> Cycles MSR(EE) bit off and external interrupt pending
  17: PM_FAB_CMD_RETRIED <--> Fabric command retried
  18: PM_FAB_DCLAIM_RETRIED <--> dclaim retried
  19: PM_FAB_M1toVNorNN_SIDECAR_EMPTY <--> M1 to VN/NN sidecar empty
  20: PM_FAB_P1toVNorNN_SIDECAR_EMPTY <--> P1 to VN/NN sidecar empty
  21: PM_FAB_PNtoNN_SIDECAR <--> PN to NN beat went to sidecar first
  22: PM_FAB_PNtoVN_SIDECAR <--> PN to VN beat went to sidecar first
  23: PM_FAB_VBYPASS_EMPTY <--> Vertical bypass buffer empty
  24: PM_FLUSH_BR_MPRED <--> Flush caused by branch mispredict
  25: PM_FLUSH_IMBAL <--> Flush caused by thread GCT imbalance
  26: PM_FLUSH <--> Flushes
  27: PM_FLUSH_SB <--> Flush caused by scoreboard operation
  28: PM_FLUSH_SYNC <--> Flush caused by sync
  29: PM_FPU0_FEST <--> FPU0 executed FEST instruction
  30: PM_FPU0_FIN <--> FPU0 produced a result
  31: PM_FPU0_FMOV_FEST <--> FPU0 executed FMOV or FEST instructions
  32: PM_FPU0_FPSCR <--> FPU0 executed FPSCR instruction
  33: PM_FPU0_FRSP_FCONV <--> FPU0 executed FRSP or FCONV instructions
  34: PM_FPU1_FEST <--> FPU1 executed FEST instruction
  35: PM_FPU1_FIN <--> FPU1 produced a result
  36: PM_FPU1_FMOV_FEST <--> FPU1 executing FMOV or FEST instructions
  37: PM_FPU1_FRSP_FCONV <--> FPU1 executed FRSP or FCONV instructions
  38: PM_FPU_FMOV_FEST <--> FPU executing FMOV or FEST instructions
  39: PM_FPU_FRSP_FCONV <--> FPU executed FRSP or FCONV instructions
  40: PM_FXLS0_FULL_CYC <--> Cycles FXU0/LS0 queue full
  41: PM_FXLS1_FULL_CYC <--> Cycles FXU1/LS1 queue full
  42: PM_FXU0_BUSY_FXU1_IDLE <--> FXU0 busy FXU1 idle
  43: PM_FXU0_FIN <--> FXU0 produced a result
  44: PM_FXU1_FIN <--> FXU1 produced a result
  45: PM_FXU_FIN <--> FXU produced a result
  46: PM_GCT_NOSLOT_SRQ_FULL <--> No slot in GCT caused by SRQ full
  47: PM_GCT_USAGE_80to99_CYC <--> Cycles GCT 80-99% full
  48: PM_GPR_MAP_FULL_CYC <--> Cycles GPR mapper full
  49: PM_GRP_CMPL <--> Group completed
  50: PM_GRP_DISP_BLK_SB_CYC <--> Cycles group dispatch blocked by scoreboard
  51: PM_GRP_DISP_SUCCESS <--> Group dispatch success
  52: PM_IC_DEMAND_L2_BHT_REDIRECT <--> L2 I cache demand request due to BHT redirect
  53: PM_IC_DEMAND_L2_BR_REDIRECT <--> L2 I cache demand request due to branch redirect
  54: PM_IC_PREF_INSTALL <--> Instruction prefetched installed in prefetch buffer
  55: PM_IOPS_CMPL <--> IOPS instructions completed
  56: PM_INST_DISP <--> Instructions dispatched
  57: PM_INST_FROM_L275_SHR <--> Instruction fetched from L2.75 shared
  58: PM_INST_FROM_L375_SHR <--> Instruction fetched from L3.75 shared
  59: PM_INST_FROM_PREF <--> Instructions fetched from prefetch
  60: PM_L1_DCACHE_RELOAD_VALID <--> L1 reload data source valid
  61: PM_L1_PREF <--> L1 cache data prefetches
  62: PM_L1_WRITE_CYC <--> Cycles writing to instruction L1
  63: PM_L2SA_MOD_INV <--> L2 slice A transition from modified to invalid
  64: PM_L2SA_RCLD_DISP_FAIL_ADDR <--> L2 Slice A RC load dispatch attempt 
	failed due to address collision with RC/CO/SN/SQ
  65: PM_L2SA_RCLD_DISP_FAIL_OTHER <--> L2 Slice A RC load dispatch attempt 
	failed due to other reasons
  66: PM_L2SA_RCST_DISP_FAIL_ADDR <--> L2 Slice A RC store dispatch attempt 
	failed due to address collision with RC/CO/SN/SQ
  67: PM_L2SA_RCST_DISP_FAIL_OTHER <--> L2 Slice A RC store dispatch attempt 
	failed due to other reasons
  68: PM_L2SA_RC_DISP_FAIL_CO_BUSY_ALL <--> L2 Slice A RC dispatch attempt 
	failed due to all CO busy
  69: PM_L2SA_SHR_INV <--> L2 slice A transition from shared to invalid
  70: PM_L2SA_ST_HIT <--> L2 slice A store hits
  71: PM_L2SB_MOD_INV <--> L2 slice B transition from modified to invalid
  72: PM_L2SB_RCLD_DISP_FAIL_ADDR <--> L2 Slice B RC load dispatch attempt 
	failed due to address collision with RC/CO/SN/SQ
  73: PM_L2SB_RCLD_DISP_FAIL_OTHER <--> L2 Slice B RC load dispatch attempt 
	failed due to other reasons
  74: PM_L2SB_RCST_DISP_FAIL_ADDR <--> L2 Slice B RC store dispatch attempt 
	failed due to address collision with RC/CO/SN/SQ
  75: PM_L2SB_RCST_DISP_FAIL_OTHER <--> L2 Slice B RC store dispatch attempt 
	failed due to other reasons
  76: PM_L2SB_RC_DISP_FAIL_CO_BUSY_ALL <--> L2 Slice B RC dispatch attempt 
	failed due to all CO busy
  77: PM_L2SB_SHR_INV <--> L2 slice B transition from shared to invalid
  78: PM_L2SB_ST_HIT <--> L2 slice B store hits
  79: PM_L2SC_MOD_INV <--> L2 slice C transition from modified to invalid
  80: PM_L2SC_RCLD_DISP_FAIL_ADDR <--> L2 Slice C RC load dispatch attempt 
	failed due to address collision with RC/CO/SN/SQ
  81: PM_L2SC_RCLD_DISP_FAIL_OTHER <--> L2 Slice C RC load dispatch attempt 
	failed due to other reasons
  82: PM_L2SC_RCST_DISP_FAIL_ADDR <--> L2 Slice C RC store dispatch attempt 
	failed due to address collision with RC/CO/SN/SQ
  83: PM_L2SC_RCST_DISP_FAIL_OTHER <--> L2 Slice C RC store dispatch attempt 
	failed due to other reasons
  84: PM_L2SC_RC_DISP_FAIL_CO_BUSY_ALL <--> L2 Slice C RC dispatch attempt 
	failed due to all CO busy
  85: PM_L2SC_SHR_INV <--> L2 slice C transition from shared to invalid
  86: PM_L2SC_ST_HIT <--> L2 slice C store hits
  87: PM_L2_PREF <--> L2 cache prefetches
  88: PM_L3SA_HIT <--> L3 slice A hits
  89: PM_L3SA_MOD_INV <--> L3 slice A transition from modified to invalid
  90: PM_L3SA_SHR_INV <--> L3 slice A transition from shared to invalid
  91: PM_L3SA_SNOOP_RETRY <--> L3 slice A snoop retries
  92: PM_L3SB_HIT <--> L3 slice B hits
  93: PM_L3SB_MOD_INV <--> L3 slice B transition from modified to invalid
  94: PM_L3SB_SHR_INV <--> L3 slice B transition from shared to invalid
  95: PM_L3SB_SNOOP_RETRY <--> L3 slice B snoop retries
  96: PM_L3SC_HIT <--> L3 Slice C hits
  97: PM_L3SC_MOD_INV <--> L3 slice C transition from modified to invalid
  98: PM_L3SC_SHR_INV <--> L3 slice C transition from shared to invalid
  99: PM_L3SC_SNOOP_RETRY <--> L3 slice C snoop retries
  100: PM_LD_MISS_L1 <--> L1 D cache load misses
  101: PM_LD_MISS_L1_LSU0 <--> LSU0 L1 D cache load misses
  102: PM_LD_MISS_L1_LSU1 <--> LSU1 L1 D cache load misses
  103: PM_LD_REF_L1_LSU0 <--> LSU0 L1 D cache load references
  104: PM_LD_REF_L1_LSU1 <--> LSU1 L1 D cache load references
  105: PM_LSU0_LDF <--> LSU0 executed Floating Point load instruction
  106: PM_LSU0_NCLD <--> LSU0 non-cacheable loads
  107: PM_LSU1_LDF <--> LSU1 executed Floating Point load instruction
  108: PM_LSU1_NCLD <--> LSU1 non-cacheable loads
  109: PM_LSU_FLUSH <--> Flush initiated by LSU
  110: PM_LSU_FLUSH_SRQ_FULL <--> Flush caused by SRQ full
  111: PM_LSU_LMQ_FULL_CYC <--> Cycles LMQ full
  112: PM_LSU_LMQ_LHR_MERGE <--> LMQ LHR merges
  113: PM_LSU_LMQ_S0_ALLOC <--> LMQ slot 0 allocated
  114: PM_LSU_LMQ_S0_VALID <--> LMQ slot 0 valid
  115: PM_LSU_LMQ_SRQ_EMPTY_CYC <--> Cycles LMQ and SRQ empty
  116: PM_LSU_LRQ_FULL_CYC <--> Cycles LRQ full
  117: PM_DC_PREF_STREAM_ALLOC_BLK <--> D cache prefetch stream 
	allocations blocked
  118: PM_LSU_SRQ_FULL_CYC <--> Cycles SRQ full
  119: PM_LSU_SRQ_SYNC_CYC <--> SRQ sync duration
  120: PM_LWSYNC_HELD <--> LWSYNC held at dispatch
  121: PM_MEM_LO_PRIO_PW_CMPL <--> Low priority partial-write completed
  122: PM_MEM_LO_PRIO_WR_CMPL <--> Low priority write completed
  123: PM_MEM_PW_CMPL <--> Memory partial-write completed
  124: PM_MEM_PW_GATH <--> Memory partial-write gathered
  125: PM_MEM_RQ_DISP_BUSY1to7 <--> Memory read queue dispatched with 
	1-7 queues busy
  126: PM_MEM_SPEC_RD_CANCEL <--> Speculative memory read canceled
  127: PM_MEM_WQ_DISP_BUSY8to15 <--> Memory write queue dispatched with 8-15 queues busy
  128: PM_MEM_WQ_DISP_DCLAIM <--> Memory write queue dispatched due to dclaim/flush
  129: PM_MRK_DATA_FROM_L25_MOD <--> Marked data loaded from L2.5 modified
  130: PM_MRK_DATA_FROM_L275_SHR <--> Marked data loaded from L2.75 shared
  131: PM_MRK_DATA_FROM_L35_MOD <--> Marked data loaded from L3.5 modified
  132: PM_MRK_DATA_FROM_L375_SHR <--> Marked data loaded from L3.75 shared
  133: PM_MRK_DATA_FROM_LMEM <--> Marked data loaded from local memory
  134: PM_MRK_DSLB_MISS <--> Marked Data SLB misses
  135: PM_MRK_DTLB_MISS <--> Marked Data TLB misses
  136: PM_MRK_FPU_FIN <--> Marked instruction FPU processing finished
  137: PM_MRK_INST_FIN <--> Marked instruction finished
  138: PM_MRK_L1_RELOAD_VALID <--> Marked L1 reload data source valid
  139: PM_MRK_LSU0_FLUSH_LRQ <--> LSU0 marked LRQ flushes
  140: PM_MRK_LSU0_FLUSH_SRQ <--> LSU0 marked SRQ flushes
  141: PM_MRK_LSU0_FLUSH_UST <--> LSU0 marked unaligned store flushes
  142: PM_MRK_LSU0_FLUSH_ULD <--> LSU0 marked unaligned load flushes
  143: PM_MRK_LSU1_FLUSH_LRQ <--> LSU1 marked LRQ flushes
  144: PM_MRK_LSU1_FLUSH_SRQ <--> LSU1 marked SRQ flushes
  145: PM_MRK_LSU1_FLUSH_ULD <--> LSU1 marked unaligned load flushes
  146: PM_MRK_LSU1_FLUSH_UST <--> LSU1 marked unaligned store flushes
  147: PM_MRK_LSU_FLUSH_LRQ <--> Marked LRQ flushes
  148: PM_MRK_LSU_FLUSH_UST <--> Marked unaligned store flushes
  149: PM_MRK_LSU_SRQ_INST_VALID <--> Marked instruction valid in SRQ
  150: PM_MRK_ST_CMPL_INT <--> Marked store completed with intervention
  151: PM_PMC2_OVERFLOW <--> PMC2 Overflow
  152: PM_PMC6_OVERFLOW <--> PMC6 Overflow
  153: PM_PTEG_FROM_L25_MOD <--> PTEG loaded from L2.5 modified
  154: PM_PTEG_FROM_L275_SHR <--> PTEG loaded from L2.75 shared
  155: PM_PTEG_FROM_L35_MOD <--> PTEG loaded from L3.5 modified
  156: PM_PTEG_FROM_L375_SHR <--> PTEG loaded from L3.75 shared
  157: PM_PTEG_FROM_LMEM <--> PTEG loaded from local memory
  158: PM_SNOOP_PARTIAL_RTRY_QFULL <--> Snoop partial write retry due to 
	partial-write queues full
  159: PM_SNOOP_PW_RETRY_WQ_PWQ <--> Snoop partial-write retry due to 
	collision with active write or partial-write queue
  160: PM_SNOOP_RD_RETRY_WQ <--> Snoop read retry due to collision with 
	active write queue
  161: PM_SNOOP_WR_RETRY_QFULL <--> Snoop read retry due to read queue full
  162: PM_SNOOP_WR_RETRY_WQ <--> Snoop write/dclaim retry due to 
	collision with active write queue
  163: PM_STOP_COMPLETION <--> Completion stopped
  164: PM_ST_MISS_L1 <--> L1 D cache store misses
  165: PM_ST_REF_L1 <--> L1 D cache store references
  166: PM_ST_REF_L1_LSU0 <--> LSU0 L1 D cache store references
  167: PM_ST_REF_L1_LSU1 <--> LSU1 L1 D cache store references
  168: PM_SUSPENDED <--> Suspended
  169: PM_CLB_EMPTY_CYC <--> Cycles CLB empty
  170: PM_THRD_L2MISS_BOTH_CYC <--> Cycles both threads in L2 misses
  171: PM_THRD_PRIO_DIFF_0_CYC <--> Cycles no thread priority difference
  172: PM_THRD_PRIO_DIFF_1or2_CYC <--> Cycles thread priority difference 
	is 1 or 2
  173: PM_THRD_PRIO_DIFF_3or4_CYC <--> Cycles thread priority difference 
	is 3 or 4
  174: PM_THRD_PRIO_DIFF_5or6_CYC <--> Cycles thread priority difference 
	is 5 or 6
  175: PM_THRD_PRIO_DIFF_minus1or2_CYC <--> Cycles thread priority difference is -1 or -2
  176: PM_THRD_PRIO_DIFF_minus3or4_CYC <--> Cycles thread priority difference is -3 or -4
  177: PM_THRD_PRIO_DIFF_minus5or6_CYC <--> Cycles thread priority difference is -5 or -6
  178: PM_THRD_SEL_OVER_CLB_EMPTY <--> Thread selection overides caused by CLB empty
  179: PM_THRD_SEL_OVER_GCT_IMBAL <--> Thread selection overides caused by GCT imbalance
  180: PM_THRD_SEL_OVER_ISU_HOLD <--> Thread selection overides caused by ISU holds
  181: PM_THRD_SEL_OVER_L2MISS <--> Thread selection overides caused by L2 misses
  182: PM_THRD_SEL_T0 <--> Decode selected thread 0
  183: PM_THRD_SEL_T1 <--> Decode selected thread 1
  184: PM_THRD_SMT_HANG <--> SMT hang detected
  185: PM_THRESH_TIMEO <--> Threshold timeout
  186: PM_TLBIE_HELD <--> TLBIE held at dispatch
  187: PM_DATA_FROM_L2MISS <--> Data loaded missed L2
  188: PM_MRK_DATA_FROM_L2MISS <--> Marked data loaded missed L2
  189: PM_PTEG_FROM_L2MISS <--> PTEG loaded from L2 miss
 Counter 3: 193 events
  0: PM_0INST_FETCH <--> No instructions fetched
  1: PM_BR_ISSUED <--> Branches issued
  2: PM_BR_MPRED_CR <--> Branch mispredictions due CR bit setting
  3: PM_BR_MPRED_TA <--> Branch mispredictions due to target address
  4: PM_BR_PRED_CR <--> A conditional branch was predicted, CR prediction
  5: PM_BR_PRED_CR_TA <--> A conditional branch was predicted, CR and target prediction
  6: PM_BR_PRED_TA <--> A conditional branch was predicted, target prediction
  7: PM_CMPLU_STALL_DIV <--> Completion stall caused by DIV instruction
  8: PM_CMPLU_STALL_ERAT_MISS <--> Completion stall caused by ERAT miss
  9: PM_CMPLU_STALL_FPU <--> Completion stall caused by FPU instruction
  10: PM_CMPLU_STALL_REJECT <--> Completion stall caused by reject
  11: PM_CRQ_FULL_CYC <--> Cycles CR issue queue full
  12: PM_CYC <--> Processor cycles
  13: PM_DATA_FROM_L275_MOD <--> Data loaded from L2.75 modified
  14: PM_DATA_FROM_L375_MOD <--> Data loaded from L3.75 modified
  15: PM_DATA_FROM_RMEM <--> Data loaded from remote memory
  16: PM_DC_INV_L2 <--> L1 D cache entries invalidated from L2
  17: PM_DC_PREF_DST <--> DST (Data Stream Touch) stream start
  18: PM_DC_PREF_STREAM_ALLOC <--> D cache new prefetch stream allocated
  19: PM_EE_OFF <--> Cycles MSR(EE) bit off
  20: PM_EE_OFF_EXT_INT <--> Cycles MSR(EE) bit off and external interrupt pending
  21: PM_EXT_INT <--> External interrupts
  22: PM_FAB_CMD_RETRIED <--> Fabric command retried
  23: PM_FAB_DCLAIM_RETRIED <--> dclaim retried
  24: PM_FAB_M1toVNorNN_SIDECAR_EMPTY <--> M1 to VN/NN sidecar empty
  25: PM_FAB_P1toVNorNN_SIDECAR_EMPTY <--> P1 to VN/NN sidecar empty
  26: PM_FAB_PNtoNN_SIDECAR <--> PN to NN beat went to sidecar first
  27: PM_FAB_PNtoVN_SIDECAR <--> PN to VN beat went to sidecar first
  28: PM_FAB_VBYPASS_EMPTY <--> Vertical bypass buffer empty
  29: PM_FLUSH_BR_MPRED <--> Flush caused by branch mispredict
  30: PM_FLUSH_IMBAL <--> Flush caused by thread GCT imbalance
  31: PM_FLUSH <--> Flushes
  32: PM_FLUSH_SB <--> Flush caused by scoreboard operation
  33: PM_FLUSH_SYNC <--> Flush caused by sync
  34: PM_FPU0_FEST <--> FPU0 executed FEST instruction
  35: PM_FPU0_FIN <--> FPU0 produced a result
  36: PM_FPU0_FMOV_FEST <--> FPU0 executed FMOV or FEST instructions
  37: PM_FPU0_FPSCR <--> FPU0 executed FPSCR instruction
  38: PM_FPU0_FRSP_FCONV <--> FPU0 executed FRSP or FCONV instructions
  39: PM_FPU1_FEST <--> FPU1 executed FEST instruction
  40: PM_FPU1_FIN <--> FPU1 produced a result
  41: PM_FPU1_FMOV_FEST <--> FPU1 executing FMOV or FEST instructions
  42: PM_FPU1_FRSP_FCONV <--> FPU1 executed FRSP or FCONV instructions
  43: PM_FPU_FEST <--> FPU executed FEST instruction
  44: PM_FPU_FIN <--> FPU produced a result
  45: PM_FXLS0_FULL_CYC <--> Cycles FXU0/LS0 queue full
  46: PM_FXLS1_FULL_CYC <--> Cycles FXU1/LS1 queue full
  47: PM_FXLS_FULL_CYC <--> Cycles FXLS queue is full
  48: PM_FXU0_FIN <--> FXU0 produced a result
  49: PM_FXU1_BUSY_FXU0_IDLE <--> FXU1 busy FXU0 idle
  50: PM_FXU1_FIN <--> FXU1 produced a result
  51: PM_GCT_NOSLOT_BR_MPRED <--> No slot in GCT caused by branch mispredict
  52: PM_GCT_FULL_CYC <--> Cycles GCT full
  53: PM_GPR_MAP_FULL_CYC <--> Cycles GPR mapper full
  54: PM_GRP_DISP_BLK_SB_CYC <--> Cycles group dispatch blocked by scoreboard
  55: PM_GRP_DISP_REJECT <--> Group dispatch rejected
  56: PM_IC_DEMAND_L2_BHT_REDIRECT <--> L2 I cache demand request due to BHT redirect
  57: PM_IC_DEMAND_L2_BR_REDIRECT <--> L2 I cache demand request due to branch redirect
  58: PM_IC_PREF_INSTALL <--> Instruction prefetched installed in prefetch buffer
  59: PM_IOPS_CMPL <--> IOPS instructions completed
  60: PM_INST_DISP <--> Instructions dispatched
  61: PM_INST_FROM_L275_MOD <--> Instruction fetched from L2.75 modified
  62: PM_INST_FROM_L375_MOD <--> Instruction fetched from L3.75 modified
  63: PM_INST_FROM_RMEM <--> Instruction fetched from remote memory
  64: PM_L1_DCACHE_RELOAD_VALID <--> L1 reload data source valid
  65: PM_L1_PREF <--> L1 cache data prefetches
  66: PM_L1_WRITE_CYC <--> Cycles writing to instruction L1
  67: PM_L2SA_MOD_INV <--> L2 slice A transition from modified to invalid
  68: PM_L2SA_RCLD_DISP_FAIL_ADDR <--> L2 Slice A RC load dispatch attempt 
	failed due to address collision with RC/CO/SN/SQ
  69: PM_L2SA_RCLD_DISP_FAIL_OTHER <--> L2 Slice A RC load dispatch attempt 
	failed due to other reasons
  70: PM_L2SA_RCST_DISP_FAIL_ADDR <--> L2 Slice A RC store dispatch attempt 
	failed due to address collision with RC/CO/SN/SQ
  71: PM_L2SA_RCST_DISP_FAIL_OTHER <--> L2 Slice A RC store dispatch attempt 
	failed due to other reasons
  72: PM_L2SA_RC_DISP_FAIL_CO_BUSY_ALL <--> L2 Slice A RC dispatch attempt 
	failed due to all CO busy
  73: PM_L2SA_SHR_INV <--> L2 slice A transition from shared to invalid
  74: PM_L2SA_ST_HIT <--> L2 slice A store hits
  75: PM_L2SB_MOD_INV <--> L2 slice B transition from modified to invalid
  76: PM_L2SB_RCLD_DISP_FAIL_ADDR <--> L2 Slice B RC load dispatch attempt 
	failed due to address collision with RC/CO/SN/SQ
  77: PM_L2SB_RCLD_DISP_FAIL_OTHER <--> L2 Slice B RC load dispatch attempt 
	failed due to other reasons
  78: PM_L2SB_RCST_DISP_FAIL_ADDR <--> L2 Slice B RC store dispatch attempt 
	failed due to address collision with RC/CO/SN/SQ
  79: PM_L2SB_RCST_DISP_FAIL_OTHER <--> L2 Slice B RC store dispatch attempt 
	failed due to other reasons
  80: PM_L2SB_RC_DISP_FAIL_CO_BUSY_ALL <--> L2 Slice B RC dispatch attempt 
	failed due to all CO busy
  81: PM_L2SB_SHR_INV <--> L2 slice B transition from shared to invalid
  82: PM_L2SB_ST_HIT <--> L2 slice B store hits
  83: PM_L2SC_MOD_INV <--> L2 slice C transition from modified to invalid
  84: PM_L2SC_RCLD_DISP_FAIL_ADDR <--> L2 Slice C RC load dispatch attempt 
	failed due to address collision with RC/CO/SN/SQ
  85: PM_L2SC_RCLD_DISP_FAIL_OTHER <--> L2 Slice C RC load dispatch attempt 
	failed due to other reasons
  86: PM_L2SC_RCST_DISP_FAIL_ADDR <--> L2 Slice C RC store dispatch attempt 
	failed due to address collision with RC/CO/SN/SQ
  87: PM_L2SC_RCST_DISP_FAIL_OTHER <--> L2 Slice C RC store dispatch attempt 
	failed due to other reasons
  88: PM_L2SC_RC_DISP_FAIL_CO_BUSY_ALL <--> L2 Slice C RC dispatch attempt 
	failed due to all CO busy
  89: PM_L2SC_SHR_INV <--> L2 slice C transition from shared to invalid
  90: PM_L2SC_ST_HIT <--> L2 slice C store hits
  91: PM_L2_PREF <--> L2 cache prefetches
  92: PM_L3SA_HIT <--> L3 slice A hits
  93: PM_L3SA_MOD_INV <--> L3 slice A transition from modified to invalid
  94: PM_L3SA_SHR_INV <--> L3 slice A transition from shared to invalid
  95: PM_L3SA_SNOOP_RETRY <--> L3 slice A snoop retries
  96: PM_L3SB_HIT <--> L3 slice B hits
  97: PM_L3SB_MOD_INV <--> L3 slice B transition from modified to invalid
  98: PM_L3SB_SHR_INV <--> L3 slice B transition from shared to invalid
  99: PM_L3SB_SNOOP_RETRY <--> L3 slice B snoop retries
  100: PM_L3SC_HIT <--> L3 Slice C hits
  101: PM_L3SC_MOD_INV <--> L3 slice C transition from modified to invalid
  102: PM_L3SC_SHR_INV <--> L3 slice C transition from shared to invalid
  103: PM_L3SC_SNOOP_RETRY <--> L3 slice C snoop retries
  104: PM_LD_MISS_L1_LSU0 <--> LSU0 L1 D cache load misses
  105: PM_LD_MISS_L1_LSU1 <--> LSU1 L1 D cache load misses
  106: PM_LD_REF_L1 <--> L1 D cache load references
  107: PM_LD_REF_L1_LSU0 <--> LSU0 L1 D cache load references
  108: PM_LD_REF_L1_LSU1 <--> LSU1 L1 D cache load references
  109: PM_LSU0_LDF <--> LSU0 executed Floating Point load instruction
  110: PM_LSU0_NCLD <--> LSU0 non-cacheable loads
  111: PM_LSU1_LDF <--> LSU1 executed Floating Point load instruction
  112: PM_LSU1_NCLD <--> LSU1 non-cacheable loads
  113: PM_LSU_FLUSH <--> Flush initiated by LSU
  114: PM_LSU_FLUSH_SRQ_FULL <--> Flush caused by SRQ full
  115: PM_LSU_LDF <--> LSU executed Floating Point load instruction
  116: PM_LSU_LMQ_FULL_CYC <--> Cycles LMQ full
  117: PM_LSU_LMQ_LHR_MERGE <--> LMQ LHR merges
  118: PM_LSU_LMQ_S0_ALLOC <--> LMQ slot 0 allocated
  119: PM_LSU_LMQ_S0_VALID <--> LMQ slot 0 valid
  120: PM_LSU_LRQ_FULL_CYC <--> Cycles LRQ full
  121: PM_DC_PREF_STREAM_ALLOC_BLK <--> D cache prefetch stream allocations blocked
  122: PM_LSU_SRQ_EMPTY_CYC <--> Cycles SRQ empty
  123: PM_LSU_SRQ_FULL_CYC <--> Cycles SRQ full
  124: PM_LSU_SRQ_SYNC_CYC <--> SRQ sync duration
  125: PM_LWSYNC_HELD <--> LWSYNC held at dispatch
  126: PM_MEM_LO_PRIO_PW_CMPL <--> Low priority partial-write completed
  127: PM_MEM_LO_PRIO_WR_CMPL <--> Low priority write completed
  128: PM_MEM_PW_CMPL <--> Memory partial-write completed
  129: PM_MEM_PW_GATH <--> Memory partial-write gathered
  130: PM_MEM_RQ_DISP_BUSY1to7 <--> Memory read queue dispatched with 1-7 queues busy
  131: PM_MEM_SPEC_RD_CANCEL <--> Speculative memory read canceled
  132: PM_MEM_WQ_DISP_BUSY8to15 <--> Memory write queue dispatched with 8-15 queues busy
  133: PM_MEM_WQ_DISP_DCLAIM <--> Memory write queue dispatched due to dclaim/flush
  134: PM_MRK_CRU_FIN <--> Marked instruction CRU processing finished
  135: PM_MRK_DATA_FROM_L25_MOD_CYC <--> Marked load latency from L2.5 modified
  136: PM_MRK_DATA_FROM_L275_MOD <--> Marked data loaded from L2.75 modified
  137: PM_MRK_DATA_FROM_L275_MOD_CYC <--> Marked load latency from L2.75 modified
  138: PM_MRK_DATA_FROM_L35_MOD_CYC <--> Marked load latency from L3.5 modified
  139: PM_MRK_DATA_FROM_L375_MOD <--> Marked data loaded from L3.75 modified
  140: PM_MRK_DATA_FROM_L375_MOD_CYC <--> Marked load latency from L3.75 modified
  141: PM_MRK_DATA_FROM_LMEM_CYC <--> Marked load latency from local memory
  142: PM_MRK_DATA_FROM_RMEM <--> Marked data loaded from remote memory
  143: PM_MRK_DATA_FROM_RMEM_CYC <--> Marked load latency from remote memory
  144: PM_MRK_DSLB_MISS <--> Marked Data SLB misses
  145: PM_MRK_DTLB_MISS <--> Marked Data TLB misses
  146: PM_MRK_GRP_CMPL <--> Marked group completed
  147: PM_MRK_GRP_IC_MISS <--> Group experienced marked I cache miss
  148: PM_MRK_GRP_TIMEO <--> Marked group completion timeout
  149: PM_MRK_L1_RELOAD_VALID <--> Marked L1 reload data source valid
  150: PM_MRK_LSU0_FLUSH_LRQ <--> LSU0 marked LRQ flushes
  151: PM_MRK_LSU0_FLUSH_SRQ <--> LSU0 marked SRQ flushes
  152: PM_MRK_LSU0_FLUSH_UST <--> LSU0 marked unaligned store flushes
  153: PM_MRK_LSU0_FLUSH_ULD <--> LSU0 marked unaligned load flushes
  154: PM_MRK_LSU1_FLUSH_LRQ <--> LSU1 marked LRQ flushes
  155: PM_MRK_LSU1_FLUSH_SRQ <--> LSU1 marked SRQ flushes
  156: PM_MRK_LSU1_FLUSH_ULD <--> LSU1 marked unaligned load flushes
  157: PM_MRK_LSU1_FLUSH_UST <--> LSU1 marked unaligned store flushes
  158: PM_MRK_LSU_FIN <--> Marked instruction LSU processing finished
  159: PM_MRK_LSU_FLUSH_SRQ <--> Marked SRQ flushes
  160: PM_MRK_LSU_FLUSH_ULD <--> Marked unaligned load flushes
  161: PM_MRK_LSU_SRQ_INST_VALID <--> Marked instruction valid in SRQ
  162: PM_PMC3_OVERFLOW <--> PMC3 Overflow
  163: PM_PTEG_FROM_L275_MOD <--> PTEG loaded from L2.75 modified
  164: PM_PTEG_FROM_L375_MOD <--> PTEG loaded from L3.75 modified
  165: PM_PTEG_FROM_RMEM <--> PTEG loaded from remote memory
  166: PM_SNOOP_PARTIAL_RTRY_QFULL <--> Snoop partial write retry due 
	to partial-write queues full
  167: PM_SNOOP_PW_RETRY_WQ_PWQ <--> Snoop partial-write retry due to 
	collision with active write or partial-write queue
  168: PM_SNOOP_RD_RETRY_WQ <--> Snoop read retry due to collision with active write queue
  169: PM_SNOOP_WR_RETRY_QFULL <--> Snoop read retry due to read queue full
  170: PM_SNOOP_WR_RETRY_WQ <--> Snoop write/dclaim retry due to 
	collision with active write queue
  171: PM_ST_MISS_L1 <--> L1 D cache store misses
  172: PM_ST_REF_L1_LSU0 <--> LSU0 L1 D cache store references
  173: PM_ST_REF_L1_LSU1 <--> LSU1 L1 D cache store references
  174: PM_SUSPENDED <--> Suspended
  175: PM_CLB_EMPTY_CYC <--> Cycles CLB empty
  176: PM_THRD_L2MISS_BOTH_CYC <--> Cycles both threads in L2 misses
  177: PM_THRD_PRIO_DIFF_0_CYC <--> Cycles no thread priority difference
  178: PM_THRD_PRIO_DIFF_1or2_CYC <--> Cycles thread priority difference is 1 or 2
  179: PM_THRD_PRIO_DIFF_3or4_CYC <--> Cycles thread priority difference is 3 or 4
  180: PM_THRD_PRIO_DIFF_5or6_CYC <--> Cycles thread priority difference is 5 or 6
  181: PM_THRD_PRIO_DIFF_minus1or2_CYC <--> Cycles thread priority difference is -1 or -2
  182: PM_THRD_PRIO_DIFF_minus3or4_CYC <--> Cycles thread priority difference is -3 or -4
  183: PM_THRD_PRIO_DIFF_minus5or6_CYC <--> Cycles thread priority difference is -5 or -6
  184: PM_THRD_SEL_OVER_CLB_EMPTY <--> Thread selection overides caused by CLB empty
  185: PM_THRD_SEL_OVER_GCT_IMBAL <--> Thread selection overides caused by GCT imbalance
  186: PM_THRD_SEL_OVER_ISU_HOLD <--> Thread selection overides caused by ISU holds
  187: PM_THRD_SEL_OVER_L2MISS <--> Thread selection overides caused by L2 misses
  188: PM_THRD_SEL_T0 <--> Decode selected thread 0
  189: PM_THRD_SEL_T1 <--> Decode selected thread 1
  190: PM_THRD_SMT_HANG <--> SMT hang detected
  191: PM_TLBIE_HELD <--> TLBIE held at dispatch
  192: PM_WORK_HELD <--> Work held
 Counter 4: 1 events
  0: PM_INST_CMPL <--> Instructions completed
 Counter 5: 1 events
  0: PM_RUN_CYC <--> Run cycles

POWER5: number of groups supported: 148
 Group 0:
 PM_RUN_CYC --- Run cycles
 PM_IOPS_CMPL - IOPS instructions completed
 PM_INST_DISP - Instructions dispatched
 PM_CYC ------- Processor cycles
 PM_INST_CMPL - Instructions completed
 PM_RUN_CYC --- Run cycles

 Group 1:
 PM_1PLUS_PPC_CMPL - One or more PPC instruction completed
 PM_GCT_EMPTY_CYC -- Cycles GCT empty
 PM_GRP_CMPL ------- Group completed
 PM_CYC ------------ Processor cycles
 PM_INST_CMPL ------ Instructions completed
 PM_RUN_CYC -------- Run cycles

 Group 2:
 PM_GRP_DISP_VALID ------ Group dispatch valid
 PM_GRP_DISP_REJECT ----- Group dispatch rejected
 PM_GRP_DISP_BLK_SB_CYC - Cycles group dispatch blocked by scoreboard
 PM_INST_DISP ----------- Instructions dispatched
 PM_INST_CMPL ----------- Instructions completed
 PM_RUN_CYC ------------- Run cycles

 Group 3:
 PM_0INST_CLB_CYC ------------- Cycles no instructions in CLB
 PM_2INST_CLB_CYC ------------- Cycles 2 instructions in CLB
 PM_CLB_EMPTY_CYC ------------- Cycles CLB empty
 PM_MRK_DATA_FROM_L35_MOD_CYC - Marked load latency from L3.5 modified
 PM_INST_CMPL ----------------- Instructions completed
 PM_RUN_CYC ------------------- Run cycles

 Group 4:
 PM_5INST_CLB_CYC ---------- Cycles 5 instructions in CLB
 PM_6INST_CLB_CYC ---------- Cycles 6 instructions in CLB
 PM_MRK_LSU_SRQ_INST_VALID - Marked instruction valid in SRQ
 PM_IOPS_CMPL -------------- IOPS instructions completed
 PM_INST_CMPL -------------- Instructions completed
 PM_RUN_CYC ---------------- Run cycles

 Group 5:
 PM_GCT_NOSLOT_CYC ------ Cycles no GCT slot allocated
 PM_GCT_NOSLOT_IC_MISS -- No slot in GCT caused by I cache miss
 PM_GCT_NOSLOT_SRQ_FULL - No slot in GCT caused by SRQ full
 PM_GCT_NOSLOT_BR_MPRED - No slot in GCT caused by branch mispredict
 PM_INST_CMPL ----------- Instructions completed
 PM_RUN_CYC ------------- Run cycles

 Group 6:
 PM_GCT_USAGE_00to59_CYC - Cycles GCT less than 60% full
 PM_GCT_USAGE_60to79_CYC - Cycles GCT 60-79% full
 PM_GCT_USAGE_80to99_CYC - Cycles GCT 80-99% full
 PM_GCT_FULL_CYC --------- Cycles GCT full
 PM_INST_CMPL ------------ Instructions completed
 PM_RUN_CYC -------------- Run cycles

 Group 7:
 PM_LSU_LRQ_S0_ALLOC - LRQ slot 0 allocated
 PM_LSU_LRQ_S0_VALID - LRQ slot 0 valid
 PM_LSU_LMQ_S0_ALLOC - LMQ slot 0 allocated
 PM_LSU_LMQ_S0_VALID - LMQ slot 0 valid
 PM_INST_CMPL -------- Instructions completed
 PM_RUN_CYC ---------- Run cycles

 Group 8:
 PM_LSU_SRQ_S0_ALLOC - SRQ slot 0 allocated
 PM_LSU_SRQ_S0_VALID - SRQ slot 0 valid
 PM_LSU_SRQ_SYNC_CYC - SRQ sync duration
 PM_LSU_SRQ_FULL_CYC - Cycles SRQ full
 PM_INST_CMPL -------- Instructions completed
 PM_RUN_CYC ---------- Run cycles

 Group 9:
 PM_LSU_SRQ_STFWD --------- SRQ store forwarded
 PM_LSU_LMQ_SRQ_EMPTY_CYC - Cycles LMQ and SRQ empty
 PM_LSU_LMQ_LHR_MERGE ----- LMQ LHR merges
 PM_LSU_SRQ_EMPTY_CYC ----- Cycles SRQ empty
 PM_INST_CMPL ------------- Instructions completed
 PM_RUN_CYC --------------- Run cycles

 Group 10:
 PM_INST_FROM_L2MISS --------- Instructions fetched missed L2
 PM_INST_FETCH_CYC ----------- Cycles at least 1 instruction fetched
 PM_DC_PREF_STREAM_ALLOC_BLK - D cache prefetch stream allocations blocked
 PM_DC_PREF_STREAM_ALLOC ----- D cache new prefetch stream allocated
 PM_INST_CMPL ---------------- Instructions completed
 PM_RUN_CYC ------------------ Run cycles

 Group 11:
 PM_IOPS_CMPL ------- IOPS instructions completed
 PM_CLB_FULL_CYC ---- Cycles CLB full
 PM_L1_PREF --------- L1 cache data prefetches
 PM_IC_PREF_INSTALL - Instruction prefetched installed in prefetch buffer
 PM_INST_CMPL ------- Instructions completed
 PM_RUN_CYC --------- Run cycles

 Group 12:
 PM_LSU_BUSY_REJECT - LSU busy due to reject
 PM_1INST_CLB_CYC --- Cycles 1 instruction in CLB
 PM_L2_PREF --------- L2 cache prefetches
 PM_IOPS_CMPL ------- IOPS instructions completed
 PM_INST_CMPL ------- Instructions completed
 PM_RUN_CYC --------- Run cycles

 Group 13:
 PM_LSU0_REJECT_SRQ_LHS - LSU0 SRQ rejects
 PM_LSU1_REJECT_SRQ_LHS - LSU1 SRQ rejects
 PM_DC_PREF_DST --------- DST (Data Stream Touch) stream start
 PM_L2_PREF ------------- L2 cache prefetches
 PM_INST_CMPL ----------- Instructions completed
 PM_RUN_CYC ------------- Run cycles

 Group 14:
 PM_LSU_REJECT_ERAT_MISS - LSU reject due to ERAT miss
 PM_LSU_REJECT_LMQ_FULL -- LSU reject due to LMQ full or missed data coming
 PM_FLUSH_IMBAL ---------- Flush caused by thread GCT imbalance
 PM_MRK_LSU_FLUSH_SRQ ---- Marked SRQ flushes
 PM_INST_CMPL ------------ Instructions completed
 PM_RUN_CYC -------------- Run cycles

 Group 15:
 PM_LSU0_REJECT_RELOAD_CDF - LSU0 reject due to reload CDF or tag update collision
 PM_LSU1_REJECT_RELOAD_CDF - LSU1 reject due to reload CDF or tag update collision
 PM_IOPS_CMPL -------------- IOPS instructions completed
 PM_L1_WRITE_CYC ----------- Cycles writing to instruction L1
 PM_INST_CMPL -------------- Instructions completed
 PM_RUN_CYC ---------------- Run cycles

 Group 16:
 PM_LSU0_REJECT_ERAT_MISS - LSU0 reject due to ERAT miss
 PM_LSU1_REJECT_ERAT_MISS - LSU1 reject due to ERAT miss
 PM_LWSYNC_HELD ----------- LWSYNC held at dispatch
 PM_TLBIE_HELD ------------ TLBIE held at dispatch
 PM_INST_CMPL ------------- Instructions completed
 PM_RUN_CYC --------------- Run cycles

 Group 17:
 PM_LSU0_REJECT_LMQ_FULL - LSU0 reject due to LMQ full or missed data coming
 PM_LSU1_REJECT_LMQ_FULL - LSU1 reject due to LMQ full or missed data coming
 PM_IOPS_CMPL ------------ IOPS instructions completed
 PM_BR_ISSUED ------------ Branches issued
 PM_INST_CMPL ------------ Instructions completed
 PM_RUN_CYC -------------- Run cycles

 Group 18:
 PM_LSU_REJECT_SRQ_LHS ---- LSU SRQ rejects
 PM_LSU_REJECT_RELOAD_CDF - LSU reject due to reload CDF or tag update collision
 PM_LSU_FLUSH ------------- Flush initiated by LSU
 PM_FLUSH ----------------- Flushes
 PM_INST_CMPL ------------- Instructions completed
 PM_RUN_CYC --------------- Run cycles

 Group 19:
 PM_IOPS_CMPL ----- IOPS instructions completed
 PM_LSU_FLUSH_UST - SRQ unaligned store flushes
 PM_FLUSH_IMBAL --- Flush caused by thread GCT imbalance
 PM_DC_INV_L2 ----- L1 D cache entries invalidated from L2
 PM_INST_CMPL ----- Instructions completed
 PM_RUN_CYC ------- Run cycles

 Group 20:
 PM_ITLB_MISS -- Instruction TLB misses
 PM_IOPS_CMPL -- IOPS instructions completed
 PM_FLUSH_SB --- Flush caused by scoreboard operation
 PM_FLUSH_SYNC - Flush caused by sync
 PM_INST_CMPL -- Instructions completed
 PM_RUN_CYC ---- Run cycles

 Group 21:
 PM_LSU_FLUSH_SRQ - SRQ flushes
 PM_LSU_FLUSH_LRQ - LRQ flushes
 PM_IOPS_CMPL ----- IOPS instructions completed
 PM_LSU_FLUSH ----- Flush initiated by LSU
 PM_INST_CMPL ----- Instructions completed
 PM_RUN_CYC ------- Run cycles

 Group 22:
 PM_LSU0_FLUSH_LRQ - LSU0 LRQ flushes
 PM_LSU1_FLUSH_LRQ - LSU1 LRQ flushes
 PM_LSU_FLUSH ------ Flush initiated by LSU
 PM_IOPS_CMPL ------ IOPS instructions completed
 PM_INST_CMPL ------ Instructions completed
 PM_RUN_CYC -------- Run cycles

 Group 23:
 PM_LSU0_FLUSH_SRQ - LSU0 SRQ flushes
 PM_LSU1_FLUSH_SRQ - LSU1 SRQ flushes
 PM_IOPS_CMPL ------ IOPS instructions completed
 PM_LSU_FLUSH ------ Flush initiated by LSU
 PM_INST_CMPL ------ Instructions completed
 PM_RUN_CYC -------- Run cycles

 Group 24:
 PM_LSU_FLUSH_ULD - LRQ unaligned load flushes
 PM_LSU_FLUSH_UST - SRQ unaligned store flushes
 PM_BR_ISSUED ----- Branches issued
 PM_IOPS_CMPL ----- IOPS instructions completed
 PM_INST_CMPL ----- Instructions completed
 PM_RUN_CYC ------- Run cycles

 Group 25:
 PM_LSU0_FLUSH_ULD - LSU0 unaligned load flushes
 PM_LSU1_FLUSH_ULD - LSU1 unaligned load flushes
 PM_LSU_FLUSH ------ Flush initiated by LSU
 PM_IOPS_CMPL ------ IOPS instructions completed
 PM_INST_CMPL ------ Instructions completed
 PM_RUN_CYC -------- Run cycles

 Group 26:
 PM_LSU0_FLUSH_UST - LSU0 unaligned store flushes
 PM_LSU1_FLUSH_UST - LSU1 unaligned store flushes
 PM_IOPS_CMPL ------ IOPS instructions completed
 PM_LSU_FLUSH ------ Flush initiated by LSU
 PM_INST_CMPL ------ Instructions completed
 PM_RUN_CYC -------- Run cycles

 Group 27:
 PM_LSU_FLUSH_LRQ_FULL - Flush caused by LRQ full
 PM_IOPS_CMPL ---------- IOPS instructions completed
 PM_MRK_LSU_FLUSH_LRQ -- Marked LRQ flushes
 PM_LSU_FLUSH_SRQ_FULL - Flush caused by SRQ full
 PM_INST_CMPL ---------- Instructions completed
 PM_RUN_CYC ------------ Run cycles

 Group 28:
 PM_GRP_MRK ------------ Group marked in IDU
 PM_CMPLU_STALL_LSU ---- Completion stall caused by LSU instruction
 PM_IOPS_CMPL ---------- IOPS instructions completed
 PM_CMPLU_STALL_REJECT - Completion stall caused by reject
 PM_INST_CMPL ---------- Instructions completed
 PM_RUN_CYC ------------ Run cycles

 Group 29:
 PM_IOPS_CMPL --------------- IOPS instructions completed
 PM_CMPLU_STALL_DCACHE_MISS - Completion stall caused by D cache miss
 PM_CYC --------------------- Processor cycles
 PM_CMPLU_STALL_ERAT_MISS --- Completion stall caused by ERAT miss
 PM_INST_CMPL --------------- Instructions completed
 PM_RUN_CYC ----------------- Run cycles

 Group 30:
 PM_GRP_IC_MISS_BR_REDIR_NONSPEC - Group experienced non-speculative I cache miss or branch redirect
 PM_CMPLU_STALL_FXU -------------- Completion stall caused by FXU instruction
 PM_IOPS_CMPL -------------------- IOPS instructions completed
 PM_CMPLU_STALL_DIV -------------- Completion stall caused by DIV instruction
 PM_INST_CMPL -------------------- Instructions completed
 PM_RUN_CYC ---------------------- Run cycles

 Group 31:
 PM_FPU_FULL_CYC ----- Cycles FPU issue queue full
 PM_CMPLU_STALL_FDIV - Completion stall caused by FDIV or FQRT instruction
 PM_IOPS_CMPL -------- IOPS instructions completed
 PM_CMPLU_STALL_FPU -- Completion stall caused by FPU instruction
 PM_INST_CMPL -------- Instructions completed
 PM_RUN_CYC ---------- Run cycles

 Group 32:
 PM_LARX_LSU0 -------- Larx executed on LSU0
 PM_BRQ_FULL_CYC ----- Cycles branch queue full
 PM_LSU_LRQ_FULL_CYC - Cycles LRQ full
 PM_LSU_LMQ_FULL_CYC - Cycles LMQ full
 PM_INST_CMPL -------- Instructions completed
 PM_RUN_CYC ---------- Run cycles

 Group 33:
 PM_FPU0_FULL_CYC -- Cycles FPU0 issue queue full
 PM_FPU1_FULL_CYC -- Cycles FPU1 issue queue full
 PM_FXLS0_FULL_CYC - Cycles FXU0/LS0 queue full
 PM_FXLS1_FULL_CYC - Cycles FXU1/LS1 queue full
 PM_INST_CMPL ------ Instructions completed
 PM_RUN_CYC -------- Run cycles

 Group 34:
 PM_CR_MAP_FULL_CYC ----- Cycles CR logical operation mapper full
 PM_LR_CTR_MAP_FULL_CYC - Cycles LR/CTR mapper full
 PM_GPR_MAP_FULL_CYC ---- Cycles GPR mapper full
 PM_CRQ_FULL_CYC -------- Cycles CR issue queue full
 PM_INST_CMPL ----------- Instructions completed
 PM_RUN_CYC ------------- Run cycles

 Group 35:
 PM_FPR_MAP_FULL_CYC ----- Cycles FPR mapper full
 PM_XER_MAP_FULL_CYC ----- Cycles XER mapper full
 PM_MRK_DATA_FROM_L2MISS - Marked data loaded missed L2
 PM_IOPS_CMPL ------------ IOPS instructions completed
 PM_INST_CMPL ------------ Instructions completed
 PM_RUN_CYC -------------- Run cycles

 Group 36:
 PM_STCX_FAIL - STCX failed
 PM_STCX_PASS - Stcx passes
 PM_LSU0_NCLD - LSU0 non-cacheable loads
 PM_LSU1_NCLD - LSU1 non-cacheable loads
 PM_INST_CMPL - Instructions completed
 PM_RUN_CYC --- Run cycles

 Group 37:
 PM_LSU0_BUSY_REJECT ---------- LSU0 busy due to reject
 PM_LSU1_BUSY_REJECT ---------- LSU1 busy due to reject
 PM_IC_DEMAND_L2_BHT_REDIRECT - L2 I cache demand request due to BHT redirect
 PM_IC_DEMAND_L2_BR_REDIRECT -- L2 I cache demand request due to branch redirect
 PM_INST_CMPL ----------------- Instructions completed
 PM_RUN_CYC ------------------- Run cycles

 Group 38:
 PM_IERAT_XLATE_WR -- Translation written to ierat
 PM_IC_PREF_REQ ----- Instruction prefetch requests
 PM_IC_PREF_INSTALL - Instruction prefetched installed in prefetch buffer
 PM_0INST_FETCH ----- No instructions fetched
 PM_INST_CMPL ------- Instructions completed
 PM_RUN_CYC --------- Run cycles

 Group 39:
 PM_GRP_IC_MISS_NONSPEC ---- Group experienced non-speculative I cache miss
 PM_GRP_IC_MISS ------------ Group experienced I cache miss
 PM_L1_DCACHE_RELOAD_VALID - L1 reload data source valid
 PM_IOPS_CMPL -------------- IOPS instructions completed
 PM_INST_CMPL -------------- Instructions completed
 PM_RUN_CYC ---------------- Run cycles

 Group 40:
 PM_TLB_MISS ---- TLB misses
 PM_SLB_MISS ---- SLB misses
 PM_BR_MPRED_CR - Branch mispredictions due CR bit setting
 PM_BR_MPRED_TA - Branch mispredictions due to target address
 PM_INST_CMPL --- Instructions completed
 PM_RUN_CYC ----- Run cycles

 Group 41:
 PM_BR_UNCOND ----- Unconditional branch
 PM_BR_PRED_TA ---- A conditional branch was predicted, target prediction
 PM_BR_PRED_CR ---- A conditional branch was predicted, CR prediction
 PM_BR_PRED_CR_TA - A conditional branch was predicted, CR and target prediction
 PM_INST_CMPL ----- Instructions completed
 PM_RUN_CYC ------- Run cycles

 Group 42:
 PM_GRP_BR_REDIR_NONSPEC - Group experienced non-speculative branch redirect
 PM_GRP_BR_REDIR --------- Group experienced branch redirect
 PM_FLUSH_BR_MPRED ------- Flush caused by branch mispredict
 PM_IOPS_CMPL ------------ IOPS instructions completed
 PM_INST_CMPL ------------ Instructions completed
 PM_RUN_CYC -------------- Run cycles

 Group 43:
 PM_DATA_TABLEWALK_CYC - Cycles doing data tablewalks
 PM_DTLB_MISS ---------- Data TLB misses
 PM_LD_MISS_L1 --------- L1 D cache load misses
 PM_LD_REF_L1 ---------- L1 D cache load references
 PM_INST_CMPL ---------- Instructions completed
 PM_RUN_CYC ------------ Run cycles

 Group 44:
 PM_DATA_FROM_L2 --- Data loaded from L2
 PM_LSU_DERAT_MISS - DERAT misses
 PM_ST_REF_L1 ------ L1 D cache store references
 PM_ST_MISS_L1 ----- L1 D cache store misses
 PM_INST_CMPL ------ Instructions completed
 PM_RUN_CYC -------- Run cycles

 Group 45:
 PM_DSLB_MISS ------- Data SLB misses
 PM_ISLB_MISS ------- Instruction SLB misses
 PM_LD_MISS_L1_LSU0 - LSU0 L1 D cache load misses
 PM_LD_MISS_L1_LSU1 - LSU1 L1 D cache load misses
 PM_INST_CMPL ------- Instructions completed
 PM_RUN_CYC --------- Run cycles

 Group 46:
 PM_DTLB_REF_4K ---- Data TLB reference for 4K page
 PM_DTLB_MISS_4K --- Data TLB miss for 4K page
 PM_LD_REF_L1_LSU0 - LSU0 L1 D cache load references
 PM_LD_REF_L1_LSU1 - LSU1 L1 D cache load references
 PM_INST_CMPL ------ Instructions completed
 PM_RUN_CYC -------- Run cycles

 Group 47:
 PM_DTLB_REF_16M --- Data TLB reference for 16M page
 PM_DTLB_MISS_16M -- Data TLB miss for 16M page
 PM_ST_REF_L1_LSU0 - LSU0 L1 D cache store references
 PM_ST_REF_L1_LSU1 - LSU1 L1 D cache store references
 PM_INST_CMPL ------ Instructions completed
 PM_RUN_CYC -------- Run cycles

 Group 48:
 PM_DATA_FROM_L3 --- Data loaded from L3
 PM_DATA_FROM_LMEM - Data loaded from local memory
 PM_FLUSH ---------- Flushes
 PM_IOPS_CMPL ------ IOPS instructions completed
 PM_INST_CMPL ------ Instructions completed
 PM_RUN_CYC -------- Run cycles

 Group 49:
 PM_DATA_FROM_L3 ----- Data loaded from L3
 PM_DATA_FROM_LMEM --- Data loaded from local memory
 PM_DATA_FROM_L2MISS - Data loaded missed L2
 PM_DATA_FROM_RMEM --- Data loaded from remote memory
 PM_INST_CMPL -------- Instructions completed
 PM_RUN_CYC ---------- Run cycles

 Group 50:
 PM_DATA_FROM_L25_SHR -- Data loaded from L2.5 shared
 PM_DATA_FROM_L25_MOD -- Data loaded from L2.5 modified
 PM_DATA_FROM_L275_SHR - Data loaded from L2.75 shared
 PM_DATA_FROM_L275_MOD - Data loaded from L2.75 modified
 PM_INST_CMPL ---------- Instructions completed
 PM_RUN_CYC ------------ Run cycles

 Group 51:
 PM_DATA_FROM_L35_SHR -- Data loaded from L3.5 shared
 PM_DATA_FROM_L35_MOD -- Data loaded from L3.5 modified
 PM_DATA_FROM_L375_SHR - Data loaded from L3.75 shared
 PM_DATA_FROM_L375_MOD - Data loaded from L3.75 modified
 PM_INST_CMPL ---------- Instructions completed
 PM_RUN_CYC ------------ Run cycles

 Group 52:
 PM_INST_FROM_L3 --- Instruction fetched from L3
 PM_INST_FROM_L1 --- Instruction fetched from L1
 PM_INST_FROM_PREF - Instructions fetched from prefetch
 PM_INST_FROM_RMEM - Instruction fetched from remote memory
 PM_INST_CMPL ------ Instructions completed
 PM_RUN_CYC -------- Run cycles

 Group 53:
 PM_INST_FROM_L2 --- Instructions fetched from L2
 PM_INST_FROM_LMEM - Instruction fetched from local memory
 PM_IOPS_CMPL ------ IOPS instructions completed
 PM_0INST_FETCH ---- No instructions fetched
 PM_INST_CMPL ------ Instructions completed
 PM_RUN_CYC -------- Run cycles

 Group 54:
 PM_INST_FROM_L25_SHR -- Instruction fetched from L2.5 shared
 PM_INST_FROM_L25_MOD -- Instruction fetched from L2.5 modified
 PM_INST_FROM_L275_SHR - Instruction fetched from L2.75 shared
 PM_INST_FROM_L275_MOD - Instruction fetched from L2.75 modified
 PM_INST_CMPL ---------- Instructions completed
 PM_RUN_CYC ------------ Run cycles

 Group 55:
 PM_INST_FROM_L35_SHR -- Instruction fetched from L3.5 shared
 PM_INST_FROM_L35_MOD -- Instruction fetched from L3.5 modified
 PM_INST_FROM_L375_SHR - Instruction fetched from L3.75 shared
 PM_INST_FROM_L375_MOD - Instruction fetched from L3.75 modified
 PM_INST_CMPL ---------- Instructions completed
 PM_RUN_CYC ------------ Run cycles

 Group 56:
 PM_PTEG_FROM_L25_SHR -- PTEG loaded from L2.5 shared
 PM_PTEG_FROM_L25_MOD -- PTEG loaded from L2.5 modified
 PM_PTEG_FROM_L275_SHR - PTEG loaded from L2.75 shared
 PM_PTEG_FROM_L275_MOD - PTEG loaded from L2.75 modified
 PM_INST_CMPL ---------- Instructions completed
 PM_RUN_CYC ------------ Run cycles

 Group 57:
 PM_PTEG_FROM_L35_SHR -- PTEG loaded from L3.5 shared
 PM_PTEG_FROM_L35_MOD -- PTEG loaded from L3.5 modified
 PM_PTEG_FROM_L375_SHR - PTEG loaded from L3.75 shared
 PM_PTEG_FROM_L375_MOD - PTEG loaded from L3.75 modified
 PM_INST_CMPL ---------- Instructions completed
 PM_RUN_CYC ------------ Run cycles

 Group 58:
 PM_PTEG_FROM_L2 ----- PTEG loaded from L2
 PM_PTEG_FROM_LMEM --- PTEG loaded from local memory
 PM_PTEG_FROM_L2MISS - PTEG loaded from L2 miss
 PM_PTEG_FROM_RMEM --- PTEG loaded from remote memory
 PM_INST_CMPL -------- Instructions completed
 PM_RUN_CYC ---------- Run cycles

 Group 59:
 PM_PTEG_FROM_L3 ----- PTEG loaded from L3
 PM_GRP_DISP --------- Group dispatches
 PM_GRP_DISP_SUCCESS - Group dispatch success
 PM_DC_INV_L2 -------- L1 D cache entries invalidated from L2
 PM_INST_CMPL -------- Instructions completed
 PM_RUN_CYC ---------- Run cycles

 Group 60:
 PM_L2SA_RCLD_DISP -------------- L2 Slice A RC load dispatch attempt
 PM_L2SA_RCLD_DISP_FAIL_RC_FULL - L2 Slice A RC load dispatch attempt 
		failed due to all RC full
 PM_L2SA_RCLD_DISP_FAIL_ADDR ---- L2 Slice A RC load dispatch attempt 
		failed due to address collision with RC/CO/SN/SQ
 PM_L2SA_RCLD_DISP_FAIL_OTHER --- L2 Slice A RC load dispatch attempt 
		failed due to other reasons
 PM_INST_CMPL ------------------- Instructions completed
 PM_RUN_CYC --------------------- Run cycles

 Group 61:
 PM_L2SA_RCST_DISP -------------- L2 Slice A RC store dispatch attempt
 PM_L2SA_RCST_DISP_FAIL_RC_FULL - L2 Slice A RC store dispatch 
			attempt failed due to all RC full
 PM_L2SA_RCST_DISP_FAIL_ADDR ---- L2 Slice A RC store dispatch attempt 
			failed due to address collision with RC/CO/SN/SQ
 PM_L2SA_RCST_DISP_FAIL_OTHER --- L2 Slice A RC store dispatch attempt 
			failed due to other reasons
 PM_INST_CMPL ------------------- Instructions completed
 PM_RUN_CYC --------------------- Run cycles

 Group 62:
 PM_L2SA_RC_DISP_FAIL_CO_BUSY ----- L2 Slice A RC dispatch attempt failed 
		due to RC/CO pair chosen was miss and CO already busy
 PM_L2SA_ST_REQ ------------------- L2 slice A store requests
 PM_L2SA_RC_DISP_FAIL_CO_BUSY_ALL - L2 Slice A RC dispatch attempt failed 
				due to all CO busy
 PM_L2SA_ST_HIT ------------------- L2 slice A store hits
 PM_INST_CMPL --------------------- Instructions completed
 PM_RUN_CYC ----------------------- Run cycles

 Group 63:
 PM_L2SB_RCLD_DISP -------------- L2 Slice B RC load dispatch attempt
 PM_L2SB_RCLD_DISP_FAIL_RC_FULL - L2 Slice B RC load dispatch attempt 
				failed due to all RC full
 PM_L2SB_RCLD_DISP_FAIL_ADDR ---- L2 Slice B RC load dispatch attempt 
		failed due to address collision with RC/CO/SN/SQ
 PM_L2SB_RCLD_DISP_FAIL_OTHER --- L2 Slice B RC load dispatch attempt 
					failed due to other reasons
 PM_INST_CMPL ------------------- Instructions completed
 PM_RUN_CYC --------------------- Run cycles

 Group 64:
 PM_L2SB_RCST_DISP -------------- L2 Slice B RC store dispatch attempt
 PM_L2SB_RCST_DISP_FAIL_RC_FULL - L2 Slice B RC store dispatch attempt 
				failed due to all RC full
 PM_L2SB_RCST_DISP_FAIL_ADDR ---- L2 Slice B RC store dispatch 
		attempt failed due to address collision with RC/CO/SN/SQ
 PM_L2SB_RCST_DISP_FAIL_OTHER --- L2 Slice B RC store dispatch attempt 
				failed due to other reasons
 PM_INST_CMPL ------------------- Instructions completed
 PM_RUN_CYC --------------------- Run cycles

 Group 65:
 PM_L2SB_RC_DISP_FAIL_CO_BUSY ----- L2 Slice B RC dispatch attempt failed 
			due to RC/CO pair chosen was miss and CO already busy
 PM_L2SB_ST_REQ ------------------- L2 slice B store requests
 PM_L2SB_RC_DISP_FAIL_CO_BUSY_ALL - L2 Slice B RC dispatch attempt failed 
					due to all CO busy
 PM_L2SB_ST_HIT ------------------- L2 slice B store hits
 PM_INST_CMPL --------------------- Instructions completed
 PM_RUN_CYC ----------------------- Run cycles

 Group 66:
 PM_L2SC_RCLD_DISP -------------- L2 Slice C RC load dispatch attempt
 PM_L2SC_RCLD_DISP_FAIL_RC_FULL - L2 Slice C RC load dispatch attempt 
					failed due to all RC full
 PM_L2SC_RCLD_DISP_FAIL_ADDR ---- L2 Slice C RC load dispatch attempt 
					failed due to address collision with RC/CO/SN/SQ
 PM_L2SC_RCLD_DISP_FAIL_OTHER --- L2 Slice C RC load dispatch attempt 
					failed due to other reasons
 PM_INST_CMPL ------------------- Instructions completed
 PM_RUN_CYC --------------------- Run cycles

 Group 67:
 PM_L2SC_RCST_DISP -------------- L2 Slice C RC store dispatch attempt
 PM_L2SC_RCST_DISP_FAIL_RC_FULL - L2 Slice C RC store dispatch attempt 
					failed due to all RC full
 PM_L2SC_RCST_DISP_FAIL_ADDR ---- L2 Slice C RC store dispatch attempt 
				failed due to address collision with RC/CO/SN/SQ
 PM_L2SC_RCST_DISP_FAIL_OTHER --- L2 Slice C RC store dispatch attempt 
				failed due to other reasons
 PM_INST_CMPL ------------------- Instructions completed
 PM_RUN_CYC --------------------- Run cycles

 Group 68:
 PM_L2SC_RC_DISP_FAIL_CO_BUSY ----- L2 Slice C RC dispatch attempt failed 
			due to RC/CO pair chosen was miss and CO already busy
 PM_L2SC_ST_REQ ------------------- L2 slice C store requests
 PM_L2SC_RC_DISP_FAIL_CO_BUSY_ALL - L2 Slice C RC dispatch attempt failed 
			due to all CO busy
 PM_L2SC_ST_HIT ------------------- L2 slice C store hits
 PM_INST_CMPL --------------------- Instructions completed
 PM_RUN_CYC ----------------------- Run cycles

 Group 69:
 PM_L3SA_MOD_TAG - L3 slice A transition from modified to TAG
 PM_IOPS_CMPL ---- IOPS instructions completed
 PM_L3SA_MOD_INV - L3 slice A transition from modified to invalid
 PM_L3SA_SHR_INV - L3 slice A transition from shared to invalid
 PM_INST_CMPL ---- Instructions completed
 PM_RUN_CYC ------ Run cycles

 Group 70:
 PM_IOPS_CMPL ---- IOPS instructions completed
 PM_L3SB_MOD_TAG - L3 slice B transition from modified to TAG
 PM_L3SB_MOD_INV - L3 slice B transition from modified to invalid
 PM_L3SB_SHR_INV - L3 slice B transition from shared to invalid
 PM_INST_CMPL ---- Instructions completed
 PM_RUN_CYC ------ Run cycles

 Group 71:
 PM_IOPS_CMPL ---- IOPS instructions completed
 PM_L3SC_MOD_TAG - L3 slice C transition from modified to TAG
 PM_L3SC_MOD_INV - L3 slice C transition from modified to invalid
 PM_L3SC_SHR_INV - L3 slice C transition from shared to invalid
 PM_INST_CMPL ---- Instructions completed
 PM_RUN_CYC ------ Run cycles

 Group 72:
 PM_L2SA_MOD_TAG - L2 slice A transition from modified to tagged
 PM_L2SA_SHR_MOD - L2 slice A transition from shared to modified
 PM_L2SA_MOD_INV - L2 slice A transition from modified to invalid
 PM_L2SA_SHR_INV - L2 slice A transition from shared to invalid
 PM_INST_CMPL ---- Instructions completed
 PM_RUN_CYC ------ Run cycles

 Group 73:
 PM_L2SB_MOD_TAG - L2 slice B transition from modified to tagged
 PM_L2SB_SHR_MOD - L2 slice B transition from shared to modified
 PM_L2SB_MOD_INV - L2 slice B transition from modified to invalid
 PM_L2SB_SHR_INV - L2 slice B transition from shared to invalid
 PM_INST_CMPL ---- Instructions completed
 PM_RUN_CYC ------ Run cycles

 Group 74:
 PM_L2SC_MOD_TAG - L2 slice C transition from modified to tagged
 PM_L2SC_SHR_MOD - L2 slice C transition from shared to modified
 PM_L2SC_MOD_INV - L2 slice C transition from modified to invalid
 PM_L2SC_SHR_INV - L2 slice C transition from shared to invalid
 PM_INST_CMPL ---- Instructions completed
 PM_RUN_CYC ------ Run cycles

 Group 75:
 PM_L3SA_ALL_BUSY ---- L3 slice A active for every cycle all CI/CO machines busy
 PM_L3SB_ALL_BUSY ---- L3 slice B active for every cycle all CI/CO machines busy
 PM_L3SA_SNOOP_RETRY - L3 slice A snoop retries
 PM_L3SB_SNOOP_RETRY - L3 slice B snoop retries
 PM_INST_CMPL -------- Instructions completed
 PM_RUN_CYC ---------- Run cycles

 Group 76:
 PM_L3SA_REF -- L3 slice A references
 PM_L3SB_REF -- L3 slice B references
 PM_L3SA_HIT -- L3 slice A hits
 PM_L3SB_HIT -- L3 slice B hits
 PM_INST_CMPL - Instructions completed
 PM_RUN_CYC --- Run cycles

 Group 77:
 PM_L3SC_ALL_BUSY ---- L3 slice C active for every cycle all CI/CO machines busy
 PM_L3SC_REF --------- L3 slice C references
 PM_L3SC_SNOOP_RETRY - L3 slice C snoop retries
 PM_L3SC_HIT --------- L3 Slice C hits
 PM_INST_CMPL -------- Instructions completed
 PM_RUN_CYC ---------- Run cycles

 Group 78:
 PM_FPU_FDIV ------ FPU executed FDIV instruction
 PM_FPU_FMA ------- FPU executed multiply-add instruction
 PM_FPU_FMOV_FEST - FPU executing FMOV or FEST instructions
 PM_FPU_FEST ------ FPU executed FEST instruction
 PM_INST_CMPL ----- Instructions completed
 PM_RUN_CYC ------- Run cycles

 Group 79:
 PM_FPU_1FLOP ------ FPU executed one flop instruction 
 PM_FPU_FSQRT ------ FPU executed FSQRT instruction
 PM_FPU_FRSP_FCONV - FPU executed FRSP or FCONV instructions
 PM_FPU_FIN -------- FPU produced a result
 PM_INST_CMPL ------ Instructions completed
 PM_RUN_CYC -------- Run cycles

 Group 80:
 PM_FPU_DENORM - FPU received denormalized data
 PM_FPU_STALL3 - FPU stalled in pipe3
 PM_FPU0_FIN --- FPU0 produced a result
 PM_FPU1_FIN --- FPU1 produced a result
 PM_INST_CMPL -- Instructions completed
 PM_RUN_CYC ---- Run cycles

 Group 81:
 PM_FPU_SINGLE - FPU executed single precision instruction
 PM_FPU_STF ---- FPU executed store instruction
 PM_IOPS_CMPL -- IOPS instructions completed
 PM_LSU_LDF ---- LSU executed Floating Point load instruction
 PM_INST_CMPL -- Instructions completed
 PM_RUN_CYC ---- Run cycles

 Group 82:
 PM_FPU0_FSQRT - FPU0 executed FSQRT instruction
 PM_FPU1_FSQRT - FPU1 executed FSQRT instruction
 PM_FPU0_FEST -- FPU0 executed FEST instruction
 PM_FPU1_FEST -- FPU1 executed FEST instruction
 PM_INST_CMPL -- Instructions completed
 PM_RUN_CYC ---- Run cycles

 Group 83:
 PM_FPU0_DENORM ---- FPU0 received denormalized data
 PM_FPU1_DENORM ---- FPU1 received denormalized data
 PM_FPU0_FMOV_FEST - FPU0 executed FMOV or FEST instructions
 PM_FPU1_FMOV_FEST - FPU1 executing FMOV or FEST instructions
 PM_INST_CMPL ------ Instructions completed
 PM_RUN_CYC -------- Run cycles

 Group 84:
 PM_FPU0_FDIV ------- FPU0 executed FDIV instruction
 PM_FPU1_FDIV ------- FPU1 executed FDIV instruction
 PM_FPU0_FRSP_FCONV - FPU0 executed FRSP or FCONV instructions
 PM_FPU1_FRSP_FCONV - FPU1 executed FRSP or FCONV instructions
 PM_INST_CMPL ------- Instructions completed
 PM_RUN_CYC --------- Run cycles

 Group 85:
 PM_FPU0_STALL3 - FPU0 stalled in pipe3
 PM_FPU1_STALL3 - FPU1 stalled in pipe3
 PM_IOPS_CMPL --- IOPS instructions completed
 PM_FPU0_FPSCR -- FPU0 executed FPSCR instruction
 PM_INST_CMPL --- Instructions completed
 PM_RUN_CYC ----- Run cycles

 Group 86:
 PM_FPU0_SINGLE - FPU0 executed single precision instruction
 PM_FPU1_SINGLE - FPU1 executed single precision instruction
 PM_LSU0_LDF ---- LSU0 executed Floating Point load instruction
 PM_LSU1_LDF ---- LSU1 executed Floating Point load instruction
 PM_INST_CMPL --- Instructions completed
 PM_RUN_CYC ----- Run cycles

 Group 87:
 PM_FPU0_FMA -------- FPU0 executed multiply-add instruction
 PM_FPU1_FMA -------- FPU1 executed multiply-add instruction
 PM_IOPS_CMPL ------- IOPS instructions completed
 PM_FPU1_FRSP_FCONV - FPU1 executed FRSP or FCONV instructions
 PM_INST_CMPL ------- Instructions completed
 PM_RUN_CYC --------- Run cycles

 Group 88:
 PM_FPU0_1FLOP - FPU0 executed add, mult, sub, cmp or sel instruction
 PM_FPU1_1FLOP - FPU1 executed add, mult, sub, cmp or sel instruction
 PM_FPU0_FIN --- FPU0 produced a result
 PM_IOPS_CMPL -- IOPS instructions completed
 PM_INST_CMPL -- Instructions completed
 PM_RUN_CYC ---- Run cycles

 Group 89:
 PM_FPU0_STF -- FPU0 executed store instruction
 PM_FPU1_STF -- FPU1 executed store instruction
 PM_LSU0_LDF -- LSU0 executed Floating Point load instruction
 PM_IOPS_CMPL - IOPS instructions completed
 PM_INST_CMPL - Instructions completed
 PM_RUN_CYC --- Run cycles

 Group 90:
 PM_FXU_IDLE ------------ FXU idle
 PM_FXU_BUSY ------------ FXU busy
 PM_FXU0_BUSY_FXU1_IDLE - FXU0 busy FXU1 idle
 PM_FXU1_BUSY_FXU0_IDLE - FXU1 busy FXU0 idle
 PM_INST_CMPL ----------- Instructions completed
 PM_RUN_CYC ------------- Run cycles

 Group 91:
 PM_MRK_GRP_DISP ----- Marked group dispatched
 PM_MRK_GRP_BR_REDIR - Group experienced marked branch redirect
 PM_FXU_FIN ---------- FXU produced a result
 PM_FXLS_FULL_CYC ---- Cycles FXLS queue is full
 PM_INST_CMPL -------- Instructions completed
 PM_RUN_CYC ---------- Run cycles

 Group 92:
 PM_3INST_CLB_CYC - Cycles 3 instructions in CLB
 PM_4INST_CLB_CYC - Cycles 4 instructions in CLB
 PM_FXU0_FIN ------ FXU0 produced a result
 PM_FXU1_FIN ------ FXU1 produced a result
 PM_INST_CMPL ----- Instructions completed
 PM_RUN_CYC ------- Run cycles

 Group 93:
 PM_THRD_PRIO_4_CYC --------- Cycles thread running at priority level 4
 PM_THRD_PRIO_7_CYC --------- Cycles thread running at priority level 7
 PM_THRD_PRIO_DIFF_0_CYC ---- Cycles no thread priority difference
 PM_THRD_PRIO_DIFF_1or2_CYC - Cycles thread priority difference is 1 or 2
 PM_INST_CMPL --------------- Instructions completed
 PM_RUN_CYC ----------------- Run cycles

 Group 94:
 PM_THRD_PRIO_3_CYC --------- Cycles thread running at priority level 3
 PM_THRD_PRIO_6_CYC --------- Cycles thread running at priority level 6
 PM_THRD_PRIO_DIFF_3or4_CYC - Cycles thread priority difference is 3 or 4
 PM_THRD_PRIO_DIFF_5or6_CYC - Cycles thread priority difference is 5 or 6
 PM_INST_CMPL --------------- Instructions completed
 PM_RUN_CYC ----------------- Run cycles

 Group 95:
 PM_THRD_PRIO_2_CYC -------------- Cycles thread running at priority level 2
 PM_THRD_PRIO_5_CYC -------------- Cycles thread running at priority level 5
 PM_THRD_PRIO_DIFF_minus1or2_CYC - Cycles thread priority difference is -1 or -2
 PM_THRD_PRIO_DIFF_minus3or4_CYC - Cycles thread priority difference is -3 or -4
 PM_INST_CMPL -------------------- Instructions completed
 PM_RUN_CYC ---------------------- Run cycles

 Group 96:
 PM_THRD_PRIO_1_CYC -------------- Cycles thread running at priority level 1
 PM_HV_CYC ----------------------- Hypervisor Cycles
 PM_THRD_PRIO_DIFF_minus5or6_CYC - Cycles thread priority difference is -5 or -6
 PM_IOPS_CMPL -------------------- IOPS instructions completed
 PM_INST_CMPL -------------------- Instructions completed
 PM_RUN_CYC ---------------------- Run cycles

 Group 97:
 PM_THRD_ONE_RUN_CYC ------- One of the threads in run cycles
 PM_THRD_GRP_CMPL_BOTH_CYC - Cycles group completed by both threads
 PM_IOPS_CMPL -------------- IOPS instructions completed
 PM_THRD_L2MISS_BOTH_CYC --- Cycles both threads in L2 misses
 PM_INST_CMPL -------------- Instructions completed
 PM_RUN_CYC ---------------- Run cycles

 Group 98:
 PM_SNOOP_TLBIE - Snoop TLBIE
 PM_IOPS_CMPL --- IOPS instructions completed
 PM_THRD_SEL_T0 - Decode selected thread 0
 PM_THRD_SEL_T1 - Decode selected thread 1
 PM_INST_CMPL --- Instructions completed
 PM_RUN_CYC ----- Run cycles

 Group 99:
 PM_IOPS_CMPL --------------- IOPS instructions completed
 PM_0INST_CLB_CYC ----------- Cycles no instructions in CLB
 PM_THRD_SEL_OVER_CLB_EMPTY - Thread selection overides caused by CLB empty
 PM_THRD_SEL_OVER_GCT_IMBAL - Thread selection overides caused by GCT imbalance
 PM_INST_CMPL --------------- Instructions completed
 PM_RUN_CYC ----------------- Run cycles

 Group 100:
 PM_IOPS_CMPL -------------- IOPS instructions completed
 PM_CYC -------------------- Processor cycles
 PM_THRD_SEL_OVER_ISU_HOLD - Thread selection overides caused by ISU holds
 PM_THRD_SEL_OVER_L2MISS --- Thread selection overides caused by L2 misses
 PM_INST_CMPL -------------- Instructions completed
 PM_RUN_CYC ---------------- Run cycles

 Group 101:
 PM_FAB_CMD_ISSUED ----- Fabric command issued
 PM_FAB_DCLAIM_ISSUED -- dclaim issued
 PM_FAB_CMD_RETRIED ---- Fabric command retried
 PM_FAB_DCLAIM_RETRIED - dclaim retried
 PM_INST_CMPL ---------- Instructions completed
 PM_RUN_CYC ------------ Run cycles

 Group 102:
 PM_FAB_P1toM1_SIDECAR_EMPTY ----- P1 to M1 sidecar empty
 PM_FAB_HOLDtoVN_EMPTY ----------- Hold buffer to VN empty
 PM_FAB_P1toVNorNN_SIDECAR_EMPTY - P1 to VN/NN sidecar empty
 PM_FAB_VBYPASS_EMPTY ------------ Vertical bypass buffer empty
 PM_INST_CMPL -------------------- Instructions completed
 PM_RUN_CYC ---------------------- Run cycles

 Group 103:
 PM_FAB_PNtoNN_DIRECT -- PN to NN beat went straight to its destination
 PM_FAB_PNtoVN_DIRECT -- PN to VN beat went straight to its destination
 PM_FAB_PNtoNN_SIDECAR - PN to NN beat went to sidecar first
 PM_FAB_PNtoVN_SIDECAR - PN to VN beat went to sidecar first
 PM_INST_CMPL ---------- Instructions completed
 PM_RUN_CYC ------------ Run cycles

 Group 104:
 PM_FAB_M1toP1_SIDECAR_EMPTY ----- M1 to P1 sidecar empty
 PM_FAB_HOLDtoNN_EMPTY ----------- Hold buffer to NN empty
 PM_EE_OFF ----------------------- Cycles MSR(EE) bit off
 PM_FAB_M1toVNorNN_SIDECAR_EMPTY - M1 to VN/NN sidecar empty
 PM_INST_CMPL -------------------- Instructions completed
 PM_RUN_CYC ---------------------- Run cycles

 Group 105:
 PM_SNOOP_RD_RETRY_QFULL ----- Snoop read retry due to read queue full
 PM_SNOOP_DCLAIM_RETRY_QFULL - Snoop dclaim/flush retry due to 
					write/dclaim queues full
 PM_SNOOP_WR_RETRY_QFULL ----- Snoop read retry due to read queue full
 PM_SNOOP_PARTIAL_RTRY_QFULL - Snoop partial write retry due to 
					partial-write queues full
 PM_INST_CMPL ---------------- Instructions completed
 PM_RUN_CYC ------------------ Run cycles

 Group 106:
 PM_SNOOP_RD_RETRY_RQ -- Snoop read retry due to collision with 
				active read queue
 PM_SNOOP_RETRY_1AHEAD - Snoop retry due to one ahead collision
 PM_SNOOP_RD_RETRY_WQ -- Snoop read retry due to collision with 
				active write queue
 PM_IOPS_CMPL ---------- IOPS instructions completed
 PM_INST_CMPL ---------- Instructions completed
 PM_RUN_CYC ------------ Run cycles

 Group 107:
 PM_SNOOP_WR_RETRY_RQ --- Snoop write/dclaim retry due to 
		collision with active read queue
 PM_MEM_HI_PRIO_WR_CMPL - High priority write completed
 PM_SNOOP_WR_RETRY_WQ --- Snoop write/dclaim retry due to collision 
				with active write queue
 PM_MEM_LO_PRIO_WR_CMPL - Low priority write completed
 PM_INST_CMPL ----------- Instructions completed
 PM_RUN_CYC ------------- Run cycles

 Group 108:
 PM_SNOOP_PW_RETRY_RQ ----- Snoop partial-write retry due to collision 
				with active read queue
 PM_MEM_HI_PRIO_PW_CMPL --- High priority partial-write completed
 PM_SNOOP_PW_RETRY_WQ_PWQ - Snoop partial-write retry due to collision with 
				active write or partial-write queue
 PM_MEM_LO_PRIO_PW_CMPL --- Low priority partial-write completed
 PM_INST_CMPL ------------- Instructions completed
 PM_RUN_CYC --------------- Run cycles

 Group 109:
 PM_MEM_RQ_DISP ----------- Memory read queue dispatched
 PM_MEM_RQ_DISP_BUSY8to15 - Memory read queue dispatched with 8-15 queues busy
 PM_MEM_RQ_DISP_BUSY1to7 -- Memory read queue dispatched with 1-7 queues busy
 PM_EE_OFF_EXT_INT -------- Cycles MSR(EE) bit off and external 
				interrupt pending
 PM_INST_CMPL ------------- Instructions completed
 PM_RUN_CYC --------------- Run cycles

 Group 110:
 PM_MEM_READ_CMPL --------- Memory read completed or canceled
 PM_MEM_FAST_PATH_RD_CMPL - Fast path memory read completed
 PM_MEM_SPEC_RD_CANCEL ---- Speculative memory read canceled
 PM_EXT_INT --------------- External interrupts
 PM_INST_CMPL ------------- Instructions completed
 PM_RUN_CYC --------------- Run cycles

 Group 111:
 PM_MEM_WQ_DISP_WRITE ----- Memory write queue dispatched due to write
 PM_MEM_WQ_DISP_BUSY1to7 -- Memory write queue dispatched with 1-7 queues busy
 PM_MEM_WQ_DISP_DCLAIM ---- Memory write queue dispatched due to dclaim/flush
 PM_MEM_WQ_DISP_BUSY8to15 - Memory write queue dispatched with 8-15 queues busy
 PM_INST_CMPL ------------- Instructions completed
 PM_RUN_CYC --------------- Run cycles

 Group 112:
 PM_MEM_PWQ_DISP ---------- Memory partial-write queue dispatched
 PM_MEM_PWQ_DISP_BUSY2or3 - Memory partial-write queue dispatched 
				with 2-3 queues busy
 PM_MEM_PW_GATH ----------- Memory partial-write gathered
 PM_MEM_PW_CMPL ----------- Memory partial-write completed
 PM_INST_CMPL ------------- Instructions completed
 PM_RUN_CYC --------------- Run cycles

 Group 113:
 PM_MRK_GRP_DISP --- Marked group dispatched
 PM_MRK_IMR_RELOAD - Marked IMR reloaded
 PM_THRESH_TIMEO --- Threshold timeout
 PM_MRK_LSU_FIN ---- Marked instruction LSU processing finished
 PM_INST_CMPL ------ Instructions completed
 PM_RUN_CYC -------- Run cycles

 Group 114:
 PM_MRK_GRP_DISP --- Marked group dispatched
 PM_MRK_ST_MISS_L1 - Marked L1 D cache store misses
 PM_MRK_INST_FIN --- Marked instruction finished
 PM_MRK_GRP_CMPL --- Marked group completed
 PM_INST_CMPL ------ Instructions completed
 PM_RUN_CYC -------- Run cycles

 Group 115:
 PM_MRK_GRP_ISSUED ------ Marked group issued
 PM_MRK_BRU_FIN --------- Marked instruction BRU processing finished
 PM_MRK_L1_RELOAD_VALID - Marked L1 reload data source valid
 PM_MRK_GRP_IC_MISS ----- Group experienced marked I cache miss
 PM_INST_CMPL ----------- Instructions completed
 PM_RUN_CYC ------------- Run cycles

 Group 116:
 PM_MRK_DATA_FROM_L2 ---------- Marked data loaded from L2
 PM_MRK_DATA_FROM_L2_CYC ------ Marked load latency from L2
 PM_MRK_DATA_FROM_L25_MOD ----- Marked data loaded from L2.5 modified
 PM_MRK_DATA_FROM_L25_MOD_CYC - Marked load latency from L2.5 modified
 PM_INST_CMPL ----------------- Instructions completed
 PM_RUN_CYC ------------------- Run cycles

 Group 117:
 PM_MRK_DATA_FROM_L25_SHR ----- Marked data loaded from L2.5 shared
 PM_MRK_DATA_FROM_L25_SHR_CYC - Marked load latency from L2.5 shared
 PM_IOPS_CMPL ----------------- IOPS instructions completed
 PM_FPU_FIN ------------------- FPU produced a result
 PM_INST_CMPL ----------------- Instructions completed
 PM_RUN_CYC ------------------- Run cycles

 Group 118:
 PM_MRK_DATA_FROM_L3 ---------- Marked data loaded from L3
 PM_MRK_DATA_FROM_L3_CYC ------ Marked load latency from L3
 PM_MRK_DATA_FROM_L35_MOD ----- Marked data loaded from L3.5 modified
 PM_MRK_DATA_FROM_L35_MOD_CYC - Marked load latency from L3.5 modified
 PM_INST_CMPL ----------------- Instructions completed
 PM_RUN_CYC ------------------- Run cycles

 Group 119:
 PM_MRK_DATA_FROM_RMEM --------- Marked data loaded from remote memory
 PM_MRK_DATA_FROM_L275_SHR_CYC - Marked load latency from L2.75 shared
 PM_MRK_DATA_FROM_L275_SHR ----- Marked data loaded from L2.75 shared
 PM_MRK_DATA_FROM_RMEM_CYC ----- Marked load latency from remote memory
 PM_INST_CMPL ------------------ Instructions completed
 PM_RUN_CYC -------------------- Run cycles

 Group 120:
 PM_MRK_DATA_FROM_L35_SHR ----- Marked data loaded from L3.5 shared
 PM_MRK_DATA_FROM_L35_SHR_CYC - Marked load latency from L3.5 shared
 PM_MRK_DATA_FROM_LMEM -------- Marked data loaded from local memory
 PM_MRK_DATA_FROM_LMEM_CYC ---- Marked load latency from local memory
 PM_INST_CMPL ----------------- Instructions completed
 PM_RUN_CYC ------------------- Run cycles

 Group 121:
 PM_MRK_DATA_FROM_L275_MOD ----- Marked data loaded from L2.75 modified
 PM_MRK_DATA_FROM_L275_SHR_CYC - Marked load latency from L2.75 shared
 PM_IOPS_CMPL ------------------ IOPS instructions completed
 PM_MRK_DATA_FROM_L275_MOD_CYC - Marked load latency from L2.75 modified
 PM_INST_CMPL ------------------ Instructions completed
 PM_RUN_CYC -------------------- Run cycles

 Group 122:
 PM_MRK_DATA_FROM_L375_MOD ----- Marked data loaded from L3.75 modified
 PM_MRK_DATA_FROM_L375_SHR_CYC - Marked load latency from L3.75 shared
 PM_MRK_DATA_FROM_L375_SHR ----- Marked data loaded from L3.75 shared
 PM_MRK_DATA_FROM_L375_MOD_CYC - Marked load latency from L3.75 modified
 PM_INST_CMPL ------------------ Instructions completed
 PM_RUN_CYC -------------------- Run cycles

 Group 123:
 PM_MRK_DTLB_MISS_4K -- Marked Data TLB misses for 4K page
 PM_MRK_DTLB_MISS_16M - Marked Data TLB misses for 16M page
 PM_MRK_DTLB_MISS ----- Marked Data TLB misses
 PM_MRK_DSLB_MISS ----- Marked Data SLB misses
 PM_INST_CMPL --------- Instructions completed
 PM_RUN_CYC ----------- Run cycles

 Group 124:
 PM_MRK_DTLB_REF_4K -- Marked Data TLB reference for 4K page
 PM_MRK_DTLB_REF_16M - Marked Data TLB reference for 16M page
 PM_IOPS_CMPL -------- IOPS instructions completed
 PM_MRK_DSLB_MISS ---- Marked Data SLB misses
 PM_INST_CMPL -------- Instructions completed
 PM_RUN_CYC ---------- Run cycles

 Group 125:
 PM_MRK_LD_MISS_L1 -- Marked L1 D cache load misses
 PM_IOPS_CMPL ------- IOPS instructions completed
 PM_MRK_ST_CMPL_INT - Marked store completed with intervention
 PM_MRK_CRU_FIN ----- Marked instruction CRU processing finished
 PM_INST_CMPL ------- Instructions completed
 PM_RUN_CYC --------- Run cycles

 Group 126:
 PM_MRK_ST_CMPL ------- Marked store instruction completed
 PM_MRK_ST_MISS_L1 ---- Marked L1 D cache store misses
 PM_MRK_LSU_FLUSH_UST - Marked unaligned store flushes
 PM_MRK_LSU_FLUSH_ULD - Marked unaligned load flushes
 PM_INST_CMPL --------- Instructions completed
 PM_RUN_CYC ----------- Run cycles

 Group 127:
 PM_MRK_STCX_FAIL - Marked STCX failed
 PM_MRK_ST_GPS ---- Marked store sent to GPS
 PM_MRK_FPU_FIN --- Marked instruction FPU processing finished
 PM_MRK_GRP_TIMEO - Marked group completion timeout
 PM_INST_CMPL ----- Instructions completed
 PM_RUN_CYC ------- Run cycles

 Group 128:
 PM_DATA_FROM_L2 - Data loaded from L2
 PM_INST_FROM_L1 - Instruction fetched from L1
 PM_ST_REF_L1 ---- L1 D cache store references
 PM_LD_REF_L1 ---- L1 D cache load references
 PM_INST_CMPL ---- Instructions completed
 PM_RUN_CYC ------ Run cycles

 Group 129:
 PM_DATA_FROM_L3 --- Data loaded from L3
 PM_DATA_FROM_LMEM - Data loaded from local memory
 PM_ST_REF_L1 ------ L1 D cache store references
 PM_LD_REF_L1 ------ L1 D cache load references
 PM_INST_CMPL ------ Instructions completed
 PM_RUN_CYC -------- Run cycles

 Group 130:
 PM_ITLB_MISS - Instruction TLB misses
 PM_DTLB_MISS - Data TLB misses
 PM_ST_REF_L1 - L1 D cache store references
 PM_LD_REF_L1 - L1 D cache load references
 PM_INST_CMPL - Instructions completed
 PM_RUN_CYC --- Run cycles

 Group 131:
 PM_DATA_FROM_L3 --- Data loaded from L3
 PM_DATA_FROM_LMEM - Data loaded from local memory
 PM_LD_MISS_L1 ----- L1 D cache load misses
 PM_ST_MISS_L1 ----- L1 D cache store misses
 PM_INST_CMPL ------ Instructions completed
 PM_RUN_CYC -------- Run cycles

 Group 132:
 PM_CYC --------- Processor cycles
 PM_IC_PREF_REQ - Instruction prefetch requests
 PM_L1_PREF ----- L1 cache data prefetches
 PM_L2_PREF ----- L2 cache prefetches
 PM_INST_CMPL --- Instructions completed
 PM_RUN_CYC ----- Run cycles

 Group 133:
 PM_BR_UNCOND -- Unconditional branch
 PM_BR_PRED_TA - A conditional branch was predicted, target prediction
 PM_BR_PRED_CR - A conditional branch was predicted, CR prediction
 PM_BR_ISSUED -- Branches issued
 PM_INST_CMPL -- Instructions completed
 PM_RUN_CYC ---- Run cycles

 Group 134:
 PM_FPU0_STALL3 - FPU0 stalled in pipe3
 PM_FPU1_STALL3 - FPU1 stalled in pipe3
 PM_FPU0_FIN ---- FPU0 produced a result
 PM_FPU0_FPSCR -- FPU0 executed FPSCR instruction
 PM_INST_CMPL --- Instructions completed
 PM_RUN_CYC ----- Run cycles

 Group 135:
 PM_FPU0_FMA -------- FPU0 executed multiply-add instruction
 PM_FPU1_FMA -------- FPU1 executed multiply-add instruction
 PM_FPU0_FRSP_FCONV - FPU0 executed FRSP or FCONV instructions
 PM_FPU1_FRSP_FCONV - FPU1 executed FRSP or FCONV instructions
 PM_INST_CMPL ------- Instructions completed
 PM_RUN_CYC --------- Run cycles

 Group 136:
 PM_FPU0_1FLOP - FPU0 executed add, mult, sub, cmp or sel instruction
 PM_FPU1_1FLOP - FPU1 executed add, mult, sub, cmp or sel instruction
 PM_FPU0_FIN --- FPU0 produced a result
 PM_FPU1_FIN --- FPU1 produced a result
 PM_INST_CMPL -- Instructions completed
 PM_RUN_CYC ---- Run cycles

 Group 137:
 PM_FPU_1FLOP - FPU executed one flop instruction 
 PM_FPU_FMA --- FPU executed multiply-add instruction
 PM_ST_REF_L1 - L1 D cache store references
 PM_LD_REF_L1 - L1 D cache load references
 PM_INST_CMPL - Instructions completed
 PM_RUN_CYC --- Run cycles

 Group 138:
 PM_FPU_SINGLE - FPU executed single precision instruction
 PM_FPU_STF ---- FPU executed store instruction
 PM_FPU0_FIN --- FPU0 produced a result
 PM_FPU1_FIN --- FPU1 produced a result
 PM_INST_CMPL -- Instructions completed
 PM_RUN_CYC ---- Run cycles

 Group 139:
 PM_FPU_FDIV ------- FPU executed FDIV instruction
 PM_FPU_FSQRT ------ FPU executed FSQRT instruction
 PM_FPU_FRSP_FCONV - FPU executed FRSP or FCONV instructions
 PM_FPU_FIN -------- FPU produced a result
 PM_INST_CMPL ------ Instructions completed
 PM_RUN_CYC -------- Run cycles

 Group 140:
 PM_FPU_1FLOP --- FPU executed one flop instruction 
 PM_CYC --------- Processor cycles
 PM_MRK_FPU_FIN - Marked instruction FPU processing finished
 PM_FPU_FIN ----- FPU produced a result
 PM_INST_CMPL --- Instructions completed
 PM_RUN_CYC ----- Run cycles

 Group 141:
 PM_CYC ------- Processor cycles
 PM_FPU_STF --- FPU executed store instruction
 PM_INST_DISP - Instructions dispatched
 PM_LSU_LDF --- LSU executed Floating Point load instruction
 PM_INST_CMPL - Instructions completed
 PM_RUN_CYC --- Run cycles

 Group 142:
 PM_CYC --------------- Processor cycles
 PM_INST_DISP_ATTEMPT - Instructions dispatch attempted
 PM_LD_MISS_L1 -------- L1 D cache load misses
 PM_ST_MISS_L1 -------- L1 D cache store misses
 PM_INST_CMPL --------- Instructions completed
 PM_RUN_CYC ----------- Run cycles

 Group 143:
 PM_TLB_MISS -- TLB misses
 PM_CYC ------- Processor cycles
 PM_ST_REF_L1 - L1 D cache store references
 PM_LD_REF_L1 - L1 D cache load references
 PM_INST_CMPL - Instructions completed
 PM_RUN_CYC --- Run cycles

 Group 144:
 PM_CYC --------- Processor cycles
 PM_MRK_FXU_FIN - Marked instruction FXU processing finished
 PM_FXU_FIN ----- FXU produced a result
 PM_FXU0_FIN ---- FXU0 produced a result
 PM_INST_CMPL --- Instructions completed
 PM_RUN_CYC ----- Run cycles

 Group 145:
 PM_INST_CMPL -- Instructions completed
 PM_CYC -------- Processor cycles
 PM_LD_MISS_L1 - L1 D cache load misses
 PM_DC_INV_L2 -- L1 D cache entries invalidated from L2
 PM_INST_CMPL -- Instructions completed
 PM_RUN_CYC ---- Run cycles

 Group 146:
 PM_MRK_LD_MISS_L1 - Marked L1 D cache load misses
 PM_INST_CMPL ------ Instructions completed
 PM_ST_REF_L1 ------ L1 D cache store references
 PM_LD_REF_L1 ------ L1 D cache load references
 PM_INST_CMPL ------ Instructions completed
 PM_RUN_CYC -------- Run cycles

 Group 147:
 PM_MRK_ST_MISS_L1 - Marked L1 D cache store misses
 PM_INST_CMPL ------ Instructions completed
 PM_INST_DISP ------ Instructions dispatched
 PM_ST_MISS_L1 ----- L1 D cache store misses
 PM_INST_CMPL ------ Instructions completed
 PM_RUN_CYC -------- Run cycles

LBNL Home
Page last modified: Fri, 30 Jun 2006 21:14:30 GMT
Page URL: http://www.nersc.gov/nusers/resources/software/tools/ihpct.php
Web contact: webmaster@nersc.gov
Computing questions: consult@nersc.gov

Privacy and Security Notice
DOE Office of Science