Part 1: Adding Counters in the Simulator (10 points)
Overview
In this assignment, you will extend your simulator to calculate power and energy.
To simplify the assignment, we measure power and energy consumption for some important hardware structures only.
The basic approach to calculate power consumption is that you add counters in your simulators. After the end of simulation, you use the collected counter values to calculate power and energy consumption.
You have to download McPAT from McPAT website or here .
McPAT is a multicore power and area simulator.
You have to add counters to your simulator. You also need to provide the output values to McPAT as an XML file, since McPAT takes XML files as input. You do not have to generate an XML file directly from your simulator. Your simulator should generate a text file, power_counters.txt, containing the counter values, and then you might manually type the values into an XML file. You can use Excel or other XML editors to edit the XML file. The format of power_counters.txt is not specified.
Since we are building a simplified powr model, we do not generate many counter values. For counter values that are not mentioned in this assignment, you just use the default values in the original XML file.
Here are the counters that you have to add in your simulator. We also show the how these counters map to attributes (counters) in a McPAT XML file.
LOOP; LD F0, 0 (R1) Add F4, F0, F2 Mul F5, F4, F3 Store F5, 0 (R1) Add R1, R1, #-8 Br R1, R2, LOOP
LOOP; Store F5, 0(R1) Mul F5, F4, F3 Add F4, F0, F2 LD F0, -24(R1) Add R1, R1, #-8 Br R1, R2, LOOPWith all start-up/clean-up code
LD F0, 0(R1) Add F4, F0, F2 Mul F5, F4, F3 LD F0, -8(R1) Add F4, F0, F2 LD F0, -16(R1) LOOP; Store F5, 0(R1) Mul F5, F4, F3 Add F4, F0, F2 LD F0, -24 (R1) Add R1, R1, #-8 Br R1, R2, LOOP Store F5, -8(R1) Mul F5, F4, F3 Store F5, -16(R1) Add F4, F0, F2 Mul F5, F4, F3 Store F5, -24(R1)
VLIW SuperScalar ---- ----------- statically combined dynamically combined more than one instruction more than one instruction Static Issue Dynamic Issue Compiler optimized Runtime optimized in hardware Can't react to latencies Can react to latencies in caches, memory, etc Larger view of program may Smaller window for optimization lead to better optimization Less complex hardware, less Complex hardware, more power power consumed And more..
0xa000 ADD R0, R0, R2 0xa004 ADD R1, R1, R0 0xa008 FADD F3, F2, F3 0xa00B ADD R1, R1, R0 0xa010 FADD F3, F2, F3 0xa014 LD R2, MEM[R6] 0xa008 ADD R1, R1, R0 0xa01B FMUL F1, F5, F4 0xa020 LD F2, MEM[R0] 0xa024 FADD F4, F1, F2 0xa028 LD F3, MEM[R0] 0xa030 STORE MEM[R5], F4 0xa034 FADD F4, F3, F4 0xa038 ADD R2, R2, R0 0xa03B BR R2, 0x8000ADD: simple ALU instruction (1-cycle latency)
I1: 0xa000 ADD R0, R0, R2 I2: 0xa004 ADD R1, R1, R0 I3: 0xa008 FADD F3, F2, F3 I4: 0xa00B ADD R1, R1, R0 I5: 0xa010 FADD F3, F2, F3 I6: 0xa014 LD R2, MEM[R6] I7: 0xa008 ADD R1, R1, R0 I8: 0xa01B FMUL F1, F5, F4 I9: 0xa020 LD F2, MEM[R0] IA: 0xa024 FADD F4, F1, F2 IB: 0xa028 LD F3, MEM[R0] IC: 0xa030 STORE MEM[R5], F4 ID: 0xa034 FADD F4, F3, F4 IE: 0xa038 ADD R2, R2, R0 IF: 0xa03B BR R2, 0x8000 ADD: simple ALU instruction (1-cycle latency) FADD: floating point instruction (1-cycle latency) FMUL: floating point instruction (2-cycle latency) LD: load instruction (2-cycle latency) STORE: store instruction (1-cycle latency) BR: branch instruction (1-cycle latency) Rename I9:F2 to F6 IA:F2 to F6 IB:F3 to F7 ID:F3 to F7 Now the VLIW schedule looks like: ----------------------------------------------- | I1 | I6 | I8 | ----------------------------------------------- | I9 | IB | I2 | ----------------------------------------------- | I4 | I3 | IE | ----------------------------------------------- | I7 | IA | I5 | ----------------------------------------------- | IC | ID | IF | -----------------------------------------------