國立中央大學資訊工程學系 104 學年度第一學期博士班資格考試題紙

## <u>科目: 計算機結構 (Computer Architecture) 第一頁 共三頁(page 1 of 3)</u>

- 1. (8%) The information of a virtual memory system is assumed to be:
  - Page size: 512 words

С

- Number of virtual pages: 8
- Number of physical pages: 4
- The current page table is

| VPN | 0 | 1 | 2    | 3    | 4 | 5    | 6 | 7    |
|-----|---|---|------|------|---|------|---|------|
| PPN | 2 | 1 | NULL | NULL | 3 | NULL | 0 | NULL |

For the following (decimal) virtual word addresses: VA1: 2559, VA2: 1024, VA3: 500, VA4:

3632, what are their corresponding physical addresses, PA1, PA2, PA3, PA4?

- 2. (8%) A cache with data size 1Mbytes contains 65536 blocks and is 16-way set associative. The address 0x21E9A3B2 is accessed and is a hit in this cache. What data, i.e., the range of memory address, will surely be found in the cache? What is its "tag"?
- 3. (10%) Given cache, TLB, page table, main memory and the secondary storage (disk), describe the process of a memory access in a systematical way. (10 pts.)
- 4. (8%) Two computers, M1 and M2. M1 has a clock rate of 2GHz and M2 has a clock rate of 1.6GHz. The following shows the CPI of 3 instruction classes:

| Instruction Class                                                                | CPI for N | M1 CP | I for M2 |  |  |  |
|----------------------------------------------------------------------------------|-----------|-------|----------|--|--|--|
| А                                                                                | 3         | 2     |          |  |  |  |
| В                                                                                | 4         | 4     |          |  |  |  |
| С                                                                                | 4         | 3     |          |  |  |  |
| There are 3 compilers: C1, C2, C3 that produce the instruction mixes as follows: |           |       |          |  |  |  |
| Instruction Class                                                                | C1        | C2    | C3       |  |  |  |
| А                                                                                | 30%       | 25%   | 50%      |  |  |  |
| В                                                                                | 50%       | 25%   | 30%      |  |  |  |
|                                                                                  |           |       |          |  |  |  |

50%

20%

Assume that any of the three compilers (C1, C2, C3) can be used in M1 and M2. We also assume that the numbers of instructions from C1 and C2 are the same for both M1 and M2 and equal to 1000, while the number of instructions from C3 is 1050 (for both M1 and M2). Please calculate the execution time to determine the best combinations of compiler + computer.

5. (8%) In memory hierarchy, please describe the positive and negative effects of (a) increasing cache size (b) increasing associativity (c) increasing block size.

20%

6. (8%) The CPI of a machine without any memory stall is 1.1. Assume that the instruction mix is 50% ALU, 30% data memory-access and 20% control. Now, 10% of data memory-access instructions will cause misses with the penalty 50 clock cycles. Besides, 1% of miss will also happen in the instruction memory with the penalty 50 clock cycles. Please calculate the resulting CPI.

背面還有 Please Turn Over

## <u>科目: 計算機結構 (Computer Architecture) 第二頁 共三頁(page 2of 3)</u>

7. (10%) Suppose we have the following measurements:

Frequency of Floating Point (FP) operations = 20%

Average CPI of FP operations = 4

Average CPI of other instructions = 1.5

Frequency of Floating Point Square Root (FPSQR)=5%

CPI of FPSQR = 20

There are two design alternatives to improve the system.

Alternative A: to decrease the CPI of FPSQR to 2

Alternative B: to decrease the average CPI of all FP operations to 2. Compare these two design alternatives by calculating the average CPI of two improvement alternatives. Which alternative is better? Use **Amdahl's Law** to compute the speedup of the two alternatives over the system before improvement.

- 8. (10%) Briefly and clearly explain how a (m,n) correlating branch predictor works? For a (3,2) correlating branch predictor, how many branch-selected entries are there in the predictor with a total 16K bits in the prediction buffer. How many bits from the last bits of branch address are used to select the prediction entries?
- 9. (10%) Consider the following code sequence.

for (i=1; i<=100; i=i+1){

}

| Z[i] = X[i] / c;   | /*S1*/ |
|--------------------|--------|
| X[i] = Z[i] + c;   | /*S2*/ |
| Z[i] = X[i] + c;   | /*S3*/ |
| Y[i] = c - Y[i-1]; | /*S4*/ |
|                    |        |

(a) Find all the true dependences, output dependences, and anti-dependences

(b) Is there a loop-carried dependence?

10. (10%) Consider the following code sequence. F1- F10 are registers. Assume the following execution time for different operations: MUL takes 8 cycles, DIV takes 20 cycles, and ADD and SUB both take 1 cycle to complete. The instructions are issued into the pipeline in order, but out of order execution and completion is allowed. What hazards will be encountered when executing this code sequence?MUL F4, F5, F6

SUB F2, F7, F8

國立中央大學資訊工程學系 104 學年度第一學期博士班資格考試題紙

## <u>科目: 計算機結構 (Computer Architecture) 第三頁 共三頁(page 3of 3)</u>

DIV F10, F4, F5 ADD F5, F8, F2

11. (10%) For multi-processor machines, cache coherence is an important issue. The state transition diagrams of a write invalidate, cache coherence protocol for a write-back cache are illustrated below. Suppose two processors P1 and P2 are both in Shared state for a data block X. P1 writes X at time T1. Then, P2 writes X at time T2. Afterwards, P1 reads X at time T3. Please write down and explain the state transitions, bus activities, and the actions of P1 and P2 at time T1, T2, and T3.

