Because it is easier to work with, and because we will deal with simple processors in this chapter, we use CPI. Designers sometimes also use instructions per clock (IPC), which is the inverse of CPI.

Consider our performance example on page 52, here modified to use measurements of the frequency of the instructions and of the instruction CPI values, which, in practice, are obtained by simulation or by hardware instrumentation.

Compare these two design alternatives using Cefotetan for Injection (Cefotetan)- FDA processor performance equation.

Answer First, observe that only the CPI changes; the clock rate and instruction count remain identical. In particular, it may be difficult to measure things such as the fraction of execution time for which a set of instructions is used. In practice, this would probably be computed by summing the product of the instruction count and the CPI for each of the instructions in the set.

Since the starting point is the individual instruction count and CPI measurements, the processor performance equation is incredibly useful. Using the processor performance equation as a design tool, we need to be able to measure the various factors.

For an existing processor, it is possible to obtain the execution time by measurement, and we know the default clock speed. The challenge lies in discovering the instruction count or the CPI. Most processors include counters for both instructions and clock cycles.

By periodically monitoring these counters, it is also possible to attach execution time and instruction count to segments of the code, which can be helpful to programmers trying to understand and tune the performance of an application. Often designers or programmers will want to understand performance at a more fine-grained level than what is available from the hardware counters.

For example, they may want to know why the CPI is what it is. In such cases, the simulation techniques used are like those for processors that are being designed. Techniques that help with energy efficiency, such as dynamic voltage frequency scaling and overclocking (see Section 1.

A simple approach is to turn off those features to make the results reproducible. In this section, we look at measures of performance and power-performance in small servers using the SPECpower benchmark.

To keep the price comparison fair, all are Dell PowerEdge servers. We selected a twosocket system-so 44 cores total-with 128 GB of ECC-protected 2400 MHz DDR4 DRAM. The next server is the PowerEdge C630, with the same processor, number of sockets, and DRAM. We calculated the cost of the processors by subtracting the cost of a second processor.

Similarly, we calculated the overall cost of memory by seeing what the cost of extra memory was. Hence the base cost of the server is adjusted by subtracting the estimated cost of the default processor and memory. Chapter 5 describes how these multisocket systems are connected together, and Chapter 6 describes how clusters are connected together. All are running the Oracle Java HotSpot version 1. Note that because of the forces of benchmarking (see Section 1.

The systems in Figure 1. Rather than run statically linked C programs of SPEC CPU, SPECpower uses a more modern software stack written in Java. It exercises not only the processor of the server, as does SPEC CPU, but also the caches, memory system, and even the multiprocessor interconnection system. In addition, it exercises the JVM, including the JIT runtime compiler and garbage collector, as well as portions of the underlying operating system.

As the last two rows of Figure 1. Therefore the single node R630 has the best power-performance. We call such errors fallacies. When discussing a fallacy, we try to give a counterexample. We also discuss pitfalls-easily made mistakes. Often pitfalls are violations of principles that are true in a limited context.

The purpose of these sections is to help you avoid making these errors in computers that you design.

Pitfall All exponential laws must come to an end. The first to go was Dennard scaling. Thus chips could be designed to operate faster and still use less power.

The threshold voltage was driven so low that static power became a significant fraction of overall power. The next deceleration was hard disk drives.



