As we embark on an interesting 2010 and even more interesting 2011, it is clear that the age of clock speed is behind us; it’s all about cores for the next few years until “Fusion-based” computing hits the server market.
And let’s be clear – AMD is designing its AMD Opteron™ processors to have the cores that customers need to drive their enterprise applications.
To begin with, let’s dissect the difference between threads and cores. Cores are physical blocks of logic in the processor that can run applications. In the old world, it was simple, one CPU = one core. Today, Six- Core AMD Opteron processors (formally code named “Istanbul”) are quickly becoming the mainstream and by the end of this quarter, eight and twelve will be the operative core counts per processor.
Threads on the other hand aren’t physical – they are software-generated tasks that can execute independently. In order for a program to run on multiple cores, you need to thread the program, or run multiple tasks simultaneously. The operating system takes the threads spawned by the program and schedules them to run on available cores.
So – cores are like bikes, threads are the riders. Running more threads increases throughput for applications as long as you have available cores. If you have threads waiting to be scheduled and no available cores – you have a bottleneck.
There are two major strategies to getting more efficiency out of your server. The first is the simple, straightforward way – feed that application more cores. That is why you are seeing 4+ cores in processors today. Nobody will argue against the point that giving applications more real cores will help increase overall throughput. However, some see another answer and wonder why AMD has chosen not to go down that path.
Simultaneous Multithreading (SMT) is a method for squeezing two threads into one core. SMT was first researched by IBM in 1968 and introduced to x86 processors by Intel in 2002 under the name of HyperThreading. That sounds great, in concept. Carpooling is more efficient than giving everyone their own car, right?
Well, car pooling falls apart if the two employees live too far from each other and the office is close. If Bob lives 3 miles north of the office and Mary lives 2 miles south of the office, it really doesn’t make sense for them to carpool. In the bike and rider example above, think of SMT as a tandem bike. Yes it can move two riders, but not as quickly or efficiently as two separate bikes.
The challenge with SMT is that as a technology, it forces two threads to share a single physical core.
Consider a software thread running on a hardware thread, where a second runnable software thread is then executed on another hardware thread on the same core. This could be triggered by an event like a stall due to a cache miss. The second thread does not necessarily thrash the cache; in fact there are situations where the cache lines used by both threads are shared resulting in little cache churn. However, in many cases the second thread causes the cache to be refilled with its own data, requiring the first thread to refill the cache in turn when it resumes execution. This competition for shared core resources on a processor with SMT is what can result in diminishing returns for SMT based processor, or worse, in situations with negative performance characteristics. (This paragraph was updated for clarity and to correct a statement that could have been misinterpreted…)
Generally speaking, SMT can give applications as much as an extra 10-20% increase in performance, which feels like that mythical “free lunch” that you were always told doesn’t exist. Well, don’t start eating yet, because there is a dark side to SMT. What if adding that extra thread actually decreased your throughput? What if 8 threads on 4 cores provided worse throughput than 4 threads on 4 cores?
Here are a few examples of opinions on the other side of the SMT discussion:
* A consultant who deals with Cognos, a leading BI software by IBM, recommend disabling HyperThreading because it “frequently degrades performance and proves unstable.”
* Microsoft recommends turning off HyperThreading when running PeopleSoft applications because “our lab testing has shown little or no improvement.”
* A Microsoft TechNet article recommends disabling Hyper-threading for production Exchange servers and “only enabled if absolutely necessary as a temporary measure to increase CPU capacity until additional hardware can be obtained.”
* Advanced Clustering found when running High Performance Linpack (HPL) that “Using HT on the other hand causes a ~10% drop in performance compared to HT not being used.”
There are more examples, but the “free lunch” is obviously not quite as tasty as you might have originally expected.
So, if SMT (or “core sharing”) yields both positive and negative results, what is the better answer? How about more cores? When you add more cores, you add more throughput. Period.
When you run multiple threads over multiple cores, you can expect better performance, and that is the AMD strategy. With “Magny Cours” we’re planning 8 and 12 cores per processor running 8 and 12 threads, not 8 or 12 threads sharing 4 or 6 cores. No sharing needed, every thread can be as selfish as it needs to be. Then in 2011, we plan to introduce “Interlagos” and increase the core count again, to 12 and 16. With “Interlagos” we’re designing some shared components that help reduce power consumption and die size, but you won’t see us sharing integer pipelines, the “meat” of the core.
By keeping discrete integer cores, and delivering more of those cores per CPU, AMD is designing processors that are designed to help you get more throughput for your enterprise applications.
Here’s AMD’s Core Commitment for servers:
1. AMD is working to deliver more cores for your business critical applications and a wider choice of core configurations. From 4 cores through 12 cores per processor planned for 2010 and 6 to 16 cores planned in 2011, AMD is working to deliver more of the resources that you need to drive your business forward.
2. Our cores are real. Threads can run faster when they have their own core underneath rather than having to share. If you have to run 12 threads, we know you would rather have 12 cores with unfettered access than worry about sharing cores.
Of course there are those that can say “well, things like SMT can be implemented inexpensively and don’t consume that much power.” To those, I ask you, historically hasn’t AMD been the one committed to deliver better value and lower power? Why would we stray from our core principles?
If you can get all the cores you need at the price you need and the power envelope that you need, then why would you ever consider anything else? Why would you ever compromise? Have your cake, and eat it too. THAT is your free lunch. And it’s delicious.