If you’ve been around the block for a while, you’ve noticed that some database systems seem to run and run and run, while others seem to be the squeaky wheel on the bus. In trying to determine the real issue, after having checked for long running queries, huge insert statements and obvious blocking (sp_who2) you may want to take a close look at the wait stats.
What’s a wait stat? Well, SQL Server uses waits to tell you what resource is waiting for what resource, in performance of a particular function. I’ve leveraged a great deal from my other authors like Paul Randal (http://www.sqlskills.com/blogs/paul/wait-statistics-or-please-tell-me-where-it-hurts/) and he’s leveraged the work of his fellow SQL Server MVP’s.
In my personal experience, examining the SQL waits has been a darn expedient way to achieve better overall performance.
A lot has been written about CXPacket wait, and I think Brent Ozar made best explanations of the CXPacket wait type and the potential to solve those issues. At the last MS Ignite conference I had the pleasure of having a conversation with a very smart man, who’s name unfortunately escapes me and I’d like to be able to credit. He works on large implementations, something with over 1000 cores dedicated to SQL processing for HP. We attended a presentation on SQL optimization and tuning in Availability Groups in SQL 2014 and got to talking afterwards. What he explained was the CXPacket wait they were seeing and how they tuned MAXDOP became directly in correlation to the number of cores on a NUMA node. 4 cores meant a MAXDOP of 4, and how a corresponding increase in the Cost Threshold to Parallelism to something like 50 would decrease the CXPacket waits and allow the processing system more throughput.
These numbers were based on a configuration I was working with directly, but proved to be very close once I was testing in a loaded environment and monitoring the wait stats closely. Sure enough, the number of cores became a direct correlation with the best performance on queries with a parallel execution plan, and even more throughput came with increasing the cost threshold. Why is the number of cores on a single node important? The cache on the processor node is shared among the cores, so there’s no need to have a master thread to reassemble the query results and hence, virtually no CXPacket wait.
If you have a running install of SQL Server and haven’t looking deeply into the wait stats (sys.dm_os_wait_stats) then read up on what Brent and Paul have to say on the subject. I think they do it a lot more justice than I’m capable, and why would I recreate the wheel?
Good Data to you.
Data Chef out,….