Intel has been reeling (alongside AMD, to a lesser extent) after the Meltdown and Spectre chip vulnerability was revealed. The kernel-level flaw leaves all most Intel-powered machines open to attack. A patch has been issued, but it is buggy and also compromises performance. However, Intel has made another confession regarding the supposed fix.
The company's patches to fix the problem have resulted in rebooting problems across many older systems. Intel believed the patch bug affected only Broadwell and Haswell chips. Unfortunately, internal testing has found the vulnerability also affects machines powered by Skylake and Kaby Lake chips. Additionally, the very old Ivy Bridge and Sandy Bridge systems are also affected by the faulty patch.
Intel says it is still working to fix buggy patches that have already been sent. VP Navin Shenoy said the company has patched many machines:
“We have now issued firmware updates for 90 percent of Intel CPUs introduced in the past five years, but we have more work to do. As I noted in my blog post last week, while the firmware updates are effective at mitigating exposure to the security issues, customers have reported more frequent reboots on firmware updated systems.”
How the patch affects performance has been a main concern. Servers and datacenters could experience a performance loss in their efforts to thwart Meltdown and Spectre. Shenoy discussed how Intel has tested the performance trade that comes with the patch.
The company used two-socket Intel Xeon Scalable systems for benchmarking server platforms. It found no loss of energy efficiency and no slowdown for Java business applications. However, there is a 2% to 4% drop in performance:
- Impacts ranging from 0-2% on industry-standard measures of integer and floating point throughput, Linpack, STREAM, server-side Java and energy efficiency benchmarks. These benchmarks represent several common workloads important to enterprise and cloud customers.
- An online transaction processing (OLTP) benchmark simulating modeling a brokerage firm's customer-broker-stock exchange interaction showed a 4% impact. More analytics testing is in process and the results will be dependent on system configuration, test setup and benchmark used.
- Benchmarks for storage also showed a range of results depending on the benchmark, test setup and system configuration:
- For FlexibleIO, a benchmark simulating different types of I/O loads, results depend on many factors, including read/write mix, block size, drives and CPU utilization. When we conducted testing to stress the CPU (100% write case), we saw an 18% decrease in throughput performance because there was not CPU utilization headroom. When we used a 70/30 read/write model, we saw a 2% decrease in throughput performance. When CPU utilization was low (100% read case), as is the case with common storage provisioning, we saw an increase in CPU utilization, but no throughput performance impact.
- Storage Performance Development Kit (SPDK) tests, which provide a set of tools and libraries for writing high performance, scalable, user-mode storage applications, were measured in multiple test configurations. Using SPDK iSCSI, we saw as much as a 25% impact while using only a single core. Using SPDK vHost, we saw no impact.
Meltdown and Spectre
The vulnerability occurs in kernel operations. Each time a command is sent on a system, the CPU hands control to the kernel. To ensure improved performance and system efficiency, the kernel remains below the surface of processes even when the CPU resumes control. It is this that leaves machines at risk.
It was believed the newest chips already had the Kernel Page Table Isolation (PTI) workaround. PTI places the kernel in a dedicated address space, making it unavailable to running processes. This workaround is believed to work, but is clearly not included in newer chips like previously thought. PTI also compromises performance, something not evident in single machines, but maybe costly over a major computing operation.