This post was inspired by Measuring the size of the cache line empirically and it’s my first them to quantify the impact of false sharing in multi-threaded applications.
My goal is to measure the cache line on any system and for that I need a fast and portable solution. These are the main step:
- Create testing environment - I’ll start with Docker
- Write a C program that writes results to a json file
- Evaluate and compare the results - with a python helper script
Create testing environment
Surely, I could have have just cloned the repo provided in the reference article, but I want to learn, compare and have fun 🏄♂️.
Pull my standard Ubuntu 22 dev environment
docker pull mmontanari/devenv:buildenv-ubuntu-22
keep in running
docker run -d --name buildenv mmontanari/devenv:buildenv-ubuntu-22 tail -f /dev/null
check it runs
docker ps
used it as dev container in VS Code with “Attached to running container …”
Write a C program that writes results
Actually no. I’m too lazy and want to test Claude to write this for me. I will only prompt to preproduce the reference article, but I will not provide the provided github code to Claude.
This is what I got as response:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#define BUFFER_SIZE (32 * 1024 * 1024) // 32 MB
#define NUM_ITERATIONS 10
#define MAX_STRIDE 512
double get_time() {
struct timespec ts;
clock_gettime(CLOCK_MONOTONIC, &ts);
return ts.tv_sec + ts.tv_nsec / 1e9;
}
void strided_copy(char *src, char *dst, size_t size, size_t stride) {
for (size_t i = 0; i < size; i += stride) {
dst[i] = src[i];
}
}
int main() {
char *src = malloc(BUFFER_SIZE);
char *dst = malloc(BUFFER_SIZE);
if (!src || !dst) {
fprintf(stderr, "Memory allocation failed\n");
return 1;
}
// Initialize source buffer
for (int i = 0; i < BUFFER_SIZE; i++) {
src[i] = (char)i;
}
FILE *fp = fopen("cache_line_results.json", "w");
if (!fp) {
fprintf(stderr, "Failed to open output file\n");
return 1;
}
fprintf(fp, "{\n\"results\": [\n");
for (size_t stride = 16; stride <= MAX_STRIDE; stride++) {
double min_speed = 1e9, max_speed = 0, total_speed = 0;
for (int iter = 0; iter < NUM_ITERATIONS; iter++) {
double start = get_time();
strided_copy(src, dst, BUFFER_SIZE, stride);
double end = get_time();
double elapsed = end - start;
double speed = (BUFFER_SIZE / 1e9) / elapsed; // GB/s
if (speed < min_speed) min_speed = speed;
if (speed > max_speed) max_speed = speed;
total_speed += speed;
}
double avg_speed = total_speed / NUM_ITERATIONS;
fprintf(fp, " {\"stride\": %zu, \"min\": %.2f, \"max\": %.2f, \"avg\": %.2f}%s\n",
stride, min_speed, max_speed, avg_speed,
stride == MAX_STRIDE ? "" : ",");
}
fprintf(fp, "]\n}\n");
fclose(fp);
free(src);
free(dst);
return 0;
}
build and run with:
gcc -O3 -o cache_line_test cache_line_test.c && ./cache_line_test
all runs as expected 💯
Evaluate and compare the results
My dev container is missing pip and matplotlib, so let’s fix that:
apt-get update -y && apt-get install pip -y
pip install matplotlib
I don’t need to mess around with virtual environments here since the container will be destroyed afterwards.
Next, let’s run the python script that Claude generated from the same prompt that gave me the C code - Yes, you read it right: one prompt generated all artifacts and matched the output plot thanks to Claude’s vision capabilities.
import json
import matplotlib.pyplot as plt
# Read the JSON file
with open('cache_line_results.json', 'r') as f:
data = json.load(f)
# Extract data
strides = [result['stride'] for result in data['results']]
min_speeds = [result['min'] for result in data['results']]
max_speeds = [result['max'] for result in data['results']]
avg_speeds = [result['avg'] for result in data['results']]
# Create the plot
plt.figure(figsize=(12, 6))
plt.plot(strides, min_speeds, label='Min Speed', marker='o')
plt.plot(strides, max_speeds, label='Max Speed', marker='o')
plt.plot(strides, avg_speeds, label='Avg Speed', marker='o')
plt.xlabel('Stride (bytes)')
plt.ylabel('Speed (GB/s)')
plt.title('Cache Line Size Test Results')
plt.legend()
plt.xscale('log', base=2)
plt.grid(True)
# Add vertical lines at powers of 2
for x in [32, 64, 128, 256]:
plt.axvline(x=x, color='gray', linestyle='--', alpha=0.5)
plt.savefig('cache_line_test_results.png')
plt.show()
This runs just fine as well.
Results
This is my first attempt
This is really noisy and I can’t tell if the speed raises at 64 or 128.
I decided to repeat the test and after replacing malloc
with aligned_alloc
. This reduces the noise significantly
Better now! Repeating the test gives more consistent results and the cache line on my laptop seems to be 128 bytes. From common wisdom I expected it to be 64 bytes instead. Let’s do more testing.
Compare against reference tests
Since I could not find any official information from Intel about my i9 CPU, to validate my results I run the reference code which is clearly taking a very similar approach. This runs in 8 minutes.
While it runs I compared the two codes: mine from Claude, the reference from ChatGPT. They are very similar but the devil may be in the details. Also, I’d be very surprised if the python code can actually reproduce the image prompted to 🤩. I’m very curios to see how they compare.
Here is the results from the reference code
They match! This measures a cache line of 128 bytes (the cache line length corresponds to the value in which the speed curve goes from plateau to raising exponentially). Basically, these two codes do produce consistent results. I have been looking for major differences, I made some tests and changes, but I could not find anything incredibly different.
Conclusion
Generating code with tools with ChatGPT, Claude or other LLM is great, but not perfect. These black-boxes sometimes works greatly (see the python snippet for plotting), sometimes they do not. Then I basically have to debug someone else’s code and that takes time. Still, in this case, I got lucky and the result is genuinely impressive.
I gave an extremely simple prompt and augmented it with the full article (text and images). In return, I obtained a perfectly running software that reproduced the desired results. Awesome 🤓.