Skip to content

Commit c203574

Browse files
committed
Add docs for SIMD image blur example
1 parent feed1d5 commit c203574

2 files changed

Lines changed: 328 additions & 0 deletions

File tree

docs/sphinx/examples.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,3 +31,4 @@ can build the examples.
3131
examples/interest_calculator
3232
examples/1d_stencil
3333
examples/serialization
34+
examples/simd_image_blur
Lines changed: 327 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,327 @@
1+
..
2+
Copyright (C) 2025 Dimitra Karatza
3+
4+
SPDX-License-Identifier: BSL-1.0
5+
Distributed under the Boost Software License, Version 1.0. (See accompanying
6+
file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
7+
8+
.. _examples_image_blur:
9+
10+
==============================
11+
Parallel SIMD image processing
12+
==============================
13+
14+
Image processing is a classic example of a data-parallel workload: the same
15+
operation is applied independently to many pixels. This makes it a natural fit
16+
for SIMD vectorization and parallel execution.
17+
18+
This example demonstrates how to use |hpx| parallel algorithms together with
19+
SIMD-enabled execution policies to apply a simple vertical blur filter to an
20+
image. The example combines **task-level parallelism** and **data-level
21+
parallelism** using the :cpp:var:`hpx::execution::par_simd` execution policy.
22+
23+
.. _simd_blur_setup:
24+
25+
Setup
26+
=====
27+
28+
SIMD support must be enabled in your |hpx| build. Make sure |hpx| was configured with
29+
the Data Parallelism module enabled, for example:
30+
31+
.. code-block:: shell-session
32+
33+
-DHPX_WITH_DATAPAR=On
34+
-DHPX_WITH_DATAPAR_BACKEND=STD_EXPERIMENTAL_SIMD
35+
36+
This example uses the EasyBMP library to load and store images. EasyBMP is a
37+
lightweight BMP image library that provides direct access to pixel data.
38+
39+
Users are free to use any image library and any image they want to blur,
40+
as long as the library allows reading pixel values and writing the processed
41+
output back to an image file.
42+
43+
The program reads an input image, applies a simple vertical blur filter, and
44+
writes the blurred result to an output image. Only the blur computation itself
45+
depends on |hpx|.
46+
47+
.. _simd_blur_overview:
48+
49+
Overview of the blur operation
50+
==============================
51+
52+
The blur filter implemented in this example is a simple vertical 3-point
53+
stencil. For each pixel, a weighted average of the pixel itself and its
54+
vertical neighbors is computed:
55+
56+
.. math::
57+
58+
p_{out} = 0.25 \cdot p_{top} + 0.5 \cdot p_{center} + 0.25 \cdot p_{bottom}
59+
60+
This operation is applied independently to each color channel (red, green, and
61+
blue). Border pixels are excluded from the computation to avoid out-of-bounds
62+
memory accesses and are copied directly to the output image.
63+
64+
Full example code
65+
=================
66+
67+
The following listing shows the complete example. The image handling code may
68+
vary depending on the chosen image library, but the blur kernel and |hpx| usage
69+
remain the same.
70+
71+
.. code-block:: c++
72+
73+
#include <hpx/hpx_main.hpp>
74+
#include <hpx/include/parallel_algorithm.hpp>
75+
#include "./include/EasyBMP.h"
76+
#include <vector>
77+
#include <iostream>
78+
#include <algorithm>
79+
80+
using namespace easy_bmp;
81+
82+
// Apply vertical 3-point blur to each RGB channel separately
83+
void blur_image(const BMP& input, BMP& output) {
84+
int width = input.TellWidth();
85+
int height = input.TellHeight();
86+
87+
// Prepare rows to parallelize over (excluding borders)
88+
std::vector<int> rows(height - 2);
89+
for (int i = 0; i < height - 2; ++i)
90+
rows[i] = i + 1;
91+
92+
// Perform the blur
93+
hpx::for_each(
94+
hpx::execution::par_simd,
95+
rows.begin(), rows.end(),
96+
[&](int y) {
97+
for (int x = 0; x < width; ++x) {
98+
const RGBApixel* top = input(x, y - 1);
99+
const RGBApixel* mid = input(x, y);
100+
const RGBApixel* bot = input(x, y + 1);
101+
102+
auto blur_channel = [](int a, int b, int c) {
103+
return std::clamp(static_cast<int>(0.25f * a + 0.5f * b + 0.25f * c), 0, 255);
104+
};
105+
106+
output(x, y)->Red = blur_channel(top->Red, mid->Red, bot->Red);
107+
output(x, y)->Green = blur_channel(top->Green, mid->Green, bot->Green);
108+
output(x, y)->Blue = blur_channel(top->Blue, mid->Blue, bot->Blue);
109+
}
110+
}
111+
);
112+
}
113+
114+
int main() {
115+
BMP input;
116+
if (!input.ReadFromFile("image.bmp")) {
117+
std::cerr << "Could not open image.bmp\n";
118+
return 1;
119+
}
120+
121+
int width = input.TellWidth();
122+
int height = input.TellHeight();
123+
124+
BMP output;
125+
output.SetSize(width, height);
126+
output.SetBitDepth(24); // RGB
127+
128+
// Copy input to output to preserve borders
129+
for (int y = 0; y < height; ++y)
130+
for (int x = 0; x < width; ++x)
131+
*output(x, y) = *input(x, y);
132+
133+
blur_image(input, output);
134+
135+
output.WriteToFile("blurred.bmp");
136+
return 0;
137+
}
138+
139+
Explanation
140+
===========
141+
142+
The first lines pull in the necessary |hpx| headers, the EasyBMP library, and
143+
headers uch as vector, iosteam and algorithms.
144+
145+
Main function
146+
-------------
147+
148+
The ``main`` function, which is responsible for loading the input image, preparing
149+
the output image, invoking the blur kernel, and writing the result.
150+
151+
First, an input image object is created and populated from a file:
152+
153+
.. code-block:: c++
154+
155+
BMP input;
156+
if (!input.ReadFromFile("image.bmp")) {
157+
std::cerr << "Could not open image.bmp\n";
158+
return 1;
159+
}
160+
161+
The call to ``ReadFromFile()`` attempts to load the image. If the file
162+
cannot be opened or read, the program prints an error message and exits.
163+
164+
Next, the dimensions of the image are queried:
165+
166+
.. code-block:: c++
167+
168+
int width = input.TellWidth();
169+
int height = input.TellHeight();
170+
171+
These values are used to allocate an output image of the same size. The output
172+
image is configured to use a 24-bit RGB format:
173+
174+
.. code-block:: c++
175+
176+
BMP output;
177+
output.SetSize(width, height);
178+
output.SetBitDepth(24); // RGB
179+
180+
Before applying the blur, the input image is copied into the output image:
181+
182+
.. code-block:: c++
183+
184+
for (int y = 0; y < height; ++y)
185+
for (int x = 0; x < width; ++x)
186+
*output(x, y) = *input(x, y);
187+
188+
This step preserves the border pixels, which are not modified by the blur
189+
operation. Copying the image up front avoids special-case handling for the
190+
image boundaries inside the blur kernel.
191+
192+
Finally, the blur kernel is invoked and the resulting image is written:
193+
194+
.. code-block:: c++
195+
196+
blur_image(input, output);
197+
198+
output.WriteToFile("blurred.bmp");
199+
200+
At this point, the output image contains the blurred result, which is saved as
201+
a new image file.
202+
203+
Blur function
204+
-------------
205+
206+
The blur computation is implemented in the ``blur_image`` function. The input
207+
image is passed as a read-only reference, while the output image is modified in
208+
place:
209+
210+
.. code-block:: c++
211+
212+
void blur_image(const BMP& input, BMP& output)
213+
214+
The function begins by querying the dimensions of the image:
215+
216+
.. code-block:: c++
217+
218+
int width = input.TellWidth();
219+
int height = input.TellHeight();
220+
221+
These values are used to determine the iteration bounds of the blur operation.
222+
223+
Defining the iteration space
224+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
225+
226+
|hpx| parallel algorithms operate over ranges. In this example, the blur is
227+
applied only to the interior rows of the image, excluding the first and last
228+
rows to avoid out-of-bounds accesses.
229+
230+
A vector of row indices is constructed as follows:
231+
232+
.. code-block:: c++
233+
234+
std::vector<int> rows(height - 2);
235+
for (int i = 0; i < height - 2; ++i)
236+
rows[i] = i + 1;
237+
238+
Each element of the ``rows`` vector corresponds to a valid row index in the
239+
range ``[1, height - 2]``. This guarantees that for every processed pixel, both
240+
the pixel above and the pixel below exist.
241+
242+
Parallel blur using ``par_simd``
243+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
244+
245+
The blur computation is expressed using :cpp:func:`hpx::for_each` with the
246+
:cpp:var:`hpx::execution::par_simd` execution policy:
247+
248+
.. code-block:: c++
249+
250+
hpx::for_each(
251+
hpx::execution::par_simd,
252+
rows.begin(), rows.end(),
253+
[&](int y) {
254+
// process row y
255+
}
256+
);
257+
258+
The ``par_simd`` execution policy enables HPX to distribute iterations across
259+
multiple worker threads while also applying SIMD vectorization within each
260+
thread. Each iteration of the loop processes a single row of the image.
261+
262+
Processing pixels within a row
263+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
264+
265+
Inside the parallel loop, the code iterates over all columns of the current row:
266+
267+
.. code-block:: c++
268+
269+
for (int x = 0; x < width; ++x) {
270+
271+
For each pixel at position ``(x, y)``, three neighboring pixels are accessed:
272+
273+
.. code-block:: c++
274+
275+
const RGBApixel* top = input(x, y - 1);
276+
const RGBApixel* mid = input(x, y);
277+
const RGBApixel* bot = input(x, y + 1);
278+
279+
These correspond to the pixel above, the current pixel, and the pixel below. The
280+
blur is computed using these three values, forming a simple vertical 3-point
281+
stencil.
282+
283+
Blurring individual color channels
284+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
285+
286+
The blur operation is applied independently to each color channel (red, green,
287+
and blue). A small helper lambda computes a weighted average of three input
288+
values:
289+
290+
.. code-block:: c++
291+
292+
auto blur_channel = [](int a, int b, int c) {
293+
return std::clamp(
294+
static_cast<int>(0.25f * a + 0.5f * b + 0.25f * c), 0, 255);
295+
};
296+
297+
This lambda implements the blur formula and clamps the result to the valid range
298+
for image pixels. It is then used to update each color channel of the output
299+
image:
300+
301+
.. code-block:: c++
302+
303+
output(x, y)->Red = blur_channel(top->Red, mid->Red, bot->Red);
304+
output(x, y)->Green = blur_channel(top->Green, mid->Green, bot->Green);
305+
output(x, y)->Blue = blur_channel(top->Blue, mid->Blue, bot->Blue);
306+
307+
Combining task-level and data-level parallelism
308+
-----------------------------------------------
309+
310+
By parallelizing over image rows and using the ``par_simd`` execution policy,
311+
this example combines two forms of parallelism:
312+
313+
* **Task-level parallelism**, by processing different rows concurrently
314+
* **Data-level parallelism (SIMD)**, by vectorizing the computations within each
315+
thread
316+
317+
This allows the blur operation to scale efficiently on modern multicore CPUs
318+
with SIMD support, without requiring explicit SIMD intrinsics or manual thread
319+
management.
320+
321+
Summary
322+
=======
323+
324+
This example illustrates how HPX execution policies enable expressive and
325+
efficient SIMD-aware parallel programming. Real-world workloads such as image
326+
processing can be parallelized with minimal effort, while still allowing fine-
327+
grained control over execution behavior through policy selection.

0 commit comments

Comments
 (0)