|
| 1 | +.. |
| 2 | + Copyright (C) 2025 Dimitra Karatza |
| 3 | +
|
| 4 | + SPDX-License-Identifier: BSL-1.0 |
| 5 | + Distributed under the Boost Software License, Version 1.0. (See accompanying |
| 6 | + file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) |
| 7 | + |
| 8 | +.. _examples_image_blur: |
| 9 | + |
| 10 | +============================== |
| 11 | +Parallel SIMD image processing |
| 12 | +============================== |
| 13 | + |
| 14 | +Image processing is a classic example of a data-parallel workload: the same |
| 15 | +operation is applied independently to many pixels. This makes it a natural fit |
| 16 | +for SIMD vectorization and parallel execution. |
| 17 | + |
| 18 | +This example demonstrates how to use |hpx| parallel algorithms together with |
| 19 | +SIMD-enabled execution policies to apply a simple vertical blur filter to an |
| 20 | +image. The example combines **task-level parallelism** and **data-level |
| 21 | +parallelism** using the :cpp:var:`hpx::execution::par_simd` execution policy. |
| 22 | + |
| 23 | +.. _simd_blur_setup: |
| 24 | + |
| 25 | +Setup |
| 26 | +===== |
| 27 | + |
| 28 | +SIMD support must be enabled in your |hpx| build. Make sure |hpx| was configured with |
| 29 | +the Data Parallelism module enabled, for example: |
| 30 | + |
| 31 | +.. code-block:: shell-session |
| 32 | +
|
| 33 | + -DHPX_WITH_DATAPAR=On |
| 34 | + -DHPX_WITH_DATAPAR_BACKEND=STD_EXPERIMENTAL_SIMD |
| 35 | +
|
| 36 | +This example uses the EasyBMP library to load and store images. EasyBMP is a |
| 37 | +lightweight BMP image library that provides direct access to pixel data. |
| 38 | + |
| 39 | +Users are free to use any image library and any image they want to blur, |
| 40 | +as long as the library allows reading pixel values and writing the processed |
| 41 | +output back to an image file. |
| 42 | + |
| 43 | +The program reads an input image, applies a simple vertical blur filter, and |
| 44 | +writes the blurred result to an output image. Only the blur computation itself |
| 45 | +depends on |hpx|. |
| 46 | + |
| 47 | +.. _simd_blur_overview: |
| 48 | + |
| 49 | +Overview of the blur operation |
| 50 | +============================== |
| 51 | + |
| 52 | +The blur filter implemented in this example is a simple vertical 3-point |
| 53 | +stencil. For each pixel, a weighted average of the pixel itself and its |
| 54 | +vertical neighbors is computed: |
| 55 | + |
| 56 | +.. math:: |
| 57 | +
|
| 58 | + p_{out} = 0.25 \cdot p_{top} + 0.5 \cdot p_{center} + 0.25 \cdot p_{bottom} |
| 59 | +
|
| 60 | +This operation is applied independently to each color channel (red, green, and |
| 61 | +blue). Border pixels are excluded from the computation to avoid out-of-bounds |
| 62 | +memory accesses and are copied directly to the output image. |
| 63 | + |
| 64 | +Full example code |
| 65 | +================= |
| 66 | + |
| 67 | +The following listing shows the complete example. The image handling code may |
| 68 | +vary depending on the chosen image library, but the blur kernel and |hpx| usage |
| 69 | +remain the same. |
| 70 | + |
| 71 | +.. code-block:: c++ |
| 72 | + |
| 73 | + #include <hpx/hpx_main.hpp> |
| 74 | + #include <hpx/include/parallel_algorithm.hpp> |
| 75 | + #include "./include/EasyBMP.h" |
| 76 | + #include <vector> |
| 77 | + #include <iostream> |
| 78 | + #include <algorithm> |
| 79 | + |
| 80 | + using namespace easy_bmp; |
| 81 | + |
| 82 | + // Apply vertical 3-point blur to each RGB channel separately |
| 83 | + void blur_image(const BMP& input, BMP& output) { |
| 84 | + int width = input.TellWidth(); |
| 85 | + int height = input.TellHeight(); |
| 86 | + |
| 87 | + // Prepare rows to parallelize over (excluding borders) |
| 88 | + std::vector<int> rows(height - 2); |
| 89 | + for (int i = 0; i < height - 2; ++i) |
| 90 | + rows[i] = i + 1; |
| 91 | + |
| 92 | + // Perform the blur |
| 93 | + hpx::for_each( |
| 94 | + hpx::execution::par_simd, |
| 95 | + rows.begin(), rows.end(), |
| 96 | + [&](int y) { |
| 97 | + for (int x = 0; x < width; ++x) { |
| 98 | + const RGBApixel* top = input(x, y - 1); |
| 99 | + const RGBApixel* mid = input(x, y); |
| 100 | + const RGBApixel* bot = input(x, y + 1); |
| 101 | + |
| 102 | + auto blur_channel = [](int a, int b, int c) { |
| 103 | + return std::clamp(static_cast<int>(0.25f * a + 0.5f * b + 0.25f * c), 0, 255); |
| 104 | + }; |
| 105 | + |
| 106 | + output(x, y)->Red = blur_channel(top->Red, mid->Red, bot->Red); |
| 107 | + output(x, y)->Green = blur_channel(top->Green, mid->Green, bot->Green); |
| 108 | + output(x, y)->Blue = blur_channel(top->Blue, mid->Blue, bot->Blue); |
| 109 | + } |
| 110 | + } |
| 111 | + ); |
| 112 | + } |
| 113 | + |
| 114 | + int main() { |
| 115 | + BMP input; |
| 116 | + if (!input.ReadFromFile("image.bmp")) { |
| 117 | + std::cerr << "Could not open image.bmp\n"; |
| 118 | + return 1; |
| 119 | + } |
| 120 | + |
| 121 | + int width = input.TellWidth(); |
| 122 | + int height = input.TellHeight(); |
| 123 | + |
| 124 | + BMP output; |
| 125 | + output.SetSize(width, height); |
| 126 | + output.SetBitDepth(24); // RGB |
| 127 | + |
| 128 | + // Copy input to output to preserve borders |
| 129 | + for (int y = 0; y < height; ++y) |
| 130 | + for (int x = 0; x < width; ++x) |
| 131 | + *output(x, y) = *input(x, y); |
| 132 | +
|
| 133 | + blur_image(input, output); |
| 134 | + |
| 135 | + output.WriteToFile("blurred.bmp"); |
| 136 | + return 0; |
| 137 | + } |
| 138 | + |
| 139 | +Explanation |
| 140 | +=========== |
| 141 | + |
| 142 | +The first lines pull in the necessary |hpx| headers, the EasyBMP library, and |
| 143 | +headers uch as vector, iosteam and algorithms. |
| 144 | + |
| 145 | +Main function |
| 146 | +------------- |
| 147 | + |
| 148 | +The ``main`` function, which is responsible for loading the input image, preparing |
| 149 | +the output image, invoking the blur kernel, and writing the result. |
| 150 | + |
| 151 | +First, an input image object is created and populated from a file: |
| 152 | + |
| 153 | +.. code-block:: c++ |
| 154 | + |
| 155 | + BMP input; |
| 156 | + if (!input.ReadFromFile("image.bmp")) { |
| 157 | + std::cerr << "Could not open image.bmp\n"; |
| 158 | + return 1; |
| 159 | + } |
| 160 | + |
| 161 | +The call to ``ReadFromFile()`` attempts to load the image. If the file |
| 162 | +cannot be opened or read, the program prints an error message and exits. |
| 163 | + |
| 164 | +Next, the dimensions of the image are queried: |
| 165 | + |
| 166 | +.. code-block:: c++ |
| 167 | + |
| 168 | + int width = input.TellWidth(); |
| 169 | + int height = input.TellHeight(); |
| 170 | + |
| 171 | +These values are used to allocate an output image of the same size. The output |
| 172 | +image is configured to use a 24-bit RGB format: |
| 173 | + |
| 174 | +.. code-block:: c++ |
| 175 | + |
| 176 | + BMP output; |
| 177 | + output.SetSize(width, height); |
| 178 | + output.SetBitDepth(24); // RGB |
| 179 | + |
| 180 | +Before applying the blur, the input image is copied into the output image: |
| 181 | + |
| 182 | +.. code-block:: c++ |
| 183 | + |
| 184 | + for (int y = 0; y < height; ++y) |
| 185 | + for (int x = 0; x < width; ++x) |
| 186 | + *output(x, y) = *input(x, y); |
| 187 | +
|
| 188 | +This step preserves the border pixels, which are not modified by the blur |
| 189 | +operation. Copying the image up front avoids special-case handling for the |
| 190 | +image boundaries inside the blur kernel. |
| 191 | + |
| 192 | +Finally, the blur kernel is invoked and the resulting image is written: |
| 193 | + |
| 194 | +.. code-block:: c++ |
| 195 | + |
| 196 | + blur_image(input, output); |
| 197 | + |
| 198 | + output.WriteToFile("blurred.bmp"); |
| 199 | + |
| 200 | +At this point, the output image contains the blurred result, which is saved as |
| 201 | +a new image file. |
| 202 | + |
| 203 | +Blur function |
| 204 | +------------- |
| 205 | + |
| 206 | +The blur computation is implemented in the ``blur_image`` function. The input |
| 207 | +image is passed as a read-only reference, while the output image is modified in |
| 208 | +place: |
| 209 | + |
| 210 | +.. code-block:: c++ |
| 211 | + |
| 212 | + void blur_image(const BMP& input, BMP& output) |
| 213 | + |
| 214 | +The function begins by querying the dimensions of the image: |
| 215 | + |
| 216 | +.. code-block:: c++ |
| 217 | + |
| 218 | + int width = input.TellWidth(); |
| 219 | + int height = input.TellHeight(); |
| 220 | + |
| 221 | +These values are used to determine the iteration bounds of the blur operation. |
| 222 | + |
| 223 | +Defining the iteration space |
| 224 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 225 | + |
| 226 | +|hpx| parallel algorithms operate over ranges. In this example, the blur is |
| 227 | +applied only to the interior rows of the image, excluding the first and last |
| 228 | +rows to avoid out-of-bounds accesses. |
| 229 | + |
| 230 | +A vector of row indices is constructed as follows: |
| 231 | + |
| 232 | +.. code-block:: c++ |
| 233 | + |
| 234 | + std::vector<int> rows(height - 2); |
| 235 | + for (int i = 0; i < height - 2; ++i) |
| 236 | + rows[i] = i + 1; |
| 237 | + |
| 238 | +Each element of the ``rows`` vector corresponds to a valid row index in the |
| 239 | +range ``[1, height - 2]``. This guarantees that for every processed pixel, both |
| 240 | +the pixel above and the pixel below exist. |
| 241 | + |
| 242 | +Parallel blur using ``par_simd`` |
| 243 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 244 | + |
| 245 | +The blur computation is expressed using :cpp:func:`hpx::for_each` with the |
| 246 | +:cpp:var:`hpx::execution::par_simd` execution policy: |
| 247 | + |
| 248 | +.. code-block:: c++ |
| 249 | + |
| 250 | + hpx::for_each( |
| 251 | + hpx::execution::par_simd, |
| 252 | + rows.begin(), rows.end(), |
| 253 | + [&](int y) { |
| 254 | + // process row y |
| 255 | + } |
| 256 | + ); |
| 257 | + |
| 258 | +The ``par_simd`` execution policy enables HPX to distribute iterations across |
| 259 | +multiple worker threads while also applying SIMD vectorization within each |
| 260 | +thread. Each iteration of the loop processes a single row of the image. |
| 261 | + |
| 262 | +Processing pixels within a row |
| 263 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 264 | + |
| 265 | +Inside the parallel loop, the code iterates over all columns of the current row: |
| 266 | + |
| 267 | +.. code-block:: c++ |
| 268 | + |
| 269 | + for (int x = 0; x < width; ++x) { |
| 270 | + |
| 271 | +For each pixel at position ``(x, y)``, three neighboring pixels are accessed: |
| 272 | + |
| 273 | +.. code-block:: c++ |
| 274 | + |
| 275 | + const RGBApixel* top = input(x, y - 1); |
| 276 | + const RGBApixel* mid = input(x, y); |
| 277 | + const RGBApixel* bot = input(x, y + 1); |
| 278 | + |
| 279 | +These correspond to the pixel above, the current pixel, and the pixel below. The |
| 280 | +blur is computed using these three values, forming a simple vertical 3-point |
| 281 | +stencil. |
| 282 | + |
| 283 | +Blurring individual color channels |
| 284 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 285 | + |
| 286 | +The blur operation is applied independently to each color channel (red, green, |
| 287 | +and blue). A small helper lambda computes a weighted average of three input |
| 288 | +values: |
| 289 | + |
| 290 | +.. code-block:: c++ |
| 291 | + |
| 292 | + auto blur_channel = [](int a, int b, int c) { |
| 293 | + return std::clamp( |
| 294 | + static_cast<int>(0.25f * a + 0.5f * b + 0.25f * c), 0, 255); |
| 295 | + }; |
| 296 | + |
| 297 | +This lambda implements the blur formula and clamps the result to the valid range |
| 298 | +for image pixels. It is then used to update each color channel of the output |
| 299 | +image: |
| 300 | + |
| 301 | +.. code-block:: c++ |
| 302 | + |
| 303 | + output(x, y)->Red = blur_channel(top->Red, mid->Red, bot->Red); |
| 304 | + output(x, y)->Green = blur_channel(top->Green, mid->Green, bot->Green); |
| 305 | + output(x, y)->Blue = blur_channel(top->Blue, mid->Blue, bot->Blue); |
| 306 | + |
| 307 | +Combining task-level and data-level parallelism |
| 308 | +----------------------------------------------- |
| 309 | + |
| 310 | +By parallelizing over image rows and using the ``par_simd`` execution policy, |
| 311 | +this example combines two forms of parallelism: |
| 312 | + |
| 313 | +* **Task-level parallelism**, by processing different rows concurrently |
| 314 | +* **Data-level parallelism (SIMD)**, by vectorizing the computations within each |
| 315 | + thread |
| 316 | + |
| 317 | +This allows the blur operation to scale efficiently on modern multicore CPUs |
| 318 | +with SIMD support, without requiring explicit SIMD intrinsics or manual thread |
| 319 | +management. |
| 320 | + |
| 321 | +Summary |
| 322 | +======= |
| 323 | + |
| 324 | +This example illustrates how HPX execution policies enable expressive and |
| 325 | +efficient SIMD-aware parallel programming. Real-world workloads such as image |
| 326 | +processing can be parallelized with minimal effort, while still allowing fine- |
| 327 | +grained control over execution behavior through policy selection. |
0 commit comments