LAHVA - Linear Algebra on Heterogenous/Vectorized Architecture

Motivation & Purpose

The motivation behind the LAHVA project is to create a commodity layer that enables faster and more user-friendly interaction with heterogeneous computing hardware. The API of BLAS and LAPACK libraries are usually cumbersome especially when moving to accelerator hardware such as GPUs. Due to the complexity of the hardware and the communication between host and device (the accelerator) more objects are needed to control the execution of linear algebra operations. We want to come in and simplify the API by bundling the additional objects needed for the execution in a runtime. Additionally, we have implemented Tensor classes for Vector, Matrix and Lower triangular matrix. Due to the fact that memory spaces of the host and device are usually separate, we also need to take care of the transfer of information and addressing the right memory space in functions. Therefore, we went for a solution where a Tensor object can have to pointers, a host and device pointer. Allocators can then be used for both to allocate the memory. For GPU allocators we also implement the transfer within the allocator object. The project is heavily focussed on using template variables for numeric precision as well as execution of the function on either host or device merely by changing the used Runtime.

The iceberg symbolizes graphically the motivation of this project to simplify the interface between LAHVA and a vendor BLAS library such as nvidia's cuBLAS.

Compatibility

We test the implementation for a permutation of the following operating systems, compilers and BLAS/LAPACK implementations with and without GPU support:

Operating System	Compiler	CPU BLAS/LAPACK	CUDA
Ubuntu 20.04	intel oneAPI 2023.2.0	intel oneMKL 2023.2.0	11.8
	gcc-9	OpenBLAS

Build system: meson (v. 1.4.0), cmake (> 3.18) Build generator: ninja, make

We also provide apptainer recipes to use for building and deployment purposes. You can find them in the subfolder apptainer_recipe.

Compile from source

Currently, we support both meson and CMake build system.

Meson build

First of all, LAHVA can be compiled with and without GPU support (default is with GPU support, nvidia only).
This behavior is set by -Dgpu=true or within the meson_options.txt file.

GPU build

If you are planning to use an nvidia GPU, you will need the compute capability of your GPU or the range of GPUs that the software should be deployed to. One resource to find out this value is techpowerup. You can search for your hardware and find the compute capability (cc) under Graphics Features then CUDA. The cc value is given with a . between both digits. However, when changing the value of gpu_arch in meson_options.txt remove the . and fill in the cc values of all cards that will be used with the program in the array.

Next you should take care that meson is able to find your CUDA installation. The easiest way to take care of this is to set the CUDA_ROOT environment variable used by meson. It should point to the root of your CUDA installation path, i.e. /mnt/group-lib/nvidia-hpc-sdk/Linux_x86_64/24.7/cuda/11.8/. When you are using a non-standard installation path for CUDA if you are on a shared HPC or other system, it can be necessary to also set the paths for libcudart the CUDA runtime library and other libraries such as libcublas or libcusolver. YOu can achieve this by setting or extending the LIBRARY_PATH environment variable. For example: export LIBRARY_PATH=/mnt/group-lib/nvidia-hpc-sdk/Linux_x86_64/24.7/cuda/11.8/lib64:$LIBRARY_PATH. Finally, if you have several nvcc versions installed it might be helpful to set the path of nvcc also explicitly. Now that the compile environment is setup for GPU compilation, we need to setup the meson build. Optional arguments are: the lapack vendor (options: mkl, openblas; default: auto)

meson setup _build -Dgpu=true [optional: -Dlapack=mkl,openblas]

After the setup, we can compile LAHVA like so:

meson compile -C _build

Lastly, we can test the library using the provided unit tests.

meson test -C _build

Use as subproject in other projects

One of the more common applications is to use LAHVA as a subproject in other projects to reuse the implemented tensor classes and its BLAS interface.

Using meson this is rather straightforward, the following dependency should be added to the meson.build file.

lahva_dep = dependency(
  'lahva',
  version: '>=0.0.0',
  fallback: ['lahva', 'lahva_dep'],
  default_options: ['default_library=static'],
)

Usage

LAHVA provides an implementation for 3 kinds of Tensor Classes:

Vector (1D-Tensor)
Matrix (2D-Tensor)
Lower Triangular Matrix [LowTriMatrix] (symmetric 2D Tensor, in packed mode)

In accordance with the purpose of this library, these classes are available in a CPU-only and in a GPU and CPU version. Similar to the std-library containers they are available for various numerical precisions (i.e. int, double, float) via template parameters. Additionally, similar to std::vector allocators are used to allocate and deallocate memory for the containers. However, this template parameter is optional.

In general, there are two ways to use the provided Tensor classes: a) in a static fashion, i.e. import one namespace and use it in that way. This works best outside of classes. b) in a polymorphic fashion, where the actual tensor type is resolved only at runtime using a template parameter in a class of our implementation.

Setup Tensor classes

Static fashion

For CPU-only tensor classes, we import the linalg.hpp header that defines the tensor classes and then use the namespace lahva::cpu.

#include <linalg.hpp>

lahva::cpu::Vector<double> p(5, 2.0);

using namespace lahva::cpu;
// construct a 5 by 5 matrix, using the Shape struct and initializing the values to 1.0
Matrix<float> s(Shape(5, 5), 1.0);

For tensor classes, that also have GPU-compatibility, we include the same header but use the namespace lahva::gpu. In comparison to the CPU tensor, GPU tensors rely on two Allocators one for the CPU memory space and for the GPU memory space that also handles memory transfers between host and device.

#include <linalg.hpp>

lahva::gpu::Vector<double> p(5, 2.0);

using namespace lahva::gpu;
// similar to the CPU Matrix, we have a quadratic 5 x 5 matrix
// here we explicitly give the template parameters for the Allocators instead of relying on default values. 
Matrix<float, CudaHostAllocator<float>, CudaDeviceAsyncAllocator<float>> s(5, 1.0);

Polymorphic fashion

In order to change between CPU and GPU tensors in a polymorphic fashion, a few additional components come in handy. We provide an example for this infrastructure in example/lahva_wrap.hpp. We extend the namespaces lahva::cpu and lahva::gpu with empty structs. These are used as template parameters and markers to lead the compiler to use the appropriate functions and classes from the CPU or GPU namespace. In an application we would include this lahva_wrap.hpp, and implement our TestClass as follows:

For the testclass.hpp:

#include "example/lahva_wrap.hpp"

using namespace lahva::cpu;
using namespace lahva::gpu;

template<typename blas_impl>
class TestClass
{
  public:
    template<typename U>
    using Vector = typename TensorFactory<blas_impl>::template Vector<U>;
    template<typename U>
    using Matrix = typename TensorFactory<blas_impl>::template Matrix<U>;   
    template<typename U>
    using LowTriMatrix = typename TensorFactory<blas_impl>::template LowTriMatrix<U>;
  private:
    Vector<double> vec1;
    Matrix<float> mat2;
    LowTriMatrix<int> low3;
}

When creating the classes in a impl.cpp file for example we would create a CPU-only and a GPU and CPU class.

#include "testclass.hpp"

TestClass<cpuBLAS> test_cpu;
TestClass<gpuBLAS> test_gpu;

To see the effect of this design choice you can visit one of our libraries based on LAHVA: GAMBITS.

Support

Please open an issue in this GitLab repo, so we can help you out.

Roadmap

If you have ideas for releases in the future, it is a good idea to list them in the README.

Authors and acknowledgment

Original author: Pit Steinbach
with contributions from: Mark Heezen
under the supervision of: Christoph Bannwarth

License

For open source projects, say how it is licensed.

Project status

This project is still in an experimental state, though we are committed to keep the API stable, changes could occur.

Name		Name	Last commit message	Last commit date
Latest commit History 214 Commits
apptainer_recipe		apptainer_recipe
config		config
examples		examples
graphics		graphics
include		include
src		src
test		test
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
common-scripts.yml		common-scripts.yml
fpm.toml		fpm.toml
meson.build		meson.build
meson_options.txt		meson_options.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LAHVA - Linear Algebra on Heterogenous/Vectorized Architecture

Motivation & Purpose

Compatibility

Compile from source

Meson build

GPU build

Use as subproject in other projects

Usage

Setup Tensor classes

Static fashion

Polymorphic fashion

Support

Roadmap

Authors and acknowledgment

License

Project status

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LAHVA - Linear Algebra on Heterogenous/Vectorized Architecture

Motivation & Purpose

Compatibility

Compile from source

Meson build

GPU build

Use as subproject in other projects

Usage

Setup Tensor classes

Static fashion

Polymorphic fashion

Support

Roadmap

Authors and acknowledgment

License

Project status

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages