The most expensive procedures in Krylov methods are matrix-vector products (y ← Ax) and vector operations (dot products, vector norms, y ← αx + βy). Therefore they directly affect the efficiency of the methods provided by `Krylov.jl`

. In this section, we present various optimizations based on multithreading to speed up these procedures. Multithreading is a form of shared memory parallelism that makes use of multiple cores to perform tasks.

## Multi-threaded BLAS and LAPACK operations

```
OPENBLAS_NUM_THREADS=N julia # Set the number of OpenBLAS threads to N
MKL_NUM_THREADS=N julia # Set the number of MKL threads to N
```

If you don't know the maximum number of threads available on your computer, you can obtain it with

`NMAX = Sys.CPU_THREADS`

and define the number of OpenBLAS/MKL threads at runtine with

```
BLAS.set_num_threads(N) # 1 ≤ N ≤ NMAX
BLAS.get_num_threads()
```

The recommended number of BLAS threads is the number of physical and not logical cores, which is in general `N = NMAX / 2`

if your CPU supports simultaneous multithreading (SMT).

By default Julia ships with OpenBLAS but it's also possible to use Intel MKL BLAS and LAPACK with MKL.jl.

```
using LinearAlgebra
BLAS.vendor() # get_config() for Julia ≥ 1.7
```

## Multi-threaded sparse matrix-vector products

For sparse matrices, the Julia implementation of `mul!`

of SparseArrays library is not parallelized. A siginifiant speed-up can be observed with the multhreaded `mul!`

of MKLSparse.jl.

It's also possible to implement a generic multithreaded julia version. For instance, the following function can be used for symmetric matrices

```
using Base.Threads
function threaded_mul!(y::Vector{T}, A::SparseMatrixCSC{T}, x::Vector{T}) where T <: Number
A.m == A.n || error("A is not a square matrix!")
@threads for i = 1 : A.n
tmp = zero(T)
@inbounds for j = A.colptr[i] : (A.colptr[i+1] - 1)
tmp += A.nzval[j] * x[A.rowval[j]]
end
@inbounds y[i] = tmp
end
return y
end
```

and wrapped inside a linear operator to solve symmetric linear systems

```
using LinearOperators
n, m = size(A)
sym = herm = true
T = eltype(A)
opA = LinearOperator(T, n, m, sym, herm, (y, v) -> threaded_mul!(y, A, v))
```

To enable multi-threading with Julia, you can start julia with the environment variable `JULIA_NUM_THREADS`

or the options `-t`

and `--threads`

```
julia -t auto # alternative: --threads auto
julia -t N # alternative: --threads N
JULIA_NUM_THREADS=N julia
```

Thereafter, you can verify the number of threads usable by Julia

```
using Base.Threads
nthreads()
```

The following benchmarks illustrate the time required in seconds to compute 1000 sparse matrix-vector products with symmetric matrices of the SuiteSparse Matrix Collection. The computer used for the benchmarks has 2 physical cores and Julia was launched with `JULIA_NUM_THREADS=2`

.