Important default behaviour

The environment variables are cleared before running a library benchmark. Have a look into the Configuration section if you need to change that behavior. Gungraun sometimes deviates from the valgrind defaults which are:

GungraunValgrind (v3.23)
--trace-children=yes--trace-children=no
--fair-sched=try--fair-sched=no
--separate-threads=yes--separate-threads=no
--cache-sim=yes--cache-sim=no

The thread and subprocess specific valgrind options enable tracing threads and subprocesses basically but there's usually some additional configuration necessary to trace the metrics of threads and subprocesses.

As show in the table above, the benchmarks run with cache simulation switched on. This adds run time. If you don't need the cache metrics and estimation of cycles, you can easily switch cache simulation off for example with:

#![allow(unused)]
fn main() {
extern crate gungraun;
use gungraun::{LibraryBenchmarkConfig, Callgrind};

LibraryBenchmarkConfig::default().tool(Callgrind::with_args(["--cache-sim=no"]));
}

To switch off cache simulation for all benchmarks in the same file:

extern crate gungraun;
mod my_lib { pub fn fibonacci(a: u64) -> u64 { a } }
use gungraun::{
    main, library_benchmark_group, library_benchmark, LibraryBenchmarkConfig,
    Callgrind
};
use std::hint::black_box;

#[library_benchmark]
fn bench_fibonacci() -> u64 {
    black_box(my_lib::fibonacci(10))
}

library_benchmark_group!(name = fibonacci_group; benchmarks = bench_fibonacci);

fn main() {
main!(
    config = LibraryBenchmarkConfig::default()
        .tool(Callgrind::with_args(["--cache-sim=no"]));
    library_benchmark_groups = fibonacci_group
);
}

Gungraun reports the cache hits and an estimation of cpu cycles:

test_lib_bench_readme_example_fibonacci::bench_fibonacci_group::bench_fibonacci short:10
  Instructions:                        1734|1734                 (No change)
  L1 Hits:                             2359|2359                 (No change)
  LL Hits:                                0|0                    (No change)
  RAM Hits:                               3|3                    (No change)
  Total read+write:                    2362|2362                 (No change)
  Estimated Cycles:                    2464|2464                 (No change)

Gungraun result: Ok. 1 without regressions; 0 regressed; 0 filtered; 1 benchmarks finished in 0.49333s

If you prefer cache misses over cache hits or just want both metrics displayed you can fully customize the callgrind output format.