Skip to content

v0.18.0

Latest
Compare
Choose a tag to compare
@laggui laggui released this 18 Jul 16:27
· 12 commits to main since this release
f5d889d

Summary

This release marks a significant step forward in performance, reliability, and optimization, ensuring a more robust and efficient system for our users. We've expanded our CI testing suite to address multi-threading, lazy evaluation, and async execution issues, ensuring robust performance across an increasing number of supported platforms.

Matrix Multiplication Improvements

Optimized matrix multiplication kernels with specialized implementations for:

  • Matrix-vector (mat@vec)
  • Vector-matrix (vec@mat)
  • Inner product
  • Outer product

And enhanced flexibility in the matrix multiplication kernel generation engine, surpassing traditional GEMM (General Matrix Multiply) approaches.

For more details, including performance benchmarks, check out our state-of-the-art multiplatform matrix multiplication post.

Fusion Enhancements

  • Improved reliability and performance of Burn Fusion through advanced optimizations.
  • Added support for basic dead code elimination.
  • Introduced a new search engine that optimally reorders operations to maximize optimization opportunities, improving resilience to tensor operation ordering.

Multi-Threading and Memory Management

  • Resolved critical multi-threading issues by adopting a new approach to support multiple concurrent streams.
  • Burn Fusion's lazy evaluation of registered operations across concurrent streams now places greater demands on memory management. To address this:
    • Implemented a robust memory leak test in our CI pipeline to verify the runtime's internal state, ensuring all handles and concurrent streams are properly cleaned up in all test cases.
    • Fixed bugs related to premature memory deallocation, enhancing memory management stability.

CubeCL Config

By default, CubeCL loads its configuration from a TOML file (cubecl.toml or CubeCL.toml) located in your current directory or any parent directory. If no configuration file is found, CubeCL falls back to sensible defaults.

A typical cubecl.toml file might look like this:

[profiling]
logger = { level = "basic", stdout = true }

[autotune]
level = "balanced"
logger = { level = "minimal", stdout = true }

[compilation]
logger = { level = "basic", file = "cubecl.log", append = true }

Each section configures a different aspect of CubeCL:

  • profiling: Controls performance profiling and logging.
  • autotune: Configures the autotuning system, which benchmarks and selects optimal kernel parameters.
  • compilation: Manages kernel compilation logging and cache.

For more info, check out the CubeCL book.

As with previous releases, this version includes various bug fixes, many internal optimizations, and backend upgrades that reinforce the framework's performance and flexibility across platforms.

Changelog

Breaking: the default stride(s) for pooling modules now match the kernel size instead of defaulting to strides of 1. This will affect output shapes if strides were not explicitly set.

MaxPool2dConfig
let pool = MaxPool2dConfig::new(kernel_size)
+   .with_strides([1, 1])
    .with_padding(PaddingConfig2d::Same)
    .init();
MaxPool1dConfig
let pool = MaxPool1dConfig::new(kernel_size)
+   .with_stride(1)
    .with_padding(PaddingConfig1d::Same)
    .init();
AvgPool2dConfig
let pool = AvgPool2dConfig::new(kernel_size)
+   .with_strides([1, 1])
    .with_padding(PaddingConfig2d::Same)
    .init();
AvgPool1dConfig
let pool = AvgPool1dConfig::new(kernel_size)
+   .with_stride(1)
    .with_padding(PaddingConfig1d::Same)
    .init();

Module & Tensor

Backends

Bug Fixes

Documentation & Examples

Fixes

ONNX Support

Enhancements

Refactoring

Miscellaneous