simplify benchmarking.rst, support environment variable to toggle benchmark

tharittk · tharittk · commit 81e62b959e64 · 2025-07-28T05:33:15.000-05:00
diff --git a/docs/source/benchmarking.rst b/docs/source/benchmarking.rst
@@ -3,160 +3,39 @@
 Benchmark Utility in PyLops-MPI
 ===============================
 PyLops-MPI users can convenienly benchmark the performance of their code with a simple decorator.
-
-This tutorial demonstrates how to use the :py:func:`pylops_mpi.utils.benchmark` and
-:py:func:`pylops_mpi.utils.mark` utility methods in PyLops-MPI. These utilities support various
+:py:func:`pylops_mpi.utils.benchmark` and :py:func:`pylops_mpi.utils.mark` support various
 function calling patterns that may arise when benchmarking distributed code.
 
 - :py:func:`pylops_mpi.utils.benchmark` is a **decorator** used to time the execution of entire functions.
 - :py:func:`pylops_mpi.utils.mark` is a **function** used inside decorated functions to insert fine-grained time measurements.
 
-Basic Setup
------------
-
-We start by importing the required modules and setting up some parameters of our simple program.
-
-.. code-block:: python
-
-   import sys
-   import logging
-   import numpy as np
-   from mpi4py import MPI
-   from pylops_mpi import DistributedArray, Partition
-
-   from pylops_mpi.utils.benchmark import benchmark, mark
+.. note::
+   This benchmark utility is enabled by default i.e., if the user decorates the function with :py:func:`@benchmark`, the function will go through
+   the time measurements, adding overheads. Users can turn off the benchmark while leaving the decorator in-place with
 
-   np.random.seed(42)
-   rank = MPI.COMM_WORLD.Get_rank()
+   .. code-block:: bash
 
-   par = {'global_shape': (500, 501),
-          'partition': Partition.SCATTER, 'dtype': np.float64,
-          'axis': 1}
+      >> export BENCH_PYLOPS_MPI=0
 
-Benchmarking a Simple Function
-------------------------------
-
-We define a simple function and decorate it with :py:func:`benchmark`.
+The usage can be as simple as:
 
 .. code-block:: python
 
    @benchmark
-   def inner_func(par):
-       dist_arr = DistributedArray(global_shape=par['global_shape'],
-                                   partition=par['partition'],
-                                   dtype=par['dtype'], axis=par['axis'])
-       # may perform computation here
-       dist_arr.dot(dist_arr)
-
-Calling the function will result in the elapsed runtime being printed to standard output.
-
-.. code-block:: python
-
-   inner_func(par)
-
-You can also customize the label of the printout using the ``description`` parameter:
+   def function_to_time():
+       # Your computation
 
-.. code-block:: python
-
-   @benchmark(description="printout_name")
-   def my_func(...):
-       ...
-
-Fine-grained Time Measurements
-------------------------------
-
-To gain more insight into the runtime of specific code regions, use :py:func:`mark` within
-a decorated function. This allows insertion of labeled time checkpoints.
+The result will print out to the standard output.
+For fine-grained time measurements, :py:func:`pylops_mpi.utils.mark` can be inserted in the code region of benchmarked functions:
 
 .. code-block:: python
 
    @benchmark
-   def inner_func_with_mark(par):
-       mark("Begin array constructor")
-       dist_arr = DistributedArray(global_shape=par['global_shape'],
-                                   partition=par['partition'],
-                                   dtype=par['dtype'], axis=par['axis'])
-       mark("Begin dot")
-       dist_arr.dot(dist_arr)
-       mark("Finish dot")
-
-The output will now contain timestamped entries for each marked location, along with the total time
-from the outer decorator (marked with ``[decorator]`` in the output).
-
-.. code-block:: python
-
-   inner_func_with_mark(par)
-
-Nested Function Benchmarking
-----------------------------
-
-You can nest benchmarked functions to track execution times across layers of function calls.
-Below, we define an :py:func:`outerfunc_with_mark` that calls :py:func:`inner_func_with_mark` defined earlier.
-
-.. code-block:: python
-
-   @benchmark
-   def outer_func_with_mark(par):
-       mark("Outer func start")
-       inner_func_with_mark(par)
-       dist_arr = DistributedArray(global_shape=par['global_shape'],
-                                   partition=par['partition'],
-                                   dtype=par['dtype'], axis=par['axis'])
-       dist_arr + dist_arr
-       mark("Outer func ends")
-
-Calling the function prints the full call tree with indentation, capturing both outer and nested timing.
-
-.. code-block:: python
-
-   outer_func_with_mark(par)
-
-Logging Benchmark Output
-------------------------
-
-To store benchmarking results in a file, pass a custom :py:class:`logging.Logger` instance
-to the :py:func:`benchmark` decorator. Below is a utility function that constructs such a logger.
-
-.. code-block:: python
-
-   def make_logger(save_file=False, file_path=''):
-       logger = logging.getLogger(__name__)
-       logging.basicConfig(filename=file_path if save_file else None,
-                           filemode='w', level=logging.INFO, force=True)
-       logger.propagate = False
-       if save_file:
-           handler = logging.FileHandler(file_path, mode='w')
-       else:
-           handler = logging.StreamHandler(sys.stdout)
-       logger.addHandler(handler)
-       return logger
-
-Use this logger when decorating your function:
-
-.. code-block:: python
-
-   save_file = True
-   file_path = "benchmark.log"
-   logger = make_logger(save_file, file_path)
-
-   @benchmark(logger=logger)
-   def inner_func_with_logger(par):
-       dist_arr = DistributedArray(global_shape=par['global_shape'],
-                                   partition=par['partition'],
-                                   dtype=par['dtype'], axis=par['axis'])
-       # may perform computation here
-       dist_arr.dot(dist_arr)
-
-Run the function to generate output written directly to ``benchmark.log``.
-
-.. code-block:: python
-
-   inner_func_with_logger(par)
-
-Final Notes
------------
-
-This tutorial demonstrated how to benchmark distributed PyLops-MPI operations using both
-coarse and fine-grained instrumentation tools. These utilities help track and debug
-performance bottlenecks in parallel workloads.
-
+   def funtion_to_time():
+       # You computation that you may want to ignore it in benchmark
+       mark("Begin Region")
+       # You computation
+       mark("Finish Region")
+
+You can also nest benchmarked functions to track execution times across layers of function calls with the output being correctly formatted.
+Additionally, the result can also be exported to the text file. For completed and runnable examples, visit :ref:`sphx_glr_tutorials_benchmarking.py`
diff --git a/pylops_mpi/utils/benchmark.py b/pylops_mpi/utils/benchmark.py
@@ -1,5 +1,6 @@
 import functools
 import logging
+import os
 import time
 from typing import Callable, Optional, List
 from mpi4py import MPI
@@ -16,8 +17,8 @@
     def _nccl_sync():
         pass
 
-# TODO (tharitt): later move to env file or something
-ENABLE_BENCHMARK = True
+# Benchmark is enabled by default
+ENABLE_BENCHMARK = int(os.getenv("BENCH_PYLOPS_MPI", 1)) == 1
 
 # Stack of active mark functions for nested support
 _mark_func_stack = []
@@ -77,6 +78,8 @@ def mark(label: str):
         A label of the mark. This signifies both 1) the end of the
         previous mark 2) the beginning of the new mark
     """
+    if not ENABLE_BENCHMARK:
+        return
     if not _mark_func_stack:
         raise RuntimeError("mark() called outside of a benchmarked region")
     _mark_func_stack[-1](label)
@@ -108,9 +111,11 @@ def benchmark(func: Optional[Callable] = None,
         is not provided, the output is printed to stdout.
     """
 
-    # Zero-overhead
-    if not ENABLE_BENCHMARK:
-        return func
+    def noop_decorator(func):
+        @functools.wraps(func)
+        def wrapped(*args, **kwargs):
+            return func(*args, **kwargs)
+        return wrapped
 
     @functools.wraps(func)
     def decorator(func):
@@ -153,7 +158,10 @@ def local_mark(label):
                         print("".join(output))
             return result
         return wrapper
-    if func is not None:
-        return decorator(func)
 
-    return decorator
+    # The code still has to return decorator so that the in-place decorator with arguments
+    # like @benchmark(logger=logger) does not throw the error and can be kept untouched.
+    if not ENABLE_BENCHMARK:
+        return noop_decorator if func is None else noop_decorator(func)
+
+    return decorator if func is None else decorator(func)
diff --git a/tutorials/benchmarking.py b/tutorials/benchmarking.py