|
3 | 3 | Benchmark Utility in PyLops-MPI
|
4 | 4 | ===============================
|
5 | 5 | PyLops-MPI users can convenienly benchmark the performance of their code with a simple decorator.
|
6 |
| - |
7 |
| -This tutorial demonstrates how to use the :py:func:`pylops_mpi.utils.benchmark` and |
8 |
| -:py:func:`pylops_mpi.utils.mark` utility methods in PyLops-MPI. These utilities support various |
| 6 | +:py:func:`pylops_mpi.utils.benchmark` and :py:func:`pylops_mpi.utils.mark` support various |
9 | 7 | function calling patterns that may arise when benchmarking distributed code.
|
10 | 8 |
|
11 | 9 | - :py:func:`pylops_mpi.utils.benchmark` is a **decorator** used to time the execution of entire functions.
|
12 | 10 | - :py:func:`pylops_mpi.utils.mark` is a **function** used inside decorated functions to insert fine-grained time measurements.
|
13 | 11 |
|
14 |
| -Basic Setup |
15 |
| ------------ |
16 |
| - |
17 |
| -We start by importing the required modules and setting up some parameters of our simple program. |
18 |
| - |
19 |
| -.. code-block:: python |
20 |
| -
|
21 |
| - import sys |
22 |
| - import logging |
23 |
| - import numpy as np |
24 |
| - from mpi4py import MPI |
25 |
| - from pylops_mpi import DistributedArray, Partition |
26 |
| -
|
27 |
| - from pylops_mpi.utils.benchmark import benchmark, mark |
| 12 | +.. note:: |
| 13 | + This benchmark utility is enabled by default i.e., if the user decorates the function with :py:func:`@benchmark`, the function will go through |
| 14 | + the time measurements, adding overheads. Users can turn off the benchmark while leaving the decorator in-place with |
28 | 15 |
|
29 |
| - np.random.seed(42) |
30 |
| - rank = MPI.COMM_WORLD.Get_rank() |
| 16 | + .. code-block:: bash |
31 | 17 |
|
32 |
| - par = {'global_shape': (500, 501), |
33 |
| - 'partition': Partition.SCATTER, 'dtype': np.float64, |
34 |
| - 'axis': 1} |
| 18 | + >> export BENCH_PYLOPS_MPI=0 |
35 | 19 |
|
36 |
| -Benchmarking a Simple Function |
37 |
| ------------------------------- |
38 |
| - |
39 |
| -We define a simple function and decorate it with :py:func:`benchmark`. |
| 20 | +The usage can be as simple as: |
40 | 21 |
|
41 | 22 | .. code-block:: python
|
42 | 23 |
|
43 | 24 | @benchmark
|
44 |
| - def inner_func(par): |
45 |
| - dist_arr = DistributedArray(global_shape=par['global_shape'], |
46 |
| - partition=par['partition'], |
47 |
| - dtype=par['dtype'], axis=par['axis']) |
48 |
| - # may perform computation here |
49 |
| - dist_arr.dot(dist_arr) |
50 |
| -
|
51 |
| -Calling the function will result in the elapsed runtime being printed to standard output. |
52 |
| - |
53 |
| -.. code-block:: python |
54 |
| -
|
55 |
| - inner_func(par) |
56 |
| -
|
57 |
| -You can also customize the label of the printout using the ``description`` parameter: |
| 25 | + def function_to_time(): |
| 26 | + # Your computation |
58 | 27 |
|
59 |
| -.. code-block:: python |
60 |
| -
|
61 |
| - @benchmark(description="printout_name") |
62 |
| - def my_func(...): |
63 |
| - ... |
64 |
| -
|
65 |
| -Fine-grained Time Measurements |
66 |
| ------------------------------- |
67 |
| - |
68 |
| -To gain more insight into the runtime of specific code regions, use :py:func:`mark` within |
69 |
| -a decorated function. This allows insertion of labeled time checkpoints. |
| 28 | +The result will print out to the standard output. |
| 29 | +For fine-grained time measurements, :py:func:`pylops_mpi.utils.mark` can be inserted in the code region of benchmarked functions: |
70 | 30 |
|
71 | 31 | .. code-block:: python
|
72 | 32 |
|
73 | 33 | @benchmark
|
74 |
| - def inner_func_with_mark(par): |
75 |
| - mark("Begin array constructor") |
76 |
| - dist_arr = DistributedArray(global_shape=par['global_shape'], |
77 |
| - partition=par['partition'], |
78 |
| - dtype=par['dtype'], axis=par['axis']) |
79 |
| - mark("Begin dot") |
80 |
| - dist_arr.dot(dist_arr) |
81 |
| - mark("Finish dot") |
82 |
| -
|
83 |
| -The output will now contain timestamped entries for each marked location, along with the total time |
84 |
| -from the outer decorator (marked with ``[decorator]`` in the output). |
85 |
| - |
86 |
| -.. code-block:: python |
87 |
| -
|
88 |
| - inner_func_with_mark(par) |
89 |
| -
|
90 |
| -Nested Function Benchmarking |
91 |
| ----------------------------- |
92 |
| - |
93 |
| -You can nest benchmarked functions to track execution times across layers of function calls. |
94 |
| -Below, we define an :py:func:`outerfunc_with_mark` that calls :py:func:`inner_func_with_mark` defined earlier. |
95 |
| - |
96 |
| -.. code-block:: python |
97 |
| -
|
98 |
| - @benchmark |
99 |
| - def outer_func_with_mark(par): |
100 |
| - mark("Outer func start") |
101 |
| - inner_func_with_mark(par) |
102 |
| - dist_arr = DistributedArray(global_shape=par['global_shape'], |
103 |
| - partition=par['partition'], |
104 |
| - dtype=par['dtype'], axis=par['axis']) |
105 |
| - dist_arr + dist_arr |
106 |
| - mark("Outer func ends") |
107 |
| -
|
108 |
| -Calling the function prints the full call tree with indentation, capturing both outer and nested timing. |
109 |
| - |
110 |
| -.. code-block:: python |
111 |
| -
|
112 |
| - outer_func_with_mark(par) |
113 |
| -
|
114 |
| -Logging Benchmark Output |
115 |
| ------------------------- |
116 |
| - |
117 |
| -To store benchmarking results in a file, pass a custom :py:class:`logging.Logger` instance |
118 |
| -to the :py:func:`benchmark` decorator. Below is a utility function that constructs such a logger. |
119 |
| - |
120 |
| -.. code-block:: python |
121 |
| -
|
122 |
| - def make_logger(save_file=False, file_path=''): |
123 |
| - logger = logging.getLogger(__name__) |
124 |
| - logging.basicConfig(filename=file_path if save_file else None, |
125 |
| - filemode='w', level=logging.INFO, force=True) |
126 |
| - logger.propagate = False |
127 |
| - if save_file: |
128 |
| - handler = logging.FileHandler(file_path, mode='w') |
129 |
| - else: |
130 |
| - handler = logging.StreamHandler(sys.stdout) |
131 |
| - logger.addHandler(handler) |
132 |
| - return logger |
133 |
| -
|
134 |
| -Use this logger when decorating your function: |
135 |
| - |
136 |
| -.. code-block:: python |
137 |
| -
|
138 |
| - save_file = True |
139 |
| - file_path = "benchmark.log" |
140 |
| - logger = make_logger(save_file, file_path) |
141 |
| -
|
142 |
| - @benchmark(logger=logger) |
143 |
| - def inner_func_with_logger(par): |
144 |
| - dist_arr = DistributedArray(global_shape=par['global_shape'], |
145 |
| - partition=par['partition'], |
146 |
| - dtype=par['dtype'], axis=par['axis']) |
147 |
| - # may perform computation here |
148 |
| - dist_arr.dot(dist_arr) |
149 |
| -
|
150 |
| -Run the function to generate output written directly to ``benchmark.log``. |
151 |
| - |
152 |
| -.. code-block:: python |
153 |
| -
|
154 |
| - inner_func_with_logger(par) |
155 |
| -
|
156 |
| -Final Notes |
157 |
| ------------ |
158 |
| - |
159 |
| -This tutorial demonstrated how to benchmark distributed PyLops-MPI operations using both |
160 |
| -coarse and fine-grained instrumentation tools. These utilities help track and debug |
161 |
| -performance bottlenecks in parallel workloads. |
162 |
| - |
| 34 | + def funtion_to_time(): |
| 35 | + # You computation that you may want to ignore it in benchmark |
| 36 | + mark("Begin Region") |
| 37 | + # You computation |
| 38 | + mark("Finish Region") |
| 39 | +
|
| 40 | +You can also nest benchmarked functions to track execution times across layers of function calls with the output being correctly formatted. |
| 41 | +Additionally, the result can also be exported to the text file. For completed and runnable examples, visit :ref:`sphx_glr_tutorials_benchmarking.py` |
0 commit comments