This page has moved to http://developer.lsst.io/en/latest/coding/profiling.html
It is important to generate a profile of code performance to understand where to focus optimization efforts. The following tools have been used effectively with python and C++ code.
Python Profiling
A useful guide to optimisation of python code in general, and scipy/numpy in particular, is: http://scipy-lectures.github.io/advanced/optimizing/.
Function-level Profiling
Consider the code in mosaic.py
. To profile it:
Code Block | ||
---|---|---|
| ||
python -m cProfile -o cprofile-mosaic.dat `which mosaic.py` /data3a/work/price/SUPA-MIT/rerun/cosmos --id field=COSMOS filter=W-S-I+ expTime=120.0 --clobber-config |
Then, in a python session:
Code Block | ||
---|---|---|
| ||
import pstats
p = pstats.Stats("cprofile-mosaic.dat")
p.sort_stats("cumulative").print_stats(30) # Print top 30 cumulative |
The results are:
Code Block | ||
---|---|---|
| ||
Tue Jun 18 04:16:45 2013 cprofile-mosaic.dat
6711700 function calls (6698794 primitive calls) in 36.536 seconds
Ordered by: cumulative time
List reduced from 4671 to 30 due to restriction <30>
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.004 0.004 36.538 36.538 /home/price/hsc/meas_mosaic/bin/mosaic.py:3(<module>)
1 0.000 0.000 34.707 34.707 /data1a/ana/products2.1/Linux64/pipe_base/HSC-2.4.1a_hsc/python/lsst/pipe/base/cmdLineTask.py:243(parseAndRun)
1 0.000 0.000 34.324 34.324 /data1a/ana/products2.1/Linux64/pipe_base/HSC-2.4.1a_hsc/python/lsst/pipe/base/cmdLineTask.py:87(run)
30 0.000 0.000 34.317 1.144 {map}
1 0.000 0.000 34.303 34.303 /home/price/hsc/meas_mosaic/python/lsst/meas/mosaic/mosaicTask.py:45(__call__)
1 0.073 0.073 34.303 34.303 /home/price/hsc/meas_mosaic/python/lsst/meas/mosaic/mosaicTask.py:1112(run)
1 0.176 0.176 34.230 34.230 /home/price/hsc/meas_mosaic/python/lsst/meas/mosaic/mosaicTask.py:950(mosaic)
1 2.404 2.404 25.686 25.686 /home/price/hsc/meas_mosaic/python/lsst/meas/mosaic/mosaicTask.py:268(readCatalog)
360 0.740 0.002 20.289 0.056 /home/price/hsc/meas_mosaic/python/lsst/meas/mosaic/mosaicTask.py:205(getAllForCcd)
1008 0.012 0.000 13.592 0.013 /data1a/ana/products2.1/Linux64/daf_persistence/HSC-2.1.2a_hsc/python/lsst/daf/persistence/butlerSubset.py:171(get)
1008 0.025 0.000 13.579 0.013 /data1a/ana/products2.1/Linux64/daf_persistence/HSC-2.1.2a_hsc/python/lsst/daf/persistence/butler.py:209(get)
648 0.007 0.000 12.235 0.019 /data1a/ana/products2.1/Linux64/daf_persistence/HSC-2.1.2a_hsc/python/lsst/daf/persistence/butler.py:239(<lambda>)
648 0.014 0.000 12.228 0.019 /data1a/ana/products2.1/Linux64/daf_persistence/HSC-2.1.2a_hsc/python/lsst/daf/persistence/butler.py:386(_read)
324 0.001 0.000 10.380 0.032 /home/price/hsc/afw/python/lsst/afw/table/tableLib.py:7836(readFits)
324 10.379 0.032 10.379 0.032 {_tableLib.SourceCatalog_readFits}
1 0.121 0.121 7.344 7.344 /home/price/hsc/meas_mosaic/python/lsst/meas/mosaic/mosaicTask.py:318(mergeCatalog)
1 0.000 0.000 6.680 6.680 /home/price/hsc/meas_mosaic/python/lsst/meas/mosaic/mosaicLib.py:1400(kdtreeSource)
1 6.680 6.680 6.680 6.680 {_mosaicLib.kdtreeSource}
360 0.001 0.000 2.248 0.006 /home/price/hsc/afw/python/lsst/afw/image/imageLib.py:8635(makeWcs)
360 2.137 0.006 2.248 0.006 {_imageLib.makeWcs}
648 0.331 0.001 2.148 0.003 /home/price/hsc/meas_mosaic/python/lsst/meas/mosaic/mosaicTask.py:173(selectStars)
153718 0.679 0.000 1.899 0.000 /home/price/hsc/meas_mosaic/python/lsst/meas/mosaic/mosaicLib.py:776(__init__)
324 0.001 0.000 1.779 0.005 /home/price/hsc/afw/python/lsst/afw/table/tableLib.py:6266(readFits)
324 1.779 0.005 1.779 0.005 {_tableLib.BaseCatalog_readFits}
2916 0.178 0.000 1.450 0.000 /home/price/hsc/afw/python/lsst/afw/table/tableLib.py:726(find)
360 0.000 0.000 1.128 0.003 /data1a/ana/products2.1/Linux64/daf_persistence/HSC-2.1.2a_hsc/python/lsst/daf/persistence/butler.py:236(<lambda>)
360 0.001 0.000 1.127 0.003 /data1a/ana/products2.1/Linux64/daf_butlerUtils/HSC-2.2.0c_hsc/python/lsst/daf/butlerUtils/cameraMapper.py:315(<lambda>)
360 0.001 0.000 1.126 0.003 /home/price/hsc/afw/python/lsst/afw/image/imageLib.py:1159(readMetadata)
360 1.126 0.003 1.126 0.003 {_imageLib.readMetadata}
1 0.004 0.004 1.036 1.036 /home/price/hsc/meas_mosaic/python/lsst/meas/mosaic/mosaicTask.py:3(<module>) |
It is often most useful to look at the cumtime
column, which is the time spent in that function and what it calls. The results here show that 10/36=28% is being spent in readFits
, but 26/36=72% is devoted to I/O (readCatalog
). That might suggest that some python code should get pushed down to C++. If you do:
Code Block | ||
---|---|---|
| ||
p.print_callees("readCatalog")
p.print_callees("getAllForCcd") |
you can see that the cumtime
column doesn't add up to the cumtime
values in the above, so the remaining time is time spent within those functions doing work.
For more details on pstats and python profiling in general see:
A potentially useful tool for visualising the results is:
Line Profiling
Having found the particular function that's consuming all the time, you may want finer granularity. For this, use line profiler. Installation is a simple matter of:
Code Block | ||
---|---|---|
| ||
pip install line_profiler |
Put an @profile decorator on the function of interest, and run:
Code Block | ||
---|---|---|
| ||
kernprof.py -l -v /path/to/script.py <arguments> |
C++ Profiling
Profiling C++ code can be done with igprof:
Code Block | ||
---|---|---|
| ||
setup igprof v5.9.6
igprof -pp -z -o igprof-mosaic.pp.gz python `which mosaic.py` /data3a/work/price/SUPA-MIT/rerun/cosmos --id field=COSMOS filter=W-S-I+ expTime=120.0 --clobber-config
igprof-analyse -d -v -g igprof-mosaic.pp.gz > igprof-mosaic.pp.txt |
That provides the cumulative profile (top) and then the caller/callee profiles further down (see http://igprof.org/text-output-format.html). There is a fancy cgi-bin
setup for browsing the profiles, but it requires setting up your Apache server. This may or may not be worth the trouble.
Note that there is a bug in igprof (or its dependency, libunwind) that sometimes causes the process to hang. The recommended workaround is "to make sure you have a hot cache for your libraries (cat *.so >/dev/null
)". A slightly more complete command is
...
language | bash |
---|
...