==> git log -n1 <==
commit 193151fd5ddb35f4523d975b12cf448a6b55c2fe
Author: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>
Date:   Wed May 22 20:13:08 2019 -0700

    1.5.x - GPU support, rotation-based recon, MSVC support (#404)
    
    * TiMemory + slight opt for CUDA
    
    * Fully working CUDA SIRT
    
    * Removed old CUDA code + cleanup
    
    * run_compare.sh script
    
    * MT optimizations + build fixes
    
    * Fixed gitignore
    
    * IMPORTANT
    
    - changed defaults to sirt in pyctest scripts
    - renamed _global functions to _kernel
    - approx 6x speed-up on TomoBank dataset
    - reduced memory for C++
    
    * Fixed __global__ in header when CUDA=OFF
    
    * Updated .travis.yml
    
    * Fixed benchmarking/.gitignore to not ignore itself
    
    * Enabled compilation without PTL
    
    * Update sirt.cu
    
    * Updates that improve the CUDA performance to > 50x speed-up
    
    * Update sirt.cc
    
    * Docker updates
    
    - .docker/Dockerfile.cuda --> Dockerfile
    - apt.sh includes clang-format
    - runtime-entrypoint.sh enters /home/tomopy directory
    - runtime-entrypoint.sh attempts install on start-up
    
    * Update sirt.cc
    
    * OpenCV support + CUDA mlem + modern CMake with CUDA
    
    - Added OpenCV support
    - Updated CMake to modern usage of CUDA as language
    - Updated environments
    
    * Fixed MLEM (slightly broken -- memory constaints)
    
    * MLEM impl w/o arrays + SIRT CPU updates
    
    - SIRT cpu has some testing code for expansion + compression
    - normalize SIRT and MLEM in Python extern.py
    
    * NPP Affine for CUDA
    
    - Migrated some utils_cuda.cu to sum.cu
    - Implemented CUDA affine transform
    - Link to NPP
    
    * Reorganized + CUDA NPP rotate + project.cc + test_nppi.cu
    
    * SIRT CUDA rt performance improvements
    
    * Reverted extern.py + PTL improvements
    
    * PyCTest + Travis fixes
    
    * Cleanup + CMake + IPP
    
    - Added prelim support for IPP (default = OFF)
    - Cleaned out the repository
    - CMake fixes for detecting CUDA
    - CMake fixes for PGI + OpenACC
    - envs install IPP
    - Warning fixes
    
    * rotate project + multiple device CUDA + MT fixes
    
    - project with rotation is semi-working
    - CUDA should support multiple devices
    - multithreading initialization simplification
    - CUDA should run slightly faster (hopefully)
    - clang-format update to break after templates
    - pyctest_tomopy_phantom prints projection
    - cxx_mlem disabled by default
    
    * PyBind11 fixes
    
    * Update tomocxx.hpp
    
    * Fixes for CUDA SIRT
    
    * Updated env/tomopy-python27.yml to not use Intel
    
    * Fixed SIRT C++ CPU segfault
    
    * Updated PTL
    
    * Fixed PGI compiler warnings
    
    * Docker installs nsight + removed deviceToDevice cudaMemcpy
    
    * Removed thrust header include
    
    * Overlapping streams + NVTX updates
    
    * Multi-GPU (potential) fixes
    
    * NVTX_RANGE_POP updates + optimizations for streams
    
    * Working multi-GPU version (requires TOMOPY_USE_PTL=OFF)
    
    * Fix to TOMOPY_USE_PTL=ON + multi-GPU
    
    - TaskRunManager is now thread-local static instance instead of static instance
    
    * partial reconstruction + PTL run manager fixes
    
    * Update common.hh
    
    * Update CPU vs. GPU task run manager
    
    * REF: Remove unused imports
    
    * REF: Move dxchange to lazy imports in tomo.prep.alignment
    
    dxchange has a many dependencies, but is rarely used.
    In order to remove it from the conda requirements, it is now
    optional. In this case, it cannot be replaced with tifffile
    because it provides additional functionality such as not over-
    writing existing files.
    
    * REF: Replace dxchange with tifffile in recon.rotation
    
    In this case, the functionality of dxchange is easily replaced with
    a direct call to tifffile. This removes dxchange as a hard depedency
    of tomopy.
    
    * BLD: Remove dxchange from requirements.txt and meta.yaml
    
    Also reorder the requirements and update with tifffile and
    pywavelets because those are directly imported. Pywavelets
    is also required by scikit-image, but it should be listed here
    too.
    
    * SLURM files + extra messages in pyctest_tomopy_rec.py
    
    * Fixed warnings + pinned memory
    
    * Launch and synch optimizations
    
    * Launch and sync optimizations
    
    * Memory fixes and sync optimizations
    
    * Memory optimizations
    
    * Sync optimizations
    
    * Fix to TOMOPY_PYTHON_THREADS check
    
    * Thread ID info
    
    * PTL update + better PTL parallelism
    
    * PTL affinity configure via env (PTL_CPU_AFFINITY)
    
    * Updated gpu template functions to enforce async more explicitly
    
    * Docker updates
    
    * SLURM updates
    
    * GetEnv + CXX PTL + format
    
    * SLURM updates
    
    * PTL CPU affinity updates + lower memory overhead in SIRT CUDA
    
    * Removed Intel packages
    
    * BLD: More efficient way to exclude pyc files from install
    
    * BLD: More efficent way to add files in tomopy/ to install
    
    * BLD: Remove tests from installed packages
    
    * BLD: Don't copy compiled library to source tree
    
    The compiled library should only end up in the install directory and
    in the build directory.
    
    Devs who want the library copied to the source tree should use
    `python setup.py develop` or `pip install -e .` In that case, the
    install directory is the source directory.
    
    * BLD: Remove source files from final installation
    
    End users do not need copies of CMake files or the C source files. If
    they want these things, they can download the complete source from
    GitHub.
    
    * BLD: List installed packages manually in setup.py
    
    * BLD: Use CMake to set version in __init__.py
    
    * Docker updates + PTL updates + formatting
    
    * Fix to benchmarking phantom test construction
    
    * Update to coverage script
    
    * Fixed pyctest nosetest
    
    * Updated benchmarking/__init__.py
    
    * Updated benchmarking/__init__.py
    
    * Docker + CMake fixes
    
    * Fix to directories in pyctest_tomopy_phantom.py
    
    * Fixes to pyctest phantom
    
    * Folder restructuring
    
    * ART on GPU + rotate change + DeviceOption + MANIFEST
    
    - Fixed python starting extra threads
    - moved utils_cuda.h to .hh
    - Used GpuOption scheme to control CPU vs. GPU
    - Unified CXX selection
    - Fixed finding OpenCV
    
    * Fixed missing TOMOPY_USE_OPENCV
    
    - Fixed error about NPP when not using NVCC
    
    * cuda_mult_kernel + gpu final rotation fixes + clang-format fix
    
    * Low freq fix + CPU template rotates + GPU int rotates
    
    - Disabled TOMOPY_CXX_GRIDREC by default
    
    * Iteration info for C + ART
    
    * Partial recon + correct GPU + SIRT solution + cleanup
    
    * Removed debug exception + nosetest set environ
    
    * Updated CI
    
    * BLD: Replace VERSION with setuptools_scm
    
    Instead of manually managing the VERSION file, setuptools_scm
    will automatically create a version number based on git tags.
    CMake interrogates git separately and does not include the git
    hash on the version number because it only allows numbers.
    
    I chose setuptools_scm instead of versioneer because it doesn't
    require adding any additional files the the repo. Instead all of
    the logic is contained within the scm package which is a
    dependency which is automatically installed by setup.py at
    install time or it can be pre-installed in the environment.
    
    * REF: Reorganize files so source doesn't overshadow installed
    
    By moving the python module to /src, devs have the choice of
    testing against the source code or installed code by running the
    tests from either at / or inside /tests.
    
    * REF: Move tests down one directory because if __init__.py
    
    test is actually a python module because they have an
    __init__.py. This means that the modules in src are also imported
    and the source will always overshadow the installed tomopy.
    Moving the tests module down one directory solves this problem.
    
    * REF: Adjust CMake to new file structure
    
    Appended /src to file paths and put if(NOT SKBUILD) around copy
    operations that are not necessary if building without an IDE.
    
    * REF: Change directories for coverage and tests
    
    * DOC: Correct spelling languae -> language
    
    * Updated SLURM scripts + fixed OpenCV includes
    
    * GPU MLEM + template execute - sync_freq
    
    - TOMOPY_USE_OPENMP=OFF by default
    - removed old gpu mlem implementation
    
    * Fixes to GPU MLEM and SIRT. Excellent recon!
    
    * CPU MLEM and SIRT + GetEnv choices + execute fixes
    
    - Both MLEM and SIRT have working rotate versions for GPU and CPU now
    - TOMOPY_INTER is restricted to choices (NN, LINEAR, CUBIC)
    
    * Fixed NaN at high iterations (SIRT, MLEM)
    
    * Update _forward_args_t to use std::move
    
    * Introduced invoker template and binding to fix expansion issues
    
    - execute template function was failing with some compilers
    
    * Minor cleanup changes + changes to pack expansion
    
    * Travis update + Linux conda compilers
    
    - Installing GCC for Linux in conda environments
    
    * Removed unnecessary calc +
    
    - fnx not needed
    - fixed tuple warnings in morph.py
    - generate_compare.sh support for mlem
    
    * Memory reduction + Open{MP,ACC} removal
    
    * Optimizations for sirt/mlem (reducing kernel launches)
    
    * CMake cleanup + no env compilers
    
    * Removed PyBind11 support
    
    * cxx_extern.h + reduce scikit install + removed allocator
    
    * Increase max jobs on Appveyor
    
    Increasing the maximum number of jobs allows pull request validation to occur faster because multiple builds can run in parallel.
    
    * CPU thread-local tasking run manager + python thread locking
    
    - cpu_data uses per-python thread mutexes to eliminate unnecessary locking
    - remove lambda execution in SIRT
    
    * Fixed util/dtype.py typecodes, algorithm.py tuple index in message
    
    - envs/tomopy-python35.yml uses older pyctest
    
    * TST: Use setup.cfg not .coveragerc to specify covered package
    
    Using this method instead of relative paths means that
    python-coverage will be able to find the installed tomopy
    and we don't need to use python setup.py develop inside the
    pyctest_tomopy.py
    
    * Separate out computing sum_dist
    
    - sum_dist is now computed independently
    
    * SLURM updates
    
    * Update utils_cuda.cu
    
    * Update utils_cuda.cu
    
    * Update utils_cuda.cu
    
    * CUDA_*_SIZE -> TOMOPY_*_SIZE + env for block/grid dim3
    
    - SLURM updates (significant)
    
    * Immediate sum_dist calculation
    
    * Warm-up kernel launch
    
    * Removed async for cuda_compute_sum_dist
    
    * Update utils_cuda.cu
    
    * Update utils_cuda.hh
    
    * destroy_stream syncs + minor improvements
    
    * Fixes to SLURM env-common-settings.sh
    
    * Move cache reset up higher in SIRT and MLEM
    
    * Update env-common-settings.sh
    
    * Update env-common-settings.sh
    
    * Nearest-neighbor interpolation is default
    
    * DIsable "TOMOPY_USE_C_ALGORITHMS" from affecting project
    
    * BUG: Resized shaped must be iterable
    
    * BUG: util.dtype not compatible with numpy 1.16.1
    
    numpy/numpy#12769 breaks compatibility with TomoPy because
    np.ctypeslib._typecodes no longer exists. This patch uses public
    functions instead in a way that is backward compatible.
    
    Closes #392
    
    * BLD: Tell NVCC which host compiler to use (#7)
    
    NVCC should be told to use the same host compiler that CMake has
    identified as the CXX compiler. Otherwise, unexpected behavior may
    occur.
    
    * REF: Replace recon.algorithm switch with getattr()
    
    We can remove this long switch statement with getattr() because
    each of the functions that util.extern implements is an attribute
    of the module. getattr() is a lower maintenance option because
    we no longer had to change add and remove options from this
    switch.
    
    * REF: Make recon.algorithms.allowed_kwargs global
    
    This makes the list of implemented functions public which is good
    for benchmarking because we can ask tomopy what options are
    available instead of reading the docs.
    
    * Potential optimization in summation for SIRT
    
    * Update sirt.cu
    
    * Update sirt.cu
    
    * Update sirt.cu
    
    * Update sirt.cu
    
    * DOC: Update badges in README
    
    Badges on the README are pointing to the wrong repositories. They
    should be pointing to the conda-forge anaconda channel and the
    tomopy/tomopy coveralls instead of dgursoy repos.
    
    * BLD: Add setuptools_scm_git_archive
    
    Without this setuptools_scm extension, you cannot build
    from a git archive such as the tarball that is downloadable
    from GitHub. This is because there is no repository to scrape
    the version number.
    
    * Removed thrust + update PTL + PTL simplified interface
    
    * Update source/gpu/gpu.cu
    
    * Updated PTL
    
    * Update sirt.cu
    
    - testing CUDA graph
    
    * Update PTL
    
    * Updates to SIRT graph exec (not working)
    
    * Fixes to CUDA graph
    
    * Massive cleanup + reorganization
    
    * Update data.hh
    
    * Update CUDA compute_projection for SIRT and MLEM
    
    * Update execute
    
    * Update common.hh
    
    - execute update
    
    * Updated execute (and usage) to not loop over slices
    
    * Cleanup + CPU rotation updates
    
    - Removed OpenMP and OpenACC from build system
    - Removed unnecessary TOMOPY_USE_GPU
    - Added common.cc
    - Removed duplicate macros
    - Added some CUDA queries to C++ when CUDA not available
    - Removed test/test_nppi.cu and test/test_opencv.cc
    - Added some docstrings
    
    * OpenCV header fix
    
    * clang-tidy + removed docker + removed slurm + PTL updates
    
    * Resetting the device at the end of GPU algorithms
    
    * CUDA_ARCH changes
    
    * Update Options.cmake
    
    - disable clang-tidy by default
    
    * Removed profile/run scripts from benchmarking
    
    * Updated output_dir for pyctest_tomopy_rec.py
    
    * GpuData (cache) safety
    
    * Update .travis.yml
    
    * Sync guards
    
    * Fixes to CUDA_ARCH
    
    * Update PTL
    
    - fixed customized CFLAGS and CXXFLAGS to_list instead of string
    
    * OpenCV requirement + removed cooperative_groups header include (unused) for CUDA 8 or earlier
    
    * REF: Arrange directories like jrmadsen/gpu
    
    * REF: Move python source into source folder
    
    * FIXME: Make separate tets for each back-end
    
    There are no tests for the new back-end, and the old back-ends
    need to be selected using an environment variable? The old
    implementation should be the default.
    
    * MAI: Remove unused files
    
    VERSION has been replaced by setuptools_Scm.
    requirements has been moved to envs/
    conda meta.yaml is now stored on the conda-forge/tomopy-feedstock
    because we don't build it ourselves anymore
    
    * Restore benchmarking from jrmadsen/tomopy-gpu
    
    * REF: Moved tomopy.misc.benchmarks in the root benchmarks module
    
    * MAI: Remove VERSION from manifest
    
    * BLD: Clean up envs and CI yamls
    
    Added two environments for windows and removed logic comments
    because those don't work outside of recipes.
    
    * BLD: Remove coveralls from Travis python-37 build
    
    * BUG: Don't import the submodules
    
    We cannot import the submodules for benchmarking because requires
    TomoPy, and TomoPy may not be installed.
    
    * BLD: Reorganize CI build to match anaconda recommendations
    
    * BLD: Use defaults:libopencv not conda-forge:opencv
    
    The default channel opencv is split into two subpackages: python-opencv
    and libopencv. The conda-forge package is not split. We only need the
    C/C++ libraries to build and run against.
    
    * BUG: Don't update conda on Appveyor [skip travis]
    
    There's some bug (appveyor/ci#2270) where the conda environment is
    disrupted if you update.
    
    * BUG: Use git clone instead of tarball [skip travis]
    
    setuptools_scm needs a git hash to function. The shallow clone option
    for appveyor downloads a tarball without repo information.
    
    * BUG: Win compiler missing M_PI definition
    
    * BLD: Add git to build requirements
    
    Also coveralls is not compatible with py 3.7
    
    * DOC: Add more docstrings
    
    * BUG: Replace setenv with cv::setNumThreads
    
    setenv is not part of the ISO C standard, thus it will not compile
    on Windows. Here we are replacing these setenv calls with the
    OpenCV setNumThreads function call to accomplish the same task.
    
    * BLD: Pull Windows build updates for PTL
    
    * CMake cleanup + update PTL + OSX envs + fix benchmark
    
    - moved tomopy-python*.yml to linux-*.yml
    - add CMAKE_OSX_DEPLOYMENT_TARGET to setup.py
    - moved util.py to utilities/__init__.py because 'import util' was importing 'timemory.util'
    
    * Simplified thread-pool and initialization sets for algorithms
    
    - enable/disable tasking option
    
    * Moved benchmarking/utilities to source/tomopy/misc/benchmark.py
    
    - Moving to this location is breaking pyctest
    
    * Update phantom.py
    
    * Update phantom.py
    
    * Update __init__.py
    
    * Reverted PTL to master branch because of strange static cleanup behavior
    
    * BUG: Fix default value for --exclude-phantoms
    
    The default option for --exclude-phantoms in pyctest_tomopy.py
    should be the empty list `[]` instead of `None` because this parameter
    is treated as an iterator.
    
    * BLD: Add coverage tests back to Travis CI
    
    pyctest.pyctest.run() will always return `None` regardless of whether
    the build or other tests failed. Thus, instead of the Travis tests
    failing when the build fails, we need to check whether we can run
    tests.
    
    Also, because we are using `conda` to install all the dependencies, we
    can use the `minimal` Travis image.
    
    * Fixes GPU sirt and mlem + phantom/rec scripts use png
    
    - Updated PTL to implicit-manager-interface branch
    - CUDA error checks
    
    * Windows fixes (uses MSVC compiler now)
    
    - due to issues with MingGW + OpenCV, Windows builds not utilize MSVC
    - Windows does not support OpenMP SIMD so TOMOPY_USE_OPENMP is disregarded
    - Windows uses C++ version of gridrec (std::complex)
    - Removed MinGW from envs/win-{36,37}.yml
    - Removed tbb-devel from envs/win-{36,37}.yml
    - Added vs2015_win-64 to envs/win-{36,37}.yml
    - Added cv::setNumThreads(1)
    
    * Removed timemory from envs/win-37.yml
    
    * CMake coverage fixes + suppress setup.py warnings + PTL update for sequential tids
    
    * Migration of settings to Python interface
    
    - added ['accelerated', 'pool_size', 'interpolation', 'device', 'grid_size', 'block_size'] to mlem and sirt
    - created RuntimeOptions class
    
    * Updated kwargs for accelerated algorithms
    
    * GPU documentation
    
    * Build opts to force flags/libs + PTL shared lib support + PTL bug fix
    
    - PTL has a bug fix that very randomly would cause segfault as it destroyed thread-pool
    - tomopy can build PTL as a shared library
    - Added TOMOPY_USER_FLAGS and TOMOPY_USER_LIBRARIES CMake Options
    
    * Safer thread-pool cleanup + fix to strange behavior in NPP rotating integers
    
    * Update macros.hh
    
    - fix to dummy thread-pool when tasking is disabled
    
    * BUG: Add double braces for C++11 compatibility
    
    The conda-forge gxx compiler is missing a patch which allows the
    initialization of std:array without double braces. Read more about
    this problem here:
    https://en.cppreference.com/w/cpp/container/array
    https://stackoverflow.com/a/11400125/4459405
    
    * Update PTL
    
    * Updates fixing a sporadic bug deleting thread-local thread-pool
    
    - updated PTL to new revision that resolves the occasional data race when the threads in thread-pool exit the execute_thread function after ThreadPool instance was destroyed. The error arose because those threads were trying to unlock a mutex that was created by the ThreadPool instance that was already destroyed
    - removed .dockerignore
    
    * Disable linux-{27,36,37}.yml from using OpenBLAS. May also be needed on macOS
    
    * Specify scipy<1.3 for envs/linux-{36,37}.yml until scipy-feedstock is fixed
==> git describe --tags --dirty <==
1.5.0
==> git status <==
HEAD detached at 1.5.0
Untracked files:
  (use "git add <file>..." to include in what will be committed)

	bld.bat
	build_env_setup.bat
	conda_build.bat
	metadata_conda_debug.yaml

nothing added to commit but untracked files present (use "git add" to track)
