20/06/2022 Tensorflow environments in conda pt2

So from the latest attempt, I got the error:

ImportError: cannot import name 'dtensor' from 'tensorflow.compat.v2.experimental'

I googled the error code and some post suggested it's because of the version mismatching of tensorflow and keras package. Although from what I've learnt that keras is now included in the tensorflow package and no separate keras package is needed (maybe that's just for installing?)

Maybe this is a grip. I checked my conda environment and found keras package at version 2.9.0 whereas tensorflow version 2.6.0. Maybe this is a breakthrough?

So I need to downgrade keras from 2.9.0 to 2.6.0, the following line should do it:

   conda install keras==2.6.0

Interestingly, this did not delete the keras 2.9.0. Have a close examine of the keras installation on my machine, I found out the source is from PyPi. Wait what? Why do I have keras from pip? Let's uninstall it and see what happens. 

pip uninstall keras

Now conda list shows keras version 2.6.0 from conda source.

There are a few other packages that have PyPi sources which versions are not compatible with my Tensorflow version, namely 'Tensorflow estimator 2.9.0', 'Tensorboard 2.9.1', and 'tensorflow-io-gcs-filesystem 0.26.0'. If any other problems occurs, this might be the reason.

Let's try it again with keras 2.6.0:

Nope, same error occurs.

I am quite fed up with the conda distributions at this point. So I'm going to uninstall tensorflow completely and clean up the environment, and work with pip in the conda virtual environment...

So I removed the conda tensorflow package, and try to install with pip.

pip install tensorflow-gpu

To my surprise, I got an error!

AttributeError: module 'brotli' has no attribute 'error'

Well the module 'brotli' is certainly very sure of himself. Looking up the error on stack overflow, the fixes seems a bit hacking, by instally some packages. This will probably mess up the environment even more. So it's decided, I'll start over with the environment.

Firstly I need to remove the environment

conda env remove --name tf_gpu

Before I start, I'd like to pin down how I want the insallation to go:

Although conda seems to have cudatool;kit and cudnn sorted within the virtual environment, after some reserach it seems like it's inevitable you'll have to install them by hand in the system (at least for windows). I already have both in my system, let's check the version before I move on.

Checking cudatoolkit version is easy, just type nvcc --version . My cuda build is 11.2.

Check the cuDNN version is slightly more complicated, as tehre's no cuDNN --version  command you can use. You'll need to dig into the header in the cuDNN library, located in 'C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\include' (same place where you placed the downloaded cuDNN libraries when installed). Read the file cudnn_version.h and the version is stated in there. 

#define CUDNN_MAJOR 8

#define CUDNN_MINOR 1

#define CUDNN_PATCHLEVEL 1

This means cudnn 8.1.1 is installed. So let's check the tensorflow compatible list again, we can see that cudnn 8.1 and cuda11.2 supports tensorflow  version 2.5 to 2.9. 

Testing cudatoolkit/cudnn installation

Before I get ahead of myself, I want to do a little experiment. The tensorflow official document decommend to install cudatoolkit and cuddn with conda inside the virtual environment, but some recommend to have it in your system environment. I want to try out the combinations (cudatoolkit/cudnn in system but not virtual environment, cudatoolkit and cudatoolkit/cudnn in virtual environment but not system)

For the experiment, tensorflow is installed in a conda python=3.9 virtual environment using pip install tensorflow-gpu.

cudatoolkit/cudnn in system but not in virtual_enviroment works

cudatoolkit/cudnn in virtual environment but not in system works

This shows you can localise the cudatoolkit/cudnn installation if you're worried about cudatools breaking your GPU driver.

Testing jupyter installation

From experience jupyter notebook doesn't normally work out the box when it's in a virtual environment

But this time, after conda install -c conda-forge jupyter and opened jupyter notebook within the viortual env, tensorflow can be imported! This is rather weird because normally I have to install nb_conda to switch python kernals. I also tested running jupyter notebook in the system environment, and tensorflow cannot be imported!

This means jupyter notebook installed within the virtual environment uses the python kernal from within the environment.

Actually second thought, I think that jupyter notebook in the virtual environment can use the virtual env is because a separate python distribution has been installed in the virtual env, because we initiated the virtual env by conda create tf-gpu python=3.9 and therefore the ipynb is running on the environment's own python instead of the conda python, and tehrefore python packages on this environment can be recognised by jupyter notebook. Anyhow, this version works.

Quick conclusion on what's in my env:

System env:

conda env (tf-gpu):