Machine Learning Cookbook

This is a personal collection of repetitive commands and snippets for ML projects.

Conda

Install OpenCV in conda

conda install -c conda-forge open-cv

Update conda

conda update -n base -c defaults conda

Create/Update conda environment from file

conda env create -f environment.yml
conda env update -f environment.yml

Install CUDA toolkit in conda

conda install cudatoolkit=9.2 -c pytorch
conda install cudatoolkit=10.0 -c pytorch

Disable auto-activation of conda environment

conda config --set auto_activate_base false

Celery

Run celery workers
File tasks.py contains celery object, concurrency is set to 1 and no threads or process are used with -P solo

celery -A tasks.celery worker --loglevel=info --concurrency=1 -P solo

Colab

Force remount google drive

from google.colab import drive
drive.mount('/content/drive', force_remount=True)

Docker

Start docker-compose as daemon

docker-compose up --build -d

Disable pip cache and version check

ENV PIP_NO_CACHE_DIR=1 \
    PIP_DISABLE_PIP_VERSION_CHECK=1

Dockerfile for FastAPI

FastAPI

Use debugging mode

# server.py

if __name__ == "__main__":
    uvicorn.run("server:app", host="0.0.0.0", port=8000, reload=True)

Enable CORS

Run FastAPI in Jupyter Notebook

Flask

Test API in flask

Git

Prevent git from asking for password

git config credential.helper 'cache --timeout=1800'

Whitelist in .gitignore

# First, ignore everything
*

# Whitelist all directory
!*/

# Only .py and markdown files
!*.py
!*.md
!*.gitignore

Clone private repo using personal token

Create token from settings and run:

git clone https://<token>@github.com/amitness/example.git

Create alias to run command

# git test
git config --global alias.test "!python -m doctest``"

Gunicorn

Increase timeout

gunicorn --bind 0.0.0.0:5000 app:app --timeout 6000

Check error logs

tail -f /var/log/gunicorn/error_

Run two workers

gunicorn app:app  --preload -w 2 -b 0.0.0.0:5000

Huey

Add background task to add 2 numbers

Jupyter Notebook

Auto-import common libraries

  1. Create startup folder in ~/.ipython/profile_default
  2. Create a python file start.py
  3. Add imports there.
# start.py
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

Auto print all expressions
Edit ~/.ipython/profile_default/ipython_config.py and add

c = get_config()

# Run all nodes interactively
# c.InteractiveShell.ast_node_interactivity = "last_expr"
c.InteractiveShell.ast_node_interactivity = "all"

Add conda kernel to jupyter
Activate conda environment and run below command.

pip install --user ipykernel
python -m ipykernel install --user --name=condaenv

Add R kernel to jupyter

conda install -c r r-irkernel

# Link to fix issue with readline
cd /lib/x86_64-linux-gnu/
sudo ln -s libreadline.so.7.0 libreadline.so.6

Start notebook on remote server

jupyter notebook --ip=0.0.0.0 --no-browser

Serve as voila app

voila --port=$PORT --no-browser app.ipynb

Kaggle

Add kaggle credentials

pip install --upgrade  kaggle kaggle-cli

mkdir ~/.kaggle
mv kaggle.json ~/.kaggle
chmod 600 /root/.kaggle/kaggle.json

Linux

Zip a folder

zip -r folder.zip folder

Use remote server as VPN

ssh -D 8888 -f -C -q -N ubuntu@example.com

SSH Tunneling for multiple ports (5555, 5556)

ssh -N -f -L localhost:5555:127.0.0.1:5555 -L localhost:5556:127.0.0.1:5556 ubuntu@example.com

Reverse SSH tunneling
Enable GatewayPorts=yes in /etc/ssh/sshd_config on server.

ssh -NT -R example.com:5000:localhost:5000 ubuntu@example.com -i ~/.ssh/xyz.pem -o GatewayPorts=yes

Copy remote files to local

scp ubuntu@example.com:/mnt/file.zip .

Set correct permission for PEM file

chmod 400 credentials.pem

Clear DNS cache

sudo service network-manager restart
sudo service dns-clean
sudo systemctl restart dnsmasq
sudo iptables -F

Unzip .xz file

sudo apt-get install xz-utils
unxz ne.txt.xz

Auto-generate help for make files

Markdown

Add comparison of code blocks side by side
Solution

Nginx

Assign path to port

location /demo/ {
                proxy_pass http://localhost:5000/;
                proxy_http_version 1.1;
                proxy_set_header Upgrade $http_upgrade;
                proxy_set_header Connection "upgrade";
	}

Increase timeout for nginx
Default timeout is 60s. Run below command or use alternative.

sudo nano /etc/nginx/proxy_params
proxy_set_header Host $http_host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_connect_timeout   300;
proxy_send_timeout      300;
proxy_read_timeout      300;

Setup nginx for prodigy

Pandas

Save with quoted strings

df.to_csv('data.csv', 
            index=False, 
            quotechar='"',
            quoting=csv.QUOTE_NONNUMERIC)

Pycharm

Add keyboard shortcut for custom command
Link

Enable pytest as default test runner

Pydantic

Allow camel case field name from frontend

Validate fields

Python

Install build utilities

sudo apt update
sudo apt install build-essential python3-dev
sudo apt install python-pip virtualenv

Install mysqlclient

sudo apt-get install libmysqlclient-dev mysql-server
pip install mysqlclient

Convert python package to command line tool

Send email with SMTP

Run selenium on chromium

sudo apt-get update
sudo apt install chromium-chromedriver
cp /usr/lib/chromium-browser/chromedriver /usr/bin
pip install selenium
from selenium import webdriver

# set options to be headless:
options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
driver = webdriver.Chrome('chromedriver',options=options)

Generate fake user agent in selenium
Run pip install fake_useragent.

from fake_useragent import UserAgent
from selenium import webdriver

ua = UserAgent(verify_ssl=False)
user_agent = ua.random

chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument(f"user-agent={user_agent}")
driver = webdriver.Chrome(chrome_options=chrome_options)

PyTorch

Install CPU-only version of PyTorch

conda install pytorch torchvision cpuonly -c pytorch

Auto-select proper pytorch version based on GPU

pip install light-the-torch
ltt install torch torchvision

Set random seed

seed = 42
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)

Create custom transformation

class Reshape:
    def __init__(self, new_shape):
        self.new_shape = new_shape

    def __call__(self, img):
        return torch.reshape(img, self.new_shape)

Pytorch Lightning

Use model checkpoint callback

Redis

Connect to redis from commandline

redis-cli -h 1.1.1.1 -p 6380 -a password

Connect to local redis

from redis import Redis
conn = Redis(host='127.0.0.1', port='6379')
conn.set('age', 100)

Add password to redis server

Edit /etc/redis/redis.conf file.

sudo nano /etc/redis/redis.conf

Uncomment this line and set password.

# requirepass yourpassword

Restart redis server.

sudo service redis-server restart

Requests

Post JSON data to endpoint

import json
import requests

headers = {'Content-Type': 'application/json'}
data = {}
response = requests.post('http://example.com',
                         data=json.dumps(data),
                         headers=headers)

SSH

Add server alias to SSH config
Add to ~/.ssh/config

Streamlit

Disable CORS
Create ~/.streamlit/config.toml

[server]
enableCORS = false

File Uploader

file = st.file_uploader("Upload file", 
                        type=['csv', 'xlsx'], 
                        encoding='latin-1')
df = pd.read_csv(file)

Create download link for CSV file

import base64

csv = df.to_csv(index=False)
filename = 'data.csv'
b64 = base64.b64encode(csv.encode()).decode()
href = f'<a href="data:file/csv;base64,{b64}" download="{filename}.csv">Download CSV File</a>'
st.markdown(href, unsafe_allow_html=True)

Run on docker

FROM python:3.7
EXPOSE 8501
WORKDIR /app
COPY requirements.txt ./requirements.txt
RUN pip3 install -r requirements.txt
COPY . .
CMD streamlit run src/main.py

Docker compose for streamlit

Add Dockerfile to app folder.

Add project.conf to nginx folder.

Add Dockerfile to nginx folder.

Add docker-compose.yml at the root

Run on heroku
Add requirements.txt, create Procfile and setup.sh.

Deploy streamlit on google cloud
Create Dockerfile, app.yaml and run:

gcloud config set project your_projectname
gcloud app deploy

Render SVG

import base64
import streamlit as st

def render_svg(svg):
    """Renders the given svg string."""
    b64 = base64.b64encode(svg.encode('utf-8')).decode("utf-8")
    html = f'<img src="data:image/svg+xml;base64,{b64}"/>'
    st.write(html, unsafe_allow_html=True)

Tensorflow

Install CPU-only version of Tensorflow

conda install tensorflow-mkl

or

pip install tensorflow-cpu==2.1.0

Install custom builds for CPU

Find link from https://github.com/lakshayg/tensorflow-build

pip install --ignore-installed --upgrade "url"

Use only single GPU

export CUDA_VISIBLE_DEVICES=0

Allocate memory as needed

export TF_FORCE_GPU_ALLOW_GROWTH='true'

Enable XLA

import tensorflow as tf
tf.config.optimizer.set_jit(True)

Load saved model with custom layer

from tensorflow.keras.models import load_model
import tensorflow_hub as hub

model = load_model(model_name, 
                   custom_objects={'KerasLayer':hub.KerasLayer})

Ensure Conda doesn’t cause tensorflow issue

Upload tensorboard data to cloud

tensorboard dev upload --logdir ./logs \
    --name "XYZ" \
    --description "some model"

Use TPU in Keras
TPU survival guide on Google Colaboratory

Textblob

Backtranslate a text

from textblob import TextBlob

def back_translate(text):
    t = TextBlob(text)
    return (TextBlob(t.translate('en', 'zh').raw)
            .translate('zh', 'en')
            .raw)