Parallel Processing with tqdm

tqdm is a popular library that’s widely used in a bunch of open-source python ML libraries for displaying progress bars. As such, it’s already pre-installed as a dependency when working on machine learning projects.

pip show tqdm

Required-by: datasets, dvc, evaluate, huggingface-hub, openai, sentence-transformers, spacy, transformers

For example, consider a task where we loop over a list of websites and need to fetch the status code for each.

Naive loop

import requests

def ping(url):
    return requests.head(url).status_code

urls = ['https://amitness.com']*10
statuses = [ping(url) for url in urls]

To get a progress bar, it’s as easy as wrapping the urls list with the tqdm class.

shell

pip install tqdm

import requests
from tqdm.auto import tqdm

def ping(url):
    return requests.head(url).status_code

urls = ['https://amitness.com'] * 10
statuses = [ping(url) for url in tqdm(urls)]

1: Import the tqdm object. Importing from tqdm.auto is preferred as it automatically select the best progress bar (jupyter-compatible or console-based)
2: Simply wrap the list of items and you get a progress bar

While this use case of tqdm as a progress bar library is well known, there are three relatively undocumented features in tqdm to get progress bars while doing concurrent, parallel or asynchronous processing.

Running Concurrent Threads

You can execute a function on the list concurrently with multiple threads using the thread_map function. It takes the function to run as the first argument and a list of items as the second argument and returns the results.

import requests
from tqdm.contrib.concurrent import thread_map

def ping(url):
    return requests.head(url).status_code

urls = ['https://amitness.com']*500
statuses = thread_map(ping, urls, max_workers=4)

1: The number of threaded-workers to use can be specified using max_workers parameter.

This is useful to speed up IO-bound tasks such as fetching data by scraping a website, calling a remote third party API or querying a remote database.

Internally, thread_map leverages the ThreadPoolExecutor from concurrent.futures standard library.¹

Running parallel processes

For compute-bound tasks, tqdm provides a process_map function with a similar API to process the list in parallel using multiple child processes.

import requests
from tqdm.contrib.concurrent import process_map

def ping(url):
    return requests.head(url).status_code

urls = ['https://amitness.com'] * 500
statuses = process_map(ping, urls, max_workers=4)

1: The number of processes to use can be specified using max_workers parameter.

This is particularly useful when the task involves heavy computation such as generating sentence embeddings for a large dataset or running batch model inference on CPU.

Internally, process_map uses ProcessPoolExecutor from concurrent.futures standard library.²

Running Asynchronous Tasks

For asynchronous tasks, tqdm provides an asyncio-compatible progress bar using tqdm_asyncio. This allows you to run asynchronous functions with a progress bar.

We use the same example as before, but this time we will use httpx to make asynchronous HTTP requests instead of the synchronous requests library.

shell

pip install httpx

In the code, we only need to use tqdm_asyncio.gather instead of asyncio.gather to get a progress bar. Everything else is regular asyncio code.

import asyncio
import httpx
from tqdm.asyncio import tqdm_asyncio

async def ping(client, url):
    try:
        response = await client.head(url, timeout=10)
        return response.status_code
    except Exception as e:
        return f"Error: {e}"

async def main():
    urls = ['https://amitness.com'] * 500
    async with httpx.AsyncClient() as client:
        tasks = [ping(client, url) for url in urls]
        # tqdm_asyncio.gather instead of asyncio.gather
        statuses = await tqdm_asyncio.gather(*tasks)
    return statuses


if __name__ == "__main__":
    asyncio.run(main())

Conclusion

Thus, thread_map, process_map and tqdm_asyncio are useful tools to add to your toolbox when dealing with parallel processing. As tqdm is already pre-installed via other libraries you might use in ML, it’s a quick and easy way to add parallel processing to your program logic.

Footnotes

Source code for thread_map ↩︎
Source code for process_map ↩︎