Amit Chaudhary

Evals for Diversity in Synthetic Data

synthetic-data

evals

llm

An overview of evaluation metrics for measuring linguistic diversity in LLM-generated synthetic data

Feb 9, 2025

13 min

Zero-Cost Custom Feeds on Bluesky

misc

A simple stack for generating custom feeds for Bluesky programmatically without a backend server

Dec 1, 2024

Parallel Processing with tqdm

python

A dead-simple way to perform parallel processing with progress bars natively in tqdm

Oct 20, 2024

4 min

A Visual Guide to Regular Expression

python

nlp

A mental model of how various components of a regular expression work from the bottom-up.

Oct 21, 2020

Thumbnail showing process of clustering for knowledge transfer

Knowledge Transfer in Self Supervised Learning

self-supervised-learning

A general framework to transfer knowledge from deep self-supervised models to shallow task-specific models

Oct 4, 2020

11 min

Thumbnail showing interactive visualization of texts clustered and plotted on a graph

Interactive Analysis of Sentence Embeddings

nlp

embeddings

Learn how to interactively explore sentence embeddings and labels in Tensorflow Embedding Projector

Sep 24, 2020

VSCode on Google Colab

colab

Learn how to setup and use VSCode as an IDE on Google Colab and Kaggle.

Sep 1, 2020

5 min

Thumbnail showing process of backtranslation from English to French and back to English

Text Data Augmentation with MarianMT

nlp

data-augmentation

Learn how to use machine translation models in Hugging Face Transformers for data augmentation

Aug 30, 2020

5 min

Thumbnail showing the steps in the pipeline of keyphrase extraction from documents

Unsupervised Keyphrase Extraction

nlp

Learn about unsupervised algorithms for automatically extracting representative keyword and phrases from documents

Aug 30, 2020

16 min

Thumbnail showing grading of different retrieved documents per query as relevant or not

Evaluation Metrics For Information Retrieval

information-retrieval

evals

rag

Learn about common metrics used to evaluate performance of information retrieval systems

Aug 4, 2020

22 min

Thumbnail showing an modification to a text that should have a deterministic impact on the sentiment of text

Behavioral Testing of NLP models

nlp

evals

An overview of the “CheckList” framework for fine-grained evaluation of NLP models

Jul 28, 2020

60 min

Semi-Supervised Learning in Computer Vision

semi-supervised-learning

A comprehensive overview of recent semi-supervised learning methods in Computer Vision

Jul 12, 2020

14 min

FastAPI for Flask Users

python

A comprehensive guide to FastAPI with a side-by-side code comparison with Flask

Jun 29, 2020

10 min

Google Colab Tips for Power Users

colab

Learn about lesser-known features in Google Colaboratory to improve your productivity.

Jun 26, 2020

Thumbnail showing the process of generating character n-grams from a word in fasttext embeddings

A Visual Guide to FastText Word Embeddings

nlp

embeddings

A deep-dive into how FastText enriches word vectors with subword information

Jun 21, 2020

15 min

Thumbnail showing the three tasks that the Universal Sentence Encoder is trained for multi-task learning

Universal Sentence Encoder Visually Explained

nlp

embeddings

A deep-dive into how Universal Sentence Encoder learns to generate fixed-length sentence embeddings

Jun 15, 2020

10 min

Thumbnail showing the difference between regular classification and zero-shot classification via GPT-2

Zero-shot Text Classification With Generative Language Models

nlp

zero-shot-learning

llm

A text generation approach to zero-shot text classification with GPT-2

Jun 7, 2020

21 min

A screenshot of the YouTube video where Alec Redford gives presentation on knowledge captured in language models

Exploring Knowledge Captured in Probability of Strings

nlp

llm

An exploration of simple knowledge captured by language models

Jun 7, 2020

Thumbnail showing the high-level overview of the 'Train Once, Test Anywhere' paper for zero-shot text classification

Zero Shot Learning for Text Classification

nlp

zero-shot-learning

A summary of “Train Once, Test Anywhere” paper for zero-shot text classification

May 30, 2020

7 min

Thumbnail showing the pipeline for relation between self-supervised learning and downstream supervised learning

Self Supervised Representation Learning in NLP

nlp

self-supervised-learning

An overview of self-supervised pretext tasks in Natural Language Processing

May 23, 2020

7 min

Thumbnail showing the difference between augmentation on an image vs text and challenge of semantically invariant transformation in NLP

A Visual Survey of Data Augmentation in NLP

nlp

data-augmentation

An extensive overview of text data augmentation techniques for Natural Language Processing

May 16, 2020

Thumbnail showing a hypothetical github repo where papers derived from BERT are shown as forks of a original BERT repo

A Commit History of BERT and its Forks

nlp

What a commit history of version-controlled research papers could look like?

May 9, 2020

Thumbnail showing the relation between usage of SimpleRNN class in Keras and a diagram actually showing the RNN working under the hood

A Visual Guide to Recurrent Layers in Keras

nlp

Understand how to use Recurrent Layers like RNN, GRU, and LSTM in Keras with diagrams

Apr 23, 2020

Thumbnail showing the overall pipeline for the DeepCluster method

A Visual Exploration of DeepCluster

self-supervised-learning

DeepCluster is a self-supervised method to combine clustering and representation learning

Apr 14, 2020

9 min

Thumbnail showing the overall pipeline for the Self-Labelling method for representation learning

A Visual Guide to Self-Labelling Images

self-supervised-learning

A self-supervised method to generate labels via simultaneous clustering and representation learning

Apr 10, 2020

The Illustrated FixMatch for Semi-Supervised Learning

semi-supervised-learning

Learn how to leverage unlabeled data using FixMatch for semi-supervised learning

Mar 31, 2020

15 min

The Python Magic Behind PyTorch

python

pytorch

Learn about the advanced python native features behind PyTorch

Mar 23, 2020

The Illustrated PIRL: Pretext-Invariant Representation Learning

self-supervised-learning

Learn how PIRL generates image representations invariant to transformation in a self-supervised manner

Mar 16, 2020

Thumbnail showing the overall pipeline for SimCLR framework for contrastive learning of visual representations

The Illustrated SimCLR Framework

self-supervised-learning

A visual guide to the SimCLR framework for contrastive learning of visual representations.

Mar 4, 2020

The Illustrated Self-Supervised Learning

self-supervised-learning

A visual introduction to self-supervised learning methods for visual representations.

Feb 25, 2020

10 min

Back Translation for Text Augmentation with Google Sheets

nlp

data-augmentation

Learn how to augment existing labeled text data for free using Google Sheets.

Feb 19, 2020

3 min

Thumbnail showing the concept of parameter sharing in the ALBERT pre-training method

A Visual Guide to ALBERT (A Lite BERT)

nlp

An illustrated summary of the ALBERT paper

Feb 8, 2020

Thumbnail showing the task of clickbait detection on news articles

Transfer Learning in NLP with Tensorflow Hub and Keras

nlp

Learn how to integrate and finetune tensorflow-hub modules in Tensorflow 2.0

Feb 2, 2020

Thumbnail showing comparison between os.path module and pathlib module for getting the current working directory

Migrating from OS.PATH to PATHLIB Module in Python

python

Learn how to use the modern pathlib module to perform tasks you have been using os.path for

Dec 29, 2019

2 min

Thumbnail showing the equivalent python code for the math formula of summation of a list

Math Symbols Explained with Python

maths

Learn the meaning behind mathematical symbols used in Machine Learning using your knowledge of Python.

Aug 3, 2019

5 min

Language Detection in Python

nlp

Learn how to detect the language of a given piece of text using Natural Language Processing.

Jul 15, 2019