AI Engineering Tools

A collection of open source libraries for every stage of AI engineering, from data and evaluation to building, deploying, and monitoring AI systems
Author

Amit Chaudhary

Published

May 11, 2025

Modified

February 4, 2026

Data

Category Tool Remarks
Synthetic Data curator, datadreamer, distillabel, fabricator, promptwright
auto-label Classification
chatito Synthetic texts
text_renderer Synthetic OCR
Pretraining pipelines datatrove, nemo-curator
Data Exploration lilac
turftopic, berttopic Topic Modeling
Subset selection twinning, apricot Submodular optimization
semhash, rensa Deduplication
imagehash, imagededup Image dedup
Import smart_open, pathy Remote storage
camelot, tabula-py, parsr, pdftotext, pdfplumber, pymupdf, grobid, PyPDF2, pdf2image PDF
pyarxiv Arxiv
talon Email
openpyxl Excel
gdown, pydrive, googlesearch, geo-heatmap, google-play-scraper Google
gspread Google Sheets
py-image-dataset-generator, idt, jmd-imagescraper Image websearch
pytube, scrapetube Youtube
news-please, news-catcher, pygooglenews News
twint, tweepy, twarc Twitter
wikipedia, wikitextparser, wikitables, wikidata Wikipedia
Web Scraping crawl4ai, MechanicalSoup, libextract, pyppeteer
autoscraper specify target and get scraping code
selectolex, html2text, hext, trafilatura, justext, python-readability, html-text Text Extraction
Language Detection fast-langdetect
Data Annotation labelstudio, awesome-data-labeling, prodigy
makesense.ai, labelimg, via, cvat Image
doccano, brat Text
audio-annotator, audiono Audio
superintendent, pigeon Notebook annotation
disagree, simpledorff Inter-rater agreement
Weak Supervision snorkel
Data Augmentation audiomentations, muda Audio
imgaug, albumentations, augmentor, solt, deepaugment Image
nlpaug, noisemix, textattack, textaugment, niacin, SeaQuBe, DataAug4NLP, NL-Augmenter Text
TextRecognitionDataGenerator, genalog OCR
deltapy Tabular

Evals

Category Tool Remarks
Libraries athina-evals, deepeval, geval, openevals, promptfoo, inspect-ai
evaluate, torchmetrics, torcheval Off-the-shelf metrics
mir-eval
Agents agentsevals
Benchmarks openai-evals, yet-another-applied-llm-benchmark, lm-evaluation-harness, openbench
llmtest-needleinahaystack Needle in a Hatstack
agentbench, textarena Agents
open-llm-leaderboard, LMArena Arena
Diversity diversity
Hallucination lettucedetect, hallucination-leaderboard, selfcheckgpt
RAG ragas, ragchecker
auto-evaluator Generate synthetic QA pairs from docs
chunking-evaluation Evaluate chunking strategies
NER/POS Tagging seqeval NER, POS tagging
Information Retrieval ranking-metrics, cute_ranking
mir_eval Music IR
Text Generation bleurt, bertscore, moverscore
Deep Research search_evals
Recommendation System rs_metrics

Development Workflow

Category Tool Remarks
Experiment Management trulens, mlflow, wandb, mlop
Coding Agents claude-code, codex, gemini-cli, qwen-code, mistral-vibe, amp, opencode, openhands
vibe-kanban, conductor Orchestration
ruler, agent-rules-sync Rules
superpowers, awesome-copilot, upskill Skills
continuous-claude, claude-mem Context Management
beads Shared Memory
claudecode-telegram, takopi Telegram bridge
agent-trace Code annotation
Voice Typing vscode-speech
Frontend Feedback agentation
Linting ruff, pylint, pycodestyle, black, pydocstyle
safety, bandit, shellcheck Vulnerabilities
mypy Type-checking
CLI Formatting rich, tabulate, prompt-toolkit
Debugging PySnooper
Dependency Management uv
pip-chill pip freeze without dependencies
pipreqs Reverse engineer requirements.txt from imports
conda-pack Export conda for offline use
Documentation mkdocs, pdoc
Progress bar fastprogress, tqdm
Testing xdoctest Improved doctest
crosshair find failure cases for functions

Context Engineering

Category Tool Remarks
Automatic Prompt Engineering dspy, textgrad, adalflow, zenbase, gepa
dspydantic Pydantic models
openevolve Evolutionary code optimization
System Prompts llm-system-prompts, leaked-system-prompts, system-prompt-leaks, awesome-ai-system-prompts, cl4r1t4s, system-prompts-and-models-of-ai-tools
Prompt Compression toon, llm-lingua
Function Calling functionary
Structured Output guidance, instructor, jsonformer,lm-format-enforcer, outlines, xgrammar, lqml, fructose
json-repair Post-process broken JSON
llm-scraper webpage to json
Memory mem0, letta, memobase, memary, langmem, memoripy
cognee Memory using knowledge graphs
Code Interpreter gpt-code-interpreter, open-interpreter, codeinterpreter-api
Steering Vectors dialz, repeng

RAG

Category Tool Remarks
Libraries llama-index, verba, fastrag, haystack, ragbits
Vector Index annoy, diskann, faiss, chroma, qdrant, pinecone, weviate, milvus
pgvector, sqlite-vec SQL Extensions
simsimd Faster dot-product on CPUs
Chunking wtsplit, semchunk, chonkie, langchain-text-splitters, chonky
chonky-distilbert-base-multilingual-cased Multilingual Chunking
open-parse, doclayout-yolo Layout parsing visually
sparseprimingrepresentations SPR
Embeddings fastembed, sentence-transformers
model2vec Static Vectors
embedding-atlas Visualize
Retrieval pyserini
bm25s, rank_bm25 Sparse retrieval
Reranking rerankers, flaskrank
pyversity Diversity re-ranking
Late Interaction ragatouille, pylate Train ColBERT models
maxsim-cpu Speed-up max-sim for ColBERT
Graph RAG fast-graphrag, graphrag, nano-graphrag
Metric learning metric-learn, pytorch-metric-learning
OCR textract, deepseek-ocr-2
nougat Academic Documents
Document Understanding donut
table-transformer Table Extraction
marker, pdf2md, mineru, docext, docling PDF to markdown
RAG on internal docs onyx, xyne, pipeshub-ai

Agents

Category Tool Remarks
Libraries autogen, crewai, langroid, openai-agents, pydantic-ai, marvin, metagpt, semantic-kernel
agent-sdk, claude-agent-sdk, copilot-sdk SDKs
langgraph Graphs
smolagents Code-based agent
MCP fastmcp, enrichmcp
mcp-scan scan security vulnerability
Web Use browser-use, agent-browser
magnitude Automated Testing
Deep Research deer-flow, deeptutor, deepagents
Computer Use ui-tars-desktop, cuda
Sandbox microsandbox, e2b, screnenv, sandbox-runtime, yolobox, cco, daytona, docker-sandbox
Web Search API sonar

Finetuning

Category Tool Remarks
LLM Finetuning axolotl, unsloth, torchtune, peft, litgpt, llama-factory
onebitllms, matmulfreellm 1.58-bit LLMs
Model Merging mergekit, mergoo, mergenetic
Multi-modal VLMs maestro, nanovlm, mlx-vlm
smol-vision Recipes
Distributed Training metagron-lm, deepspeed, yafsdp, nanotron, fairscale, colossalai
hivemind, psyche Decentralized Training
Tokenization supertokenizer Train multi-word BPE
tokendagger faster tiktoken
Classification adaptive-classifier Continuous Learning
Abliteration heretic

Post-training

Category Tool Remarks
Libraries art, trl, verl, openrlhf, atropos, retrain, nemo-rl, slime, skyrl
Instruction Tuning llava text+image
airoboros self-instruct
Verifiers verifiers
Environments harbor, openspiel, openenv, gym, prime-environments, llmgym, reasoning-gym, gem, agentgym
MCTS search-and-learn

Voice

Category Tool Remarks
Libraries livekit
Pre-processing moviepy
denosier Denoising
Voice Activity Detection silero-vad, smart-turn, yamnet
General models seamless-communication
Source Separation samaudio, spleeter, nussl, open-unmix-pytorch, asteroid
indicconformerasr, vistaar Indic Languages
Speech Recognition faster-whisper, whisperx, vibevoice-asr, kaldi, speech_recognition, delta, pocketsphinx-python, deepspeech, stt, vosk
Speaker Identification whisperkitlive, resemblyzer
Speech Synthesis chatterbox, festvox, cmuflite, tts
whisper-streaming Realtime
pocket-tts CPU

Vision

Category Tool Remarks
Vision Language Models CogVLM
Watermarking meta-seal
Facial recognition deepface, face_recognition, mtcnn, insightface, face-detection, terran
face-alignment Find facial landmarks
Facial-Expression-Recognition.Pytorch Face Emotion
Face swapping faceit, faceit-live, avatarify
GANS mimicry, imaginaire, pytorch-lightning-gans
Image Processing scikit-image, imutils, opencv-wrapper, opencv-python
torchio Medical Images
Object detection luminoth, detectron2, mmdetection, icevision
OCR keras-ocr, pytesseract, keras-craft, ocropy, doc2text
easyocr, kraken, PaddleOCR Multilingual OCR
layout-parser, pdftabextract Table Extraction
Segmentation segmentation_models.pytorch Segmentation models in PyTorch
Pretrained models pretrained-models.pytorch, pytorchcv, pytorch-image-models Pre-trained ConvNets

Inference

Category Tool Remarks
Inference Servers ray, vllm, powerinfer, text-generation-inference, sglang, tensorrt-llm, ctransalte2, mlc-llm, deepspeed-mii, openllm, exllamav2, fastgen, tokasaurus
Rate Limiting backoff, tenacity, ratelimit, ratelimit, requests-cache, retrying
limits Rate-limit for own APIs
Batch Processing skypilot
airflow, luigi, dagster, oozie, prefect, kubernetes-cron-jobs, argo Batch jobs
pyspark, hive Data
rq, schedule, huey Task Queue
GPU Snapshot inferx
Multi-token prediction medusa
Multi-LoRA inference s-lora, punica, lorax
Caching kvzip, lmcache KV Cache
gptcache Semantic Cache
cachetools, cachew (cache to local sqlite), redis-py, pymemcache
panza Caching for async functions
LLM Gateway litellm, portkey, any-llm, tensorzero, bifrost
LLM Routing routellm, automix, openrouter, awesome-ai-model-routing, helicone
Model Cascade frugalgpt
Quantization llm-compressor, bitsandbytes, optimum, sinq
nn_pruning Movement Pruning
Embeddings text-embeddings-inference, infinity
Kernels liger-kernel
attorch Triton kernels
sparse_transformers Sparse kernels
Local Servers lmstudio, ollama, text-generation-webui, koboldcpp, llama.cpp
Frontend UI gradio, streamlit, dash, voila
copilot-kit, assistant-ui, ai-elements Chat UI components
agent-inbox Agent UI
ai-sdk, chatkit, a2ui, ag-ui, Generative UI
Authentication pyjwt (JWT), auth0, okta, cognito
Stream Processing flink, kafka, apache beam
Configuration Management config, python-decouple, python-dotenv, dynaconf
Database flask-sqlalchemy, tinydb, flask-pymongo, odmantic
tortoise-orm Asyncio ORM similar to Django
Monitoring whylogs Data Logging
grafana, prometheus Metric
sentry, honeybadger Error Reporting
Data Validation schema, jsonschema, cerebrus, pydantic, marshmallow, validators
Model Serving cortex, torchserve, ray-serve, bentoml, seldon-core Serving Framework
flask, fastapi API Frameworks
flask-cors CORS

Monitoring

Category Tool Remarks
Observability openllmetry, phoenix, logfire, context-viewer
gpumonitor, nvtop, jupyterlab-nvdashboard GPU usage
Alerts knockknock, jupyter-notify, apprise, pynotifier Notifications
Guardrails openai-guardrails, giskard, langkit, garak, deepchecks, nemo-guardrails
rebuff Prompt Injection Detection
curse-words, badwords, LDNOOBW, profanity Profanity
uqlm Uncertainty Quantification
Logging loguru
Drift Detection alibi-detect, torchdrift, boxkite Outlier and drift detection
ft-drift Detect drift in OpenAI messages
Edge Deployment Tensorfow Lite, coreml, Tensorflow.js
AI Detection binoculars
Testing schemathesis Automatic test generation from Swagger
mktestdocs, exdown Test code present in markdown files
Benchmarking pytest-benchmark Profile time in pytest
torchprof Profile pytorch layers
scalene, pyinstrument Profile python code
k6 Load test API
ai-benchmark Bechmark VM on 19 different models

Classic ML

Category Tool Remarks
AutoML auto-sklearn, mljar-supervised, automl-gs, pycaret, evalml
lazypredict Run all sklearn models at once
tpot genetic
autocat text-classification
mindsdb, lugwig Autogenerate ML code
Active Learning modal
Anomaly detection adtk
Contrastive Learning contrastive-learner
Gradient Boosting catboost, xgboost, ngboost, lightgbm, thunderbm
Graph Neural Networks spektral GNN for Keras
Graph Embedding and Community Detection karateclub, python-louvain, communities
Hidden Markov Models hmmlearn
Interpretable Models imodels Models that show rules
Multi-view Learning mvlearn
Noisy Label Learning cleanlab
Optimization nevergrad Gradient Free Optimization
cvxpy Convex Optimization
Optimal Transport pot, geomloss
Probabilistic modeling pomegranate, pymc3
Rule based classifier sklearn-expertsys
Self-Supervised Learning lightly, vissl, solo-learn Implementations of SSL models
self_supervised Self-supervised models in Fast.AI
Spiking Neural Network norse
Support Vector Machines thundersvm Run SVM on GPU
Survival Analysis lifelines
Feature engineering featuretools, autopandas
tsfresh, python-holidays, skits, catch22 Time series
Dimensionality reduction fbpca, fitsne, trimap
Data Cleaning imblearn Class Imbalance
category_encoders, dirty_cat Categorical encoding
missingno Missing values
Hyperparameter tuning hyperopt, optuna, evol, talos Libraries
keras-tuner Keras
hyperopt-sklearn, scikit-optimize Bayesian Optimization
sklearn-deap, sklearn-generic-opt Evolutionary algorithm
Adversarial Attack cleverhans General
foolbox Image
triggers NLP
Interpretability eli5, lime, shap, alibi, tf-explain, treeinterpreter, pybreakdown, xai, lofo-importance, interpretML, shapash
Language Interpretability Tool, transformers-interpret transformers
exbert, bertviz BERT
word2viz, whatlies word-vectors
Tabular Data tabfpn
Time series prophet, tslearn, pyts, seglearn, cesium, stumpy, darts, gluon-ts, stldecompose, sktime
atspy Automated time-series models
orion, luminaire Anomaly detection
pmdarima ARIMA models
Recommendation System apyori Apriori algorithm
implicit Collaborative Filtering
xlearn, DeepCTR, RankFM Factorization machines (FM), and field-aware factorization machines (FFM)
libmf-python Matrix Factorization
lightfm, spotlight Popular Recsys algos
CaseRecommender Pytorch
surprise scikit-learn like API
Pytorch pytorch-summary Keras-like summary
torchtyping, tsalib Type annotation for tensors
einops Einstein Notation
kornia Computer Vision Methods
nonechucks Drop corrupt data automatically in DataLoader
pytorch-optimizer Collection of optimizers
pytorch-block-sparse Sparse matrix replacement for nn.Linear
pytorch-forecasting Time series forecasting in PyTorch lightning
pytorch-lightning Lightweight wrapper for PyTorch
skorch Wrap pytorch in scikit-learn compatible API
torchcontrib SOTA Bulding Blocks in PyTorch
bitsandbytes 8-bit optimizers for PyTorch
Scikit-learn scikit-lego, iterative-stratification
iterstrat Cross-validation for multi-label data
scikit-multilearn Multi-label classification
tscv Time-series cross-validation
sparseml Sparsification
Helpers mlxtend Extra utilities not present in frameworks
Visualization matplotlib, seaborn, pygal, plotly, plotnine
yellowbrick, scikit-plot libraries
pyldavis topics-models
dtreeviz decision-tree
txtmarker Highlight text in PDF
metriculous Visualize model performance
mermaid markdown
squarify Tree-map chart
babyplots 3D charts
dl-visuals, ml-visuals, chalk Diagrams
bar_chart_race bar chart race
pandas_alive Animated charts in pandas
umap, ivis high-dimensions
bokeh, flourish-studio, mpld3 Interactive charts
netron, nn-svg Model visualization
tensor-sensor Visualize tensors
keract Activation maps for keras
keras-vis Visualize keras models
PlotNeuralNet Latex code for drawing neural network
loss-landscape-anim Generate loss landscape of optimizer
open-color Color Schemes
mplcyberpunk Cyberpunk style
chart.xkcd XKCD style
adjustText Prevent overlap when plotting point text label

NLP

Category Tool Remarks
Libraries spacy , nltk, corenlp, deeppavlov, kashgari, transformers, ernie, stanza, nlp-architect, spark-nlp, pytext, FARM
headliner, txt2txt seq2seq models
Nvidia NeMo Toolkit for ASR, NLP and TTS
nlu 1-line models for NLP
pyconverse Conversational Text Analysis
booknlp NLP for Books
finetune scikit-learn style
compromise Javascript NLP
CPU-optimizations turbo_transformers, onnx_transformers, fastT5
Preprocessing textacy, texthero, textpipe, nlpretext
JamSpell, pyhunspell, pyspellchecker, cython_hunspell, hunspell-dictionaries, autocorrect (can add more languages), symspellpy, spello (train your own spelling correction), contextualSpellCheck, neuspell, nlprule, spylls Spelling Correction
language-tool-python, gingerit, gramformer Grammatical Error Correction
ekphrasis Pre-processing for social media texts
editop Compute edit-operations for text normalization
contractions, pycontractions Contraction Mapping
truecase Fix casing
nnsplit, deepsegment, sentence-doctor, pysbd, sentence-splitter Sentence Segmentation
wordninja Probabilistic Word Segmentation
punctuator2 Punctuation Restoration
stopwords-iso Stopwords for all languages
language-check, langdetect, polyglot, pycld2, cld2, cld3, langid, lumi_language_id Language Identification
langcodes Get language from language code
neuralcoref Coreference Resolution
inflect, lemminflect, pyinflect Inflections
scrubadub PID removal
ftfy, clean-text,text-unidecode Fix Unicode Issues
fastpunct Punctuation Restoration
pyphen Hypthenate words into syllables
pypostal, mordecai, usaddress, libpostal Parse Street Addresses
geopy, geocoder, nominatim, pelias, photon, lieu Geocoding
probablepeople, python-nameparser Parse person name
python-phonenumbers Parse phone numbers
numerizer, word2number Parse natural language number
dateparser Parse natural dates
ctparse Parse natural language time
daterangeparser Parse date ranges in natural language
emoji Handle emoji
pyarabic multilingual
Tokenization sentencepiece, youtokentome, subword-nmt
sacremoses Rule-based
jieba, pkuseg Chinese Word Segmentation
kytea Japanese word segmentation
Clustering kmodes, star-clustering, genieclust
spherecluster K-means with cosine distance
sib Sequential Information Bottleneck
kneed Automatically find number of clusters from elbow curve
OptimalCluster Automatically find optimal number of clusters
gsdmm Short-text clustering
Code Switching codeswitch
Constituency Parsing benepar, allennlp, chunk-english-fast
Compact Models mobilebert, distilbert, tinybert,BERT-of-Theseus-MNLI, MiniML
Cross-lingual Embeddings muse, laserembeddings, xlm, LaBSE
transvec, vecmap Train mapping between monolingual embeddings
MuRIL Embeddings for 17 indic languages with transliteration
BPEmb Subword Embeddings in 275 Languages
piecelearn Train own sub-word embeddings
Dictionary vocabulary
Domain-specific codebert Code
clinicalbert-mimicnotes, clinicalbert-discharge-summary Clinical Domain
twitter-roberta-base twitter
scispacy bio-medical data
blackstone Legal text
Entity Linking dbpedia-spotlight, GENRE
Entity Matching py_entitymatching, deepmatcher
Embeddings InferSent, embedding-as-service, bert-as-service, sent2vec, sense2vec,glove-python, fse
counterix Train custom Count-based DSM
embeddix Convert word vectors format
wiki2vec Word2Vec trained on DBPedia Entities
chars2vec Character-embeddings for handling typo and slangs
rank_bm25, BM25Transformer BM25
sentence-transformers, DeCLUTR BERT sentence embeddings
conceptnet-numberbatch Word embeddings trained with common-sense knowledge graph
word2vec-twitter Word2vec trained on twitter
pymagnitude Access word-embeddings programatically
chakin Download pre-trained word vectors
zeugma Pretrained-word embeddings as scikit-learn transformers
starspace Learn embeddings for anything
svd2vec Learn embeddings from co-occurrence
all-but-the-top Post-processing for word vectors
entity-embed Train custom embeddings for named entities
Emotion Classification goemotion-pytorch, text2emotion
emosent-py Sentiment scores for Emojis
Feature Generation homer, textstat Readability scores
LexicalRichness Lexical Richness Measure
Finite State Transducer OpenFST
Gibberish Detection nostril, gibberish-detector
Grammar Induction gitta, grasp Generate CFG from sentences
Information Extraction claucy
GiveMe5W1H Extract 5-why 1-how phrases from news
spikex Spacy pipeline for knowledge extraction
Keyword extraction rake, multi-rake, pke, phrasemachine, keybert, word2phrase
pyate Automated Term Extraction
Knowledge conceptnet-lite
stanford-openie Knowledge Graphs
verbnet-parser VerbNet parser
Knowledge Distillation textbrewer, aquvitae
Language Model Scoring lm-scorer, bertscore, kenlm, spacy_kenlm, mlm-scoring
Lexical Simplification easee Evaluation metric
Morphology unimorph Morphology data for many languages
Multilingual support polyglot, trankit
inltk, indic_nlp Indic Languages
cltk Latin / Classic languages
langrank Auto-select optimal transfer language
Named Entity Recognition(NER) spaCy , Stanford NER, sklearn-crfsuite
med7 Medical records
Nearest neighbor faiss, sparse_dot_topn, n2, autofaiss
NLU snips-nlu
ParlAI Dialogue System
Paraphrasing parrot
pegasus Question Paraphrasing
paraphrase_diversity_ranker Rank paraphrases of sentence
sentaugment Paraphrase mining
Phonetics epitran Transliterate text into IPA
allosaurus Recognize phone for 2000 languages
Phonology panphon Generate phonological feature representations
phoible Database of segment inventories for 2186 languages
Probabilistic parsing parserator Create domain-specific parser for address, name etc.
Profanity detection profanity-check
Pronunciation pronouncing
Question Answering haystack Build end-to-end QA system
mcQA Multiple Choice Question Answering
TAPAS Table Question Answering
Relation Extraction OpenNRE
Search elasticsearch-dsl, mellisearch-python, jina Wrapper for elastic search
Semantic parsing quepy
Sentiment vaderSentiment, afinn Rule based
absa Aspect Based Sentiment Analysis
Spacy Extensions spacy-pattern-builder Generate dependency matcher patterns automatically
spacy_grammar Rule-based grammar error detection
role-pattern-builder Pattern based SRL
textpipeliner Extract RDF triples
tenseflow Convert tense of sentence
camphr Wrapper to transformers, elmo, udify
spleno Domain-specific lemmatization
spacy-udpipe Use UDPipe from Spacy
spacymoji Add emoji metadata to spacy docs
String match phrase-seeker, textsearch
jellyfish, fuzzy, doublemetaphone Perform string and phonetic comparison
clavier Edit distance based on keyboard layout
flashtext Super-fast extract and replace keywords
pythonverbalexpressions Verbally describe regex
commonregex Ready-made regex for email/phone etc.
textdistance, editdistance, word-mover-distance, edlib Text distances
wmd-relax Word mover distance for spacy
fuzzywuzzy, spaczz, PolyFuzz, rapidfuzz, fuzzymatcher Fuzzy Search
deduplipy, dedupe Active-Learning based fuzzy matching
recordlinkage Record Linkage
Summarization textrank, pytldr, bert-extractive-summarizer, sumy, fast-pagerank, sumeval
doc2query Summarize document with queries
summarizers Controllable summarization
insight_extractor Extract insightful sentences from docs
Text Extraction textract (Image, Audio, PDF)
Text Generation gp2client, textgenrnn, gpt-2-simple, aitextgen
markovify Markov chain
accelerated-text Template-based generation
keytotext Keyword to Sentence Generation
Transliteration wiktra
Machine Translation MarianMT, Opus-MT, joeynmt, OpenNMT, EasyNMT, argos-translate, dl-translate
googletrans, word2word, translate-python, deep_translator Translation libraries
mosesdecoder Statistical MT
apertium RBMT
translators Free calls to multiple translation APIs
giza++, fastalign, simalign, eflomal, awesome-align Word Alignment
Thesaurus python-datamuse
Toxicity Detection detoxify
Topic Modeling gensim, guidedlda, enstop, top2vec, contextualized-topic-models, corex_topic, lda2vec, bertopic, tomotopy, ToModAPI
zeroshot_topics Zero-shot topic modeling
octis Evaluate topic models
Typology lang2vec Compare typological features of languages
Visualization stylecloud Word Clouds
scattertext Compare word usage across segments
picture-text Interactive tree-maps for hierarchical clustering
ipymarkup Visualize NER and syntax
Verb Conjugation nodebox_linguistics_extended, mlconj3
Word Sense Disambiguation pywsd, ewiser, supwsd
frame-english-fast Verb Disambiguation
Zero Shot Learning setfit

Misc

Category Tool Remarks
Automation pyuserinput, pyautogui, pynput Control mouse and keyboard
Code to Maths latexify-py, handcalcs