AI Engineering Toolbox

A collection of libraries for all phases of AI Engineering.
Author

Amit Chaudhary

Published

May 11, 2025

Modified

May 31, 2025

Phase: Data

Data Preparation

Category Tool Remarks
Synthetic Data curator, datadreamer, distillabel, fabricator, promptwright
auto-label Generate annotated datasets
Pretraining pipelines datatrove, nemo-curator
Data Exploration lilac
Deduplication semhash
Web Scraping trafilatura, crawl4ai

Phase: Evaluation

Evals

Category Tool Remarks
Libraries athina-evals, deepeval, geval, openevals, promptfoo, inspect-ai
evaluate, torchmetrics, torcheval Off-the-shelf metrics
mir-eval
Agents agentsevals
Benchmarks openai-evals, yet-another-applied-llm-benchmark, lm-evaluation-harness
llmtest-needleinahaystack Needle in a Hatstack
agentbench, textarena Agents
Diversity diversity
Hallucination lettucedetect, hallucination-leaderboard
RAG ragas, ragchecker
auto-evaluator Generate synthetic QA pairs from docs
chunking-evaluation Evaluate chunking strategies
Text Generation bleurt

Phase: Modeling

Experiment Tracking

Category Tool Remarks
Libraries trulens, mlflow, wandb, mlop

Prompt Engineering

Category Tool Remarks
Automatic Prompt Engineering dspy, textgrad, adalflow
openevolve Evolutionary code optimization
Function Calling functionary
Structured Output guidance, instructor, jsonformer,lm-format-enforcer, outlines, xgrammar, lqml, fructose
llm-scraper webpage to json
Memory mem0, letta, memobase, memary, langmem, memoripy,
Rate Limiting backoff, tenacity, ratelimit
limits Rate-limit for own APIs
Code Interpreter gpt-code-interpreter, open-interpreter, codeinterpreter-api
Steering Vectors dialz, repeng

RAG

Category Tool Remarks
Libraries llama-index, verba, fastrag, haystack
Chunking wtsplit, semchunk, chonkie, langchain-text-splitters
open-parse Layout parsing visually
sparseprimingrepresentations SPR
Reranking rerankers, flaskrank
Retrieval pyserini
bm25s Sparse retrieval
Embeddings fastembed, sentence-transformers
model2vec Static Vectors
Vector Index annoy, diskann, faiss, chroma, qdrant, pinecone, weviate, milvus
pgvector, sqlite-vec SQL Extensions
simsimd Faster dot-product on CPUs
Late Interaction ragatouille, pylate Train ColBERT models
Graph RAG fast-graphrag, graphrag, nano-graphrag
OCR textract
nougat Academic Documents
Document Understanding donut
table-transformer Table Extraction
marker PDF to markdown
RAG on internal docs onyx, xyne

Agents

Category Tool Remarks
Libraries autogen, crewai, langroid, openai-agents, pydantic-ai, marvin, metagpt, semantic-kernel
langgraph Graphs
smolagents Code-based agent
Web Use browser-use
deer-flow Deep Research
magnitude Automated Testing
Computer Use ui-tars-desktop, cuda
Code Execution Sandbox microsandbox, e2b

Finetuning

Category Tool Remarks
LLM Finetuning axolotl, unsloth, torchtune, peft, litgpt, llama-factory
onebitllms 1.58-bit LLMs
Model Merging mergekit, mergoo
Multi-modal VLMs maestro, nanovlm, mlx-vlm
smol-vision Recipes
llava Visual Instruction Tuning
RL art, trl, verl, openrlhf, atropos, retrain
verifiers Verifiers
Distributed Training metagron-lm, deepspeed, yafsdp, nanotron, fairscale, colossalai
hivemind, psyche Decentralized Training
Self-instruct airoboros
Tokenization supertokenizer Train multi-word BPE
Classification adaptive-classifier Continuous Learning

Multimodality

Audio

Category Tool Remarks
Voice Activity Detection silero-vad
General models seamless-communication
Speech to Text faster-whisper, whisperx
Text to Speech chatterbox
whisper-streaming Realtime

Vision

Category Tool Remarks
Facial recognition deepface
VLM CogVLM

Phase: Deployment

Inference

Category Tool Remarks
Inference Servers ray, vllm, powerinfer, text-generation-inference, sglang, tensorrt-llm, ctransalte2, mlc-llm, deepspeed-mii, openllm, exllamav2, fastgen
Batch Processing skypilot
Multi-token prediction medusa
Multi-LoRA inference s-lora, punica, lorax
LLM Gateway litellm, portkey
LLM Routing routellm, automix, openrouter, awesome-ai-model-routing
Model Cascade frugalgpt
Semantic Cache gptcache
Quantization llm-compressor, bitsandbytes, optimum
Embeddings text-embeddings-inference, infinity
Kernels liger-kernel
attorch Triton kernels
Local Servers lmstudio, ollama, text-generation-webui, koboldcpp, llama.cpp
Frontend UI gradio, streamlit
copilot-kit Chat UI components
Caching panza Caching for async functions

Monitoring

Category Tool Remarks
Observability openllmetry, phoenix, logfire
Guardrails giskard, langkit, garak, deepchecks, nemo-guardrails
rebuff Prompt Injection Detection
uqlm Uncertainty Quantification
Drift Detection ft-drift Detect drift in OpenAI messages
AI Detection binoculars