AI Engineering Toolbox

A collection of libraries for all phases of AI Engineering.
Author

Amit Chaudhary

Published

May 11, 2025

Modified

May 27, 2025

Phase: Data

Data Preparation

Category Tool Remarks
Synthetic Data curator, datadreamer, distillabel, fabricator, promptwright
auto-label Generate annotated datasets
Pretraining pipelines datatrove, nemo-curator
Data Exploration lilac
Deduplication semhash
Web Scraping trafilatura, crawl4ai

Phase: Evaluation

Evals

Category Tool Remarks
Libraries athina-evals, deepeval, geval, openevals, promptfoo, inspect-ai
evaluate, torchmetrics, torcheval Off-the-shelf metrics
mir-eval
Agents agentsevals
Benchmarks openai-evals, yet-another-applied-llm-benchmark, lm-evaluation-harness
llmtest-needleinahaystack Needle in a Hatstack
agentbench, textarena Agents
Diversity diversity
Hallucination lettucedetect, hallucination-leaderboard
RAG ragas, ragchecker
auto-evaluator Generate synthetic QA pairs from docs
chunking-evaluation Evaluate chunking strategies
Text Generation bleurt

Phase: Modeling

Experiment Tracking

Category Tool Remarks
Libraries trulens, mlflow, wandb

Prompt Engineering

Category Tool Remarks
Automatic Prompt Engineering dspy, textgrad, adalflow
Function Calling functionary
Structured Output guidance, instructor, jsonformer,lm-format-enforcer, outlines, xgrammar, lqml, fructose
llm-scraper webpage to json
Memory mem0, letta, memobase, memary, langmem, memoripy,
Rate Limiting backoff, tenacity, ratelimit
limits Rate-limit for own APIs
Code Interpreter gpt-code-interpreter, open-interpreter, codeinterpreter-api

RAG

Category Tool Remarks
Libraries llama-index, verba, fastrag, haystack
Chunking wtsplit, semchunk, chonkie, langchain-text-splitters
open-parse Layout parsing visually
sparseprimingrepresentations SPR
Reranking rerankers, flaskrank
Retrieval pyserini
bm25s Sparse retrieval
Embeddings fastembed, sentence-transformers
model2vec Static Vectors
Vector Index annoy, diskann, faiss, chroma, qdrant, pinecone, weviate, milvus
pgvector, sqlite-vec SQL Extensions
simsimd Faster dot-product on CPUs
Late Interaction ragatouille, pylate Train ColBERT models
Graph RAG fast-graphrag, graphrag, nano-graphrag
OCR textract
nougat Academic Documents
Document Understanding donut
table-transformer Table Extraction
marker PDF to markdown
RAG on internal docs onyx, xyne

Agents

Category Tool Remarks
Libraries autogen, crewai, langroid, openai-agents, pydantic-ai, marvin, metagpt, semantic-kernel
langgraph Graphs
smolagents Code-based agent
Web Use browser-use
deer-flow Deep Research
magnitude Automated Testing
Computer Use ui-tars-desktop

Finetuning

Category Tool Remarks
LLM Finetuning axolotl, unsloth, torchtune, peft, litgpt, llama-factory
onebitllms 1.58-bit LLMs
Model Merging mergekit, mergoo
Multi-modal VLMs maestro, nanovlm, mlx-vlm
smol-vision Recipes
llava Visual Instruction Tuning
RL art, trl, verl, openrlhf, atropos, retrain
verifiers Verifiers
Distributed Training metagron-lm, deepspeed, yafsdp, nanotron, fairscale, colossalai
hivemind, psyche Decentralized Training
Self-instruct airoboros
Tokenization supertokenizer Train multi-word BPE

Multimodality

Audio

Category Tool Remarks
Voice Activity Detection silero-vad
General models seamless-communication
Speech to Text faster-whisper, whisperx
whisper-streaming Realtime

Vision

Category Tool Remarks
Facial recognition deepface
VLM CogVLM

Phase: Deployment

Inference

Category Tool Remarks
Inference Servers ray, vllm, powerinfer, text-generation-inference, sglang, tensorrt-llm, ctransalte2, mlc-llm, deepspeed-mii, openllm, exllamav2, fastgen
Batch Processing skypilot
Multi-token prediction medusa
Multi-LoRA inference s-lora, punica, lorax
LLM Gateway litellm, portkey
LLM Routing routellm, automix, openrouter, awesome-ai-model-routing
Model Cascade frugalgpt
Semantic Cache gptcache
Quantization llm-compressor, bitsandbytes, optimum
Embeddings text-embeddings-inference, infinity
Kernels liger-kernel
attorch Triton kernels
Local Servers lmstudio, ollama, text-generation-webui, koboldcpp, llama.cpp
Frontend UI gradio, streamlit
copilot-kit Chat UI components

Monitoring

Category Tool Remarks
Observability openllmetry, phoenix, logfire
Guardrails giskard, langkit, garak, deepchecks, nemo-guardrails
rebuff Prompt Injection Detection
uqlm Uncertainty Quantification
Drift Detection ft-drift Detect drift in OpenAI messages
AI Detection binoculars