AI Engineering Toolbox

A collection of libraries for all phases of AI Engineering.
Author

Amit Chaudhary

Published

May 11, 2025

Modified

June 16, 2025

Phase: Data

Data Preparation

Category Tool Remarks
Synthetic Data curator, datadreamer, distillabel, fabricator, promptwright
auto-label Generate annotated datasets
Pretraining pipelines datatrove, nemo-curator
Data Exploration lilac
Deduplication semhash, rensa
Web Scraping trafilatura, crawl4ai

Phase: Evaluation

Evals

Category Tool Remarks
Libraries athina-evals, deepeval, geval, openevals, promptfoo, inspect-ai
evaluate, torchmetrics, torcheval Off-the-shelf metrics
mir-eval
Agents agentsevals
Benchmarks openai-evals, yet-another-applied-llm-benchmark, lm-evaluation-harness
llmtest-needleinahaystack Needle in a Hatstack
agentbench, textarena Agents
open-llm-leaderboard, LMArena Arena
Diversity diversity
Hallucination lettucedetect, hallucination-leaderboard, selfcheckgpt
RAG ragas, ragchecker
auto-evaluator Generate synthetic QA pairs from docs
chunking-evaluation Evaluate chunking strategies
Text Generation bleurt, bertscore, moverscore

Phase: Modeling

Experiment Tracking

Category Tool Remarks
Libraries trulens, mlflow, wandb, mlop

Developer Tools

Category Tool Remarks
Coding Agent Rules ruler

Prompt Engineering

Category Tool Remarks
Automatic Prompt Engineering dspy, textgrad, adalflow, zenbase
openevolve Evolutionary code optimization
Function Calling functionary
Structured Output guidance, instructor, jsonformer,lm-format-enforcer, outlines, xgrammar, lqml, fructose
llm-scraper webpage to json
Memory mem0, letta, memobase, memary, langmem, memoripy
cognee Memory using knowledge graphs
Rate Limiting backoff, tenacity, ratelimit
limits Rate-limit for own APIs
Code Interpreter gpt-code-interpreter, open-interpreter, codeinterpreter-api
Steering Vectors dialz, repeng
System Prompts llm-system-prompts, leaked-system-prompts, system-prompt-leaks, awesome-ai-system-prompts, cl4r1t4s, system-prompts-and-models-of-ai-tools

RAG

Category Tool Remarks
Libraries llama-index, verba, fastrag, haystack, ragbits
Chunking wtsplit, semchunk, chonkie, langchain-text-splitters
open-parse, doclayout-yolo Layout parsing visually
sparseprimingrepresentations SPR
Reranking rerankers, flaskrank
Retrieval pyserini
bm25s Sparse retrieval
Embeddings fastembed, sentence-transformers
model2vec Static Vectors
Vector Index annoy, diskann, faiss, chroma, qdrant, pinecone, weviate, milvus
pgvector, sqlite-vec SQL Extensions
simsimd Faster dot-product on CPUs
Late Interaction ragatouille, pylate Train ColBERT models
Graph RAG fast-graphrag, graphrag, nano-graphrag
OCR textract
nougat Academic Documents
Document Understanding donut
table-transformer Table Extraction
marker, pdf2md, mineru, docext PDF to markdown
RAG on internal docs onyx, xyne

Agents

Category Tool Remarks
Libraries autogen, crewai, langroid, openai-agents, pydantic-ai, marvin, metagpt, semantic-kernel
langgraph Graphs
smolagents Code-based agent
Web Use browser-use
deer-flow Deep Research
magnitude Automated Testing
Computer Use ui-tars-desktop, cuda
Code Execution Sandbox microsandbox, e2b
Web Search API sonar

Finetuning

Category Tool Remarks
LLM Finetuning axolotl, unsloth, torchtune, peft, litgpt, llama-factory
onebitllms 1.58-bit LLMs
Model Merging mergekit, mergoo
Multi-modal VLMs maestro, nanovlm, mlx-vlm
smol-vision Recipes
llava Visual Instruction Tuning
RL art, trl, verl, openrlhf, atropos, retrain, nemo-rl
verifiers Verifiers
Distributed Training metagron-lm, deepspeed, yafsdp, nanotron, fairscale, colossalai
hivemind, psyche Decentralized Training
Self-instruct airoboros
Tokenization supertokenizer Train multi-word BPE
Classification adaptive-classifier Continuous Learning

Multimodality

Audio

Category Tool Remarks
Voice Activity Detection silero-vad
General models seamless-communication
Speech to Text faster-whisper, whisperx
Text to Speech chatterbox
whisper-streaming Realtime

Vision

Category Tool Remarks
Facial recognition deepface
VLM CogVLM

Phase: Deployment

Inference

Category Tool Remarks
Inference Servers ray, vllm, powerinfer, text-generation-inference, sglang, tensorrt-llm, ctransalte2, mlc-llm, deepspeed-mii, openllm, exllamav2, fastgen, tokasaurus
Batch Processing skypilot
Multi-token prediction medusa
Multi-LoRA inference s-lora, punica, lorax
KV Cache kvzip
LLM Gateway litellm, portkey, tensorzero
LLM Routing routellm, automix, openrouter, awesome-ai-model-routing
Model Cascade frugalgpt
Semantic Cache gptcache
Quantization llm-compressor, bitsandbytes, optimum
Embeddings text-embeddings-inference, infinity
Kernels liger-kernel
attorch Triton kernels
sparse_transformers Sparse kernels
Local Servers lmstudio, ollama, text-generation-webui, koboldcpp, llama.cpp
Frontend UI gradio, streamlit
copilot-kit Chat UI components
Caching panza Caching for async functions

Monitoring

Category Tool Remarks
Observability openllmetry, phoenix, logfire
Guardrails giskard, langkit, garak, deepchecks, nemo-guardrails
rebuff Prompt Injection Detection
uqlm Uncertainty Quantification
Drift Detection ft-drift Detect drift in OpenAI messages
AI Detection binoculars