AI Engineering Toolbox

Phase: Data

Category	Tool	Remarks
Synthetic Data	curator, datadreamer, distillabel, fabricator, promptwright
	auto-label	Generate annotated datasets
Pretraining pipelines	datatrove, nemo-curator
Data Exploration	lilac
	turftopic, berttopic	Topic Modeling
Deduplication	semhash, rensa
Web Scraping	trafilatura, crawl4ai
	autoscraper	specify target and get scraping code

Category	Tool	Remarks
Libraries	athina-evals, deepeval, geval, openevals, promptfoo, inspect-ai
	evaluate, torchmetrics, torcheval	Off-the-shelf metrics
	mir-eval
Agents	agentsevals
Benchmarks	openai-evals, yet-another-applied-llm-benchmark, lm-evaluation-harness
	llmtest-needleinahaystack	Needle in a Hatstack
	agentbench, textarena	Agents
	open-llm-leaderboard, LMArena	Arena
Diversity	diversity
Hallucination	lettucedetect, hallucination-leaderboard, selfcheckgpt
RAG	ragas, ragchecker
	auto-evaluator	Generate synthetic QA pairs from docs
	chunking-evaluation	Evaluate chunking strategies
Text Generation	bleurt, bertscore, moverscore

Category	Tool	Remarks
Libraries	trulens, mlflow, wandb, mlop

Category	Tool	Remarks
Coding Agent Rules	ruler

Category	Tool	Remarks
Automatic Prompt Engineering	dspy, textgrad, adalflow, zenbase
	openevolve	Evolutionary code optimization
Function Calling	functionary
Structured Output	guidance, instructor, jsonformer,lm-format-enforcer, outlines, xgrammar, lqml, fructose
	json-repair	Post-process broken JSON
	llm-scraper	webpage to json
Memory	mem0, letta, memobase, memary, langmem, memoripy
	cognee	Memory using knowledge graphs
Rate Limiting	backoff, tenacity, ratelimit
	limits	Rate-limit for own APIs
Code Interpreter	gpt-code-interpreter, open-interpreter, codeinterpreter-api
Steering Vectors	dialz, repeng
System Prompts	llm-system-prompts, leaked-system-prompts, system-prompt-leaks, awesome-ai-system-prompts, cl4r1t4s, system-prompts-and-models-of-ai-tools

Category	Tool	Remarks
Libraries	llama-index, verba, fastrag, haystack, ragbits
Chunking	wtsplit, semchunk, chonkie, langchain-text-splitters
	open-parse, doclayout-yolo	Layout parsing visually
	sparseprimingrepresentations	SPR
Reranking	rerankers, flaskrank
Retrieval	pyserini
	bm25s	Sparse retrieval
Embeddings	fastembed, sentence-transformers
	model2vec	Static Vectors
Vector Index	annoy, diskann, faiss, chroma, qdrant, pinecone, weviate, milvus
	pgvector, sqlite-vec	SQL Extensions
	simsimd	Faster dot-product on CPUs
Late Interaction	ragatouille, pylate	Train ColBERT models
	maxsim-cpu	Speed-up max-sim for ColBERT
Graph RAG	fast-graphrag, graphrag, nano-graphrag
OCR	textract
	nougat	Academic Documents
Document Understanding	donut
	table-transformer	Table Extraction
	marker, pdf2md, mineru, docext, docling	PDF to markdown
RAG on internal docs	onyx, xyne

Category	Tool	Remarks
Libraries	autogen, crewai, langroid, openai-agents, pydantic-ai, marvin, metagpt, semantic-kernel
	langgraph	Graphs
	smolagents	Code-based agent
MCP	fastmcp, enrichmcp
	mcp-scan	scan security vulnerability
Web Use	browser-use
	deer-flow	Deep Research
	magnitude	Automated Testing
Computer Use	ui-tars-desktop, cuda
Code Execution Sandbox	microsandbox, e2b, screnenv
Coding Agents	opencode
Web Search API	sonar

Category	Tool	Remarks
LLM Finetuning	axolotl, unsloth, torchtune, peft, litgpt, llama-factory
	onebitllms, matmulfreellm	1.58-bit LLMs
Model Merging	mergekit, mergoo
Multi-modal VLMs	maestro, nanovlm, mlx-vlm
	smol-vision	Recipes
	llava	Visual Instruction Tuning
RL	art, trl, verl, openrlhf, atropos, retrain, nemo-rl
	verifiers	Verifiers
Distributed Training	metagron-lm, deepspeed, yafsdp, nanotron, fairscale, colossalai
	hivemind, psyche	Decentralized Training
Self-instruct	airoboros
Tokenization	supertokenizer	Train multi-word BPE
	tokendagger	faster tiktoken
Classification	adaptive-classifier	Continuous Learning

Category	Tool	Remarks
Facial recognition	deepface
VLM	CogVLM

Category	Tool	Remarks
Inference Servers	ray, vllm, powerinfer, text-generation-inference, sglang, tensorrt-llm, ctransalte2, mlc-llm, deepspeed-mii, openllm, exllamav2, fastgen, tokasaurus
Batch Processing	skypilot
Multi-token prediction	medusa
Multi-LoRA inference	s-lora, punica, lorax
KV Cache	kvzip, lmcache
LLM Gateway	litellm, portkey, tensorzero
LLM Routing	routellm, automix, openrouter, awesome-ai-model-routing, ai-gateway
Model Cascade	frugalgpt
Semantic Cache	gptcache
Quantization	llm-compressor, bitsandbytes, optimum
Embeddings	text-embeddings-inference, infinity
Kernels	liger-kernel
	attorch	Triton kernels
	sparse_transformers	Sparse kernels
Local Servers	lmstudio, ollama, text-generation-webui, koboldcpp, llama.cpp
Frontend UI	gradio, streamlit
	copilot-kit, assistant-ui	Chat UI components
	agent-inbox	Agent UI
Caching	panza	Caching for async functions

Category	Tool	Remarks
Observability	openllmetry, phoenix, logfire
Guardrails	giskard, langkit, garak, deepchecks, nemo-guardrails
	rebuff	Prompt Injection Detection
	uqlm	Uncertainty Quantification
Drift Detection	ft-drift	Detect drift in OpenAI messages
AI Detection	binoculars