How Airtable Saved Millions by Cutting Archive Storage Costs by 100x (11 minute read)
Airtable cut archive storage costs by about 100x by moving cold, mostly immutable MySQL data into S3 as partitioned Parquet files and querying it with embedded Apache DataFusion. The dataset became 10x smaller, while S3 was about 10x cheaper per byte. A Flink-based migration, bulk and shadow validation, tiered caching, custom secondary indexes, and Parquet bloom filters preserved interactive latency and enterprise guarantees.
|
Internal vs. External Storage: What's the Limit of External Tables? (26 minute read)
Internal tables store and manage both data and metadata within the database system, while external tables only store metadata and reference data that lives outside the system, leaving the underlying data untouched. Internal tables enable tighter lifecycle management, whereas external tables decouple storage and compute, making it easier to scale, share, and access large datasets without moving or duplicating data.
|
|
Databases Were Not Designed For This (16 minute read)
Databases were built for predictable apps and human-written queries, not AI agents that generate queries on the fly, retry automatically, and can make silent mistakes at scale. Teams now need stronger guardrails like tighter permissions, timeouts, audit logs, idempotent writes, and clearer schemas so databases stay safe when AI becomes the caller.
|
When a Cloud Region Fails: Rethinking High Availability in a Geopolitically Unstable World (15 minute read)
Cloud high availability can no longer assume regions are safe, independent failure domains: sanctions, data localization laws, conflict zones, and submarine cable cuts can take out an entire region or make it noncompliant. Treat region-level disruption as a first-class risk, with multi-region, jurisdiction-aware data placement, control-plane separation, and dependency audits. The added cost and complexity should be justified with Annual Loss Expectancy modeling rather than assumed.
|
Stop Letting Tools Lead Your Platform Decisions (3 minute read)
Data platform decisions should start with use cases, constraints, and operating requirements, not with Kafka, Spark, Snowflake, or Airflow. The key questions are latency, data freshness, cost, failure handling, and who will consume the system. Choose the simplest stack that fits the problem, team, budget, and timelines.
|
|
DuckDB Extension - Whisper (Tool)
Whisper is a DuckDB extension that lets you transcribe audio into text directly with SQL, making voice data easier to search, analyze, and use alongside your normal tables.
|
Jaeger adopts OpenTelemetry at its core to solve the AI agent observability gap (4 minute read)
Jaeger v2 rebuilds its core on the OpenTelemetry Collector, natively ingesting OTLP and unifying metrics, logs, and traces in one deployment model to improve ingestion and eliminate translation steps. It's also adding agent-facing interfaces like MCP, ACP, and AG-UI so engineers can use natural language to translate incident context into deterministic trace queries and collaborate with AI agents.
|
tda-mapper (GitHub Repo)
tda-mapper is a Python library that helps find hidden shapes, clusters, and patterns in messy data using the Mapper algorithm from topological data analysis. It's built to scale to large datasets, works with scikit-learn pipelines, and includes visual tools to explore complex data more clearly.
|
|
Fixing What LLMs Get Wrong (22 minute read)
Enterprise LLM systems can produce fluent but factually wrong answers against private structured knowledge, creating a βhallucination taxβ on pricing, policy, org, and legal data. Fine-tuning, RAG, and static verification each help, but none learn from repeated failures. Reflexion closes the loop by storing natural-language reflections from verified errors in episodic memory and reinjecting them into future prompts.
|
|
|
Love TLDR? Tell your friends and get rewards!
|
|
Share your referral link below with friends to get free TLDR swag!
|
|
|
|
Track your referrals here.
|
|
|
|