TLDR Data 2025-11-03

AI Agents Are Everywhere, but Are They Delivering? (Sponsor)

AI agents are rapidly becoming business-critical, but enterprise success is far from guaranteed. In fact, a new Dataiku/Harris Poll survey of 800+ global data leaders found that 95% admit they cannot fully trace AI decisions for audit purposes.

The “Global AI Confessions Report: Data Leaders Edition” uncovers the hidden risks, misalignments, and urgent corrections leaders need to turn agents from a liability into measurable business value. Discover the hidden risks, fragile foundations, and urgent corrections needed for AI to deliver value in this report.

📱

Deep Dives

Machine-Learning Predictive Autoscaling for Flink (8 minute read)

Grab increased Flink usage 2.5x in a year, driving the need for efficient, self-service stream processing. Traditional reactive autoscaling proved inefficient due to restart spikes, parallelism constraints, and manual tuning overhead, often causing resource waste and latency spikes. They prototyped a predictive autoscaler leveraging time-series forecasting and regression models to dynamically align CPU provisioning with real-time Kafka workload patterns, achieving over 35% average cloud cost savings and simplifying deployment by automating scaling decisions.

From Outages to Order: Netflix's Approach to Database Resilience with WAL (2 minute read)

Netflix deployed a modular Write-Ahead Log (WAL) system to address data loss, replication entropy, and cross-region consistency, capturing all database mutations in a durable log for robust recoverability. The pluggable design separates producers and consumers, leverages SQS/Kafka with dead-letter queues for reliability, and enables flexible routing to multiple storage backends without code changes. It is capable of handling millions of writes per second with sub-second tail latencies.

From JSON to AVRO in the CDC Pipeline (8 minute read)

Fresha upgraded its Change Data Capture (CDC) pipeline from JSON to AVRO format to address scalability issues as the business grew. Originally using Postgres, Debezium, Kafka with JSON, Snowpipe Streaming, and Snowflake, the pipeline was simple but inefficient for handling increased schema changes, storage, and query performance. The transition to AVRO, enabled by Snowflake's schema evolution support, introduced automated schema management via a Schema Registry, reducing manual work and improving efficiency.

🚀

Opinions & Advice

"You Don't Need Kafka, Just Use Postgres" Considered Harmful (16 minute read)

Many “just use Postgres” arguments ignore that Kafka and Postgres solve fundamentally different problems. Kafka provides persistent ordered logs, consumer groups, low latency streaming, fault tolerance, and a massive connector ecosystem, which are difficult and costly to recreate on top of Postgres. Postgres can handle simple queuing at small scale, but using it as an event streaming platform eventually leads to complexity and performance issues, so the right approach is to let Postgres manage state and Kafka handle event streaming.

Why You'll Never Have a FAANG Data Infrastructure and That's the Point (9 minute read)

Enterprises seeking FAANG-level data capabilities often try to replicate decades-old, custom-built infrastructure. Instead, they should emulate the core design philosophies: abstraction, automation, and accountability. A hybrid “buy + build” approach using managed/SaaS platforms layered with organizational domain models and robust data governance, organizations can achieve 90% the outcomes (scalable ingestion, analytics, ML, and self-service) without FAANG-scale budgets or engineering headcount. The key is treating data artefacts as products, enforcing standards, and leveraging internal developer platforms to maximize visibility, reliability, and business alignment.

Simple, Battle-Tested Algorithms Still Outperform AI (5 minute read)

Companies are losing over $200 billion annually (experiencing negative 45% ROI) by deploying AI systems instead of proven, simpler algorithms that routinely deliver off the charts ROI. Executive fascination with AI and vendor pressures drive costly, inefficient implementations, while well-established methods like EOQ and Little's Law, managed by skilled programmers, remain dramatically more effective for most operational decisions.

Ladder of Evidence in Understanding Effectiveness of New Products (7 minute read)

Meta data scientists use a "ladder of evidence" framework to evaluate the effectiveness of new products. Randomized Controlled Trials (RCTs) are used as the gold standard for establishing causality, but causal inference methods are essential alternatives when RCTs are not feasible due to constraints like cost, sample size, or launch requirements.

💻

Launches & Tools

dbt Labs Open Sources MetricFlow: An Independent Schema for Data Interoperability (4 minute read)

dbt Labs has open-sourced MetricFlow (a Semantic Layer SQL generation tool and its JSON-based metadata schema) to drive semantic interoperability across the data ecosystem under the Apache 2.0 license. The universal metadata layer enables common metric definitions and lineage tracing between tools and data platforms. This approach facilitates data product transparency and audit, particularly relevant for agentic and LLM-powered workflows, by standardizing data and metric exchange across platforms.

PyTorch Foundation Welcomes Ray to Deliver a Unified Open Source AI Compute Stack (4 minute read)

The PyTorch Foundation announced at the PyTorch Conference in San Francisco on October 22 that Ray is joining as its newest foundation-hosted project. This move aims to create a unified open-source AI compute stack by integrating Ray with existing projects like PyTorch and vLLM, simplifying AI development and accelerating production workflows.

FlinkSketch (GitHub Repo)

FlinkSketch is a library offering various sketching algorithms compatible with Apache Flink. Developers can integrate these algorithms into their applications using Flink's DataStream API.

🎁

Miscellaneous

Anonymous Credentials: Rate-Limiting Bots and Agents Without Compromising Privacy (21 minute read)

The rapid adoption of agentic AI is shifting web traffic from individual users to high-frequency, platform-driven requests, challenging existing rate-limiting and security mechanisms, especially as traditional fingerprinting becomes ineffective and risks unjustly blocking large user cohorts. Cloudflare uses cryptographically robust anonymous credentials (ARC and ACT), enabling fine-grained, privacy-preserving rate limiting and per-user controls with sublinear communication costs and support for multi-use tokens and late origin-binding. These protocols, under IETF standardization, offer actionable primitives (algebraic MACs and zero-knowledge proofs) that seamlessly integrate with modern automation frameworks (e.g., MCP Tools), effectively balancing resource management, security, and user privacy.

The Architectural Shift: AI Agents Become Execution Engines While Backends Retreat to Governance (3 minute read)

Enterprise architecture is rapidly evolving as AI agents transition into primary execution engines, directly orchestrating workflows and CRUD operations via protocols like Model Context Protocol (MCP), while backends focus on governance and permissions. Gartner forecasts that 40% of enterprise applications will feature autonomous agents by 2026, up from less than 5% today, with agent-driven systems poised to deliver up to $6 trillion in economic value by 2028.

⚡

Quick Links

Exploring how PostgreSQL 18 Conquered Time with Temporal Constraints (4 minute read)

Postgres 18 introduces native temporal constraints, leveraging GiST indexes and the WITHOUT OVERLAPS clause to enforce non-overlapping time ranges and PERIOD-based foreign keys for robust temporal referential integrity.

Building a Unified Hybrid Cloud with Infrastructure as Code at RBC (3 minute read)

How a major bank approaches policy-as-code and GitOps to unify thousands of hybrid cloud workloads.

Love TLDR? Tell your friends and get rewards!

Share your referral link below with friends to get free TLDR swag!

https://refer.tldr.tech/9a7c3e77/11

Track your referrals here.

Want to advertise in TLDR? 📰

If your company is interested in reaching an audience of data engineering professionals and decision makers, you may want to advertise with us.

Want to work at TLDR? 💼

Apply here or send a friend's resume to [email protected] and get $1k if we hire them!

If you have any comments or feedback, just respond to this email!

Thanks for reading,
Joel Van Veluwen, Tzu-Ruey Ching & Remi Turpaud

Manage your subscriptions to our other newsletters on tech, startups, and programming. Or if TLDR Data isn't for you, please unsubscribe.