Machine-Learning Predictive Autoscaling for Flink (8 minute read)
Grab increased Flink usage 2.5x in a year, driving the need for efficient, self-service stream processing. Traditional reactive autoscaling proved inefficient due to restart spikes, parallelism constraints, and manual tuning overhead, often causing resource waste and latency spikes. They prototyped a predictive autoscaler leveraging time-series forecasting and regression models to dynamically align CPU provisioning with real-time Kafka workload patterns, achieving over 35% average cloud cost savings and simplifying deployment by automating scaling decisions.
|
From Outages to Order: Netflix's Approach to Database Resilience with WAL (2 minute read)
Netflix deployed a modular Write-Ahead Log (WAL) system to address data loss, replication entropy, and cross-region consistency, capturing all database mutations in a durable log for robust recoverability. The pluggable design separates producers and consumers, leverages SQS/Kafka with dead-letter queues for reliability, and enables flexible routing to multiple storage backends without code changes. It is capable of handling millions of writes per second with sub-second tail latencies.
|
From JSON to AVRO in the CDC Pipeline (8 minute read)
Fresha upgraded its Change Data Capture (CDC) pipeline from JSON to AVRO format to address scalability issues as the business grew. Originally using Postgres, Debezium, Kafka with JSON, Snowpipe Streaming, and Snowflake, the pipeline was simple but inefficient for handling increased schema changes, storage, and query performance. The transition to AVRO, enabled by Snowflake's schema evolution support, introduced automated schema management via a Schema Registry, reducing manual work and improving efficiency.
|
|
"You Don't Need Kafka, Just Use Postgres" Considered Harmful (16 minute read)
Many βjust use Postgresβ arguments ignore that Kafka and Postgres solve fundamentally different problems. Kafka provides persistent ordered logs, consumer groups, low latency streaming, fault tolerance, and a massive connector ecosystem, which are difficult and costly to recreate on top of Postgres. Postgres can handle simple queuing at small scale, but using it as an event streaming platform eventually leads to complexity and performance issues, so the right approach is to let Postgres manage state and Kafka handle event streaming.
|
Why You'll Never Have a FAANG Data Infrastructure and That's the Point (9 minute read)
Enterprises seeking FAANG-level data capabilities often try to replicate decades-old, custom-built infrastructure. Instead, they should emulate the core design philosophies: abstraction, automation, and accountability. A hybrid βbuy + buildβ approach using managed/SaaS platforms layered with organizational domain models and robust data governance, organizations can achieve 90% the outcomes (scalable ingestion, analytics, ML, and self-service) without FAANG-scale budgets or engineering headcount. The key is treating data artefacts as products, enforcing standards, and leveraging internal developer platforms to maximize visibility, reliability, and business alignment.
|
Simple, Battle-Tested Algorithms Still Outperform AI (5 minute read)
Companies are losing over $200 billion annually (experiencing negative 45% ROI) by deploying AI systems instead of proven, simpler algorithms that routinely deliver off the charts ROI. Executive fascination with AI and vendor pressures drive costly, inefficient implementations, while well-established methods like EOQ and Little's Law, managed by skilled programmers, remain dramatically more effective for most operational decisions.
|
Ladder of Evidence in Understanding Effectiveness of New Products (7 minute read)
Meta data scientists use a "ladder of evidence" framework to evaluate the effectiveness of new products. Randomized Controlled Trials (RCTs) are used as the gold standard for establishing causality, but causal inference methods are essential alternatives when RCTs are not feasible due to constraints like cost, sample size, or launch requirements.
|
|
dbt Labs Open Sources MetricFlow: An Independent Schema for Data Interoperability (4 minute read)
dbt Labs has open-sourced MetricFlow (a Semantic Layer SQL generation tool and its JSON-based metadata schema) to drive semantic interoperability across the data ecosystem under the Apache 2.0 license. The universal metadata layer enables common metric definitions and lineage tracing between tools and data platforms. This approach facilitates data product transparency and audit, particularly relevant for agentic and LLM-powered workflows, by standardizing data and metric exchange across platforms.
|
FlinkSketch (GitHub Repo)
FlinkSketch is a library offering various sketching algorithms compatible with Apache Flink. Developers can integrate these algorithms into their applications using Flink's DataStream API.
|
|
Anonymous Credentials: Rate-Limiting Bots and Agents Without Compromising Privacy (21 minute read)
The rapid adoption of agentic AI is shifting web traffic from individual users to high-frequency, platform-driven requests, challenging existing rate-limiting and security mechanisms, especially as traditional fingerprinting becomes ineffective and risks unjustly blocking large user cohorts. Cloudflare uses cryptographically robust anonymous credentials (ARC and ACT), enabling fine-grained, privacy-preserving rate limiting and per-user controls with sublinear communication costs and support for multi-use tokens and late origin-binding. These protocols, under IETF standardization, offer actionable primitives (algebraic MACs and zero-knowledge proofs) that seamlessly integrate with modern automation frameworks (e.g., MCP Tools), effectively balancing resource management, security, and user privacy.
|
The Architectural Shift: AI Agents Become Execution Engines While Backends Retreat to Governance (3 minute read)
Enterprise architecture is rapidly evolving as AI agents transition into primary execution engines, directly orchestrating workflows and CRUD operations via protocols like Model Context Protocol (MCP), while backends focus on governance and permissions. Gartner forecasts that 40% of enterprise applications will feature autonomous agents by 2026, up from less than 5% today, with agent-driven systems poised to deliver up to $6 trillion in economic value by 2028.
|
|
|
Love TLDR? Tell your friends and get rewards!
|
|
Share your referral link below with friends to get free TLDR swag!
|
|
|
|
Track your referrals here.
|
|
|
|