👋 Hi, this is Gergely with a subscriber-only issue of the Pragmatic Engineer Newsletter. In every issue, I cover challenges at Big Tech and startups through the lens of engineering managers and senior engineers. If you’ve been forwarded this email, you can subscribe here. Building, launching, and scaling ChatGPT ImagesChatGPT Images is OpenAI’s biggest launch yet, with 100 million NEW users generating 700 million images in the first week. But how was it built? A deepdive with OpenAI’s engineering team
ChatGPT is the fastest-growing app of all time: from its launch in November 2022, it took the AI chat assistant only 12 months to hit 100M weekly active users. And new figures show that growth is speeding up. ChatGPT Images released at the end of March, and an incredible 100 million new users signed up in the first week. This load was far higher than OpenAI had expected and prepared for, but the launch passed with no major outages. Afterwards, I sat down with two engineering leaders deeply involved in ChatGPT Images: Sulman Choudhry (head of engineering, ChatGPT) and Srinivas Narayanan (VP of engineering, OpenAI). In this article, they share previously-unreleased details about the Images project, and behind-the-scenes details of how the team pulled the launch off. For extra context, check out a previous deepdive with the ChatGPT team, including how OpenAI ships so fast, and the real-world scaling challenges they solved. Today, we cover:
This article is from the Real-world engineering challenges series. See all others here. 1. LaunchOpenAI keeps launching new features at a rapid pace. In one month, they launched:
Then, on Tuesday, 25 March 2025, OpenAI released Image Generation using the 4o model:
It’s hard to predict if a launch will achieve “cut through” by becoming an event in itself. ChatGPT’s head of engineering, Sulman Choudhry, reveals this wasn’t widely expected of ChatGPT Images, internally:
Sulman calls the launch the craziest of his entire career – and this from someone who scaled Facebook Video to 5 billion daily views back in 2014. The team designed ChatGPT Images expecting to drive meaningful growth similar to the DALL-E 3 images feature launched in October 2023. But they simply did not expect as much growth as what happened. The plan was to initially release ChatGPT Images to paying subscribers, and then free users later on the same day. However, as things unfolded it was decided to postpone launching to free users:
But the OpenAI team wanted to get Images in front of free users, so despite the high ongoing load, a day later (on 27 March) they started a gradual rollout to free users. At this point, traffic ratched up, big time. Viral in IndiaAs soon as free users got access to ChatGPT Images, usage in India blew up: celebrities in the country shared images created by ChatGPT in the Ghibli animation style, such as India’s most famous cricketer, Sachin Tendulkar:
The Prime Minister of India, Narendra Modi, was depicted in images recreated in Ghibli style:
Srinivas told me:
Ghibli-style generation has remained one of the most common use cases – and one I’ve played around with by turning existing photos into cheerful, anime-style images. Launch statsThe team worked around the clock to ensure paying users could keep generating images, and preparing for the launch to free users. Five days after everyone got access, the load was still high enough that additional work was needed to keep things up and running:
On day six, 31 March, yet another viral spike added one million users in just one hour:
Launch stats:
Despite unexpectedly high traffic, ChatGPT avoided hard outages like the site going down, and maintained availability for existing users. At peak load, latency did regress, but the team prioritized keeping the service accessible. They kept the site responsive by applying rate limits and increasing compute allocations to stabilize performance. Shortly after the peak, they returned to normal rate limits and brought latency back to acceptable levels. A rule of thumb the ChatGPT engineering team uses is to intentionally prioritize access over latency. So, at times of unexpected growth, latency is often the first tradeoff made to keep the platform up. 2. How ChatGPT Images worksHere’s how image generation works, as described by Sulman:
Here’s an image generated by these steps: Another feature of ChatGPT images is that you can iterate on a generated image with new prompts. This operation takes the existing image (with tokens) and applies a new prompt on top, meaning it’s possible to tweak an image: “Tweaking” an existing image is a practical feature, but it involves a lot more resource usage because the same compute operations execute each time a “tweaked” image is generated. Tech stackThe technology choices behind the product are surprisingly simple; dare I say, pragmatic!
3. Changing the engine while speeding on the highwayThe ChatGPT team designed Images to be a synchronous product: when an image starts rendering, it needs to finish in a synchronous way. If the process is interrupted, there is no way to restart it, and while an image is rendering, it continues using GPU and memory resources. The problem with this setup is that it can’t handle peak load by taking advantage of excess capacity at non-peak times. Sulman recounts how the team decided to rewrite image generation engine, while dealing with rapidly-rising load:
Isolating other OpenAI systemsA viral launch is usually great news, except when it takes down other parts of the system! In the case of ChatGPT, the product is used by paying users (many of whom are developers), as well as larger enterprises on enterprise plans. Ideally, ChatGPT Images’ unprecedented load should not impact other systems. But due to the increased load, several underlying infrastructure systems were impacted:
Fortunately, the engineering team had prioritized reliability ahead of the launch, so many OpenAI systems were isolated from ChatGPT traffic. OpenAI has always had strict reliability standards for the OpenAI API, and many systems were isolated from ChatGPT traffic. However, there were still some shared components for which there was a plan to isolate, which hadn’t happened yet; such as a couple of compute clusters, and a shared database instance. Seeing the sudden surge and impact on otherwise independent systems made the team speed up their isolation work. They decoupled non-ChatGPT systems from the ChatGPT infrastructure, and most OpenAI API endpoints stayed stable during the Images-related spike, thanks to prior work on isolation. They also wrapped up the work of isolating non-ChatGPT endpoints from ChatGPT infra. Improving performance while finding new capacityImages encountered a compute bottleneck, so the team started to push changes that improved performance and they sought quick performance wins, even before capacity was increased. Srinivas (head of engineering, OpenAI) recalls:
4. Reliability challengeChatGPT working reliably with higher-than-expected load was key to the successful launch. Here’s how the team prioritized this non-functional requirement... Subscribe to The Pragmatic Engineer to unlock the rest.Become a paying subscriber of The Pragmatic Engineer to get access to this post and other subscriber-only content. A subscription gets you:
|