Design like Karpathy is watching

A talk about building AI products with LLMs in mind

Tue Jul 22 2025 00:00:00 GMT+0000 (Coordinated Universal Time)

I gave this talk at the AI Engineer World's Fair in San Francisco on June 5, 2025.

It's a story about how the legendary AI researcher Andrej Karpathy built an app using Replicate and ran into some challenges along the way. The moral of the story is that in this new world where LLMs write the majority of the code on behalf of software engineers, we need to design our web apps and services to be more accessible to LLMs.

The official talk

The rough-cut talk

I didn't want to wait for the official talk to be published, so I recorded a rough-cut version of the talk. It's a bit more raw, but maybe also worth publishing...

The transcript

This is the transcript of the talk. It's more here for SEO juice than for human benefit...

How many of you know who Andrej Karpathy is? Raise your hand. Okay, maybe half of you. Raise your hand if you are not Andrej Karpathy. Just trying to gauge audience participation here. Okay, so I got 80% there, something like that. Got a lot of Andrejs in the room right now. Um, raise your hand if you work at Replicate. All right. So if you want to talk to any Replicate folks, there's, there's your group right there. All right, so, um, for those who don't know who Andrej Karpathy is, I will jump into that and explain that. Um, these are my uh, there's a GitHub repo that corresponds to this, um, these slides. So if you want to grab that, this will I'll put this slide up at the end, too, so you can um, track down any URLs or anything that I mention in the talk. Um, my name is Zeke. I'm Zeke on GitHub. Um, Zeke on X as well. Um, and I work for Replicate. So, uh, Replicate is a cloud platform that lets you run AI models with an API. So, we have, um, you know, open-source models like all the great Flux models from Black Forest Labs, but we also have, you know, proprietary models from Anthropic, OpenAI, Google, et cetera. Um, and of course, you can also run your own custom public and private models on Replicate as well. So, let's get to the point. Who is Andrej Karpathy? Well, he's an AI research, he's, an AI researcher who's worked at all these big uh companies and organizations. Google, OpenAI, Tesla, OpenAI, Eureka Labs. Um, Eureka Labs is his new thing, uh, an educational, uh, platform. Uh, but most importantly to me, he is a YouTube educator and does some really amazing talks that are highly accessible that explain how AI and machine learning works for general audiences. Um, he coined the term "vibe coding" a few months ago and of course, that's taken the world by storm. We're all really interested in that now. And subscribes to the idea that the hottest new programming language is English. Um, kind of a hot take. Um, he also wrote something called the Software 2.0 manifesto, which was, um, now seven years ago, kind of a eternity in machine learning time. Uh, basically predicting this world in which uh, machine learning models would write code for us. Um, and that it would be they would be better at it than, than humans. And so, of course, here we are. Um, so today I want to talk about MenuGen. So, MenuGen is, um, an app that Andrej created recently at a, I think he was at a hackathon doing like a, a vibe coding experiment. So, MenuGen is basically this, uh, web app where you take photos of a menu at a restaurant. That's all in a text format and it generates image representations of the contents of the menu for you. So if you don't know what the words mean, or English isn't your first language, or you just like to see tantalizing photos of food that may be good. Um, that was the idea behind it. So he was actually able to build this app, which he described as an exhilarating and fun escapade as a local demo, but a bit of a painful slog as a deployed, real app. So, you've probably many of you have probably experienced this where you are working on something locally. You have it running on your machine. Oh, cool, it really works. It's amazing. And then you try to deploy it to, you know, Vercel or Cloudflare or something like that. And that's where a lot of the, the pain begins. Um, so we're going to talk about that. So, um, Andrej kind of wrote this blog post about the experience of creating MenuGen, um, and saying, you know, I was able to make this thing, publish it, get it online, uh, add payments for it and it's a working, functioning app that people can pay for and it was super fun to build. However, he kind of rakes all these different companies over the coals because of the sort of developer experience challenges of working with all of them. So, for me, it was cool because it was like, oh, okay, Replicate is mentioned among all these big hotshot companies like OpenAI and Vercel. Um, but we also all have work to do to improve our products to make them better. So, here's a blurb about kind of what he what he experienced when he started using Replicate API. So, the LLM's knowledge of Replicate was outdated. The docs on Replicate were out of date. Um, there were changes in the API. He experienced rate limiting on the API so it was difficult to debug the app. And I was told later that these are common protection measures by these services to mitigate fraud, but they also make it harder to get started with new legitimate accounts. So, this is kind of embarrassing, but it's also kind of like an opportunity to fix our product and make it better and really listen to, you know, the kind of voices that are loud and correct about the problems with our products. So, what can Replicate do better? Um, one of them is embracing llms.txt. llms.txt is this thing where you can, uh, basically, uh, modify your website or your API or existing services to, um, render text-based or markdown-based versions of your documentation in a format that is friendly for language models to consume. Uh, more friendly than like the HTML contents of a web page. So, he said, Tired: elaborate docs pages with fancy color palettes, branding, animations, transitions, dark mode. Wired: one single docs .md file and a “copy to clipboard” button. So, it sounds simple, um, and maybe not the most glamorous thing, but it is actually the thing that your language models want to consume. So, in response to this, we added a new feature on the Replicate website where you're viewing any model page. You have a button to copy the contents of that page, uh, as markdown for a language model, or to send the page directly to Claude to have an interaction with the contents of the model page to learn more about what the model can do. Similarly, we added that support for linking to ChatGPT. You basically just say, I'm on a model page. You jump into ChatGPT and you start having a conversation about the model. So, it's a lot more interactive than just going to a web page and reading and trying to find the most relevant content. Of course, we also just dump the markdown here, too. So if you're using a tool like Cursor or Windsurf, grab this content, put it into your editor and it knows how to run this model. So, next thing, this was not necessarily from the blog post, but this is from, I'm grabbing some quotes from recent, uh, tweets from Andrej Karpathy. So, LLMs don't like to click, they like to curl. So, love it or hate it, curl is, um, a tool that is here to stay. It's developed, it's been around for, I don't know, since the 90s, maybe. Um, it's installed on everyone's machine and it is basically a standardized way to be able to make API calls without any specialized tooling. So, let's look at this curl command. Maybe it looks ugly, right? It's, there's a lot of syntax. It's not, it's not glamorous. But it covers everything that you or that a language model needs to know about how to make an API request. What is the HTTP method? What is the JSON payload? How do you send your credentials? What kind of response type do you want? Do you want to make a blocking request or an asynchronous request? What is the API endpoint? That's all covered in this one little line of code. And this is exactly the kind of thing that LLMs want to consume. If you give this content to an LLM, it now knows how to make API requests to your service. So, it's really powerful. So, we have a tool called Cog at cog.run, which is an open-source tool that you can use to package machine learning models in production-ready Docker containers. It creates a standardized API around your model, um, with standard inputs and outputs using OpenAPI. So, we took all of Cog's documentation and stuffed it into a single llms.txt file at cog.run. And what you can do with that is drop it into your editor on an existing project. Let's say you've cloned some open-source Cog model and you're like, I don't even really know how this code works, but I want to change it. You open up the model, you drop a reference to that llms.txt, and your editor knows how to consume that content, bring it into context, and use it to write code. So, pretty powerful stuff. All right, so the primary audience of your thing (product, service, library, etcetera) is now an LLM, not a human. This might be like a tough pill to swallow, but I think it's the world that we're in right now. Um, so, if you've been at this conference for a couple days, you've probably heard everybody talking about MCP. Right? It's like such a big deal. But what even is it? Like, how many of you actually feel like you really know what MCP is? Okay. I like the honesty here. Like, there's like eight hands going up. Okay, so I'm going to explain this for you. Hopefully. So, OpenAPI is this thing where you write a JSON schema that defines the behavior of your HTTP API. It's basically just a giant JSON file that says, here are the paths, here are the endpoints, here are the query parameters, here's the payload for the body, here's how you run this thing, here's how you create a prediction, here's how you get your predictions, here's how you search, all that sort of stuff. And it's just one big giant JSON file that describes your whole behavior of your API. So, we have that on Replicate. And when you go to our HTTP API page, all the content on this page is generated from that schema. So, we just have a template that renders it all out as a human-friendly representation of how to use our API. Here's an example where you can search for models. So, here's where the MCP part comes in. So, MCP is basically a way of taking an OpenAPI schema and stuffing it into a format where a language model knows what to do with it. So, we now have an MCP server for Replicate, which you can install very easily. You basically open up Claude Code, for example, Claude Desktop, not the web app. Um, go into your developer settings. Add this tiny little line of JSON. And all of a sudden, Claude now knows how to do everything that the Replicate API can do and it has an API token. So, you didn't have to install any software. All you had to do was go get a token from the Replicate website and Claude takes care of the installation of the MCP server locally. And now, you can see on this page, you can actually have an interaction with Claude where it's able to run API requests on Replicate for you. So, there's a few factors here. There's you can use this for discovery. So, you don't know how to use a product yet and you want to know what it's capable of, or you want to use a language model to do searches for you. Or you want to start, um, kind of scaffolding out the beginning of a project. And you want your language model to help you with that. So, that's exactly what MCP is for. It's a way of connecting tools to your language model so that it can do all sorts of powerful things. And I want to emphasize here that at Replicate, all we really had to do to make this possible was invest in having an OpenAPI schema that was very well written, very well documented, that, um, covered everything that our APIs are capable of doing, and the process of turning that into an MCP server that can then connect with tools like Claude, uh, GitHub Copilot in Visual Studio Code, um, and now, actually, I think OpenAI added MCP support to their agents SDK earlier this week. So, MCP is just going to be all over the map and it's a way to really accommodate language models helping you do things. So, um, this is sort of a note to self, uh, for the things that we got wrong for Andrej and the things that we want to fix. Some of them we've already addressed as I showed in this talk. Some of them we still need to get right. So, maybe kind of a no-brainer, accept payments. Okay, so Andrej went on the website. He signed up for an API key. He entered his credit card info in Replicate, you know, basically legitimate user. And then he started hammering Replicate with API requests to generate images of French toast. And whatever for whatever reason the way he was doing it, he was making a ton of API requests. And he triggered some kind of abuse mechanism in our website that said, oh, well, this user's only existed for one hour and they've already sent us a thousand requests. Something must be wrong. So, we blocked him. And this isn't something you want to do, right? You want to let your power users come to your product, dive right in. They know what they're doing. They know what they want. And don't get in their way. Luckily, uh, our CEO saw this, you know, blog post from Andrej and immediately contacted him and, you know, unblocked his account. But not everyone has the power of being able to write a blog post and have everybody in the world see it and know about it. So, the lesson here for us is Replicate should accept, um, payments for credit. So, if I go on a website, I should be able to say, here's 500 bucks. Let me go nuts, do whatever I want. And don't ban me. So, we're working on that. We're going to fix that. Uh, next, document your shit. Literally, just, when you ship features on your product, don't just merge the pull request and walk away. It's not done until it's documented and the world knows about it, and an LLM can consume the content and put it to use. So, always document everything, especially now that LLMs are in charge. We're still in charge, but you know what I mean. Um, okay, so feed the machines. That's basically just a matter of, um, producing content in forms that language models can understand and consume more easily than traditional HTML web pages. Use boring technology. So, this means, um, if a technology has been around for a long time, SQL, SQL statements have been around since, hmm, I don't know, longer than some of us have been alive. That means that the language models know how to how to write SQL because they've encountered so much of it. So, when you're building products, be sure to keep in mind that your language models are going to have a better chance of writing these this software and using it if it's a well-established technology that doesn't change a lot. And lastly, practice good API hygiene. This means, when you're writing your HTTP service and you're designing what the JSON response should look like, keep in mind that it's probably going to be going into the context window of a language model now that has limitations. So, instead of dumping a JSON payload response that has everything about all the models under the sun, consider making it a more small, slimmed-down, information-dense version of what an LLM wants to see. That's all I got. Thank you. It looks like I've got two minutes. If anybody has questions. No questions. Okay. I answered everything. Here we go. Yeah, the question is what are some recommendations for generating docs. So, first thing to do, just start by generating your own OpenAPI schema. Write schemas in YAML or JSON that describe the behavior of your API. There's a ton of tools out there. Um, there's DocuSaurus, there's, uh, Read the Docs, there's, uh, readme. What is it? readme.com? There's a whole bunch of these services that know how to take an OpenAPI schema and turn it into not only documentation, but also, you know, SDKs, clients in different programming languages, all that stuff. Yeah. So you mean ways that you're thinking about uh discovery or distribution in light of what you mentioned like the LLM's are being charged and they can purchase decisions, uh, in the future. Yeah, I think the key to that is making sure that our API, um, has really good search capabilities and that a lot of the information that users need to make informed decisions is actually available via API. So, for example, with Replicate models right now, um, the pricing is currently something that you have to go to the web page to look at, either on the pricing page or on the individual model pages. If we expose pricing, uh, you know, as a JSON structure that our API can consume, that a public user can consume, then it becomes a lot easier for you to do something like jump into a session with Claude and say, oh, look, I'm evaluating all of the video models. I'm looking at, you know, Imagen and or uh, you know, VIO and K and many backs and all the other things that are out there. Show me a comparison of which models are the most expensive, which ones are the fastest, which ones can produce the highest quality output, etcetera. And if the language model has access to the structured data, it can answer those questions, then it's going to be a lot easier to make those decisions. All right. Thanks, y'all.