Some of the best cooking content today lives on social media. A chef on YouTube walks you through a dish in real time. A home cook on Instagram shares a 60-second reel of their signature pasta. Someone on TikTok reveals the secret to perfect crispy potatoes. These aren't traditional recipes with neat ingredient lists and numbered steps, but they contain everything you need to cook the dish. You just have to extract it.
In RFD #0007, we described how we import recipes from traditional websites using JSON-LD, known DOM patterns, and intelligent HTML digestion thanks to LLMs.
Social media platforms require a completely different approach. The recipe isn't written on the page. It's spoken in a video, scattered across a caption, or hidden in the description.
Traditional websites serve HTML to anyone who asks. Social media platforms are different. Most are built as JavaScript applications that render content client-side, protected by aggressive bot detection, and optimized for their own apps rather than external access.
This project is not trying to indiscriminately list and extract their entire site content. Each URL extraction is user-initiated, on an individual URL publicly available that they have shared, and on the basis of this being a continuation of their own user's interaction.
In order to enable their users/our users reliably accessing such URLs, we use specialized proxy services that can render JavaScript and circumvent most of these limitations. It's more expensive and slower than a simple HTTP request, but it's the only way to reliably access social media content.
We currently support three major platforms where cooking content thrives:
YouTube is the richest source. Cooking videos often include detailed explanations, technique demonstrations, and complete walkthroughs. Creators frequently link to full recipes in their descriptions. Many videos have accurate subtitles, either auto-generated or manually added.
Instagram (including Facebook Reels) presents recipes in short-form video, usually 30 to 90 seconds. The content is fast and visual, with key information often in captions or on-screen text. Creators typically direct viewers to external links for the full recipe.
TikTok is similar to Instagram in format but with its own ecosystem. Recipe videos tend to be quick and personality-driven, with ingredients and steps revealed rapidly. Important details often live in the caption or comments.
Each platform has its quirks, but they share a common pattern: the recipe exists across multiple sources that need to be gathered and combined.
The most valuable content in a cooking video is often spoken aloud. The creator explains what they're doing, lists ingredients as they add them, mentions cooking times, and shares tips that never appear in any written description. To extract this, we need the transcript.
YouTube makes this relatively straightforward. Most videos have subtitle tracks, either manually created by the uploader or auto-generated by YouTube's speech recognition. We fetch these subtitles and get a timestamped transcript of everything said in the video.
For Instagram and TikTok, transcription is harder. These platforms don't expose subtitles the same way. When available, we use speech-to-text services to transcribe the audio directly. This adds cost and processing time, but for a 60-second cooking reel, the transcript often contains the entire recipe.
This transcription capability does double duty. In RFD #0006, we described our file import system and mentioned future support for audio. The same speech-to-text pipeline that transcribes social media videos also powers audio imports. You can record yourself reading a recipe aloud, or capture someone explaining a dish, and we'll transcribe and extract it the same way.
Social media posts frequently reference external content. A YouTube description might say Full recipe on my blog! with a link. A TikTok caption might just have a link buried in it.
We extract these URLs and follow them. Each external link gets processed through our website extraction pipeline from RFD #0007: JSON-LD if available, known DOM patterns if recognized, HTML digestion otherwise. If the creator linked to their blog, and their blog has a proper recipe page, we get a clean structured recipe from it.
This happens in parallel. We'll follow up to three external links simultaneously, racing to find the best recipe source. Often the linked website has more complete information than the video itself: exact measurements, precise temperatures, serving sizes, and notes that didn't fit in a short video.
At this point we might have several versions of the same recipe:
about two cups of flour, maybe a little more2 cups flour, 1 tsp baking soda, pinch of saltJSON-LD recipe including yield, prep time, and nutritional informationEach source has value. The transcript captures tips and explanations that aren't written anywhere. The description is often more precise about quantities. The external website has the most structured data.
Our final step is enhancement. During the enhancement step, we blend all these sources into a single, coherent recipe. The language model sees everything we've gathered and produces one recipe that combines the best of each source. Precise measurements from the website. Technique tips from the transcript. The creator's personality from how they described the dish.
The result is often better than any single source alone. It's not just transcription or just scraping. It's synthesis.
Putting it all together, here's what happens when you paste a YouTube URL:
YouTube URL arrives
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ 1. Fetch video metadata │
│ Title, channel, description, thumbnail │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ 2. Extract transcript │
│ From subtitle tracks (manual or auto-generated) │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ 3. Find external URLs │
│ Parse description for links to recipe websites │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ 4. Process external URLs (parallel) │
│ Each URL → website extraction pipeline (RFD #0007) │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ 5. Enhance and blend │
│ Combine transcript + description + external recipes │
│ → One coherent, complete recipe │
└─────────────────────────────────────────────────────────────────┘
Instagram and TikTok follow a similar flow, with platform-specific adjustments for how we access content and extract transcripts.
Social media has become where many people discover recipes. A generation of home cooks learns from YouTube videos, not cookbooks. TikTok recipes go viral and become dinner that same week. Ignoring these platforms would mean ignoring how people actually find things to cook.
But social media content is ephemeral. Videos get deleted. Accounts disappear. Platforms change. That 30-second reel that taught you the perfect scrambled eggs might not exist next year.
By extracting recipes from social media, we give users a way to preserve what they've found. The recipe that exists only as a fast-talking video becomes structured text you can read while cooking. The tips scattered across a caption and transcript become notes attached to your saved recipe. The link to the creator's website stays connected, preserving attribution even as social platforms come and go.
The recipe was always there, spoken and shown. We just write it down.
Author: Jorge Bastida
Published: January 7, 2026
RFD: #0009
If you'd like to discuss this RFD, share your thoughts, or simply chat about it, feel free to reach out to me - To stay up to date with the project's development, you can follow me on X