Help make Supadata better.

Tell us how we could make the product more useful to you.

Need help? Use the chat button in the corner to contact us.

Change in Transcript Output Format

Hi Supadata team, I've been using youtube transcript API for a while and recently noticed a change in the format of the output that is breaking my application. Previously, when fetching a transcript, the content was returned as clean, structured text where each speaker's speech was grouped into a complete sentence or paragraph. This made it easy to process and identify speaker turns. However, sometime around January 17, 2026, the output format changed. The transcript is now returned as a sequence of short fragmented chunks β€” exactly as the captions appear on YouTube (short timed snippets). This makes it very difficult to work with programmatically, as the text is broken mid-sentence and speaker turns are no longer clearly grouped. Here is a simplified example of what I was getting before: Speaker 1 sentence here, complete and readable. Speaker 2 response here, also complete. And here is what I get now: word word word word word word word word word I haven't made any changes to my API request. I'm using the following parameters: mode: auto lang: ar I have two transcripts generated on the same day (2026-01-17) where the first one has the old format and the second already has the new format, which suggests the change happened on your side on that date. Could you please clarify: Was there an intentional change to the output format around that date? Is there a parameter I can use to get back the old structured format? If this was a breaking change, will it be documented going forward? Thank you for your time and I look forward to your response.

contact.galdi 11 days ago

Completed

Bug Report: GET /v1/youtube/transcript returns 206 (Transcript Unavailable) when lang parameter is omitted, even when captions exist

Bug Report: GET /v1/youtube/transcript returns 206 (Transcript Unavailable) when lang parameter is omitted, even when captions exist Severity: High β€” causes false negatives for videos with non-English captions Description When calling GET /v1/youtube/transcript without the lang query parameter, the API returns HTTP 206 with {"error": "transcript-unavailable"} for videos that do have captions available (e.g., Portuguese auto-generated captions). However, when the same request includes any lang parameter (even a language that doesn't match the video's captions), the API correctly falls back to the first available language and returns HTTP 200 with the full transcript. Steps to Reproduce Video: mkn_Bx-qe4o (has Portuguese auto-generated captions, visible in the YouTube player) Request 1 β€” WITHOUT lang (fails): curl -s -w "\nHTTP: %{http_code}" "https://api.supadata.ai/v1/youtube/transcript?videoId=mkn_Bx-qe4o&text=true" -H "x-api-key: YOUR_KEY" Response: HTTP: 206 {"error":"transcript-unavailable","message":"Transcript Unavailable","details":"No transcript is available for this video","documentationUrl":"https://docs.supadata.ai/errors/transcript-unavailable"} Request 2 β€” WITH lang=pt (succeeds): curl -s -w "\nHTTP: %{http_code}" "https://api.supadata.ai/v1/youtube/transcript?videoId=mkn_Bx-qe4o&text=true&lang=pt" -H "x-api-key: YOUR_KEY" Response: HTTP: 200 {"lang":"pt","availableLangs":["pt"],"content":"[mΓΊsica] Boa noite, seja novamente muito bem-vindo ao dia 2 do nosso seminΓ‘rio..."} Request 3 β€” WITH lang=en (also succeeds β€” falls back to PT): curl -s -w "\nHTTP: %{http_code}" "https://api.supadata.ai/v1/youtube/transcript?videoId=mkn_Bx-qe4o&text=true&lang=en" -H "x-api-key: YOUR_KEY" Response: HTTP: 200 {"lang":"pt","availableLangs":["pt"],"content":"[mΓΊsica] Boa noite, seja novamente muito bem-vindo ao dia 2 do nosso seminΓ‘rio..."} Expected Behavior When lang is omitted, the API should behave the same as when lang is provided with a non-matching language: fall back to the first available transcript language and return HTTP 200 with the content. Per your own documentation: "If the video does not have a transcript in the preferred language, the endpoint will return a transcript in the first available language and a list of other available languages." This fallback should also apply when no lang preference is specified at all. Actual Behavior When lang is omitted, the API returns 206 with transcript-unavailable instead of falling back. This only happens when there is no "default" language transcript (likely English). Videos with only non-English captions (e.g., Portuguese, Spanish) are affected. Impact This bug causes all videos with only non-English captions to appear as having no transcripts available, even though captions are clearly present in the YouTube player. The workaround is to always include a lang parameter, which triggers the correct fallback behavior. Environment Plan: Free (but the behavior is API logic, not plan-related) Date tested: March 20, 2026 Endpoint: GET /v1/youtube/transcript

rodrigo.pinto 25 days ago