Change in Transcript Output Format

Hi Supadata team,

I've been using youtube transcript API for a while and recently noticed a change in the format of the output that is breaking my application.

Previously, when fetching a transcript, the content was returned as clean, structured text where each speaker's speech was grouped into a complete sentence or paragraph. This made it easy to process and identify speaker turns.

However, sometime around January 17, 2026, the output format changed. The transcript is now returned as a sequence of short fragmented chunks β€” exactly as the captions appear on YouTube (short timed snippets). This makes it very difficult to work with programmatically, as the text is broken mid-sentence and speaker turns are no longer clearly grouped.

Here is a simplified example of what I was getting before:

Speaker 1 sentence here, complete and readable. Speaker 2 response here, also complete.

And here is what I get now:

word word word word word word word word word

I haven't made any changes to my API request. I'm using the following parameters:

  • mode: auto

  • lang: ar

I have two transcripts generated on the same day (2026-01-17) where the first one has the old format and the second already has the new format, which suggests the change happened on your side on that date.

Could you please clarify:

  1. Was there an intentional change to the output format around that date?

  2. Is there a parameter I can use to get back the old structured format?

  3. If this was a breaking change, will it be documented going forward?

Thank you for your time and I look forward to your response.

Please authenticate to join the conversation.

Upvoters
Status

In Review

Board
πŸ’‘

Feature Request

Date

11 days ago

Author

contact.galdi

Subscribe to post

Get notified by email when there are changes.