Producing an Automatic Transcript for a YouTube Video
data:image/s3,"s3://crabby-images/d503c/d503c1ed30908d4a0160ee8cbf4b61bc5f904267" alt="Still of the Wilson Center presentation."
Still of the Wilson Center presentation.
data:image/s3,"s3://crabby-images/a5403/a5403b70615b2a6d1fa7ae980bd7f0e3c5132cbb" alt="Partial view of the transcript that was produced."
Partial view of the transcript that was produced.
data:image/s3,"s3://crabby-images/bf24c/bf24cf7018b88b5f2fbe499e2207537b667a4fd4" alt="Partial view of the Python code."
Partial view of the Python code.
(For the README file, code, and csv file please click here.)
Executive Summary
In July 2023, I was working on an article about intergenerational trauma in Korea due to various episodes occurring throughout history. One of the sources I refer to in my piece is the "U.S.-Korea Relations: Retrospective on the Jeju April 3 Incident, Human Rights, and Alliance" presentation at the Wilson Center think tank in Washington, DC, that took place in Dec. 2022.
Initially, I found myself manually typing out comments made by the panelists, but since the video is approximately 3 hours long - even though I only intended to pull just a few quotes - I decided typing out quotes manually was still too time-consuming. I googled and found https://github.com/jdepoix/youtube-transcript-api, which was my initial starting point for using Python code, but I tweaked the code so that a) timestamps also appeared every 30 seconds in the transcript and b) the transcript is printed to a txt file.
The final transcript can be seen in the attached file "transcript_with_timestamps.txt".
Tools Used
Programming language: Python