[Python] Collecting YouTube Comments by using YouTube Data API

▶️ Updated on March 6th, 2025

YouTube is one of the largest social media platforms, generating millions of comments daily across various videos. Whether you are conducting sentiment analysis, market research, or simply analyzing audience engagement, collecting YouTube comments can provide valuable insights.

The YouTube Data API v3 allows developers to access and retrieve YouTube content, including video details, channel statistics, and, most importantly, comments. In this tutorial, I introduce the process of extracting comments from YouTube videos using Python and the YouTube Data API.

By the end of this guide, you’ll learn how to:

  • Set up your YouTube API key
  • Install the required Python libraries
  • Fetch comments from a specific YouTube video

Flow Chart

Step 1. Request your own API

You need to enable your own YouTube Data API v3 at this link (https://console.cloud.google.com/apis/library/youtube.googleapis.com). It has a daily limit, but you can get free access to some amount of data.

In the following screen, click “create credentials” and put your information.

It will return the API key. Do not share it with others and copy it to put it in the Python code.

First, you need to import the libraries and then put your API key in the “put-your-key-here.”

Python
!pip install google-api-python-client isodate

import os
import time
import pandas as pd
from googleapiclient.discovery import build
import isodate  # For parsing video duration

API_KEY= 'put-your-key-here'
youtube = build("youtube", "v3", developerKey=API_KEY)

Step 2: Find the key for YouTube Playlist or video

Now, you need to find the key for the YouTube Playlist or video that you would love to collect the comments data. It’s really simple! You can find it in the URL of the Playlist or video.

For example, if the URL of the playlist is “https://www.youtube.com/watch?v=ZfCNFYAd77o&list=PL5gua8hQg_DoHCEBeOISWjUK2I3r00puR,” PL5gua8hQg_DoHCEBeOISWjUK2I3r00puR after &=list is the key to the playlist. The video key is ZfCNFYAd77o after watch?=v.

Python Code to Collect YouTube ID List

Instead of finding YouTube ID one by one, you can collect the video ID list based on the keywords using the following Python code.

Python
# File path for saving video metadata
VIDEO_DATA_FILE = "video_data.csv"

def get_video_data(keyword):
    """
    Searches YouTube for videos based on a keyword and returns a list of video metadata.
    Includes a check for whether the video is a YouTube Shorts.
    """
    video_data = []
    next_page_token = None

    while True:
        request = youtube.search().list(
            q=keyword,
            part="id,snippet",
            maxResults=50,
            type="video",
            pageToken=next_page_token
        )
        response = request.execute()

        video_ids = [item["id"]["videoId"] for item in response.get("items", [])]

        if not video_ids:
            break

        # Fetch detailed video statistics & duration
        video_request = youtube.videos().list(
            part="snippet,statistics,contentDetails",
            id=",".join(video_ids)
        )
        video_response = video_request.execute()

        for item in video_response.get("items", []):
            # Extract duration and check if it's a Shorts
            duration = item["contentDetails"]["duration"]
            parsed_duration = isodate.parse_duration(duration).total_seconds()
            is_shorts = parsed_duration < 60  # Shorts are videos less than 60 seconds

            video_data.append({
                "video_id": item["id"],
                "title": item["snippet"]["title"],
                "publish_date": item["snippet"]["publishedAt"],
                "channel_title": item["snippet"]["channelTitle"],
                "like_count": item["statistics"].get("likeCount", 0),
                "comment_count": item["statistics"].get("commentCount", 0),
                "view_count": item["statistics"].get("viewCount", 0),
                "duration": duration,  # Keep raw duration format
                "is_shorts": is_shorts  # True if video is a Shorts
            })

        next_page_token = response.get("nextPageToken")

        if not next_page_token:
            break

        time.sleep(1)  # Avoid hitting rate limits

    return video_data

def save_video_data(video_data):
    """
    Saves video metadata to a CSV file with proper encoding.
    """
    df = pd.DataFrame(video_data)
    df.to_csv(VIDEO_DATA_FILE, index=False, encoding="utf-8-sig")  # Prevents encoding issues
    print(f"Saved {len(video_data)} video metadata entries to {VIDEO_DATA_FILE}")

if __name__ == "__main__":
    keyword = input("Enter the search keyword: ")
    video_data = get_video_data(keyword)
    save_video_data(video_data)

By using this code, you can collect YouTube video ID list and their titles, date, and the number of likes, comments, and views by using the keywords. Here is an example:

Step 3: Collect Comments from ID List, Playlist ID, or Video ID

You can choose one of the following options based on what you need – if you want to collect the comments from all IDs we collected in Step 2, or a specific Playlist ID, or video ID.

Option 1: Collect Coments from ID List

Following the code in Step 2, you can collect the video comments for all video IDs we collected.

Python
# File paths for checkpoints
VIDEO_DATA_FILE = "video_data.csv"
COMMENTS_FILE = "youtube_comments.csv"

def load_video_ids():
    """
    Loads video IDs from the saved CSV file.
    """
    if os.path.exists(VIDEO_DATA_FILE):
        return set(pd.read_csv(VIDEO_DATA_FILE)["video_id"].tolist())
    return set()

def get_video_comments(video_id):
    """
    Fetches all comments along with metadata from a given YouTube video ID.
    """
    comments = []
    next_page_token = None

    while True:
        try:
            request = youtube.commentThreads().list(
                part="snippet",
                videoId=video_id,
                maxResults=100,
                pageToken=next_page_token
            )
            response = request.execute()

            for item in response.get("items", []):
                snippet = item["snippet"]["topLevelComment"]["snippet"]

                comments.append({
                    "video_id": video_id,
                    "comment_id": item["id"],
                    "comment_text": snippet["textDisplay"],
                    "comment_author": snippet["authorDisplayName"],
                    "comment_author_id": snippet["authorChannelId"]["value"],
                    "comment_publish_date": snippet["publishedAt"],
                    "like_count": snippet.get("likeCount", 0),
                    "reply_count": item["snippet"].get("totalReplyCount", 0),
                    "is_reply": False  # This is a top-level comment
                })

                # Fetch replies if available
                if item["snippet"].get("totalReplyCount", 0) > 0:
                    reply_request = youtube.comments().list(
                        part="snippet",
                        parentId=item["id"],
                        maxResults=100
                    )
                    reply_response = reply_request.execute()

                    for reply in reply_response.get("items", []):
                        reply_snippet = reply["snippet"]
                        comments.append({
                            "video_id": video_id,
                            "comment_id": reply["id"],
                            "comment_text": reply_snippet["textDisplay"],
                            "comment_author": reply_snippet["authorDisplayName"],
                            "comment_author_id": reply_snippet["authorChannelId"]["value"],
                            "comment_publish_date": reply_snippet["publishedAt"],
                            "like_count": reply_snippet.get("likeCount", 0),
                            "reply_count": 0,  # Replies don't have further replies
                            "is_reply": True
                        })

            next_page_token = response.get("nextPageToken")

            if not next_page_token:
                break

            time.sleep(1)  # Avoid hitting rate limits
        except Exception as e:
            print(f"Error fetching comments for video {video_id}: {e}")
            break

    return comments

def save_comments(comments):
    """
    Saves comments to a CSV file with proper encoding.
    """
    df = pd.DataFrame(comments)
    df.to_csv(COMMENTS_FILE, mode="a", header=not os.path.exists(COMMENTS_FILE), index=False, encoding="utf-8-sig")
    print(f"Saved {len(comments)} comments to {COMMENTS_FILE}")

def load_existing_comments():
    """
    Loads existing comments and returns the set of already processed video IDs.
    """
    if os.path.exists(COMMENTS_FILE):
        return set(pd.read_csv(COMMENTS_FILE)["video_id"].unique())
    return set()

if __name__ == "__main__":
    video_ids = load_video_ids()
    if not video_ids:
        print("No video IDs found. Run collect_video_data.py first.")
        exit()

    processed_videos = load_existing_comments()
    print(f"Already processed {len(processed_videos)} videos.")

    for video_id in video_ids:
        if video_id in processed_videos:
            print(f"Skipping already processed video: {video_id}")
            continue

        print(f"Fetching comments for video: {video_id}")
        comments = get_video_comments(video_id)

        if comments:
            save_comments(comments)

        time.sleep(2)  # Avoid hitting API rate limits

This code shows the progress of collecting comments, and saving the file in progress so that you can download the data even though it stopped because of the API limit.

Option 2: Collect Comments from YouTube playlist
Python
def get_playlist_video_ids(service, **kwargs):
    video_ids = []
    results = service.playlistItems().list(**kwargs).execute()
    while results:
        for item in results['items']:
            video_ids.append(item['snippet']['resourceId']['videoId'])

        # check if there are more videos
        if 'nextPageToken' in results:
            kwargs['pageToken'] = results['nextPageToken']
            results = service.playlistItems().list(**kwargs).execute()
        else:
            break

    return video_ids
    
def get_video_comments(service, **kwargs):
    comments, dates, likes, video_titles = [], [], [], []
    results = service.commentThreads().list(**kwargs).execute()

    while results:
        for item in results['items']:
            comment = item['snippet']['topLevelComment']['snippet']['textDisplay']
            date = item['snippet']['topLevelComment']['snippet']['publishedAt']
            like = item['snippet']['topLevelComment']['snippet']['likeCount']
            video_title = service.videos().list(part='snippet', id=kwargs['videoId']).execute()['items'][0]['snippet']['title']

            comments.append(comment)
            dates.append(date)
            likes.append(like)
            video_titles.append(video_title)

        # check if there are more comments
        if 'nextPageToken' in results:
            kwargs['pageToken'] = results['nextPageToken']
            results = service.commentThreads().list(**kwargs).execute()
        else:
            break

    return pd.DataFrame({'Video Title': video_titles, 'Comments': comments, 'Date': dates, 'Likes': likes})

The following codes will return the pandas DataFrame with the Video Title column.

Python
def main():
    # build the service
    youtube = build("youtube", "v3", developerKey=API_KEY)

    # get playlist video ids
    playlist_id = 'PLxNb_gmvauiRtxQrQsKLEWlFVUmRixmtS'
    video_ids = get_playlist_video_ids(youtube, part='snippet', maxResults=50, playlistId=playlist_id)

    # get the comments from each video
    all_comments_df = pd.DataFrame()

    for video_id in video_ids:
        try:
            comments_df = get_video_comments(youtube, part='snippet', videoId=video_id, textFormat='plainText')
            all_comments_df = pd.concat([all_comments_df, comments_df], ignore_index=True)
        except HttpError as e:
            print(f"An HTTP error {e.resp.status} occurred:\n{e.content}")

    return all_comments_df  # return the DataFrame

if __name__ == '__main__':
    df = main()
    print(df)  # print the DataFrame here

comments_df.to_csv('output.csv') ## Export it to CSV format
Option 3: Collect Comments from YouTube Video

If you would love to collect the comments from the single video, you can run this code instead of the codes above.

Python
def get_video_comments(service, **kwargs):
    comments, dates, likes = [], [], []
    results = service.commentThreads().list(**kwargs).execute()

    while results:
        for item in results['items']:
            comment = item['snippet']['topLevelComment']['snippet']['textDisplay']
            date = item['snippet']['topLevelComment']['snippet']['publishedAt']
            like = item['snippet']['topLevelComment']['snippet']['likeCount']

            comments.append(comment)
            dates.append(date)
            likes.append(like)

        # check if there are more comments
        if 'nextPageToken' in results:
            kwargs['pageToken'] = results['nextPageToken']
            results = service.commentThreads().list(**kwargs).execute()
        else:
            break

    return pd.DataFrame({'Comments': comments, 'Date': dates, 'Likes': likes})

The following will return the pandas DataFrame with date and number of likes as follows πŸ™‚

comments_df = None

def main():
    global comments_df
    
    # Build the service
    youtube = build(YOUTUBE_API_SERVICE_NAME, YOUTUBE_API_VERSION, developerKey=DEVELOPER_KEY)

    # Get the comments
    video_id = 'your-video-key-here' 
    comments_df = get_video_comments(youtube, part='snippet', videoId=video_id, textFormat='plainText')

if __name__ == '__main__':
    main()

print(comments_df)

comments_df.to_csv('output.csv') ## Export it to CSV format

  • March 5, 2025