[Python] Collecting YouTube Comments by using YouTube Data API
▶️ Updated on March 6th, 2025
YouTube is one of the largest social media platforms, generating millions of comments daily across various videos. Whether you are conducting sentiment analysis, market research, or simply analyzing audience engagement, collecting YouTube comments can provide valuable insights.
The YouTube Data API v3 allows developers to access and retrieve YouTube content, including video details, channel statistics, and, most importantly, comments. In this tutorial, I introduce the process of extracting comments from YouTube videos using Python and the YouTube Data API.
By the end of this guide, youβll learn how to:
- Set up your YouTube API key
- Install the required Python libraries
- Fetch comments from a specific YouTube video
Step 1. Request your own API
You need to enable your own YouTube Data API v3 at this link (https://console.cloud.google.com/apis/library/youtube.googleapis.com). It has a daily limit, but you can get free access to some amount of data.
In the following screen, click “create credentials” and put your information.
It will return the API key. Do not share it with others and copy it to put it in the Python code.
First, you need to import the libraries and then put your API key in the “put-your-key-here.”
!pip install google-api-python-client isodate
import os
import time
import pandas as pd
from googleapiclient.discovery import build
import isodate # For parsing video duration
API_KEY= 'put-your-key-here'
youtube = build("youtube", "v3", developerKey=API_KEY)
Step 2: Find the key for YouTube Playlist or video
Now, you need to find the key for the YouTube Playlist or video that you would love to collect the comments data. It’s really simple! You can find it in the URL of the Playlist or video.
For example, if the URL of the playlist is “https://www.youtube.com/watch?v=ZfCNFYAd77o&list=PL5gua8hQg_DoHCEBeOISWjUK2I3r00puR,” PL5gua8hQg_DoHCEBeOISWjUK2I3r00puR
after &=list
is the key to the playlist. The video key is ZfCNFYAd77o
after watch?=v
.
Python Code to Collect YouTube ID List
Instead of finding YouTube ID one by one, you can collect the video ID list based on the keywords using the following Python code.
# File path for saving video metadata
VIDEO_DATA_FILE = "video_data.csv"
def get_video_data(keyword):
"""
Searches YouTube for videos based on a keyword and returns a list of video metadata.
Includes a check for whether the video is a YouTube Shorts.
"""
video_data = []
next_page_token = None
while True:
request = youtube.search().list(
q=keyword,
part="id,snippet",
maxResults=50,
type="video",
pageToken=next_page_token
)
response = request.execute()
video_ids = [item["id"]["videoId"] for item in response.get("items", [])]
if not video_ids:
break
# Fetch detailed video statistics & duration
video_request = youtube.videos().list(
part="snippet,statistics,contentDetails",
id=",".join(video_ids)
)
video_response = video_request.execute()
for item in video_response.get("items", []):
# Extract duration and check if it's a Shorts
duration = item["contentDetails"]["duration"]
parsed_duration = isodate.parse_duration(duration).total_seconds()
is_shorts = parsed_duration < 60 # Shorts are videos less than 60 seconds
video_data.append({
"video_id": item["id"],
"title": item["snippet"]["title"],
"publish_date": item["snippet"]["publishedAt"],
"channel_title": item["snippet"]["channelTitle"],
"like_count": item["statistics"].get("likeCount", 0),
"comment_count": item["statistics"].get("commentCount", 0),
"view_count": item["statistics"].get("viewCount", 0),
"duration": duration, # Keep raw duration format
"is_shorts": is_shorts # True if video is a Shorts
})
next_page_token = response.get("nextPageToken")
if not next_page_token:
break
time.sleep(1) # Avoid hitting rate limits
return video_data
def save_video_data(video_data):
"""
Saves video metadata to a CSV file with proper encoding.
"""
df = pd.DataFrame(video_data)
df.to_csv(VIDEO_DATA_FILE, index=False, encoding="utf-8-sig") # Prevents encoding issues
print(f"Saved {len(video_data)} video metadata entries to {VIDEO_DATA_FILE}")
if __name__ == "__main__":
keyword = input("Enter the search keyword: ")
video_data = get_video_data(keyword)
save_video_data(video_data)
By using this code, you can collect YouTube video ID list and their titles, date, and the number of likes, comments, and views by using the keywords. Here is an example:
Step 3: Collect Comments from ID List, Playlist ID, or Video ID
You can choose one of the following options based on what you need – if you want to collect the comments from all IDs we collected in Step 2, or a specific Playlist ID, or video ID.
Option 1: Collect Coments from ID List
Following the code in Step 2, you can collect the video comments for all video IDs we collected.
# File paths for checkpoints
VIDEO_DATA_FILE = "video_data.csv"
COMMENTS_FILE = "youtube_comments.csv"
def load_video_ids():
"""
Loads video IDs from the saved CSV file.
"""
if os.path.exists(VIDEO_DATA_FILE):
return set(pd.read_csv(VIDEO_DATA_FILE)["video_id"].tolist())
return set()
def get_video_comments(video_id):
"""
Fetches all comments along with metadata from a given YouTube video ID.
"""
comments = []
next_page_token = None
while True:
try:
request = youtube.commentThreads().list(
part="snippet",
videoId=video_id,
maxResults=100,
pageToken=next_page_token
)
response = request.execute()
for item in response.get("items", []):
snippet = item["snippet"]["topLevelComment"]["snippet"]
comments.append({
"video_id": video_id,
"comment_id": item["id"],
"comment_text": snippet["textDisplay"],
"comment_author": snippet["authorDisplayName"],
"comment_author_id": snippet["authorChannelId"]["value"],
"comment_publish_date": snippet["publishedAt"],
"like_count": snippet.get("likeCount", 0),
"reply_count": item["snippet"].get("totalReplyCount", 0),
"is_reply": False # This is a top-level comment
})
# Fetch replies if available
if item["snippet"].get("totalReplyCount", 0) > 0:
reply_request = youtube.comments().list(
part="snippet",
parentId=item["id"],
maxResults=100
)
reply_response = reply_request.execute()
for reply in reply_response.get("items", []):
reply_snippet = reply["snippet"]
comments.append({
"video_id": video_id,
"comment_id": reply["id"],
"comment_text": reply_snippet["textDisplay"],
"comment_author": reply_snippet["authorDisplayName"],
"comment_author_id": reply_snippet["authorChannelId"]["value"],
"comment_publish_date": reply_snippet["publishedAt"],
"like_count": reply_snippet.get("likeCount", 0),
"reply_count": 0, # Replies don't have further replies
"is_reply": True
})
next_page_token = response.get("nextPageToken")
if not next_page_token:
break
time.sleep(1) # Avoid hitting rate limits
except Exception as e:
print(f"Error fetching comments for video {video_id}: {e}")
break
return comments
def save_comments(comments):
"""
Saves comments to a CSV file with proper encoding.
"""
df = pd.DataFrame(comments)
df.to_csv(COMMENTS_FILE, mode="a", header=not os.path.exists(COMMENTS_FILE), index=False, encoding="utf-8-sig")
print(f"Saved {len(comments)} comments to {COMMENTS_FILE}")
def load_existing_comments():
"""
Loads existing comments and returns the set of already processed video IDs.
"""
if os.path.exists(COMMENTS_FILE):
return set(pd.read_csv(COMMENTS_FILE)["video_id"].unique())
return set()
if __name__ == "__main__":
video_ids = load_video_ids()
if not video_ids:
print("No video IDs found. Run collect_video_data.py first.")
exit()
processed_videos = load_existing_comments()
print(f"Already processed {len(processed_videos)} videos.")
for video_id in video_ids:
if video_id in processed_videos:
print(f"Skipping already processed video: {video_id}")
continue
print(f"Fetching comments for video: {video_id}")
comments = get_video_comments(video_id)
if comments:
save_comments(comments)
time.sleep(2) # Avoid hitting API rate limits
This code shows the progress of collecting comments, and saving the file in progress so that you can download the data even though it stopped because of the API limit.
Option 2: Collect Comments from YouTube playlist
def get_playlist_video_ids(service, **kwargs):
video_ids = []
results = service.playlistItems().list(**kwargs).execute()
while results:
for item in results['items']:
video_ids.append(item['snippet']['resourceId']['videoId'])
# check if there are more videos
if 'nextPageToken' in results:
kwargs['pageToken'] = results['nextPageToken']
results = service.playlistItems().list(**kwargs).execute()
else:
break
return video_ids
def get_video_comments(service, **kwargs):
comments, dates, likes, video_titles = [], [], [], []
results = service.commentThreads().list(**kwargs).execute()
while results:
for item in results['items']:
comment = item['snippet']['topLevelComment']['snippet']['textDisplay']
date = item['snippet']['topLevelComment']['snippet']['publishedAt']
like = item['snippet']['topLevelComment']['snippet']['likeCount']
video_title = service.videos().list(part='snippet', id=kwargs['videoId']).execute()['items'][0]['snippet']['title']
comments.append(comment)
dates.append(date)
likes.append(like)
video_titles.append(video_title)
# check if there are more comments
if 'nextPageToken' in results:
kwargs['pageToken'] = results['nextPageToken']
results = service.commentThreads().list(**kwargs).execute()
else:
break
return pd.DataFrame({'Video Title': video_titles, 'Comments': comments, 'Date': dates, 'Likes': likes})
The following codes will return the pandas DataFrame with the Video Title column.
def main():
# build the service
youtube = build("youtube", "v3", developerKey=API_KEY)
# get playlist video ids
playlist_id = 'PLxNb_gmvauiRtxQrQsKLEWlFVUmRixmtS'
video_ids = get_playlist_video_ids(youtube, part='snippet', maxResults=50, playlistId=playlist_id)
# get the comments from each video
all_comments_df = pd.DataFrame()
for video_id in video_ids:
try:
comments_df = get_video_comments(youtube, part='snippet', videoId=video_id, textFormat='plainText')
all_comments_df = pd.concat([all_comments_df, comments_df], ignore_index=True)
except HttpError as e:
print(f"An HTTP error {e.resp.status} occurred:\n{e.content}")
return all_comments_df # return the DataFrame
if __name__ == '__main__':
df = main()
print(df) # print the DataFrame here
comments_df.to_csv('output.csv') ## Export it to CSV format
Option 3: Collect Comments from YouTube Video
If you would love to collect the comments from the single video, you can run this code instead of the codes above.
def get_video_comments(service, **kwargs):
comments, dates, likes = [], [], []
results = service.commentThreads().list(**kwargs).execute()
while results:
for item in results['items']:
comment = item['snippet']['topLevelComment']['snippet']['textDisplay']
date = item['snippet']['topLevelComment']['snippet']['publishedAt']
like = item['snippet']['topLevelComment']['snippet']['likeCount']
comments.append(comment)
dates.append(date)
likes.append(like)
# check if there are more comments
if 'nextPageToken' in results:
kwargs['pageToken'] = results['nextPageToken']
results = service.commentThreads().list(**kwargs).execute()
else:
break
return pd.DataFrame({'Comments': comments, 'Date': dates, 'Likes': likes})
The following will return the pandas DataFrame with date and number of likes as follows π
comments_df = None
def main():
global comments_df
# Build the service
youtube = build(YOUTUBE_API_SERVICE_NAME, YOUTUBE_API_VERSION, developerKey=DEVELOPER_KEY)
# Get the comments
video_id = 'your-video-key-here'
comments_df = get_video_comments(youtube, part='snippet', videoId=video_id, textFormat='plainText')
if __name__ == '__main__':
main()
print(comments_df)
comments_df.to_csv('output.csv') ## Export it to CSV format