[Python] Collecting YouTube Comments by using YouTube Data API
Step 1. Request your own API
You need to enable your own YouTube Data API v3 at this link (https://console.cloud.google.com/apis/library/youtube.googleapis.com). It has a daily limit, but you can get free access to some amount of data.
In the following screen, click “create credentials” and put your information.
It will return the API key. Do not share it with others and copy it to put it in the Python code.
Step 2: Find the key for YouTube Playlist or video
Now, you need to find the key for the YouTube Playlist or video that you would love to collect the comments data. It’s really simple! You can find it in the URL of the Playlist or video.
For example, if the URL of the playlist is “https://www.youtube.com/watch?v=ZfCNFYAd77o&list=PL5gua8hQg_DoHCEBeOISWjUK2I3r00puR,” PL5gua8hQg_DoHCEBeOISWjUK2I3r00puR
after &=list
is the key to the playlist. The video key is ZfCNFYAd77o
after watch?=v
.
Step 3: Run it in Python
First, you need to import the libraries and then put your API key in the “put-your-key-here.”
from googleapiclient.discovery import build
from googleapiclient.errors import HttpError
import pandas as pd
DEVELOPER_KEY = 'put-your-key-here'
YOUTUBE_API_SERVICE_NAME = 'youtube'
YOUTUBE_API_VERSION = 'v3'
Python Code to Collect Comments from YouTube playlist
def get_playlist_video_ids(service, **kwargs):
video_ids = []
results = service.playlistItems().list(**kwargs).execute()
while results:
for item in results['items']:
video_ids.append(item['snippet']['resourceId']['videoId'])
# check if there are more videos
if 'nextPageToken' in results:
kwargs['pageToken'] = results['nextPageToken']
results = service.playlistItems().list(**kwargs).execute()
else:
break
return video_ids
def get_video_comments(service, **kwargs):
comments, dates, likes, video_titles = [], [], [], []
results = service.commentThreads().list(**kwargs).execute()
while results:
for item in results['items']:
comment = item['snippet']['topLevelComment']['snippet']['textDisplay']
date = item['snippet']['topLevelComment']['snippet']['publishedAt']
like = item['snippet']['topLevelComment']['snippet']['likeCount']
video_title = service.videos().list(part='snippet', id=kwargs['videoId']).execute()['items'][0]['snippet']['title']
comments.append(comment)
dates.append(date)
likes.append(like)
video_titles.append(video_title)
# check if there are more comments
if 'nextPageToken' in results:
kwargs['pageToken'] = results['nextPageToken']
results = service.commentThreads().list(**kwargs).execute()
else:
break
return pd.DataFrame({'Video Title': video_titles, 'Comments': comments, 'Date': dates, 'Likes': likes})
The following codes will return the pandas DataFrame with the Video Title column.
def main():
# build the service
youtube = build(YOUTUBE_API_SERVICE_NAME, YOUTUBE_API_VERSION, developerKey=DEVELOPER_KEY)
# get playlist video ids
playlist_id = 'PLxNb_gmvauiRtxQrQsKLEWlFVUmRixmtS'
video_ids = get_playlist_video_ids(youtube, part='snippet', maxResults=50, playlistId=playlist_id)
# get the comments from each video
all_comments_df = pd.DataFrame()
for video_id in video_ids:
try:
comments_df = get_video_comments(youtube, part='snippet', videoId=video_id, textFormat='plainText')
all_comments_df = pd.concat([all_comments_df, comments_df], ignore_index=True)
except HttpError as e:
print(f"An HTTP error {e.resp.status} occurred:\n{e.content}")
return all_comments_df # return the DataFrame
if __name__ == '__main__':
df = main()
print(df) # print the DataFrame here
Python Code to Collect Comments from YouTube Video
If you would love to collect the comments from the single video, you can run this code instead of the codes above.
def get_video_comments(service, **kwargs):
comments, dates, likes = [], [], []
results = service.commentThreads().list(**kwargs).execute()
while results:
for item in results['items']:
comment = item['snippet']['topLevelComment']['snippet']['textDisplay']
date = item['snippet']['topLevelComment']['snippet']['publishedAt']
like = item['snippet']['topLevelComment']['snippet']['likeCount']
comments.append(comment)
dates.append(date)
likes.append(like)
# check if there are more comments
if 'nextPageToken' in results:
kwargs['pageToken'] = results['nextPageToken']
results = service.commentThreads().list(**kwargs).execute()
else:
break
return pd.DataFrame({'Comments': comments, 'Date': dates, 'Likes': likes})
The following will return the pandas DataFrame with date and number of likes as follows π
comments_df = None
def main():
global comments_df
# Build the service
youtube = build(YOUTUBE_API_SERVICE_NAME, YOUTUBE_API_VERSION, developerKey=DEVELOPER_KEY)
# Get the comments
video_id = 'your-video-key-here'
comments_df = get_video_comments(youtube, part='snippet', videoId=video_id, textFormat='plainText')
if __name__ == '__main__':
main()
print(comments_df)
Step 4: Export it to CSV format
The final step is always to export data to use it later or share it with others! You can always do it in one line for pandas DataFrame.
comments_df.to_csv('output.csv')