[譯] 我是如何使用 Python 在 Medium 上找到並關注有趣的人

時間 2019-12-16

標籤如何使用 python medium 找到關注有趣欄目 Python 简体版

原文原文鏈接

原文地址：How I used Python to find interesting people to follow on Medium

原文做者：Radu Raicea

譯文出自：掘金翻譯計劃

本文永久連接：github.com/xitu/gold-m…

譯者：Park-ma

校對者：mingxing47

封面圖來源：Old Medium logo前端

Medium 上有大量的內容、用戶和不可勝數的帖子。當你試圖尋找有趣的用戶來關注時，你會發現本身不知所措。python

我對於有趣的用戶的定義是來自你的社交網絡，保持活躍狀態並常常在 Medium 社區發表高質量評論的用戶。android

我查看了我關注的用戶的最新的帖子來看看是誰在回覆他們。我認爲若是他們回覆了我關注的用戶，這就說明他們可能和我志趣相投。ios

這個過程很繁瑣，這就讓我想起了我上次實習期間學到的最有價值的一課：git

任何繁瑣的任務都可以而且應該是自動化完成的。github

我想要個人自動化程序可以作下面的事情：web

從個人關注中獲取全部的用戶
從每個用戶中獲取最新的帖子
獲取每個帖子的全部評論
篩選出30天之前的回覆
篩選出少於最小推薦數的回覆
獲取每一個回覆的做者的用戶名

讓咱們開始吧

我首先看了看 Medium's API，卻發現它頗有限。它給我提供的功能太少了。經過它，我只能獲取關於我本身的帳號信息，而不能獲取其餘用戶的信息。chrome

最重要的是，Medium's API 的最後一次更新是一年多前，最近也沒有要開發的跡象。json

我意識到我只能依靠 HTTP 請求來獲取個人數據，因此我開始使用個人 Chrome 開發者工具。後端

第一個目標是獲取個人關注列表。

我打開個人開發者工具並進入 Network 選項卡。我過濾了除了 XHR 以外的全部內容以查看 Medium 是從什麼地方來獲取個人關注的。我刷新了個人我的資料頁面，可是什麼有趣的事情都沒發生。

若是我點擊個人我的資料上的關注按鈕怎麼樣？成功啦！

我找到用戶關注列表的連接。

在這個連接中，我發現了一個很是大的 JSON 響應。它是一個格式很好的 JSON，除了在響應開頭的一串字符：])}while(1);</x>

我寫了一個函數整理了格式並把 JSON 轉換成一個 Python 字典。

import json

def clean_json_response(response):
    return json.loads(response.text.split('])}while(1);</x>')[1])
複製代碼

我已經找到了一個入口點，讓咱們開始編寫代碼吧。

從個人關注列表中獲取全部用戶

爲了查詢端點，我須要個人用戶 ID（儘管我早就知道啦，這樣作是出於教育目的）。

我在尋找獲取用戶 ID 的方法時發現能夠經過添加 ?format=json 給 Medium 的 URL 地址來獲取這個網頁的 JSON 響應。我在個人我的主頁上試了試。

看看，這就是個人用戶 ID。

])}while(1);</x>{"success":true,"payload":{"user":{"userId":"d540942266d0","name":"Radu Raicea","username":"Radu_Raicea",
...
複製代碼

我寫了一函數從給出的用戶名中提取用戶 ID 。一樣，我使用了 clean_json_response 函數來去除響應開頭的不想要的字符串。

我還定義了一個叫 MEDIUM 的常量，它用來存儲全部 Medium 的 URL 地址都包含的字符串。

import requests

MEDIUM = 'https://medium.com'

def get_user_id(username):

    print('Retrieving user ID...')

    url = MEDIUM + '/@' + username + '?format=json'
    response = requests.get(url)
    response_dict = clean_json_response(response)
    return response_dict['payload']['user']['userId']
複製代碼

經過用戶 ID ，我查詢了 /_/api/users/<user_id>/following 端點，從個人關注列表裏獲取了用戶名列表。

當我在開發者工具中作這時，我注意到 JSON 響應只有八個用戶名。很奇怪！

當我點擊「Show more people」，我找到了缺乏的用戶名。原來 Medium 使用分頁的方式來展現關注列表。

Medium 使用分頁的方式來展現關注列表。

分頁經過指定 limit（每頁元素）和 to（下一頁的第一個元素）來工做，我必須找到一種方式來獲取下一頁的 ID。

在從 /_/api/users/<user_id>/following 獲取的 JSON 響應的尾部，我看到了一個有趣的 JSON 鍵值對。

...
"paging":{"path":"/_/api/users/d540942266d0/followers","next":{"limit":8,"to":"49260b62a26c"}}},"v":3,"b":"31039-15ed0e5"}
複製代碼

到了這一步，很容易就能寫出一個循環從個人關注列表裏面獲取全部的用戶名。

def get_list_of_followings(user_id):

    print('Retrieving users from Followings...')
    
    next_id = False
    followings = []
    while True:

        if next_id:
            # 若是這不是關注列表的第一頁
            url = MEDIUM + '/_/api/users/' + user_id
                  + '/following?limit=8&to=' + next_id
        else:
            # 若是這是關注列表的第一頁
            url = MEDIUM + '/_/api/users/' + user_id + '/following'

        response = requests.get(url)
        response_dict = clean_json_response(response)
        payload = response_dict['payload']

        for user in payload['value']:
            followings.append(user['username'])

        try:
            # 若是找不到 "to" 鍵，咱們就到達了列表末尾，
            # 而且異常將會拋出。
            next_id = payload['paging']['next']['to']
        except:
            break

    return followings
複製代碼

獲取每一個用戶最新的帖子

我獲得了我關注的用戶列表以後，我就想獲取他們最新的帖子。我能夠經過發送這個請求 [https://medium.com/@<username>/latest?format=json](https://medium.com/@username/latest?format=json) 來實現這個功能。

因而我寫了一個函數，這個函數的參數是用戶名列表，而後返回一個包含輸入進來的全部用戶最新發表的帖子 ID 的 Python 列表。

def get_list_of_latest_posts_ids(usernames):

    print('Retrieving the latest posts...')

    post_ids = []
    for username in usernames:
        url = MEDIUM + '/@' + username + '/latest?format=json'
        response = requests.get(url)
        response_dict = clean_json_response(response)

        try:
            posts = response_dict['payload']['references']['Post']
        except:
            posts = []

        if posts:
            for key in posts.keys():
                post_ids.append(posts[key]['id'])

    return post_ids
複製代碼

獲取每一個帖子的全部評論

有了帖子的列表，我經過 https://medium.com/_/api/posts/<post_id>/responses 提取了全部的評論。

這個函數參數是帖子 ID Python 列表而後返回評論的Python列表。

def get_post_responses(posts):

    print('Retrieving the post responses...')

    responses = []

    for post in posts:
        url = MEDIUM + '/_/api/posts/' + post + '/responses'
        response = requests.get(url)
        response_dict = clean_json_response(response)
        responses += response_dict['payload']['value']

    return responses
複製代碼

篩選這些評論

一開始，我但願評論達到點讚的最小值。可是我意識到這可能並不能很好的表達出社區對於評論的讚揚程度，由於一個用戶能夠對同一條評論進行屢次點贊。

相反，我使用推薦數來進行篩選。推薦數和點贊數差很少，但它不能屢次推薦。

我但願這個最小值是能夠動態調整的。因此我傳遞了名爲 recommend_min 的變量。

下面的函數的參數是每一條評論和 recommend_min 變量。它用來檢查評論的推薦數是否到達最小值。

def check_if_high_recommends(response, recommend_min):
    if response['virtuals']['recommends'] >= recommend_min:
        return True
複製代碼

我還但願獲得最近的評論。所以我經過這個函數過濾掉超過 30 天的評論。

from datetime import datetime, timedelta

def check_if_recent(response):
    limit_date = datetime.now() - timedelta(days=30)
    creation_epoch_time = response['createdAt'] / 1000
    creation_date = datetime.fromtimestamp(creation_epoch_time)

    if creation_date >= limit_date:
        return True
複製代碼

獲取評論做者的用戶名

在完成評論的篩選工做以後，我使用下面的函數來抓取全部做者的用戶 ID。

def get_user_ids_from_responses(responses, recommend_min):

    print('Retrieving user IDs from the responses...')

    user_ids = []

    for response in responses:
        recent = check_if_recent(response)
        high = check_if_high_recommends(response, recommend_min)

        if recent and high:
            user_ids.append(response['creatorId'])

    return user_ids
複製代碼

當你試圖訪問某個用戶的我的資料時，你會發現用戶 ID 是沒用的。這時我寫了一個函數經過查詢 /_/api/users/<user_id> 端點來獲取用戶名。

def get_usernames(user_ids):

    print('Retrieving usernames of interesting users...')

    usernames = []

    for user_id in user_ids:
        url = MEDIUM + '/_/api/users/' + user_id
        response = requests.get(url)
        response_dict = clean_json_response(response)
        payload = response_dict['payload']

        usernames.append(payload['value']['username'])

    return usernames
複製代碼

把因此函數組合起來

在完成全部函數以後，我建立了一個管道來獲取個人推薦用戶列表。

def get_interesting_users(username, recommend_min):

    print('Looking for interesting users for %s...' % username)

    user_id = get_user_id(username)

    usernames = get_list_of_followings(user_id)

    posts = get_list_of_latest_posts_ids(usernames)

    responses = get_post_responses(posts)

    users = get_user_ids_from_responses(responses, recommend_min)

    return get_usernames(users)
複製代碼

這個腳本程序終於完成啦！爲了測試這個程序，你必須調用這個管道。

interesting_users = get_interesting_users('Radu_Raicea', 10)
print(interesting_users)
複製代碼

圖片來源： Know Your Meme

最後，我添加了一個選項，能夠把結果和時間戳存儲在一個 CSV 文件裏面。

import csv

def list_to_csv(interesting_users_list):
    with open('recommended_users.csv', 'a') as file:
        writer = csv.writer(file)

        now = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
        interesting_users_list.insert(0, now)
        
        writer.writerow(interesting_users_list)
        
interesting_users = get_interesting_users('Radu_Raicea', 10)
list_to_csv(interesting_users)

複製代碼