【443】Tweets Analysis Q&A

時間 2019-11-06

標籤 tweets analysis q&a 简体版

原文原文鏈接

【Question 01】

　　When converting Tweets info to csv file, commas in the middle of data (i.e. location: Sydney, NSW) can make a mistake of the csv file (creaing more columns).python

　　The solution is to add double quotation marks on both sides of the content, like this:ide

fo.write("\"" + str(tweet["user"]["location"]) + "\"")

【Question 02】

　　When open csv file with Excel, sometimes it will show messy code, but it can show well with Notepad.this

　　ref：csv 文件打開亂碼，有哪些方法能夠解決？spa

　　One solution is opening this file with notepad++.code

　　Another solution is adding codes at the beginning of the writing file, like this:blog

fo = open(r"D:\Twitter Data\Data\test\tweets.csv", "w")
fo.write("\ufeff")

【Question 03】

　　Text contents contain carriage return, double quotation marks, single quotation marks. Those info will make mistakes when creating csv file.ci

　　So we should replace those characters with space or nothing, like this:get

text = str(tweet["text"])
text = text.replace("\n", " ")
text = text.replace("\"", "")
text = text.replace("\'", "")
fo.write("\"" + text + "\"")

　　Including tweet["user"]["location"] and tweet["text"], for these two attributes, user can write whatever they want, so it's easy to make mistakes.pandas

【Question 04】

　　After converting Tweets to csv file, but I can't open this file by pandas.read_csv(). The reason is there must be some problems in those data. Since there are about more than 100000+ rows of this csv file, how can I locate the error line?it

　　Solution is coverting the first 10000 rows, if there are not errors, and then converting the next 10000 rows. If error occurs, trying to narrow the range of numbers, like error occurs between 20000 to 30000, we can change the range of numbers with 20000 to 25000. Using this method several times, we can locate the error line and find the real problems. For this spicific case, most problems are about contents include carriage return, double quotation marks, etc.

　　Codes like this:

...

count = 0
or line in tweets_file:
    try:
        count += 1
        if (count < 10000):
            continue
        ...

        if (count > 20000):
            break
    except:
        continue
...

1. Public Perception Analysis of Tweets During the 2015 閱讀筆記
2. QA
3. course1-3_supervised-ml-sentiment-analysis
4. 使用TF-IDF對Tweets作summarization
5. ThinkPHP QA
6. Android QA
7. Sonar QA
8. QA-1
9. rpm | -qa
10. (Network Analysis)Link Analysis
更多相關文章...
• 防止使用TCP協議掃描端口 - TCP/IP教程
• R 包 - R 語言教程
• Flink 數據傳輸及反壓詳解

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。