When loading the CSV below and outputting the tweets, I would like to extract only accounts containing specific string retweets (RT@***) and reflect them in a separate column (test4).
Install Python on Windows as a running environment.
■ CSV Contents
test1, tets2, test3, test4
RT@senti: What language do you want to learn?
What you want to implement
test1, test4
RT @senti: What language do you want to learn? @senti
caseis
RT @sancho: What about today? @sancho
ocho
RT @sacamuchi: Fun @sacamuchi
I looked up the code below while searching on the website, but I can't get RT account from line csv (test1).
How do I set it up to extract RT accounts only?
I'm sorry to trouble you, but could you tell me?
import pandas as pd
import csv
pd.set_option('display.max_rows',12000)
pd.set_option('display.width',12000)
pd.set_option("display.max_colwidth", 12000)
df=pd.read_csv(r'/Users/catuti/Desktop/tweets_2019.csv', encoding='cp932', names=["test1", "RT@"], usecols=[0,1], skiprows=[0], skipfooter=0, engine='python')
df = df.replace({'\n':'<br>'}, regex = True)
df = df.replace({'\r':'}, regex = True)
df = df.query('test1.str.contains("RT@") or content.str.contains("RT@")')
df.to_html(r'C:/Users/catuti/Desktop/Tweets_20191.csv',escape=False)
Uses regular expressions for usernames.
https://stackoverflow.com/questions/8650007/regular-expression-for-twitter-username
#Assume loaded df exists
import re
regex = re.compile(r"^RT(@(\w){1,15}")")
def get_username(text):
try:
return regex.match(text).group(1)
exceptAttributeError:
return None
df["test4"] = [get_username(x) for x indf["test1"]]
df.to_csv("test_out.csv", index=False)
© 2024 OneMinuteCode. All rights reserved.