I want to extract only certain strings from the csv file

Asked 2 years ago, Updated 2 years ago, 32 views

When loading the CSV below and outputting the tweets, I would like to extract only accounts containing specific string retweets (RT@***) and reflect them in a separate column (test4).

Install Python on Windows as a running environment.

■ CSV Contents

 test1, tets2, test3, test4
RT@senti: What language do you want to learn?

What you want to implement

test1, test4
RT @senti: What language do you want to learn? @senti 
caseis
RT @sancho: What about today?       @sancho
ocho
RT @sacamuchi: Fun @sacamuchi

I looked up the code below while searching on the website, but I can't get RT account from line csv (test1).

How do I set it up to extract RT accounts only?
I'm sorry to trouble you, but could you tell me?

import pandas as pd
import csv
pd.set_option('display.max_rows',12000)
pd.set_option('display.width',12000)
pd.set_option("display.max_colwidth", 12000)

df=pd.read_csv(r'/Users/catuti/Desktop/tweets_2019.csv', encoding='cp932', names=["test1", "RT@"], usecols=[0,1], skiprows=[0], skipfooter=0, engine='python')
df = df.replace({'\n':'<br>'}, regex = True)
df = df.replace({'\r':'}, regex = True)
df = df.query('test1.str.contains("RT@") or content.str.contains("RT@")')
df.to_html(r'C:/Users/catuti/Desktop/Tweets_20191.csv',escape=False)

python python3

2022-09-30 14:08

1 Answers

Uses regular expressions for usernames.
https://stackoverflow.com/questions/8650007/regular-expression-for-twitter-username

#Assume loaded df exists
import re

regex = re.compile(r"^RT(@(\w){1,15}")")

def get_username(text):
    try:
        return regex.match(text).group(1)
    exceptAttributeError:
        return None

df["test4"] = [get_username(x) for x indf["test1"]]
df.to_csv("test_out.csv", index=False)


2022-09-30 14:08

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.