I want to check the frequency of each word with the data.
Current data consists of data frames
Each row contains hashtag data in [ ] in the form of [Today, Today's Weather, Hungry]
Converting to a list is required to count the frequency of each word from all crawled data
import numpy as np
instagram_tags=list(np.array(data['clean_tag'].tolist()))
When converting to a list, the value of each row is recognized as one list
['[]',
'[]',
"[210517', 'climbing', 'bouldering', 'indoor rock wall', 'cleaning', 'rock climbing', 'climbing', 'bouldering', 'door climbing', 'today's exercise', 'ankle', 'weekendworkout', 'cleaning', 'weekly exercise', 'weekend exercise', 'weekend exercise', 'weekend exercise', 'weekend exercise', 'weekly', 'hobby', 'DOL', 'park', 'DOL]
"[Seoul Forest Climbing]",
'[]',
"[20210519B], 'August Climbing', 'August Climbing Gym', 'Guardi Station', 'Exit 5', 'gramicci', 'patagonia', 'redpoint climbingcrew', 'redpoint', 'climbing', 'sport climbing', 'gate1', 'gate'', 'scarpin'279', 'Osteen79]
Data is tied as above
Is there any way to put it together without quotation marks and [ ]?
hashtag crawling dataframe list
Try literal_val.
Python 3.8.5 (tags/v3.8.5:580fbb0, Jul 20 2020, 15:57:54) [MSC v.1924 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license()" for more information.
>>> l = ['[]',
'[]',
"[210517', 'climbing', 'bouldering', 'indoor rock wall', 'cleaning', 'rock climbing', 'climbing', 'bouldering', 'door climbing', 'today's exercise', 'ankle', 'weekendworkout', 'cleaning', 'weekly exercise', 'weekend exercise', 'weekend exercise', 'weekend exercise', 'weekend exercise', 'weekly', 'hobby', 'DOL', 'park', 'DOL]
"[Seoul Forest Climbing]",
'[]',
"[20210519B], 'August Climbing', 'August Climbing Gym', 'Guardi Station', 'Exit 5', 'gramicci', 'patagonia', 'redpoint climbingcrew', 'redpoint', 'climbing', 'sport climbing', 'gate1', 'gate'', 'scarpin'279', 'Osteen79]
>>> from ast import literal_eval
>>> l = [ literal_eval(e) for e in l ]
>>> l
[[], [210517], "Climbing", "Bouldering", "Indoor Rock Wall", "Clean", "Climbing", "Climbing", "Bouldering", "Indoor Climbing", "Today's Exercise", "Ankle", "Weekendworkout", "Weekly Exercise", "Weekly Exercise", "Weekly Exercise", "Weekly Exercise", "Weekly", "Weekly", "Bouldering", "Indoor Climbing", "Indoor Climbing", "Indoor", "DOL", "10", "Climbing", "Climbing", "Climbing",', ', 'bouldering', 'sportsclimbing', 'gate1', 'scarpa', 'instinct', 'fiveten', 'hiangle', '167592836O79']]
>>> from pprint import pprint
>>> pprint(l)
[[],
[],
['210517',
"Climbing"
"Bouldering"
"Indoor Rock Wall"
"Clean,"
"Rock climbing"
'climbing',
'bouldering',
'indoorclimbing',
"Today's Exercise"
"Ankle"
'weekendworkout',
"Clean,"
"Working out on weekdays"
"Weekend Exercise"
"Hobby"
"A bunch of rocks"
'DOLMOORI',
"Climbing Park"
"Climbing Park Hanti Branch",
["Seoul Forest Climbing",
[],
['20210519B',
"August Climbing"
"August Climbing Gym"
"Guardi Station"
"Exit 5"
'gramicci',
'patagonia',
'redpointclimbingcrew',
'redpoint',
'climbing',
'bouldering',
'sportsclimbing',
'gate1',
'scarpa',
'instinct',
'fiveten',
'hiangle',
'167592836O79']]
>>>
© 2024 OneMinuteCode. All rights reserved.