To change to a list of data frames

I want to check the frequency of each word with the data.

Current data consists of data frames

Each row contains hashtag data in [ ] in the form of [Today, Today's Weather, Hungry]

Converting to a list is required to count the frequency of each word from all crawled data

import numpy as np  
instagram_tags=list(np.array(data['clean_tag'].tolist()))

When converting to a list, the value of each row is recognized as one list

['[]',
 '[]',
 "[210517', 'climbing', 'bouldering', 'indoor rock wall', 'cleaning', 'rock climbing', 'climbing', 'bouldering', 'door climbing', 'today's exercise', 'ankle', 'weekendworkout', 'cleaning', 'weekly exercise', 'weekend exercise', 'weekend exercise', 'weekend exercise', 'weekend exercise', 'weekly', 'hobby', 'DOL', 'park', 'DOL]
 "[Seoul Forest Climbing]",
 '[]',
 "[20210519B], 'August Climbing', 'August Climbing Gym', 'Guardi Station', 'Exit 5', 'gramicci', 'patagonia', 'redpoint climbingcrew', 'redpoint', 'climbing', 'sport climbing', 'gate1', 'gate'', 'scarpin'279', 'Osteen79]

Data is tied as above

Is there any way to put it together without quotation marks and [ ]?

hashtag crawling dataframe list

2022-09-20 16:03

1 Answers

Try literal_val.

Python 3.8.5 (tags/v3.8.5:580fbb0, Jul 20 2020, 15:57:54) [MSC v.1924 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license()" for more information.
>>> l = ['[]',
 '[]',
 "[210517', 'climbing', 'bouldering', 'indoor rock wall', 'cleaning', 'rock climbing', 'climbing', 'bouldering', 'door climbing', 'today's exercise', 'ankle', 'weekendworkout', 'cleaning', 'weekly exercise', 'weekend exercise', 'weekend exercise', 'weekend exercise', 'weekend exercise', 'weekly', 'hobby', 'DOL', 'park', 'DOL]
 "[Seoul Forest Climbing]",
 '[]',
 "[20210519B], 'August Climbing', 'August Climbing Gym', 'Guardi Station', 'Exit 5', 'gramicci', 'patagonia', 'redpoint climbingcrew', 'redpoint', 'climbing', 'sport climbing', 'gate1', 'gate'', 'scarpin'279', 'Osteen79]
>>> from ast import literal_eval
>>> l = [ literal_eval(e) for e in l ]
>>> l
[[], [210517], "Climbing", "Bouldering", "Indoor Rock Wall", "Clean", "Climbing", "Climbing", "Bouldering", "Indoor Climbing", "Today's Exercise", "Ankle", "Weekendworkout", "Weekly Exercise", "Weekly Exercise", "Weekly Exercise", "Weekly Exercise", "Weekly", "Weekly", "Bouldering", "Indoor Climbing", "Indoor Climbing", "Indoor", "DOL", "10", "Climbing", "Climbing", "Climbing",', ', 'bouldering', 'sportsclimbing', 'gate1', 'scarpa', 'instinct', 'fiveten', 'hiangle', '167592836O79']]
>>> from pprint import pprint
>>> pprint(l)
[[],
 [],
 ['210517',
  "Climbing"
  "Bouldering"
  "Indoor Rock Wall"
  "Clean,"
  "Rock climbing"
  'climbing',
  'bouldering',
  'indoorclimbing',
  "Today's Exercise"
  "Ankle"
  'weekendworkout',
  "Clean,"
  "Working out on weekdays"
  "Weekend Exercise"
  "Hobby"
  "A bunch of rocks"
  'DOLMOORI',
  "Climbing Park"
  "Climbing Park Hanti Branch",
 ["Seoul Forest Climbing",
 [],
 ['20210519B',
  "August Climbing"
  "August Climbing Gym"
  "Guardi Station"
  "Exit 5"
  'gramicci',
  'patagonia',
  'redpointclimbingcrew',
  'redpoint',
  'climbing',
  'bouldering',
  'sportsclimbing',
  'gate1',
  'scarpa',
  'instinct',
  'fiveten',
  'hiangle',
  '167592836O79']]
>>>

2022-09-20 16:03

If you have any answers or tips

Popular Tags

python x 4647

android x 1593

java x 1494

javascript x 1427

c x 927

c++ x 878

ruby-on-rails x 696

php x 692

python3 x 685

html x 656