To change to a list of data frames

Asked 1 years ago, Updated 1 years ago, 75 views

I want to check the frequency of each word with the data.

Current data consists of data frames

Each row contains hashtag data in [ ] in the form of [Today, Today's Weather, Hungry]

Converting to a list is required to count the frequency of each word from all crawled data

import numpy as np  
instagram_tags=list(np.array(data['clean_tag'].tolist()))  

When converting to a list, the value of each row is recognized as one list

['[]',
 '[]',
 "[210517', 'climbing', 'bouldering', 'indoor rock wall', 'cleaning', 'rock climbing', 'climbing', 'bouldering', 'door climbing', 'today's exercise', 'ankle', 'weekendworkout', 'cleaning', 'weekly exercise', 'weekend exercise', 'weekend exercise', 'weekend exercise', 'weekend exercise', 'weekly', 'hobby', 'DOL', 'park', 'DOL]
 "[Seoul Forest Climbing]",
 '[]',
 "[20210519B], 'August Climbing', 'August Climbing Gym', 'Guardi Station', 'Exit 5', 'gramicci', 'patagonia', 'redpoint climbingcrew', 'redpoint', 'climbing', 'sport climbing', 'gate1', 'gate'', 'scarpin'279', 'Osteen79]

Data is tied as above

Is there any way to put it together without quotation marks and [ ]?

hashtag crawling dataframe list

2022-09-20 16:03

1 Answers

Try literal_val.

Python 3.8.5 (tags/v3.8.5:580fbb0, Jul 20 2020, 15:57:54) [MSC v.1924 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license()" for more information.
>>> l = ['[]',
 '[]',
 "[210517', 'climbing', 'bouldering', 'indoor rock wall', 'cleaning', 'rock climbing', 'climbing', 'bouldering', 'door climbing', 'today's exercise', 'ankle', 'weekendworkout', 'cleaning', 'weekly exercise', 'weekend exercise', 'weekend exercise', 'weekend exercise', 'weekend exercise', 'weekly', 'hobby', 'DOL', 'park', 'DOL]
 "[Seoul Forest Climbing]",
 '[]',
 "[20210519B], 'August Climbing', 'August Climbing Gym', 'Guardi Station', 'Exit 5', 'gramicci', 'patagonia', 'redpoint climbingcrew', 'redpoint', 'climbing', 'sport climbing', 'gate1', 'gate'', 'scarpin'279', 'Osteen79]
>>> from ast import literal_eval
>>> l = [ literal_eval(e) for e in l ]
>>> l
[[], [210517], "Climbing", "Bouldering", "Indoor Rock Wall", "Clean", "Climbing", "Climbing", "Bouldering", "Indoor Climbing", "Today's Exercise", "Ankle", "Weekendworkout", "Weekly Exercise", "Weekly Exercise", "Weekly Exercise", "Weekly Exercise", "Weekly", "Weekly", "Bouldering", "Indoor Climbing", "Indoor Climbing", "Indoor", "DOL", "10", "Climbing", "Climbing", "Climbing",', ', 'bouldering', 'sportsclimbing', 'gate1', 'scarpa', 'instinct', 'fiveten', 'hiangle', '167592836O79']]
>>> from pprint import pprint
>>> pprint(l)
[[],
 [],
 ['210517',
  "Climbing"
  "Bouldering"
  "Indoor Rock Wall"
  "Clean,"
  "Rock climbing"
  'climbing',
  'bouldering',
  'indoorclimbing',
  "Today's Exercise"
  "Ankle"
  'weekendworkout',
  "Clean,"
  "Working out on weekdays"
  "Weekend Exercise"
  "Hobby"
  "A bunch of rocks"
  'DOLMOORI',
  "Climbing Park"
  "Climbing Park Hanti Branch",
 ["Seoul Forest Climbing",
 [],
 ['20210519B',
  "August Climbing"
  "August Climbing Gym"
  "Guardi Station"
  "Exit 5"
  'gramicci',
  'patagonia',
  'redpointclimbingcrew',
  'redpoint',
  'climbing',
  'bouldering',
  'sportsclimbing',
  'gate1',
  'scarpa',
  'instinct',
  'fiveten',
  'hiangle',
  '167592836O79']]
>>> 


2022-09-20 16:03

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.