Questions about Python functions.Creating a program to extract strings.
I failed because I didn't understand the basics of Python functions.
Environment
Python 3.5.2
import re
list1 = [
'5/1: hogehoge town: hogehoge convention',
'5/2: hogehoge town: hogehoge convention',
'5/3: hogehoge town: hogehoge convention',
'5/4: hogehoge town: hogehoge convention',
'5/5: hogehoge town: hogehoge convention',
]
default_content(self,list1):
for content in list1:
pass
def parse_content():
pass
List1 contains a list of fictional events.
I'd like to use the get_content function to get elements from the list one by one, and analyze the elements with the parse_content function using regular expressions.
I can do all the regular expressions, but I can't make a good program because I didn't seem to have the basics of Python functions.
Please let me know how to make it work as ideal.
If you just take out the element, you can do it with split and replace built-in functions as described in the comment.
list1=[
'5/1: hogehoge town: hogehoge convention',
'5/2: hogehoge town: hogehoge convention',
'5/3: hogehoge town: hogehoge convention',
'5/4: hogehoge town: hogehoge convention',
'5/5: hogehoge town: hogehoge convention',
]
def get_content(l):
# Initialize the output list
li = list()
for contest in l:
# Split by ':'
date, city, name = contest.split(':')
# Divide Months and Days
month, day=date.split('/')
# omit the words "town" and "convention" from the name of the town or convention
city_sur=city.replace('town',')
name_sur=name.replace('Conference',')
# Add Elements to the Output List
li.append([month, day, city_sur, name_sur])
return li
get_content(list1)
# =>
# [['5', '1', 'hogehoge', 'hogehoge',]] ,
# ['5', '2', 'hogehoge', 'hogehoge',] ,
# ['5', '3', 'hogehoge', 'hogehoge',] ,
# ['5', '4', 'hogehoge', 'hogehoge',] ,
# ['5', '5', 'hogehoge', 'hogehoge']]
Also, you can analyze the contents of the list, so it is convenient to use collections.Counter to aggregate the host towns.
from collections import Counter
parsed_list=get_content(list1)
# Extract only the second element of the index from the nested list
cities = [i[2] for i in parsed_list ]
# =>
# ['hogehoge', 'hogehoge', 'hogehoge', 'hogehoge', 'hogehoge']
Counter (cities)
# =>
# Counter ({'hogehoge':5})
If you want to analyze your data, it is useful to use Pandas,NumPy, which is used by data scientists.
import pandas as pd
import numpy as np
import re
list1 = [
'5/1: hogehoge town: hogehoge convention',
'5/2: hogehoge town: hogehoge convention',
'5/3: hogehoge town: hogehoge convention',
'5/4: hogehoge town: hogehoge convention',
'5/5: hogehoge town: hogehoge convention',
]
# Read data and divide by ':'
df=pd.Series(list1).str.split(':',expand=True)
# Convert date to datetime type
df['Date'] = pd.to_datetime('2018/'+df[0])
# exclude cities and towns using regular expressions
regex_pat = re.compile(r' City | Town $')
df['Municipality'] = df[1].str.replace(regex_pat, '')
# You can use replace normally.
df['Concert'] = df[2].str.replace('Concert',')
Your data is now ready.
df
01 12 Date Municipal Meetings
05/1 Hogehoge Town Hogehoge Tournament 2018-05-01 Hogehoge Hogehoge
15/2 hogehoge town hogehoge competition 2018-05-02 hogehoge
25/3 Hogehoge Town Hogehoge Tournament 2018-05-03 Hogehoge Hogehoge
35/4 Hogehoge Town Hogehoge Tournament 2018-05-04 Hogehoge Hogehoge
45/5 Hogehoge Town Hogehoge Tournament 2018-05-05 Hogehoge Hogehoge
Pandas, NumPy has useful features for data analysis.
If you want to take out the date, municipality, or convention columns,
df[['date', 'municipality', 'convention']]
If you would like to know the number of cases by municipality
df['Municipality'].value_counts()
If you want to export data to an Excel file,
df.to_excel('data.xlsx')
Pandas can perform vector operations, eliminating loops and allowing you to write programs straight.
Also, you can use the Jupiter Notebook to perform analysis while recording the results.
I will upload the same to Google Collaboration.Google Collaboration is free to use the Jupiter Notebook if you have a Google ID.
https://colab.research.google.com/drive/16yKmNo3Z13rbkH_w2f6Gcd0TvME3ykpg
© 2025 OneMinuteCode. All rights reserved.