Python Function Questions

Asked 2 years ago, Updated 2 years ago, 44 views

Questions about Python functions.Creating a program to extract strings.
I failed because I didn't understand the basics of Python functions.

Environment
Python 3.5.2

import re

list1 = [
    '5/1: hogehoge town: hogehoge convention',
    '5/2: hogehoge town: hogehoge convention',
    '5/3: hogehoge town: hogehoge convention',
    '5/4: hogehoge town: hogehoge convention',
    '5/5: hogehoge town: hogehoge convention',
    ]

default_content(self,list1):
    for content in list1:
        pass

def parse_content():
    pass

List1 contains a list of fictional events.
I'd like to use the get_content function to get elements from the list one by one, and analyze the elements with the parse_content function using regular expressions.
I can do all the regular expressions, but I can't make a good program because I didn't seem to have the basics of Python functions.
Please let me know how to make it work as ideal.

python python3

2022-09-30 21:31

2 Answers

If you just take out the element, you can do it with split and replace built-in functions as described in the comment.

list1=[
    '5/1: hogehoge town: hogehoge convention',
    '5/2: hogehoge town: hogehoge convention',
    '5/3: hogehoge town: hogehoge convention',
    '5/4: hogehoge town: hogehoge convention',
    '5/5: hogehoge town: hogehoge convention',
    ]



def get_content(l):
    # Initialize the output list
    li = list()

    for contest in l:
        # Split by ':'
        date, city, name = contest.split(':')

        # Divide Months and Days
        month, day=date.split('/')

        # omit the words "town" and "convention" from the name of the town or convention
        city_sur=city.replace('town',')
        name_sur=name.replace('Conference',')

        # Add Elements to the Output List
        li.append([month, day, city_sur, name_sur])

    return li


get_content(list1)
# =>
# [['5', '1', 'hogehoge', 'hogehoge',]] ,
#  ['5', '2', 'hogehoge', 'hogehoge',] ,
#  ['5', '3', 'hogehoge', 'hogehoge',] ,
#  ['5', '4', 'hogehoge', 'hogehoge',] ,
#  ['5', '5', 'hogehoge', 'hogehoge']]

Also, you can analyze the contents of the list, so it is convenient to use collections.Counter to aggregate the host towns.

 from collections import Counter

parsed_list=get_content(list1)

# Extract only the second element of the index from the nested list
cities = [i[2] for i in parsed_list ]  
# =>
# ['hogehoge', 'hogehoge', 'hogehoge', 'hogehoge', 'hogehoge']

Counter (cities)
# =>
# Counter ({'hogehoge':5})


2022-09-30 21:31

If you want to analyze your data, it is useful to use Pandas,NumPy, which is used by data scientists.

import pandas as pd
import numpy as np
import re

list1 = [
'5/1: hogehoge town: hogehoge convention',
'5/2: hogehoge town: hogehoge convention',
'5/3: hogehoge town: hogehoge convention',
'5/4: hogehoge town: hogehoge convention',
'5/5: hogehoge town: hogehoge convention',
]

# Read data and divide by ':'
df=pd.Series(list1).str.split(':',expand=True)
# Convert date to datetime type
df['Date'] = pd.to_datetime('2018/'+df[0])
# exclude cities and towns using regular expressions
regex_pat = re.compile(r' City | Town $')
df['Municipality'] = df[1].str.replace(regex_pat, '')
# You can use replace normally.
df['Concert'] = df[2].str.replace('Concert',')

Your data is now ready.

df   
    01 12 Date Municipal Meetings
05/1 Hogehoge Town Hogehoge Tournament 2018-05-01 Hogehoge Hogehoge
15/2 hogehoge town hogehoge competition 2018-05-02 hogehoge
25/3 Hogehoge Town Hogehoge Tournament 2018-05-03 Hogehoge Hogehoge
35/4 Hogehoge Town Hogehoge Tournament 2018-05-04 Hogehoge Hogehoge
45/5 Hogehoge Town Hogehoge Tournament 2018-05-05 Hogehoge Hogehoge

Pandas, NumPy has useful features for data analysis.
If you want to take out the date, municipality, or convention columns,

df[['date', 'municipality', 'convention']]

If you would like to know the number of cases by municipality

df['Municipality'].value_counts()

If you want to export data to an Excel file,

df.to_excel('data.xlsx')

Pandas can perform vector operations, eliminating loops and allowing you to write programs straight.

Also, you can use the Jupiter Notebook to perform analysis while recording the results.
I will upload the same to Google Collaboration.Google Collaboration is free to use the Jupiter Notebook if you have a Google ID.

https://colab.research.google.com/drive/16yKmNo3Z13rbkH_w2f6Gcd0TvME3ykpg


2022-09-30 21:31

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.