Reading and Writing csv by Python Pandas

in excel file

https://www.leafkyoto.net/special/parfait/
https://tabelog.com/kyoto/A2601/A260503/26001772/

Cells marked with ↑ are vertically continuous

There is a column where the two types of links are mixed.
Next to this, I would like to create a column that extracts only the latter link.
I've been to various sites, but I don't know how to read and write a little csv on Pandas, so please let me know.
Is it possible to create a column name under the given name tabelog and put the url converted into that column in order?

import csv
import pandas aspd
x = pd.read_csv('output.csv')
y = [ ]
For zin x:
    y=x [x.find("https://tabelog.com"):]
df = df.append(y)
df.to_csv('output.csv', columns=['tabelog'])
print("finished")

python pandas csv

2022-09-30 11:50

2 Answers

x=pd.read_csv('output.csv')

followed by

print(x.shape)

Add to make sure that x is a data frame in two columns (vertical is the number of rows of data in Excel).
I can't find a comma(,) anywhere in the Excel screen image in the question, so when I read it in read_csv, I think x is a horizontal data frame.
Read_csv is read separately with a delimiter (default is ", )), but without a delimiter, it should be a single column of data frames (in a row).

I can't tell on the screen what separates the two links in one cell, but if they are separated by a new line character ("\n": character code: 0x0A),

x=pd.read_csv('output.csv', sep="\n")

Then x should be the data frame in two horizontal columns, and the second column should contain the back link (including tabelog.com).

==
The code below in the question specifies that only the data in the column named 'tabelog' should be written.

df.to_csv('output.csv', columns=['tabelog'])

However, the excel file written in the question does not appear to have the column name 'tabelog' on the first line (where the list of column names should be written).If the first line of the Excel file does not have the column name 'tabelog', the data frame read by read_csv does not have the column name 'tabelog'.
"I think the split itself is done, but I can't output it." I guess that means, "The CSV file I read doesn't have the column name 'tabelog', so if I specify the column name 'tabelog', there was nothing to output."

==

1)
　
on the first line of the Excel file (csv file) `'mae'
'tabelog'
Write a column name separated by the same string that separates the links, as shown in .

x=pd.read_csv('output.csv', sep="\n")

Load the Excel file (csv file) with the correct delimiter as shown in .

-Say something unnecessary -
When you say "Excel file", you imagine a file with a .xls or .xlsx extension.
CSV files are a type of text file, so I think you should avoid calling them Excel files (because they can cause misunderstanding).

2022-09-30 11:50

I checked and found out that one cell, like the one in the question, had multiple lines of string, and when I spit out the EXCEL sheet as CSV,

"https://www.leafkyoto.net/special/parfait/
https://tabelog.com/kyoto/A2601/A260503/26001772/"
US>"https://www.leafkyoto.net/special/parfait/
https://tabelog.com/kyoto/A2601/A260503/26018731/"
US>"https://www.leafkyoto.net/special/parfait/
https://tabelog.com/kyoto/A2601/A260202/26032128/"

CSV file output appears as shown in .
So, for that column,

df['URL'].apply(lambdas:pd.Series(s.split())))

By doing Series.apply() as shown in and str.split() the results of str.split() into Series, you can deploy each URL to a line.

Below is a sample operation (assuming there is a HEADER line)

import pandas as pd
importio

data=""
URL
US>"https://www.leafkyoto.net/special/parfait/
https://tabelog.com/kyoto/A2601/A260503/26001772/"
US>"https://www.leafkyoto.net/special/parfait/
https://tabelog.com/kyoto/A2601/A260503/26018731/"
US>"https://www.leafkyoto.net/special/parfait/
https://tabelog.com/kyoto/A2601/A260202/26032128/"
"""

df = pd.read_csv(io.StringIO(data))
new_df = df ['URL'].apply(lambdas:pd.Series(s.split())))
#                                            0                                                  1
#0 https://www.leafkyoto.net/special/parfait/ https://tabelog.com/kyoto/A2601/A260503/26001772/
#1 https://www.leafkyoto.net/special/parfait/ https://tabelog.com/kyoto/A2601/A260503/26018731/
#2 https://www.leafkyoto.net/special/parfait/ https://tabelog.com/kyoto/A2601/A260202/26032128/

You may also want to rename the column as needed, extract only the second column, or combine it with the original DataFrame.

2022-09-30 11:50

If you have any answers or tips

Popular Tags

python x 4647
android x 1593
java x 1494
javascript x 1427
c x 927
c++ x 878
ruby-on-rails x 696
php x 692
python3 x 685
html x 656

Popular Questions

980 In Java servlet, when SHA-256 sends WW-Authenticate header for digest authentication, the client does not return the result.
542 Unable to install versioned in Google Colab
557 Scrap text information after the "View More" button when searching in the Yahoo! News search window
709 When building Fast API+Uvicorn environment with PyInstaller, console=False results in an error
548 rails db:create error: Could not find mysql2-0.5.4 in any of the sources