Reading and Writing csv by Python Pandas

Asked 2 years ago, Updated 2 years ago, 82 views


in excel file Enter a description of the image here


https://www.leafkyoto.net/special/parfait/
https://tabelog.com/kyoto/A2601/A260503/26001772/

Cells marked with ↑ are vertically continuous

There is a column where the two types of links are mixed.
Next to this, I would like to create a column that extracts only the latter link.
I've been to various sites, but I don't know how to read and write a little csv on Pandas, so please let me know.
Is it possible to create a column name under the given name tabelog and put the url converted into that column in order?

import csv
import pandas aspd
x = pd.read_csv('output.csv')
y = [ ]
For zin x:
    y=x [x.find("https://tabelog.com"):]
df = df.append(y)
df.to_csv('output.csv', columns=['tabelog'])
print("finished")

python pandas csv

2022-09-30 11:50

2 Answers

x=pd.read_csv('output.csv')

followed by

print(x.shape)

Add to make sure that x is a data frame in two columns (vertical is the number of rows of data in Excel).
I can't find a comma(,) anywhere in the Excel screen image in the question, so when I read it in read_csv, I think x is a horizontal data frame.
Read_csv is read separately with a delimiter (default is ", )), but without a delimiter, it should be a single column of data frames (in a row).

I can't tell on the screen what separates the two links in one cell, but if they are separated by a new line character ("\n": character code: 0x0A),

x=pd.read_csv('output.csv', sep="\n")

Then x should be the data frame in two horizontal columns, and the second column should contain the back link (including tabelog.com).

==
The code below in the question specifies that only the data in the column named 'tabelog' should be written.

df.to_csv('output.csv', columns=['tabelog'])

However, the excel file written in the question does not appear to have the column name 'tabelog' on the first line (where the list of column names should be written).If the first line of the Excel file does not have the column name 'tabelog', the data frame read by read_csv does not have the column name 'tabelog'.
"I think the split itself is done, but I can't output it." I guess that means, "The CSV file I read doesn't have the column name 'tabelog', so if I specify the column name 'tabelog', there was nothing to output."

==

1)
 
on the first line of the Excel file (csv file) `'mae'
'tabelog'
Write a column name separated by the same string that separates the links, as shown in .

2)

x=pd.read_csv('output.csv', sep="\n")

Load the Excel file (csv file) with the correct delimiter as shown in .

-Say something unnecessary -
When you say "Excel file", you imagine a file with a .xls or .xlsx extension.
CSV files are a type of text file, so I think you should avoid calling them Excel files (because they can cause misunderstanding).


2022-09-30 11:50

I checked and found out that one cell, like the one in the question, had multiple lines of string, and when I spit out the EXCEL sheet as CSV,

"https://www.leafkyoto.net/special/parfait/
https://tabelog.com/kyoto/A2601/A260503/26001772/"
US>"https://www.leafkyoto.net/special/parfait/
https://tabelog.com/kyoto/A2601/A260503/26018731/"
US>"https://www.leafkyoto.net/special/parfait/
https://tabelog.com/kyoto/A2601/A260202/26032128/"

CSV file output appears as shown in .
So, for that column,

df['URL'].apply(lambdas:pd.Series(s.split())))

By doing Series.apply() as shown in and str.split() the results of str.split() into Series, you can deploy each URL to a line.

Below is a sample operation (assuming there is a HEADER line)

import pandas as pd
importio

data=""
URL
US>"https://www.leafkyoto.net/special/parfait/
https://tabelog.com/kyoto/A2601/A260503/26001772/"
US>"https://www.leafkyoto.net/special/parfait/
https://tabelog.com/kyoto/A2601/A260503/26018731/"
US>"https://www.leafkyoto.net/special/parfait/
https://tabelog.com/kyoto/A2601/A260202/26032128/"
"""

df = pd.read_csv(io.StringIO(data))
new_df = df ['URL'].apply(lambdas:pd.Series(s.split())))
#                                            0                                                  1
#0 https://www.leafkyoto.net/special/parfait/ https://tabelog.com/kyoto/A2601/A260503/26001772/
#1 https://www.leafkyoto.net/special/parfait/ https://tabelog.com/kyoto/A2601/A260503/26018731/
#2 https://www.leafkyoto.net/special/parfait/ https://tabelog.com/kyoto/A2601/A260202/26032128/

You may also want to rename the column as needed, extract only the second column, or combine it with the original DataFrame.


2022-09-30 11:50

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.