I have a question dealing with Python Low Data!!

Asked 2 years ago, Updated 2 years ago, 99 views

First of all, it's a warm txt The contents of the data are raw data and are all contained in one row of Excel 1 row 9 columns.

 Filming materials
Filming station = xx
Installation Location = xxxxxxxxxxxxxxxx
Observation item = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Observation time, water temperature (℃), salt (PSU), electrical conductivity (ms/m)
2015-04-01 00:00:00,20.6,RB,K,17.31,RB,K,25.7,RB,K
2015-04-01 00:15:00,20.6,RB,K,17.31,RB,K,25.7,RB,K
2015-04-01 00:30:00,20.6,RB,K,17.31,RB,K,25.7,RB,K
2015-04-01 01:00:00,20.6,RB,K,17.31,RB,K,25.7,RB,K

Data contents above.

It was hosted by "jupyter"

The code is as follows.

import pandas as pd
f= pd.read_csv('test.txt',sep='delimiter',header=None, encoding='cp949', engine='python')
f2= f.drop([0,1,2,3,4],axis=0)

I called Pandas and received the data f, and I created f2 with only the contents as a drop in the beginning. And then The 'Observation Time' value is 2015-04-0100:15:00, 2015-04-0100:30:00, 2015-04-0100:45:00, 2015-04-0101:00:00:00 Water temperature (℃) 20.6, 20.6, 20.6, 20.6 To crop by row in this way

f3= f2.split(' , ')
or
f3 = f2 ['Filming data'] = f2 ['Observation time', 'Water temperature', 'Salt', 'Electric conductivity'].str.split(' ',3)

In the first split, there was an error AttributeError: 'DataFrame' object has no attribute 'split' The error KeyError:('Observation time', 'Water temperature', 'Salt', 'Electric conductivity') occurs when trying to make an item in a row.

What I want to do is to cut it based on a comma, tie it in a row, and select the rest without RB and K I'd like to combine them into one data with a value per row.

python pandas jupyter-notebook

2022-09-20 20:47

2 Answers

Python 3.8.1 (tags/v3.8.1:1b293b6, Dec 18 2019, 23:11:46) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license()" for more information.
>>> text = ''''Filming Materials
Filming station = xx
Installation Location = xxxxxxxxxxxxxxxx
Observation item = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Observation time, water temperature (℃), salt (PSU), electrical conductivity (ms/m)
2015-04-01 00:00:00,20.6,RB,K,17.31,RB,K,25.7,RB,K
2015-04-01 00:15:00,20.6,RB,K,17.31,RB,K,25.7,RB,K
2015-04-01 00:30:00,20.6,RB,K,17.31,RB,K,25.7,RB,K
2015-04-01 01:00:00,20.6,RB,K,17.31,RB,K,25.7,RB,K'''
>>> from io import StringIO

>>> with open("temp.csv", "wt", encoding="cp949") as f:
    f.write(text)


305

>>> with open("temp.csv", "rt", encoding="cp949") as f:
    text = f.read()
    print(text)
    print('-'*10)
    text = text.replace(",RB,K", "")
    print(text)


Filming materials
Filming station = xx
Installation Location = xxxxxxxxxxxxxxxx
Observation item = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Observation time, water temperature (℃), salt (PSU), electrical conductivity (ms/m)
2015-04-01 00:00:00,20.6,RB,K,17.31,RB,K,25.7,RB,K
2015-04-01 00:15:00,20.6,RB,K,17.31,RB,K,25.7,RB,K
2015-04-01 00:30:00,20.6,RB,K,17.31,RB,K,25.7,RB,K
2015-04-01 01:00:00,20.6,RB,K,17.31,RB,K,25.7,RB,K
----------
Filming materials
Filming station = xx
Installation Location = xxxxxxxxxxxxxxxx
Observation item = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Observation time, water temperature (℃), salt (PSU), electrical conductivity (ms/m)
2015-04-01 00:00:00,20.6,17.31,25.7
2015-04-01 00:15:00,20.6,17.31,25.7
2015-04-01 00:30:00,20.6,17.31,25.7
2015-04-01 01:00:00,20.6,17.31,25.7

>>> import pandas as pd

>>> df = pd.read_csv(StringIO(text), skiprows=4)
>>> df
                  Observation time Water temperature (℃) Salt (PSU) Electrical conductivity (ms/m)
0  2015-04-01 00:00:00   20.6    17.31         25.7
1  2015-04-01 00:15:00   20.6    17.31         25.7
2  2015-04-01 00:30:00   20.6    17.31         25.7
3  2015-04-01 01:00:00   20.6    17.31         25.7
>>> 


2022-09-20 20:47

''test.txt
TIME,TEMPER,rb,k,PSU,rb,k,MS/M,rb,k
2015-04-01 00:15:00,20.6,RB,K,17.31,RB,K,25.7,RB,K
2015-04-01 00:30:00,20.6,RB,K,17.31,RB,K,25.7,RB,K
2015-04-01 01:00:00,20.6,RB,K,17.31,RB,K,25.7,RB,K
'''
import pandas as pd
f= pd.read_csv('test.txt',sep=',',header=None, encoding='utf8', engine='python')
print(f)
'''
                  TIME  TEMPER  rb  k    PSU rb.1 k.1  MS/M rb.2 k.2
0  2015-04-01 00:15:00    20.6  RB  K  17.31   RB   K  25.7   RB   K
1  2015-04-01 00:30:00    20.6  RB  K  17.31   RB   K  25.7   RB   K
2  2015-04-01 01:00:00    20.6  RB  K  17.31   RB   K  25.7   RB   K
'''
f = f.drop(['rb','k','rb.1','k.1','rb.2','k.2'], axis=1)
print(f)
'''
                  TIME  TEMPER    PSU  MS/M
0  2015-04-01 00:15:00    20.6  17.31  25.7
1  2015-04-01 00:30:00    20.6  17.31  25.7
2  2015-04-01 01:00:00    20.6  17.31  25.7
'''
#REF.https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html

Please refer to the official document


2022-09-20 20:47

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.