I want to create a graph with large amounts of data, but I'm forced to quit.

Asked 2 years ago, Updated 2 years ago, 54 views

I have a data file every hour (for example, 2018010100, 2018010101, 20180102...) and I want to read it for about a month to create a graph, but it ends in the middle.I think it's probably because there's too much data to read, but is there any good way?
The x-axis is the time and y-axis is the value, and the code sample below shows 1 Hz, but it is actually 100 Hz waveform data.

import numpy as np
import pandas aspd
import matplotlib.pyplot asplt
import matplotlib.dates as mdates
import datetime
from datetime import datetime
from datetime import timedelta

startymdh='2018070100'
endymdh='2018070223'

ymdh = startymdh

# Graphing
config,ax=plt.subplots()

# x-axis range
ax.set_xlim(datetime.strptime(startymdh, '%Y%m%d%H', datetime.strptime(endymdh, '%Y%m%d%H')+timedelta(hours=1)))

ax.xaxis.set_major_formatter(mdates.DateFormatter('%m of %d'))

# Load files every hour
while ymdh<=endymdh:

    # x_data —Generates a list of 3600 elements every second from the start time
    x = pd.date_range(datetime.strptime(ymdh, '%Y%m%d%H',
                datetime.strptime(ymdh, '%Y%m%d%H') + timedelta(hours=1) - timedelta(seconds=1), 
                freq = 'S')

    #y_data—Generates a list of 3600 elements by generating random numbers between 0 and (1+hour)
    print (ymdh [8:10])
    y=np.random.randint(0,int(ymdh[8:10])+13600)

    #3 Plot
    ax.plot(x,y,color='C0')
    # set the variable to an hour later
    ymdhtmp = datetime.strptime(ymdh, '%Y%m%d%H') + timedelta(hours=1)
    ymdh=ymdhtmp.strftime('%Y%m%d%H')
else:
    # set the variable to an hour later
    ymdhtmp = datetime.strptime(ymdh, '%Y%m%d%H') + timedelta(hours=1)
    ymdh=ymdhtmp.strftime('%Y%m%d%H')
# Output File Name
plt.savefig('test.png')
plt.close()

python python3 matplotlib

2022-09-30 21:33

1 Answers

The code in question works with a code sample, so if you increase the memory, you can create a graph.However, the graph has too many dots (100 6060 6060 2424 3030 2260 million) and will turn black (blue because color='C0')

Therefore, I don't think matplotlib has the function of "Reading data → Drawing → Opening data → Reading data for next time → Overwriting drawing → Opening data".

The resolution next to the PC display is approximately 1000pixel.If you want to graph data for one month, you usually use time data because the number of pixels per day is 40.Even if you try hard to print on a large size paper, you can display the data for 10 minutes.

Therefore, if you want to graph time series data, I think the starting point is to calculate the basic statistics every hour, 10 minutes, 1 minute, etc. and make it into a graph.

The main basic statistics are as follows, so you can choose what you need and draw a graph.( ) is the corresponding function of Pandas' Series.

Number of valid data (count)
mean
Mean absolute deviation (mad)
Standard deviation (std)
Unbiased Dispersion (var) Median
Minimum (min)
Maximum value (max)


2022-09-30 21:33

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.