Python
def make_dataset(data, seq_length=480, target_delay=24, strides=5,
mode='train', train_mean=None, train_std=None):
assert mode in ['train', 'val', 'test']
if mode is not 'train':
if train_mean is None or train_std is None:
print('Current mode is {}'.format(mode))
print('This mode needs mean and std of train data')
assert False
# Normalization
## Code start ##
if mode is 'train':
mean = train_data.mean(axis=0) #1-1
std = train_data.std(axis=0) #1-1
else:
mean = train_mean #1-2
std = train_std #1-2
data = (data - mean) / std #1-3
## Code End ##
# input, generating target data
sequence = []
target = []
for index in range(len(data) - seq_length - target_delay):
if index % strides == 0:
## Code start ##
sequence.append(index - seq_length) #2-1
target.append(index + seq_length + target_delay) #2-2
## Code End ##
if mode is 'train':
print(np.array(sequence).shape, np.array(target).shape)
return np.array(sequence), np.array(target), mean, std
else:
print(np.array(sequence).shape, np.array(target).shape)
return np.array(sequence), np.array(target)
Please write only the part between "## code start ##" and "## code end ##".
1-1 : When in train mode: the mean, std of the data set (train_data) in the train directory must be obtained. Be careful of the axis when using the function.
1-2 : When in val or test mode: train_mean, train_std.
1-3 : Write a normalize code. (The formula is (--(())))/VVar())))
(()) is the mean of the data and VVar()) is the standard deviation of the data (std).
Spin the entire time step (range(len(data) - seq_length - target_delay) with a for statement and save as a training data set only when the index (variable name index) is divided into strides.
2-1: If stride=3, the index is 0, 3, 6, 9, ... You only put data in sequence and target when. That is, the index becomes the first index of each batch data. Therefore, the data corresponding to the data variable is cut from the index by seq_length (480 hours) and added to the sequence. This will soon be input data passed to the model.
2-2 : Add the temperature (0th column) to the target from a position as far as seq_length from the index of data by target_delay (i.e., 24 hours) again.
The shape of the sequence should be (N, seq_length, 9) and the shape of the target should be (N,) when changed to the final Numpy. N is the total number of data.
I wrote the code between start and end by myself. However, it seems that the sequence shape of the result is not properly implemented if it is executed with the above code.
Please help me with the correction between ## code start ##" and "## code end ##" in order to get the result right. Thank you for reading this long article.
Data corresponding to the data variable is cut from the index by the seq_length (480 hours) and
--> data[index:index+seq_length]
© 2024 OneMinuteCode. All rights reserved.