Deep learning on Python using Tensorflow
I am currently using tf.data.experimental.CsvDataset to load the learning data set from the CSV file.Each CSV file was created before Tenforflow started, but each file size is >100G, causing problems such as HD space and data transfer.So instead of pre-creating that CSV file, I'm looking for a way to create a list that contains similar information in the program and change that list to a learning dataset.
Below is a list of some of the code you are currently using.
import tensorflow as tf
importos
import sys
os.environ ['CUDA_VISIBLE_DEVICES'] = "0" # This is specified because you are using a PC with multiple GPUs.
outfn = "stackoverflow.csv"
dataset=tf.data.experimental.CsvDataset(outfn, [tf.string, tf.string])
# The following is for confirmation.
print(dataset)
for element in dataset.as_numpy_iterator():
print(element)
break
The stackoverflow.csv contains two columns of strings connected by commas.
Example:
aaaaaaaaaa1,bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
aaaaaaaaaaa1,bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
aaaaaaaaaaa2, ccccccccccc1
(Actually, each string has about 1000 characters and has between 1 million and 100 million lines.)
Ideally:
without the above tf.data.experimental.CsvDataset
I'm looking for a way to reconfigure the stackoverflow.csv information on my PC and store it in dataset.
If anyone understands, I would appreciate it if you could let me know.
Thank you for your cooperation.
Sorry, I solved myself.
I used from_tensor_slices.
Instead of a notepad, the instructions are listed below.
The solution is
instead of List->tensorflow.dataset
It was List->pandas.dataframe->tensorflow.dataset.
For example, the pandas.dataframe->tensorflow.dataset part is
import pandas as pd
df=pd.read_csv("stackoverflow.csv", header=None)
df.columns=['input','target']
input = df ["input" ]
target = df ["target" ]
dataset2=tf.data.Dataset.from_tensor_slices((input.values, target.values))
© 2024 OneMinuteCode. All rights reserved.