How to create your own dataset for Tensorflow from the List

Asked 2 years ago, Updated 2 years ago, 50 views

Deep learning on Python using Tensorflow

I am currently using tf.data.experimental.CsvDataset to load the learning data set from the CSV file.Each CSV file was created before Tenforflow started, but each file size is >100G, causing problems such as HD space and data transfer.So instead of pre-creating that CSV file, I'm looking for a way to create a list that contains similar information in the program and change that list to a learning dataset.

Below is a list of some of the code you are currently using.

import tensorflow as tf
importos
import sys
os.environ ['CUDA_VISIBLE_DEVICES'] = "0" # This is specified because you are using a PC with multiple GPUs.

outfn = "stackoverflow.csv"
dataset=tf.data.experimental.CsvDataset(outfn, [tf.string, tf.string])
# The following is for confirmation.
print(dataset)
for element in dataset.as_numpy_iterator():
  print(element)
  break

The stackoverflow.csv contains two columns of strings connected by commas.
Example:

aaaaaaaaaa1,bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
aaaaaaaaaaa1,bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
aaaaaaaaaaa2, ccccccccccc1

(Actually, each string has about 1000 characters and has between 1 million and 100 million lines.)

Ideally:

without the above tf.data.experimental.CsvDataset I'm looking for a way to reconfigure the stackoverflow.csv information on my PC and store it in dataset.

If anyone understands, I would appreciate it if you could let me know.
Thank you for your cooperation.

python tensorflow

2022-09-30 13:50

1 Answers

Sorry, I solved myself.
I used from_tensor_slices.

Instead of a notepad, the instructions are listed below.

The solution is

instead of List->tensorflow.dataset It was List->pandas.dataframe->tensorflow.dataset.

For example, the pandas.dataframe->tensorflow.dataset part is

import pandas as pd
df=pd.read_csv("stackoverflow.csv", header=None)
df.columns=['input','target']
input = df ["input" ]
target = df ["target" ]

dataset2=tf.data.Dataset.from_tensor_slices((input.values, target.values))


2022-09-30 13:50

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.