TF.data reborn from the ashes

credits pexel.com

TensorFlow datas v.FINALLY YOU GOT IT RIGHT!!!

  • tf.data.Iterator provides the main way to extract elements from a dataset. The operation returned by Iterator.get_next() yields the next element of a Dataset when executed, and typically acts as the interface between input pipeline code and your model The simplest iterator is a “one-shot iterator”, which is associated with a particular Dataset and iterates through it once. For more sophisticated uses, theIterator.initializer operation enables you to reinitialize and parameterize an iterator with different datasets, so that you can, for example, iterate over training and validation data multiple times in the same program.

Innerworkings

  • .shuffle() — to shuffle the dataset order.
  • .bactch() — to create batches of data, let’s say you have 1000 images in total, instead of loading and showing all images at once to your model you can show 64 images at the time for example.
  • .repeat() — to repeat elements in the dataset.
  • .prefetch() — to fetch next batch elements to be processed while the previous batch is being processed.

Datasets v1.0.1

And now the long awaited subclassing module for tf.data
tfds.list_builders()
mnist_train = tfds.load(name="mnist", split=tfds.Split.TRAIN)
  • What the dataset looks like (i.e. its features);
  • How the data should be split (e.g. TRAIN and TEST);
  • and the individual records in the dataset.

Adding a Dataset

  • _download_and_prepare: to download and serialize the source data to disk
  • _as_dataset: to produce a tf.data.Dataset from the serialized data
  • _split_generators: downloads the source data and defines the dataset splits
  • _generate_examples: yields examples in the dataset from the source data

Manual download and extraction

For source data that cannot be automatically downloaded (for example, it may require a login), the user will manually download the source data and place it in manual_dir, which you can access withdl_manager.manual_dir (defaults to ~/tensorflow_datasets/manual/my_dataset).

Summary

Things to remember:

  • By Subclassing DatasetBuilder and GeneratorBasedBuilder you can create your own custom made dataset and submit it to tensorflow’s datasets collection where it can be available for everyone.

This article is part of a series where I will be sharing the highlights of the TF Dev Summit ’19 and surprise spin-off article that I will share a new AI race.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Prince Canuma

Prince Canuma

If you want learn about AI/ML🦾 and MLOps follow me! I’m a Data Scientist 👨🏽‍💻 and Developer Advocate at http://neptune.ai