r/MachineLearning · · 1 min read

A new dataset with more that 100M hi-quality, curated images, with captions and meta data! [P]

Mirrored from r/MachineLearning for archival readability. Support the source by reading on the original site.

Hello everyone.

The new dataset is named MONET, is Apache 2.0 and available on HF:

https://huggingface.co/datasets/jasperai/monet

MONET is open, Apache 2.0-licensed image–text dataset. It was built from 2.9 billion images and refined to 104.9 million high-quality samples.

We are also publishing a paper that explains how the dataset was created if you are curious and 3 compagnions projects

Hope this will be usefull!

submitted by /u/dh7net
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/MachineLearning