The following is a non-exhaustive list of datasets that are relevant for the fashionXrecsys workshop. Participants presenting work in any of these datasets will automatically be part of the workshop's challenge track. If there is a public dataset that you think should be added to the list, please contact the organizing comittee.
- Clothing Fit Dataset for Size Recommendation
:
https://www.kaggle.com/rmisra/clothing-fit-dataset-for-size-recommendation
Product size recommendation and fit prediction are critical in order to improve customers’ shopping experiences and to reduce product return rates. However, modeling customers’ fit feedback is challenging due to its subtle semantics, arising from the subjective evaluation of products and imbalanced label distribution (most of the feedbacks are "Fit"). These datasets, which are the only fit related datasets available publically at this time, collected from ModCloth and RentTheRunWay could be used to address these challenges to improve the recommendation process.
- Large-scale Fashion (DeepFashion):
http://mmlab.ie.cuhk.edu.hk/projects/DeepFashion.html
Description: DeepFashion is a large-scale clothes database which contains over 800,000 diverse fashion images ranging from well-posed shop images to unconstrained consumer photos. DeepFashion is annotated with rich information of clothing items. Each image in this dataset is labeled with 50 categories, 1,000descriptive attributes, bounding box and clothing landmarks. DeepFashion also contains over 300,000 cross-pose/cross-domain image pairs.
- DeepFashion2 dataset:
https://github.com/switchablenorms/DeepFashion2
Description: DeepFashion2 is a comprehensive fashion dataset. It contains 491K diverse images of 13 popular clothing categories from both commercial shopping stores and consumers. It totally has 801K clothing clothing items, where each item in an image is labeled with scale, occlusion, zoom-in, viewpoint, category, style, bounding box, dense landmarks and per-pixel mask.There are also 873K Commercial-Consumer clothes pairs.
- Street2Shop :
http://www.tamaraberg.com/street2shop/
Description: Street2Shop has 20,357 labeled images of clothing worn by people in the real world, and 404,683 images of clothing from shopping websites. The dataset contains 39,479 pairs of exactly matching items worn in street photos and shown in shop images.
- Fashionista:
http://vision.is.tohoku.ac.jp/~kyamagu/research/clothing_parsing/
Description: Fashionista is a novel dataset to study clothes parsing, containing 158,235 fashion photos with associated text annotations.
- Paperdoll:
http://vision.is.tohoku.ac.jp/~kyamagu/research/paperdoll/
Description: The Paper Doll dataset is a large collection of tagged fashion pictures with no manual annotation. It contains over 1 million pictures from chictopia.com with associated metadata tags denoting characteristics such as color, clothing item, or occasion.
- Fashion MNIST:
https://github.com/zalandoresearch/fashion-mnist
Description: Fashion-MNIST is a dataset of Zalando’s article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes.
- ModaNet dataset:
https://github.com/eBay/modanet
Description: ModaNet is a street fashion images dataset consisting of annotations related to RGB images. ModaNet provides multiple polygon annotations for each image.
- iMaterialist-Fashion:
https://www.kaggle.com/c/imaterialist-fashion-2019-FGVC6
Description: The dataset contains over 50K clothing images labeled for fine-grained segmentation.
- women's e-commerce dataset:
https://github.com/NadimKawwa/WomeneCommerce
Description: This is a Women’s Clothing E-Commerce dataset revolving around the reviews written by customers. Its nine supportive features offer a great environment to parse out the text through its multiple dimensions. Because this is real commercial data, it has been anonymized, and references to the company in the review text and body have been replaced with “retailer”.
- Amazon Reviews dataset:
http://jmcauley.ucsd.edu/data/amazon/links.html
Description: This dataset contains product reviews and metadata from Amazon, including 142.8 million reviews spanning May 1996 - July 2014. This dataset includes reviews (ratings, text, helpfulness votes), product metadata (descriptions, category information, price, brand, and image features), and links (also viewed/also bought graphs).
- Fashion Product Images Dataset:
https://www.kaggle.com/paramaggarwal/ fashion-product-images-dataset
Description: In addition to professionally shot high resolution product images, the dataset contains multiple label attributes describing the product which was manually entered while cataloging. The dataset also contains descriptive text that comments on the product characteristics.
- Brazilian E-Commerce Public Dataset by Olist:
https://www.kaggle.com/olistbr/brazilian-ecommerce#olist_products_dataset.csv
Description: The dataset has information of 100k orders from 2016 to 2018 made at multiple marketplaces in Brazil. Its features allows viewing an order from multiple dimensions: from order status, price, payment and freight performance to customer location, product attributes and finally reviews written by customers. The dataset contains real commercial data, it has been anonymised, and references to the companies and partners in the review text have been replaced with the names of Game of Thrones great houses.
- Flipkart products dataset:
https://www.kaggle.com/PromptCloudHQ/flipkart-products
Description: This is a pre-crawled dataset, taken as subset of a bigger dataset (more than 5.8 million products) that was created by extracting data from Flipkart.com, a leading Indian eCommerce store.
- Fashion Takes Shape:
https://www.groundai.com/project/fashion-is-taking-shape-understanding-clothing-preference-based-on-body-shape-from-online-sources/1
Description: The dataset includes more than 18000 images with meta-data including clothing category, and a manual shape annotation indicating whether the person’s shape is above average or average. The data comprises 181 different users from chictopia. Using our multi-photo method, we estimated the shape of each user. This allowed us to study the relationship between clothing categories and body shape. In particular, we compute the conditional distribution of clothing category conditioned on body shape parameters.