top of page
1.png

Dataset I for Popularity prediction

Micro-video dataset I was crawled from one of the most prominent micro-video sharing social networks, Vine. Besides the historical uploaded micro-videos, Vine also archives users' profiles and their social connections. The dataset can be download here.

Dataset II for Venue Category Estimation

We crawled the micro-videos from Vine through its public API. In particular, we first manually chose a small set of active users as the seeds. We expanded the user sets through incrementally gathering the seed users’ followers. With the user set, we then crawled the published videos, descriptions and venue information if available from the collected users. We picked out about 24,000 micro videos containing Foursquare check-in information from the overall crawled micro-video set. After removing the duplicated venue IDs, we further expanded our video set by crawling all videos in each venue ID with the help of API. Thereafter, we obtained a dataset of 276,264 videos distributed in 442 Foursquare venue categories and served the corresponding ID as the ground truth. Furthermore, we observed that the category distribution is heavily unbalanced. Thereinto, several categories contain a limit number of micro-videos to train a robust classifier. We hence removed the leaf categories with less than 50 micro-videos. At last, we achieved 270,145 micro-videos distributed in 188
Foursquare venue categories.The dataset can be download here​.

dataset2.png

Dataset III for Micro-video Routing

dataset3.png

Dataset III-1. The first dataset is released by the Kuaishou Competition in ChinaMM2018 conference, which aims to infer users click probabilities for new micro-videos. In this dataset, there are multiple interactions between users and micro-videos, such as click, not click, like, and follow. The dataset can be download here.

Dataset III-2. The second dataset is constructed for micro-video click-through prediction. It consists of 10,986 users,1,704,880 micro-videos, and 12,737,619 interactions. In this dataset, each micro-video is represented by the 512-d visual embedding extracted from its thumbnail and associated with a category label, and each users behavior is linked with a processed timestamp. The dataset can be download here​.

bottom of page