Yahoo! Labs


Important Dates


Reminder

Early registration to KDD'11 ends on June 30th. We hope to see you there at the
KDD-Cup workshop and the winners presentation.


Questions?

Visit our Yahoo! Group.


Tasks

The competition is divided into two tracks:

Track1: Learning to predict users' ratings of musical items. Items can be tracks, albums, artists and genres. Items form a hierarchy, such that each track belongs to an album, albums belong to artists, and together they are tagged by genres.

Track2: Learning to separate tracks scored highly by specific users from tracks not scored by them. In track2 the test set includes six items per user (all are tracks), three of which were rated highly (score 80 or higher) by the user and three were not rated by the user. The three unrated items are sampled with a probability proportional to number of their high (>=80) ratings. The task is to classify each item as either rated or not rated by the user (1 or 0 respectively). A hierarchy of items similar to the one used in Track 1 is also given for Track 2. However, timestamps of the ratings, which are given in Track 1, are withheld for Track 2.

You can compete in one or both tracks. The targets on test sets are not given. The goal is to train a model on the training set and to predict the relevant targets for each item on the test set.

Evaluation

The test sets for both Track1 and Track2 are divided into two disjoint equal sets each: Test1 and Test2. Examples in Test1 are used for calculating the scores shown on the Leaderboard: RMSE for Track1 and Error rate for Track2. The examples of Test2 are reserved for choosing the winners of the competition. Hence, the possibility exists that team rankings on the Leaderboard will differ from the final results, which would be calculated on Test2.

Submissions for Track1 will be evaluated using Root Mean Square Error (RMSE) and submissions of Track2 will be evaluated using Error rate (fraction of misclassifications).

Teams

You may participate in the competition as an individual or as part of a team of up to 10 participants. You may only be involved in one "team" in the competition (i.e. either as an individual or as a member of a single team, but not both). For team registrations, the team must select a team leader, who will provide the team name, and the names, email addresses and affiliations of all members of the team during registration. Team members are required to have a Yahoo! login, which will be used to log into the competition site. The names and e-mail addresses of team members will not be publicly displayed during the competition. If you are an individual winner or member of a winning team, your name will be announced following the competition.

Submissions Formats

For both Track1 and Track2, you will be required to upload one file, which includes the predictions for the respective test set. The format of the submissions is planned to reduce file sizes.

Track1: Item scores are restricted to 256 values, encoded in the range [0, 255]. Each score should be translated to its appropriate unsigned byte-code. Since predictions are limited to the 0.100 range, predictions will be encoded by multiplying them with 2.55. Hence, a predicted rating of 100 would be encoded as 255 (0xFF), and a predicted rating of 80 would be encoded as 204 (0xCC). All byte-codes that correspond to test set examples should be written to the submission file consecutively, i.e. with no separator between byte-codes. The order of the byte-codes corresponds to the order specified in the test set. We provide a Tool for converting textual predictions into the binary submission format. This code also makes several sanity checks on the validity of the submission, so we recommend using it.

Track2: Item scores are restricted to '0' (0x30) and '1' (0x31), standing for unrated and rated high by the user, respectively. All scores that correspond to the test set should be written to the submission file consecutively, i.e. with no separator or newline between item scores. Predictions order must correspond to their order in the test set. For each user, three of the predicted ratings must equal '1', while the other three must be set to '0'.

Participants can upload multiple submissions over the course of the Challenge, but at most one submission every 8 hours for each of Track1 and Track2.

The performance of the submission on a fixed subset of the test set (aka Test1) will be posted on the Leaderboard; but the results on Test2, which will be used to choose the winners, will not be disclosed until the end of the competition.

Final Submission

For both Track1 and Track2, the last submission will count for determining the winners of the competition. The rules below apply for Track1 and Track2 independently.

At the end of the competition, the primary entries from all participants will be ranked in decreasing order of their respective scores (defined above) computed on the test set Test2. The top three ranks will receive the following cash prizes:

  • - 1st place: $5,000
  • - 2nd place: $2,000
  • - 3rd place: $1,000

In the event where the first, second or third ranks are obtained by more than one team - the corresponding prize will be equally distributed between these teams.

As a condition for receiving any prize, the prospective winning teams are required to submit a manuscript describing the winning Team's Algorithm and methods used to generate the Team's output. In addition, the winning entries in both Track1 and Track2 will be invited to present a brief talk describing their winning method at the KDD Cup workshop at the upcoming KDD conference, on August 21, 2011 in San Diego (http://www.kdd.org/kdd2011/kddcup.shtml). Winners will be responsible for their own transportation, lodging and workshop registration, as these costs are not included in the prize packages.

To view the full Official Rules of the Contest click here.