Shared Task 2: Commonsense inference in news articles

The second track uses the ReCoRD dataset (Zhang et al., 2018). ReCoRD mines its prompts, questions, and answers from news articles in order to collect data efficiently and reduce human elicitation bias. It annotates named entities in the text and additionally provides some brief bullet points that summarize the prompt. It then asks for cloze-style answers, filling in a blank in a sentence related to the news article.

Example (from Zhang et al.):

Prompt: (CNN) -- A lawsuit has been filed claiming that the iconic Led Zeppelin song "Stairway to Heaven" was far from original. The suit, filed on May 31 in the United States District Court Eastern District of Pennsylvania, was brought by the estate of the late musician Randy California against the surviving members of Led Zeppelin and their record label. The copyright infringement case alleges that the Zeppelin song was taken from the single "Taurus" by the 1960s band Spirit, for whom California served as lead guitarist. "Late in 1968, a then new band named Led Zeppelin began touring in the United States, opening for Spirit," the suit states. "It was during this time that Jimmy Page, Led Zeppelin's guitarist, grew familiar with 'Taurus' and the rest of Spirit's catalog. Page stated in interviews that he found Spirit to be 'very good' and that the band's performances struck him 'on an emotional level.' "

Summary points:

Query: According to claims in the suit, "Parts of 'Stairway to Heaven,' instantly recognizable to the music fans across the world, sound almost identical to significant portions of ‘________.’”

Training and development data for the task are released here and here.

A readme describing the data structure is here.

The submission is handled via Codalab Worksheets, under this link.

Note that the test data will not be published while the shared tasks are running. Submissions during the practice phase will be displayed on the dev data leaderboard. Submissions during the eval phase will not be displayed on a leaderboard. The test data leaderboard will only be published after the eval phase.

Rank Model EM F1
Human Performance 91.31 91.69