NER SharedTask 2024

INTRODUCTION

Named Entity Recognition (NER) plays a crucial role in various Natural Language Processing (NLP) applications. This process involves identifying mentions of named entities in unstructured text and categorizing them into predefined classes, such as PERSON, ORGANIZATION, GPE, LOCATION, EVENT, and DATE. Given the relative scarcity of resources for Arabic NLP, research in Arabic NER has predominantly concentrated on "flat" entities and has been limited to a few "coarse-grained" entity types, namely PERSON, ORGANIZATION, and LOCATION. To address this limitation, the WojoodNER shared task series was initiated ( Jarrar et al., 2023). It aims to enrich Arabic NER research by introducing Wojood and Wojood-Fine, nested and fine-grained Arabic NER corpora.

DATASET

In this year's shared task (WojoodNER 2024), a new version of the Wojood corpus, called Wojood-Fine, will be released. Wojood-Fine enhances the original Wojood corpus by offering fine-grained entity types that are more granular than the data provided in WojoodNER 2023. For instance, GPE is now divided into 7 subtypes (COUNTRY, STATE-OR-PROVINCE, TOWN, NEIGHBORHOOD, CAMP, GPE_ORG, and SPORT). Similarly, LOCATION, ORGANIZATION, and FACILITY are also divided into subtyes. The corpus contains 550K tokens, 75K entity mentions covering the parent types and 47K subtype entity mentions. It is also important to highlight that Wojood-Fine is a full re-annotation of Wojood using new annotation guidelines. This means the Wojood dataset cannot be (re-)used in this shared task. More details about Wojood-Fine corpus can be found in our paper (Liqreina et al., 2023).

WojoodNER 2024 SHARED TASK

WojoodNER 2024 continues to expand on the previous WojoodNER 2023 scope, venturing beyond traditional NER tasks. This year, we introduce three subtasks, all centered around Arabic Fine-Grained NER. Among these, the "open track" subtask stands out by allowing participants to develop or utilize external datasets and leverage external tools to craft innovative systems.
While participation in any individual subtask is encouraged, we especially hope the teams will engage in all three, bringing a comprehensive approach to the competition. The introduction of multiple subtasks is designed to foster a range of diverse methodologies and machine-learning architectures. This could encompass multi-task learning systems, as well as advanced sequence-to-sequence models, such as those based on Transformer architectures and Large Language Models (LLMs).
We believe that this variety will not only challenge, but also inspire participants to explore a wide array of approaches. From leveraging existing models to pioneering new techniques, the possibilities are vast. As we delve into the specifics of these subtasks, we eagerly anticipate the creative solutions that participants will add to the Arabic NLP research for addressing the nuanced demands of Arabic Fine-Grained NER.
Subtask-1 (Closed-Track Flat Fine-Grain NER): In this subtask, we provide the Wojood-Fine Flat train (70%) and development (10%) datasets. The final evaluation will be on the test set (20%). The flat NER dataset is the same as the nested NER dataset in terms of train/test/dev split. The only difference in the flat NER is that each token is assigned one tag, which is the first high-level tag assigned to each token in the nested NER dataset. This subtask is a closed track. In other words, participants are not allowed to use other external datasets except the ones we provide to train their systems. It is also important to note that the Wojood dataset shared last year cannot be used due to different annotation guidelines.
Subtask-2 (Closed-Track Nested Fine-Grain NER): This subtask is similar to the subtask-1, we provide the Wojood-Fine Nested train (70%) and development (10%) datasets. The final evaluation will be on the test set (20%). This subtask is also a closed track.
Subtask-3 (Open-Track NER - Gaza War): In this subtask, we aim to allow participants to reflect on the utility of NER in the context of real-world events, allow them to use external resources, and encourage them to use generative models in different ways (fine-tuned, zero-shot learning, in-context learning, etc.). The goal of focusing on generative models in this particular subtask is to help the Arabic NLP research community better understand the capabilities and performance gaps of LLMs in information extraction, an area currently understudied.
We provide development and test data related to the current War on Gaza. This is motivated by the assumption that discourse about recent global events will involve mentions from different data distribution. For this subtask, we include data from five different news domains related to the War on Gaza - but we keep the names of the domains hidden. Participants will be given a development dataset (10K tokens, 2K from each of the five domains), and a testing dataset (50K tokens, 10K from each domain). Both development and testing sets are manually annotated with fine-grain named entities using the same annotation guidelines used in Subtask1 and Subtask2 (also described in Liqreina et al., 2023).

METRICs

The evaluation metrics will include precision, recall, F1-score. However, our official metric will be the micro F1-score. The evaluations of shared tasks are hosted through CODALAB.
- CODALAB link for Subtask 1 (Flat Fine-Grain NER): CodaLab Link
- CODALAB link for Subtask 2 (Nested Fine-Grain NER): CodaLab Link
- CODALAB link for Subtask 3A (Open Track Flat NER): CodaLab Link
- CODALAB link for Subtask 3B (Open Track Nested NER): CodaLab Link

BASELINES

Two baseline models trained on WojoodFine (flat and nested) are provided (See Liqreina et al., 2023). The code used to produce these baselines is available on GitHub.

Subtask	Precision	Recall	Average Micro-F1
Flat Fine-Grain NER (Subtask 1)	0.8870	0.8966	0.8917
Nested Fine-Grain NER (Subtask 2)	0.9179	0.9279	0.9229

GOOGLE COLAB NOTEBOOKS

To allow you to experiment with the baseline, we authored four Google Colab notebooks that demonstrate how to train and evaluate our baseline models.
[1] Train Flat Fine-Grain NER: This notebook can be used to train our ArabicNER model on the flat Fine-grain NER task using the sample Wojood_Fine data.
[2] Evaluate Flat Fine-Grain NER: This notebook will use the trained model saved from the notebook above to perform evaluation on unseen dataset.
[3] Train Nested Fine-Grain NER: This notebook can be used to train our ArabicNER model on the nested Fine-grain task using the sample Wojood data.
[4] Evaluate Nested Fine-Grain NER: This notebook will use the trained model saved from the notebook above to perform evaluation on unseen dataset.

SUBMISSION DATA FORMAT

Your submission to the task will include one file, which is the prediction of your model in the CoNLL format. If you are submitting to both subtasks, then you will have two submissions. The CoNLL file should include multiple columns space-separated. The IOB2 scheme should be used for the submission, which is the same format used in the Wojood dataset. Do not include a header in your submitted file. Segments should be separated by a blank line as in the sample data found in the repository.
Note that we will validate your submission to verify the number of segments and number of tokens within each segment is the same as the test dataset. We will also verify the token on each line maps to the same token in the test dataset.
Flat NER: the first column is the token (word), and the second column is the tag (entity name) as shown in Table1.

Nested NER: the first column is the token (word), followed by 21 columns, each column is for one entity type. The 21 columns should be in a particular order as shown in Table2.

Example data file

nested_prediction.txt
جريدة O O O O O O O O O O O O O B-ORG O O O O O O O فلسطين O O O O O B-GPE O O O O O O O I-ORG O O O O O O O / O O O O O O O O O O O O O O O O O O O O O نيسان O O B-DATE O O O O O O O O O O O O O O O O O O ( O O I-DATE O O O O O O O O O O O O O O O O O O 26 O O I-DATE O O O O O O O O O O O O O O O O O O / O O I-DATE O O O O O O O O O O O O O O O O O O 4 O O I-DATE O O O O O O O O O O O O O O O O O O / O O I-DATE O O O O O O O O O O O O O O O O O O 1947 O O I-DATE O O O O O O O O O O O O O O O O O O ) O O I-DATE O O O O O O O O O O O O O O O O O O . O O O O O O O O O O O O O O O O O O O O O

nested_prediction.txt

جريدة O O O O O O O O O O O O O B-ORG O O O O O O O
فلسطين O O O O O B-GPE O O O O O O O I-ORG O O O O O O O
/ O O O O O O O O O O O O O O O O O O O O O
نيسان O O B-DATE O O O O O O O O O O O O O O O O O O
( O O I-DATE O O O O O O O O O O O O O O O O O O
26 O O I-DATE O O O O O O O O O O O O O O O O O O
/ O O I-DATE O O O O O O O O O O O O O O O O O O
4 O O I-DATE O O O O O O O O O O O O O O O O O O
/ O O I-DATE O O O O O O O O O O O O O O O O O O
1947 O O I-DATE O O O O O O O O O O O O O O O O O O
) O O I-DATE O O O O O O O O O O O O O O O O O O
. O O O O O O O O O O O O O O O O O O O O O

FAQ

For any questions related to this task, please check our Frequently Asked Questions

IMPORTANT DATES

- February 25, 2024: Shared task announcement.
- March 10, 2024: Release of training data, development sets, scoring script, and Codalab links.
- April 10, 2024: Registration deadline.
- April 26, 2024: Test set made available.
- May 3, 2024: Codalab Test system submission deadline.
- May 10, 2024: Shared task system paper submissions due.
- June 17, 2024: Notification of acceptance.
- July 1, 2024: Camera-ready version.
- August 16, 2024: ArabicNLP 2024 conference in Thailand.

CONTACT

For any questions related to this task, please contact the organizers directly using the following email address: NERSharedtask@gmail.com .

ORGANIZERS

         - Mustafa Jarrar, Birzeit University
         - Muhammad Abdul-Mageed, University of British Columbia & MBZUAI
         - Mohammed Khalilia, Birzeit University
         - Bashar Talafha, University of British Columbia
         - AbdelRahim Elmadany, University of British Columbia
         - Nagham Hamad, Birzeit University

EXAMPLES

Column #	Value
1	Token
2	Tag

Table 1. Columns order for flat NER submission

Example data file

flat_prediction.txt
جريدة B-ORG فلسطين I-ORG / O نيسان B-DATE ( I-DATE 26 I-DATE / I-DATE 4 I-DATE / I-DATE 1947 I-DATE ) I-DATE . O

Column #	Value
1	Token
2	AIRPORT or O
3	BOUNDARY or O
4	BUILDING-OR-GROUNDS or O
5	CAMP or O
6	CARDINAL or O
7	CELESTIAL or O
8	CLUSTER or O
9	COM or O
10	CONTINENT or O
11	COUNTRY or O
12	CURR or O
13	DATE or O
14	EDU or O
15	ENT or O
16	EVENT or O
17	FAC or O
18	GOV or O
18	GPE or O
19	GPE_ORG or O
20	LAND or O
21	LAND-REGION-NATURAL or O
22	LANGUAGE or O
23	LAW or O
24	LOC or O
25	MED or O
26	MONEY or O
27	NEIGHBORHOOD or O
28	NONGOV or O
29	NORP or O
30	OCC or O
31	ORDINAL or O
32	ORG or O
33	ORG_FAC or O
34	PATH or O
35	PERCENT or O
36	PERS or O
37	PLANT or O
38	PRODUCT or O
39	Path or O
40	QUANTITY or O
41	REGION-GENERAL or O
42	REGION-INTERNATIONAL or O
43	REL or O
44	SCI or O
45	SPO or O
46	SPORT or O
47	STATE-OR-PROVINCE or O
48	SUBAREA-FACILITY or O
49	TIME or O
50	TOWN or O
51	UNIT or O
52	WATER-BODY or O
53	WEBSITE or O

Table 2. Columns order for nested NER submission