Language Modeling

Title Published in Github Paper Citation





ChatGPT for Arabic Grammatical Error Correction ArXiv 2023 @article{kwon2023chatgpt,
title={ChatGPT for Arabic Grammatical Error Correction},
author={Kwon, Sang Yun and Bhatia, Gagan and Nagoud, El Moatez Billah and Abdul-Mageed, Muhammad},
journal={arXiv preprint arXiv:2308.04492},
year={2023}}
Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with Architecture-Routed Mixture-of-Experts ArXiv 2023 @article{jawahar2023mixture,
title={Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with Architecture-Routed Mixture-of-Experts},
author={Jawahar, Ganesh and Yang, Haichuan and Xiong, Yunyang and Liu, Zechun and Wang, Dilin and Sun, Fei and Li, Meng and Pappu, Aasish and Oguz, Barlas and Abdul-Mageed, Muhammad and others},
journal={arXiv preprint arXiv:2306.04845},
year={2023}}
GPTAraEval: A Comprehensive Evaluation of ChatGPT on Arabic NLP EMNLP 2023 @article{khondaker2023gptaraeval,
title={GPTAraEval: A Comprehensive Evaluation of ChatGPT on Arabic NLP},
author={Khondaker, Md Tawkat Islam and Waheed, Abdul and Nagoudi, El Moatez Billah and Abdul-Mageed, Muhammad},
journal={arXiv preprint arXiv:2305.14976},
year={2023} }
Dolphin: A Challenging and Diverse Benchmark for Arabic NLG EMNLP 2023 @article{nagoudi2023dolphin,
title={Dolphin: A Challenging and Diverse Benchmark for Arabic NLG},
author={Nagoudi, El Moatez Billah and El-Shangiti, Ahmed and Elmadany, AbdelRahim and Abdul-Mageed, Muhammad},
journal={arXiv preprint arXiv:2305.14989},
year={2023} }
Lamini-lm: A diverse herd of distilled models from large-scale instructions ArXiv 2023 @article{wu2023lamini,
title={Lamini-lm: A diverse herd of distilled models from large-scale instructions},
author={Wu, Minghao and Waheed, Abdul and Zhang, Chiyu and Abdul-Mageed, Muhammad and Aji, Alham Fikri},
journal={arXiv preprint arXiv:2304.14402},
year={2023} }
SERENGETI: Massively Multilingual Language Models for Africa ACL 2023 @article{adebara2022serengeti,
title={SERENGETI: Massively Multilingual Language Models for Africa},
author={Adebara, Ife and Elmadany, AbdelRahim and Abdul-Mageed, Muhammad and Inciarte, Alcides Alcoba},
journal={arXiv preprint arXiv:2212.10785},
year={2022} }
ORCA: A Challenging Benchmark for Arabic Language Understanding ACL 2023 @article{elmadany2022orca,
title={ORCA: A Challenging Benchmark for Arabic Language Understanding},
author={Elmadany, AbdelRahim and Nagoudi, El Moatez Billah and Abdul-Mageed, Muhammad},
journal={arXiv preprint arXiv:2212.10758},
year={2022} }
JASMINE: Arabic GPT Models for Few-Shot Learning EMNLP 2023 @article{elmadany2022orca,
title={ORCA: A Challenging Benchmark for Arabic Language Understanding},
author={Elmadany, AbdelRahim and Nagoudi, El Moatez Billah and Abdul-Mageed, Muhammad},
journal={arXiv preprint arXiv:2212.10758},
year={2022} }
Cross-Platform and Cross-Domain Abusive Language Detection with Supervised Contrastive Learning ArXiv 2022
Small Character Models Match Large Word Models for Autocomplete Under Memory Constraints SustaiNLP 2023
A Benchmark Study of Contrastive Learning for Arabic Social Meaning ArXiv 2022
AraT5: Text-to-Text Transformers for Arabic Language Generation ACL 2022
ARBERT & MARBERT: Deep Bidirectional Transformers for Arabic ACL 2021
IndT5: A Text-to-Text Transformer for 10 Indigenous Languages AmericasNLP 2021
Learning Subjective Language: Feature Engineered vs. Deep Models LREC 2018
Not All Segments are Created Equal: Syntactically Motivated Sentiment Analysis in Lexical Space WANLP

                   Machine Translation

Title Published in Github Paper Citation





TARJAMAT: Evaluation of Bard and ChatGPT on Machine Translation of Ten Arabic Varieties ArabicNLP 2023
AutoMoE: Heterogeneous Mixture-of-Experts with Adaptive Computation for Efficient Neural Machine Translation ACL 2023
AutoMoE: Neural Architecture Search for Efficient Sparsely Activated Transformers ArXiv 2022
Linguistically-motivated Yorùbá-English machine translation ACL 2022
Findings of the Second AmericasNLP Competition on Speech-to-Text Translation PMLR
TURJUMAN: A public toolkit for neural Arabic machine translation OSACT 2022
Improving Neural Machine Translation of Indigenous Languages with Multilingual Transfer Learning ArXiv 2022
Machine Translation of Low-Resource Indo-European Languages ArXiv 2021
Exploring Text-to-Text Transformers for English to Hinglish Machine Translation with Synthetic Code-Mixing CALCS 2021
Investigating Code-Mixed Modern Standard Arabic-Egyptian to English Machine Translation CALCS 2021
Exploring Transformers for English to Hinglish Machine Translation with Synthetic Code-Mixing ACL 2021
Improving Similar Language Translation With Transfer Learning WMT 2021
Translating The Unseen? Yoruba-English → MT in Low-Resource, Morphologically-Unmarked Settings AfricanNLP 2021
Growing Together: Modeling Human Language Learning With n-Best Multi-Checkpoint Machine Translation WGNT 2020
Translating Similar Languages: Role of Mutual Intelligibility in Multilingual Transformers WMNT 2020
Neural Machine Translation of Low-Resource and Similar Languages with Backtranslation ACL 2019

                   Social Media

Title Published in Github Paper Citation





Contrastive Learning of Sociopragmatic Meaning in Social Media ACL 2023
Decay No More: A Persistent Twitter Dataset for Learning Social Meaning Social Media, Dataset NEATCLasS 2022
Improving Social Meaning Detection with Pragmatic Masking and Surrogate Fine-Tuning WASSA 2022
InfoDCL: A Distantly Supervised Contrastive Learning Framework for Social Meaning ArXiv 2022
Mega-cov: A Billion-Scale Dataset of 100+ Languages for Covid-19 EACL 2021
AraNet: A Deep Learning Toolkit For Arabic Social Media OSACT 2020
Leveraging affective bidirectional transformers for offensive language detection COSACT 2020
Understanding and detecting dangerous speech in social media ArXiv 2020
Emoji use in Twitter white nationalism communication CSCW 2019
Sentence-Level BERT and Multi-Task Learning of Age and Gender in Social Media ArXiv 2019
Multi-task bidirectional transformer representations for irony detection FIRE 2019
BERT-based Arabic social media author profiling FIRE 2019
Ensemble Learning of Offensive Content With Enhanced Training Data SemEval-2019
Hyperpartisan News Detection With Attention-Based Bi-LSTMs SemEval-2019
Happy Together: Learning and Understanding Appraisal From Natural Language ArXiv 2019
Learning implicit emotion with an ensemble of language models IEST 2019
Think Before Your Click: Data and Models for Adult Content in Arabic Twitter TA-COS-2018

                   Low Resource Languages

Title Published in Github Paper Citation





Zero-Shot Slot and Intent Detection in Low-Resource Languages VarDial 2023
UBC-DLNLP at SemEval-2023 Task 12: Impact of Transfer Learning on African Sentiment Analysis ACL 2023
Improving African Language Identification with Multi-task Learning AfricaNLP 2023
Afrolid: A neural language identification tool for african languages EMNLP 2022
NADI 2022: The Third Nuanced Arabic Dialect Identification Shared Ta ArXiv 2022
Towards Afrocentric NLP for African Languages: Where We Are and Where We Can Go ACL 2022
Dim Wihl Gat Tun: The case for linguistic expertise in NLP for under-documented languages ACL Findings 2022
One Wug, Two Wug+s: Transformer Inflection Models Hallucinate Affixes. ComputEL-5 2022
NADI 2021: The Second Nuanced Arabic Dialect Identification Shared Task WANLP 2021
IndT5: A Text-to-Text Transformer for 10 Indigenous Languages AmericasNLP 2021
DiaLex: A Benchmark for Evaluating Multidialectal Arabic Word Embeddings AmericasNLP 2021
Toward micro-dialect identification in diaglossic and code-switched environments WANLP 2021
I Trust AI, el Nuevo Proyecto de investigación de InterPARES Anuario: Escuela de Archivología
NADI 2020: The First Nuanced Arabic Dialect Identification Shared Task WANLP 2020
DiaNet: Bert and Hierarchical Attention Multi-task Learning of Fine-Grained Dialect Arxiv 2020
No Army, No Navy: Bert Semi-Supervised Learning of Arabic Dialects WANLP 2019

               Misinformation    

Title Published in Github Paper Citation





Automatic Detection of Entity-Manipulated Text using Factual Knowledge ACL 2022
Automatic Detection of Machine Generated Text: A Critical Survey COLING 2020
Machine generation and detection of Arabic manipulated and fake news WANLP 2020

                   Speech Processing

Title Published in Github Paper Citation





N-Shot Benchmarking of Whisper on Diverse Arabic Speech Recognition INTERSPEECH 2023
On the Robustness of Arabic Speech Dialect Identification ArXiv 2023
Improving automatic speech recognition for non-native english with transfer learning and language model decoding Book Chapter II
Speech Technology for Everyone: Automatic Speech Recognition for Non-Native English ICNLSP 2021
Automatic Detection of Cannabis Intoxication from Speech CAA