UBC - Deep Learning and Natural Language Processing Lab

Language Modeling

Title	Published in	Github	Paper	Citation

ChatGPT for Arabic Grammatical Error Correction	ArXiv 2023			@article{kwon2023chatgpt, title={ChatGPT for Arabic Grammatical Error Correction}, author={Kwon, Sang Yun and Bhatia, Gagan and Nagoud, El Moatez Billah and Abdul-Mageed, Muhammad}, journal={arXiv preprint arXiv:2308.04492}, year={2023}}
Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with Architecture-Routed Mixture-of-Experts	ArXiv 2023			@article{jawahar2023mixture, title={Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with Architecture-Routed Mixture-of-Experts}, author={Jawahar, Ganesh and Yang, Haichuan and Xiong, Yunyang and Liu, Zechun and Wang, Dilin and Sun, Fei and Li, Meng and Pappu, Aasish and Oguz, Barlas and Abdul-Mageed, Muhammad and others}, journal={arXiv preprint arXiv:2306.04845}, year={2023}}
GPTAraEval: A Comprehensive Evaluation of ChatGPT on Arabic NLP	EMNLP 2023			@article{khondaker2023gptaraeval, title={GPTAraEval: A Comprehensive Evaluation of ChatGPT on Arabic NLP}, author={Khondaker, Md Tawkat Islam and Waheed, Abdul and Nagoudi, El Moatez Billah and Abdul-Mageed, Muhammad}, journal={arXiv preprint arXiv:2305.14976}, year={2023} }
Dolphin: A Challenging and Diverse Benchmark for Arabic NLG	EMNLP 2023			@article{nagoudi2023dolphin, title={Dolphin: A Challenging and Diverse Benchmark for Arabic NLG}, author={Nagoudi, El Moatez Billah and El-Shangiti, Ahmed and Elmadany, AbdelRahim and Abdul-Mageed, Muhammad}, journal={arXiv preprint arXiv:2305.14989}, year={2023} }
Lamini-lm: A diverse herd of distilled models from large-scale instructions	ArXiv 2023			@article{wu2023lamini, title={Lamini-lm: A diverse herd of distilled models from large-scale instructions}, author={Wu, Minghao and Waheed, Abdul and Zhang, Chiyu and Abdul-Mageed, Muhammad and Aji, Alham Fikri}, journal={arXiv preprint arXiv:2304.14402}, year={2023} }
SERENGETI: Massively Multilingual Language Models for Africa	ACL 2023			@article{adebara2022serengeti, title={SERENGETI: Massively Multilingual Language Models for Africa}, author={Adebara, Ife and Elmadany, AbdelRahim and Abdul-Mageed, Muhammad and Inciarte, Alcides Alcoba}, journal={arXiv preprint arXiv:2212.10785}, year={2022} }
ORCA: A Challenging Benchmark for Arabic Language Understanding	ACL 2023			@article{elmadany2022orca, title={ORCA: A Challenging Benchmark for Arabic Language Understanding}, author={Elmadany, AbdelRahim and Nagoudi, El Moatez Billah and Abdul-Mageed, Muhammad}, journal={arXiv preprint arXiv:2212.10758}, year={2022} }
JASMINE: Arabic GPT Models for Few-Shot Learning	EMNLP 2023			@article{elmadany2022orca, title={ORCA: A Challenging Benchmark for Arabic Language Understanding}, author={Elmadany, AbdelRahim and Nagoudi, El Moatez Billah and Abdul-Mageed, Muhammad}, journal={arXiv preprint arXiv:2212.10758}, year={2022} }
Cross-Platform and Cross-Domain Abusive Language Detection with Supervised Contrastive Learning	ArXiv 2022
Small Character Models Match Large Word Models for Autocomplete Under Memory Constraints	SustaiNLP 2023
A Benchmark Study of Contrastive Learning for Arabic Social Meaning	ArXiv 2022
AraT5: Text-to-Text Transformers for Arabic Language Generation	ACL 2022
ARBERT & MARBERT: Deep Bidirectional Transformers for Arabic	ACL 2021
IndT5: A Text-to-Text Transformer for 10 Indigenous Languages	AmericasNLP 2021
Learning Subjective Language: Feature Engineered vs. Deep Models	LREC 2018
Not All Segments are Created Equal: Syntactically Motivated Sentiment Analysis in Lexical Space	WANLP

Machine Translation

Title	Published in	Github	Paper	Citation

TARJAMAT: Evaluation of Bard and ChatGPT on Machine Translation of Ten Arabic Varieties	ArabicNLP 2023
AutoMoE: Heterogeneous Mixture-of-Experts with Adaptive Computation for Efficient Neural Machine Translation	ACL 2023
AutoMoE: Neural Architecture Search for Efficient Sparsely Activated Transformers	ArXiv 2022
Linguistically-motivated Yorùbá-English machine translation	ACL 2022
Findings of the Second AmericasNLP Competition on Speech-to-Text Translation	PMLR
TURJUMAN: A public toolkit for neural Arabic machine translation	OSACT 2022
Improving Neural Machine Translation of Indigenous Languages with Multilingual Transfer Learning	ArXiv 2022
Machine Translation of Low-Resource Indo-European Languages	ArXiv 2021
Exploring Text-to-Text Transformers for English to Hinglish Machine Translation with Synthetic Code-Mixing	CALCS 2021
Investigating Code-Mixed Modern Standard Arabic-Egyptian to English Machine Translation	CALCS 2021
Exploring Transformers for English to Hinglish Machine Translation with Synthetic Code-Mixing	ACL 2021
Improving Similar Language Translation With Transfer Learning	WMT 2021
Translating The Unseen? Yoruba-English → MT in Low-Resource, Morphologically-Unmarked Settings	AfricanNLP 2021
Growing Together: Modeling Human Language Learning With n-Best Multi-Checkpoint Machine Translation	WGNT 2020
Translating Similar Languages: Role of Mutual Intelligibility in Multilingual Transformers	WMNT 2020
Neural Machine Translation of Low-Resource and Similar Languages with Backtranslation	ACL 2019

Social Media

Title	Published in	Github	Paper	Citation

Contrastive Learning of Sociopragmatic Meaning in Social Media	ACL 2023
Decay No More: A Persistent Twitter Dataset for Learning Social Meaning Social Media, Dataset	NEATCLasS 2022
Improving Social Meaning Detection with Pragmatic Masking and Surrogate Fine-Tuning	WASSA 2022
InfoDCL: A Distantly Supervised Contrastive Learning Framework for Social Meaning	ArXiv 2022
Mega-cov: A Billion-Scale Dataset of 100+ Languages for Covid-19	EACL 2021
AraNet: A Deep Learning Toolkit For Arabic Social Media	OSACT 2020
Leveraging affective bidirectional transformers for offensive language detection	COSACT 2020
Understanding and detecting dangerous speech in social media	ArXiv 2020
Emoji use in Twitter white nationalism communication	CSCW 2019
Sentence-Level BERT and Multi-Task Learning of Age and Gender in Social Media	ArXiv 2019
Multi-task bidirectional transformer representations for irony detection	FIRE 2019
BERT-based Arabic social media author profiling	FIRE 2019
Ensemble Learning of Offensive Content With Enhanced Training Data	SemEval-2019
Hyperpartisan News Detection With Attention-Based Bi-LSTMs	SemEval-2019
Happy Together: Learning and Understanding Appraisal From Natural Language	ArXiv 2019
Learning implicit emotion with an ensemble of language models	IEST 2019
Think Before Your Click: Data and Models for Adult Content in Arabic Twitter	TA-COS-2018

Low Resource Languages

Title	Published in	Github	Paper	Citation

Zero-Shot Slot and Intent Detection in Low-Resource Languages	VarDial 2023
UBC-DLNLP at SemEval-2023 Task 12: Impact of Transfer Learning on African Sentiment Analysis	ACL 2023
Improving African Language Identification with Multi-task Learning	AfricaNLP 2023
Afrolid: A neural language identification tool for african languages	EMNLP 2022
NADI 2022: The Third Nuanced Arabic Dialect Identification Shared Ta	ArXiv 2022
Towards Afrocentric NLP for African Languages: Where We Are and Where We Can Go	ACL 2022
Dim Wihl Gat Tun: The case for linguistic expertise in NLP for under-documented languages	ACL Findings 2022
One Wug, Two Wug+s: Transformer Inflection Models Hallucinate Affixes.	ComputEL-5 2022
NADI 2021: The Second Nuanced Arabic Dialect Identification Shared Task	WANLP 2021
IndT5: A Text-to-Text Transformer for 10 Indigenous Languages	AmericasNLP 2021
DiaLex: A Benchmark for Evaluating Multidialectal Arabic Word Embeddings	AmericasNLP 2021
Toward micro-dialect identification in diaglossic and code-switched environments	WANLP 2021
I Trust AI, el Nuevo Proyecto de investigación de InterPARES	Anuario: Escuela de Archivología
NADI 2020: The First Nuanced Arabic Dialect Identification Shared Task	WANLP 2020
DiaNet: Bert and Hierarchical Attention Multi-task Learning of Fine-Grained Dialect	Arxiv 2020
No Army, No Navy: Bert Semi-Supervised Learning of Arabic Dialects	WANLP 2019

Misinformation

Title	Published in	Github	Paper	Citation

Automatic Detection of Entity-Manipulated Text using Factual Knowledge	ACL 2022
Automatic Detection of Machine Generated Text: A Critical Survey	COLING 2020
Machine generation and detection of Arabic manipulated and fake news	WANLP 2020

Speech Processing

Title	Published in	Github	Paper	Citation

N-Shot Benchmarking of Whisper on Diverse Arabic Speech Recognition	INTERSPEECH 2023
On the Robustness of Arabic Speech Dialect Identification	ArXiv 2023
Improving automatic speech recognition for non-native english with transfer learning and language model decoding	Book Chapter II
Speech Technology for Everyone: Automatic Speech Recognition for Non-Native English	ICNLSP 2021
Automatic Detection of Cannabis Intoxication from Speech	CAA