Machine Translation

Title Published in Github Paper Citation





Investigating Code-Mixed Modern Standard Arabic-Egyptian to English Machine Translation CALCS 2021
Exploring Transformers for English to Hinglish Machine Translation with Synthetic Code-Mixing ACL 2021
Improving Similar Language Translation With Transfer Learning WMT 2021
Translating The Unseen? Yoruba-English → MT in Low-Resource, Morphologically-Unmarked Settings AfricanNLP 2021
Growing Together: Modeling Human Language Learning With n-Best Multi-Checkpoint Machine Translation WGNT 2020
Translating Similar Languages: Role of Mutual Intelligibility in Multilingual Transformers WMNT 2020

                   Social Media

Title Published in Github Paper Citation





Decay No More: A Persistent Twitter Dataset for Learning Social Meaning Social Media, Dataset NEATCLasS 2022
Improving Social Meaning Detection with Pragmatic Masking and Surrogate Fine-Tuning WASSA 2022
InfoDCL: A Distantly Supervised Contrastive Learning Framework for Social Meaning ArXiv 2022
Mega-cov: A Billion-Scale Dataset of 100+ Languages for Covid-19 EACL 2021
AraNet: A Deep Learning Toolkit For Arabic Social Media OSACT 2020
Leveraging affective bidirectional transformers for offensive language detection COSACT 2020
Emoji use in Twitter white nationalism communication CSCW 2019
Sentence-Level BERT and Multi-Task Learning of Age and Gender in Social Media ArXiv 2019
Multi-task bidirectional transformer representations for irony detection Social Media FIRE 2019
BERT-based Arabic social media author profiling Social Media FIRE 2019

                   Low Resource Languages

Title Published in Github Paper Citation





Dim Wihl Gat Tun: The case for linguistic expertise in NLP for under-documented languages ACL Findings 2022
One Wug, Two Wug+s: Transformer Inflection Models Hallucinate Affixes. ComputEL-5 2022
NADI 2021: The Second Nuanced Arabic Dialect Identification Shared Task WANLP 2021
IndT5: A Text-to-Text Transformer for 10 Indigenous Languages AmericasNLP 2021
DiaLex: A Benchmark for Evaluating Multidialectal Arabic Word Embeddings AmericasNLP 2021
Toward micro-dialect identification in diaglossic and code-switched environments WANLP 2021
NADI 2020: The First Nuanced Arabic Dialect Identification Shared Task WANLP 2020
DiaNet: Bert and Hierarchical Attention Multi-task Learning of Fine-Grained Dialect Arxiv 2020
No Army, No Navy: Bert Semi-Supervised Learning of Arabic Dialects WANLP 2019

               Misinformation    

Title Published in Github Paper Citation





Automatic Detection of Entity-Manipulated Text using Factual Knowledge ACL 2022
Automatic Detection of Machine Generated Text: A Critical Survey COLING 2020
Machine generation and detection of Arabic manipulated and fake news WANLP 2020

                   Language Modeling

Title Published in Github Paper Citation





AraT5: Text-to-Text Transformers for Arabic Language Generation ACL 2022
ARBERT & MARBERT: Deep Bidirectional Transformers for Arabic ACL 2021
IndT5: A Text-to-Text Transformer for 10 Indigenous Languages AmericasNLP 2021

                   Speech Processing

Title Published in Github Paper Citation





Speech Technology for Everyone: Automatic Speech Recognition for Non-Native English ICNLSP 2021