Engineers building Arabic-facing systems.

ML engineers and product engineers shipping Arabic search, chatbots, document intelligence, or translation.

Five-day syllabus

Build it the right way.

DAY01

Arabic linguistic foundations for engineers

Script properties: ligatures, shaping, diacritics, kashida — and what your text editor lies about
Morphology: roots, patterns, clitics — why English-style stemming destroys Arabic
MSA vs Gulf, Levantine, Egyptian, Maghrebi — and the dialect-detection problem
Unicode pitfalls: Alef variants, Yaa/Alif Maqsura, Hamza positions, BiDi

LabBuild a robust Arabic text normaliser: NFC, dediacritisation, Alef/Yaa unification, tatweel removal, with property-based tests against edge cases.

DAY02

Tokenisation & embeddings

BPE, WordPiece, SentencePiece on Arabic — token explosion and what to do about it
Arabic-aware tokenisers: AraBERT, MARBERT, AraGPT, Jais — strengths and gotchas
Cross-lingual embeddings: when bilingual retrieval works, when it fails
Code-switching (Arabizi, Franco-Arabic) and Latin-script Arabic

LabTrain a SentencePiece tokeniser on a provided 2 GB Arabic corpus. Compare token efficiency against three off-the-shelf tokenisers on news, dialect, and code-switched samples.

DAY03

Classification, NER & retrieval

Fine-tuning Arabic encoders for classification — sentiment, topic, intent
NER for Arabic: named entities, place names, mixed-script handling
Semantic search with Arabic embedding models, hybrid + reranking
Cross-lingual retrieval: Arabic query → English documents and vice versa

LabBuild an NER pipeline for Arabic news. Annotate 200 sentences as a golden set; fine-tune a model; report per-entity F1 and analyse failures by dialect.

DAY04

Generation, translation & RTL UX

Arabic generative models — Jais, AceGPT, and prompting in Arabic
Translation pipelines: NLLB, custom MT, post-editing patterns
Bidirectional UI: number bidi, embedded English, mixed punctuation
Right-to-left rendering across web, PDF, and chat surfaces

LabBuild an Arabic chatbot over a domain corpus, tested against an MSA-only evaluation set and a Gulf-dialect evaluation set. Quantify the dialect gap.

DAY05

Evaluation, deployment, and capstone

Arabic-specific evaluation: BLEU/chrF caveats, COMET-style judges, human review protocols
Bias and dialect coverage — the failures regulators will eventually ask about
Deployment patterns specific to Arabic-heavy products
Data licensing and Arabic corpus realities — what you can and cannot use

CapstoneTake a real Arabic NLP problem from your team, present an end-to-end design including normaliser, tokeniser, model, evaluation set, and a one-page risk register.

Lead instructor

Shady Ali

Co-Director · Sadiqoon Technologies

Has built Arabic NLP systems in production for archive, banking, and healthcare clients. Native Arabic speaker.

What's included

Inclusions.

35 hours of live instructor-led tuition
Lab GPU environment + Arabic corpora pack
Bilingual workbook (English + Arabic) + code repository
Daily lunch & refreshments; cohort dinner Day 4
UK visa support letter on confirmed registration
30 days of post-course Q&A access
Sadiqoon Institute Certificate of Completion
Curated Arabic dataset list (open-licence) for ongoing work

Arabic NLP.
Practical.

Engineers building Arabic-facing systems.

Build it the right way.

Arabic linguistic foundations for engineers

Tokenisation & embeddings

Classification, NER & retrieval

Generation, translation & RTL UX

Evaluation, deployment, and capstone

Shady Ali

Shady Ali

Inclusions.

Reserve your seat.