Duration
5 days · 35 contact hours
Dates
9–13 Nov 2026
Location
London, United Kingdom
Fee
£3,950 + VAT

Engineers building Arabic-facing systems.

ML engineers and product engineers shipping Arabic search, chatbots, document intelligence, or translation.

Build it the right way.

DAY01

Arabic linguistic foundations for engineers

  • Script properties: ligatures, shaping, diacritics, kashida — and what your text editor lies about
  • Morphology: roots, patterns, clitics — why English-style stemming destroys Arabic
  • MSA vs Gulf, Levantine, Egyptian, Maghrebi — and the dialect-detection problem
  • Unicode pitfalls: Alef variants, Yaa/Alif Maqsura, Hamza positions, BiDi
LabBuild a robust Arabic text normaliser: NFC, dediacritisation, Alef/Yaa unification, tatweel removal, with property-based tests against edge cases.
DAY02

Tokenisation & embeddings

  • BPE, WordPiece, SentencePiece on Arabic — token explosion and what to do about it
  • Arabic-aware tokenisers: AraBERT, MARBERT, AraGPT, Jais — strengths and gotchas
  • Cross-lingual embeddings: when bilingual retrieval works, when it fails
  • Code-switching (Arabizi, Franco-Arabic) and Latin-script Arabic
LabTrain a SentencePiece tokeniser on a provided 2 GB Arabic corpus. Compare token efficiency against three off-the-shelf tokenisers on news, dialect, and code-switched samples.
DAY03

Classification, NER & retrieval

  • Fine-tuning Arabic encoders for classification — sentiment, topic, intent
  • NER for Arabic: named entities, place names, mixed-script handling
  • Semantic search with Arabic embedding models, hybrid + reranking
  • Cross-lingual retrieval: Arabic query → English documents and vice versa
LabBuild an NER pipeline for Arabic news. Annotate 200 sentences as a golden set; fine-tune a model; report per-entity F1 and analyse failures by dialect.
DAY04

Generation, translation & RTL UX

  • Arabic generative models — Jais, AceGPT, and prompting in Arabic
  • Translation pipelines: NLLB, custom MT, post-editing patterns
  • Bidirectional UI: number bidi, embedded English, mixed punctuation
  • Right-to-left rendering across web, PDF, and chat surfaces
LabBuild an Arabic chatbot over a domain corpus, tested against an MSA-only evaluation set and a Gulf-dialect evaluation set. Quantify the dialect gap.
DAY05

Evaluation, deployment, and capstone

  • Arabic-specific evaluation: BLEU/chrF caveats, COMET-style judges, human review protocols
  • Bias and dialect coverage — the failures regulators will eventually ask about
  • Deployment patterns specific to Arabic-heavy products
  • Data licensing and Arabic corpus realities — what you can and cannot use
CapstoneTake a real Arabic NLP problem from your team, present an end-to-end design including normaliser, tokeniser, model, evaluation set, and a one-page risk register.

Shady Ali

Shady Ali

Co-Director · Sadiqoon Technologies

Has built Arabic NLP systems in production for archive, banking, and healthcare clients. Native Arabic speaker.

Inclusions.

  • 35 hours of live instructor-led tuition
  • Lab GPU environment + Arabic corpora pack
  • Bilingual workbook (English + Arabic) + code repository
  • Daily lunch & refreshments; cohort dinner Day 4
  • UK visa support letter on confirmed registration
  • 30 days of post-course Q&A access
  • Sadiqoon Institute Certificate of Completion
  • Curated Arabic dataset list (open-licence) for ongoing work

Reserve your seat.

Opens your email client → [email protected]