NLP COURSE · #1 COHORT · 18.3/20

Algerian Legal GraphRAG Assistant

Bilingual (FR/AR) legal AI for Algerian law. GraphRAG with hybrid retrieval: vector + BM25 + knowledge graph traversal over the full legal corpus.

Source Repository

System Architecture & Overview

The Algerian Legal GraphRAG Assistant was developed as the primary course project for the NLP module at ENSIA, ultimately securing the #1 rank in the academic cohort with a grade of 18.3/20.

Algerian jurisprudence is complex, highly structured, and written bilingualism (French and Arabic) introduces significant cross-lingual retrieval gaps. Conventional vector-only RAG pipelines struggle to follow relational links between different legal codes, amendments, and executive decrees.

To solve this, we architected a hybrid retrieval pipeline that blends semantic vector embeddings (representing local meaning) with a structured knowledge graph that preserves logical cross-references, hierarchy, and legal dependencies between documents.

Key Deliverables & Capabilities

Bilingual Search: Dual-index semantic vector mapping for seamless Arabic and French queries.
Hybrid Retrieval: Combined FAISS vector similarities, lexical BM25 matching, and structured NetworkX knowledge graph traversals.
Logical Document Splicing: Custom recursive chunking designed around traditional article, chapter, and section boundaries rather than raw character counts.
Entity Relation Parsing: Automatic extraction of legal references ('Article X refers to Decree Y') to continuously map new laws into the graph.

Critical Challenge & Pivot

Structuring the bilingual Arabic-French knowledge graph was exceptionally hard due to spelling variations and complex relational syntax. We solved this by implementing a customized Arabic NLP preprocessing pipeline utilizing CAMeL Tools and regularized morphological parsers.

System Benchmarks & Outcomes

Ranked #1 in the ENSIA academic cohort with a score of 18.3/20. The system demonstrated a ~40% reduction in document retrieval latency and outperformed classic dense vector baseline models by >15% in response accuracy and answer relevance.

Engineering Stack

FastAPI

Chosen as the high-performance backend routing framework for its asynchronous native execution and rapid serialization.

NetworkX

Utilized to model, build, and run complex path-traversal algorithms across the legal relational graph database.

FAISS

Deployed for lightning-fast, high-dimensional vector search to execute dense semantic retrievals on Arabic and French document chunks.

LangChain

Orchestrated the modular agent logic, handling retrieval-augmented generation and abstract prompt chains seamlessly.

Llama 3.3 70B

Served as the primary generative foundation layer, providing strong multilingual legal reasoning capabilities under strict context structures.

Specifications

Deployment StageProduction Ready

Access LevelOpen Source / MIT

Testing Coverage> 90% Pass