Skip to content

Analytics & Insights

Overview

Analytics & Insights covers the data and visualization tools that instructors use to understand learner behavior — engagement rates, video watch times, problem attempts, completion patterns, and course-level aggregations.

Open edX has undergone a major analytics evolution: from a proprietary in-house analytics stack (edX Insights, backed by a Hadoop/Hive data pipeline), to the community-developed Aspects platform (ClickHouse + Apache Superset). Most active deployments are migrating to Aspects.

Current State (2026)

  • Aspects: The current-generation analytics platform — event data flows from the LMS to ClickHouse via event-routing-backends; Apache Superset dashboards give instructors rich, real-time views
  • Legacy Insights: The old edX Insights product (Python + Hadoop + Hive + Django) is effectively deprecated for the community; still may run in some older deployments
  • LMS instructor tab: The legacy instructor analytics tab in the LMS provides basic aggregate stats (enrollment count, grade distribution) — still available but not being enhanced
  • Event tracking: event-tracking captures browser and server events; event-routing-backends routes them to ClickHouse and other backends

Architecture

  • Event pipeline: Browser/server → event-trackingopenedx-platform Celery → event-routing-backends → ClickHouse (Aspects)
  • Aspects stack: Tutor plugin (openedx-aspects) installs ClickHouse + Superset; aspects-dbt transforms raw events into analytics models
  • Superset dashboards: Pre-built dashboards for enrollment, engagement, completion, assessment analytics; instructors access via embedded Superset
  • Real-time vs. batch: ClickHouse provides near-real-time analytics (seconds to minutes) vs. old Hadoop pipeline (hours to days)

Relevant Repositories

RepositoryRole in This FeatureActivity LevelNotes
openedx/openedx-aspectsAspects analytics Tutor plugin (ClickHouse + Superset)HighCurrent analytics platform
openedx/aspects-dbtdbt models for Aspects data warehouseHighData transformations
openedx/event-trackingEvent emission frameworkMediumBrowser + server events
openedx/event-routing-backendsRoutes events to ClickHouse and other backendsMediumEvent pipeline
openedx/openedx-platformLMS instructor tab, legacy analytics APIsHighLegacy analytics surface

Recent Changes

  • Aspects (ClickHouse + Superset) becoming the community standard for analytics
  • Legacy Insights effectively deprecated

History

Origin

  • Year introduced: ~2013 (basic analytics from early edX)
  • Initial implementation: Django-rendered analytics pages in the LMS instructor tab; basic enrollment and grade stats
  • Context: Instructors needed visibility into how learners were engaging with their courses; data was also used by edX researchers for learning science

Key Milestones

YearMilestoneTeams / People Involved
~2013Basic instructor analytics tab in LMSUnknown
~2014–2015edX Insights launched (Hadoop/Hive pipeline)Unknown
~2020edX Insights begin to stagnate post-2U acquisitionUnknown
~2022–2023Aspects project initiated by communityUnknown
~2024Aspects becomes the recommended analytics approachUnknown

People Who Shaped This Area

  • Engineering: Unknown — open question for interview
  • Product: Unknown — open question for interview
  • Design: Unknown — open question for interview

Open Questions

  • [ ] Who built edX Insights and what was the original data pipeline architecture?
  • [ ] What drove the decision to build a Hadoop/Hive pipeline rather than using simpler approaches?
  • [ ] Who initiated the Aspects project and what was the community process?
  • [ ] How does the event schema in event-tracking compare to industry standards (xAPI, Caliper)?
  • [ ] What analytics questions do instructors most commonly ask that the platform struggles to answer?
  • [ ] How was learning analytics research (Learning Sciences) connected to the platform's analytics infrastructure?

Schema Education — Internal Research