Analytics & Insights
Overview
Analytics & Insights covers the data and visualization tools that instructors use to understand learner behavior — engagement rates, video watch times, problem attempts, completion patterns, and course-level aggregations.
Open edX has undergone a major analytics evolution: from a proprietary in-house analytics stack (edX Insights, backed by a Hadoop/Hive data pipeline), to the community-developed Aspects platform (ClickHouse + Apache Superset). Most active deployments are migrating to Aspects.
Current State (2026)
- Aspects: The current-generation analytics platform — event data flows from the LMS to ClickHouse via
event-routing-backends; Apache Superset dashboards give instructors rich, real-time views - Legacy Insights: The old edX Insights product (Python + Hadoop + Hive + Django) is effectively deprecated for the community; still may run in some older deployments
- LMS instructor tab: The legacy instructor analytics tab in the LMS provides basic aggregate stats (enrollment count, grade distribution) — still available but not being enhanced
- Event tracking:
event-trackingcaptures browser and server events;event-routing-backendsroutes them to ClickHouse and other backends
Architecture
- Event pipeline: Browser/server →
event-tracking→openedx-platformCelery →event-routing-backends→ ClickHouse (Aspects) - Aspects stack: Tutor plugin (
openedx-aspects) installs ClickHouse + Superset;aspects-dbttransforms raw events into analytics models - Superset dashboards: Pre-built dashboards for enrollment, engagement, completion, assessment analytics; instructors access via embedded Superset
- Real-time vs. batch: ClickHouse provides near-real-time analytics (seconds to minutes) vs. old Hadoop pipeline (hours to days)
Relevant Repositories
| Repository | Role in This Feature | Activity Level | Notes |
|---|---|---|---|
| openedx/openedx-aspects | Aspects analytics Tutor plugin (ClickHouse + Superset) | High | Current analytics platform |
| openedx/aspects-dbt | dbt models for Aspects data warehouse | High | Data transformations |
| openedx/event-tracking | Event emission framework | Medium | Browser + server events |
| openedx/event-routing-backends | Routes events to ClickHouse and other backends | Medium | Event pipeline |
| openedx/openedx-platform | LMS instructor tab, legacy analytics APIs | High | Legacy analytics surface |
Recent Changes
- Aspects (ClickHouse + Superset) becoming the community standard for analytics
- Legacy Insights effectively deprecated
History
Origin
- Year introduced: ~2013 (basic analytics from early edX)
- Initial implementation: Django-rendered analytics pages in the LMS instructor tab; basic enrollment and grade stats
- Context: Instructors needed visibility into how learners were engaging with their courses; data was also used by edX researchers for learning science
Key Milestones
| Year | Milestone | Teams / People Involved |
|---|---|---|
| ~2013 | Basic instructor analytics tab in LMS | Unknown |
| ~2014–2015 | edX Insights launched (Hadoop/Hive pipeline) | Unknown |
| ~2020 | edX Insights begin to stagnate post-2U acquisition | Unknown |
| ~2022–2023 | Aspects project initiated by community | Unknown |
| ~2024 | Aspects becomes the recommended analytics approach | Unknown |
People Who Shaped This Area
- Engineering: Unknown — open question for interview
- Product: Unknown — open question for interview
- Design: Unknown — open question for interview
Open Questions
- [ ] Who built edX Insights and what was the original data pipeline architecture?
- [ ] What drove the decision to build a Hadoop/Hive pipeline rather than using simpler approaches?
- [ ] Who initiated the Aspects project and what was the community process?
- [ ] How does the event schema in
event-trackingcompare to industry standards (xAPI, Caliper)? - [ ] What analytics questions do instructors most commonly ask that the platform struggles to answer?
- [ ] How was learning analytics research (Learning Sciences) connected to the platform's analytics infrastructure?