Project Context: ASU+GSV Summit 2026
What is this project?
This directory contains analysis of participants from the ASU+GSV Summit 2026, the world's most important EdTech and education event (often called the "Davos of Education"). The event is co-organized by GSV and Arizona State University, takes place in San Diego from April 12–15, 2026, and brings together ~7,000 leaders from education, EdTech, government, and enterprise.
The work done here aims to identify which summit participants use Open edX as their learning platform, in support of commercial strategy or partnership development.
Files in this folder
Source files (not generated)
Download_Participant_Preview.pdf
- Official ASU+GSV Summit 2026 participant preview document.
- 15 pages, all image-based (no selectable text).
- Contains: event cover, attendance statistics, and an alphabetical list of ~2,300 participating organizations.
- Reading it requires page rendering with Quartz (macOS) + OCR with the Vision framework.
Organizationsnotion.csv
- Export of a Notion database called "Organizations".
- Contains the organizations the team works with (clients, partners, prospects).
- Relevant columns: Name, Industry, Stage, Members, Website.
- Exported directly from Notion (the workspace is admin-managed and does not allow creating API integrations).
Sites+Powered+by+the+Open+edX+Platform.doc
- Export of the Confluence page:
openedx.atlassian.net/wiki/spaces/COMM/pages/162245773 - Despite the
.docextension, this is an MHTML file (HTML embedded in MIME), which is the standard Confluence export format. - Contains a table with 383 sites using Open edX, with site name and URL of their instance.
- This is the most reliable source for confirming whether an organization uses Open edX.
- To parse it, use Python's
email+remodules — do not usepython-docx, it will fail.
Generated files
participants_asu_gsv_2026.txt
- Clean list of ~2,297 organizations participating in the ASU+GSV Summit 2026.
- Generated by extracting and cleaning text from
Download_Participant_Preview.pdf. - Process: PDF → PNG images (Quartz) → OCR (macOS Vision framework) → artifact cleanup → deduplication → alphabetical sort.
- Format: plain text, one organization per line, with
#comment lines at the top. - Optimized for LLM-based searches: minimal token overhead, easy to include in context windows.
open_edx_users_asu_gsv_2026.txt
- List of 20 organizations that are both ASU+GSV Summit 2026 participants and Open edX users.
- Generated in two steps:
- Web research on known Open edX adopters (openedx.org, case studies, GitHub wiki, impact reports).
- Cross-reference against
participants_asu_gsv_2026.txtusing exact and substring matching. - Additional validation by cross-referencing with
Sites+Powered+by+the+Open+edX+Platform.doc.
- Organizations confirmed by the official Confluence list: Arizona State University, Australian National University, George Washington University, Harvard University, Western Governors University.
- The remaining organizations are publicly documented Open edX users but do not appear in the Confluence directory (which is voluntary/self-reported).
openedx_sites.md
- Markdown table with all 383 sites from
Sites+Powered+by+the+Open+edX+Platform.doc. - Columns: site name | URL of their Open edX instance.
- Generated by parsing the MHTML file with Python (
email+re). - Useful for quick lookups and cross-referencing against other lists.
List matching methodology
To compare organizations across files, normalization + substring matching is used:
def normalize(s):
s = s.lower()
s = re.sub(r'[^a-z0-9\s]', '', s)
return re.sub(r'\s+', ' ', s).strip()Match criteria:
- Exact: normalized names are identical.
- Approximate: one name is a substring of the other, with overlap ≥ 8 characters and ≥ 2 words (to avoid false positives from generic words like "Learning", "Technology", etc.).
Generic words excluded from matching: business, technology, development, university, college, education, institute, academy, school, international, national, online, group.
Key findings
- Out of ~2,297 organizations participating in the ASU+GSV Summit 2026, at least 20 use Open edX.
- Out of the organizations in the team's Notion database (46 orgs), 9 are also attending the ASU+GSV Summit: Axim Collaborative, 2U, Pearson, DeweyLearn, Unicon, Curriculum Associates, Harvard, QuestionWell, 1EdTech Foundation.
Technical notes
- The original PDF has no selectable text (image-based). OCR is required.
- OCR with the macOS Vision framework produces ~5% artifacts due to narrow table columns. Artifacts are filtered by discarding fragments that don't start with a capital letter or are fewer than 4 characters.
- The Confluence
.docexport is MHTML, not binary Word or DOCX. Usingpython-docxfails. Use Python'semail.message_from_bytes()instead — it works perfectly. - Homebrew on this machine has an architecture conflict (x86_64 vs ARM64) in base dependencies (
openssl,sqlite, etc.), which prevented installingpoppler. This was resolved by using native macOS frameworks viapyobjc.