Beyond the Export: Solving PDF Accessibility on a Chromebook
If you work in digital accessibility, you know the "PDF Problem." If you work on a Chromebook, that problem becomes a wall. Without access to Adobe Acrobat Pro or Microsoft Word’s native Windows export, generating a truly tagged, PDF/UA-compliant document is nearly impossible.
I recently decided to see if I could bridge this gap by building a web-based remediation tool. The goal: take a "clean" but untagged export from Google Docs and use AI to inject a proper tag tree.
Article continues after ad
The "tagged" illusion
Google Docs is a fantastic authoring tool, but its PDF export has a secret: it’s technically "untagged."
While the export preserves visual structure, it fails to set the critical
Marked: true
flag in the PDF metadata. When you open these files in Adobe Acrobat, they are
flagged as having no structure. More importantly, they lack a
StructTreeRoot
, which is the roadmap screen readers use to navigate headings, lists, and
tables.
Building a web-based solution
Using the Adobe PDF Services API (powered by Adobe Sensei) , I built a lightweight Node.js application to automate the remediation process.
The workflow is simple:
- Upload an untagged PDF.
- Process via the Auto-Tag API to identify headings, paragraphs, and lists.
- Download a modified PDF with a fully realized tag tree.
You can find the source code for this MVP over on my GitHub: svinkle/pdf-web-autotag .
The ChromeVox trap
During testing, I noticed something fascinating. When I ran ChromeVox on my "untagged" Google Doc export, it actually announced headings and links correctly. This creates a dangerous "false positive" for auditors.
Chrome (and other modern browsers) use visual heuristics to guess the structure of a PDF if tags are missing. It sees big, bold text and tells the screen reader, "I think this is a heading."
However, "guessing" doesn’t make something accessible. If that same file is opened in a dedicated screen reader like NVDA or JAWS—or used in a legal/regulatory context under the ADA or European Accessibility Act (EAA)—the lack of a hardcoded tag tree constitutes a failure. True accessibility shouldn't rely on a browser's ability to guess your intent.
Article continues after ad
What’s next?
Automated AI tagging gets us about 80% of the way there. But as I found in my tests, AI still struggles with nuanced structures—sometimes flattening lists or misinterpreting heading levels (like the "everything is an H2" phenomenon).
The next phase for this project is moving toward a hybrid remediation approach: using the API to generate the initial structure, but providing a web-based UI to manually "touch up" the tags.
Accessibility is often a manual craft, but with the right web-based tools, we can at least make that craft available to everyone, regardless of their operating system.