Why you may need to archive your platform
For compliance or legal reasons, you may need to archive your Go Vocal platform and store a copy of its content. This guide explains how to configure your archiving tool to work correctly with Go Vocal.
Why archiving requires extra configuration
Go Vocal's frontend is a single-page application (SPA) built with JavaScript. Most web archiving tools use a scraper that visits every page and saves the HTML. However, many of these scrapers don't execute JavaScript — or don't execute it fully enough — which results in blank pages being saved.
If this is the case with your scraping software, to solve this, Go Vocal provides a prerendering service that can serve fully rendered, static HTML to tools that identify themselves as crawlers. When the platform detects that a request comes from an archiving tool, it:
Serves a pure HTML version of the page instead of the JavaScript-based SPA.
Bypasses the cookie consent manager, so the actual page content is visible without user interaction.
How to configure your archiving tool
A. Obtaining prerendered pages
You can trigger the prerendered HTML version of the platform in one of two ways.
Option 1: Add the _escaped_fragment_ URL parameter
Configure your archiving tool to append _escaped_fragment_ as a query parameter to all your requests. For example:
<https://yourplatform.govocal.com/projects?_escaped_fragment_=>
When the platform detects this parameter, it will serve a fully rendered HTML version of the page.
Option 2: Use a recognized User-Agent string
Make sure your archiving tool sends a User-Agent header that is recognized by Go Vocal. The following archiving tools are currently supported:
Common Crawl (CCBot)
Heritrix (also used by Archive-It and many national libraries)
PageFreezer
British Library web archive
Bibliothèque nationale de France web archive
dip Webarchief
Capsis
User-Agent matching is case-insensitive.
💡 Tip: If your archiving software allows you to set a custom User-Agent string, setting it to include the name of one of the supported tools above is the easiest way to get it working.
What if my tool isn't recognized?
If your archiving tool uses a User-Agent string that is not in the list above and you cannot change it, use Option 1 instead — appending _escaped_fragment_ to your request URLs.
If neither option works for your setup, please contact support and we can look into adding your tool's User-Agent to our recognized list.
B. Covering all public facing content using /sitemap.xml
The platform contains some dynamic content, for example the “Load more” buttons in longer lists of projects or inputs. If your web archiving tool is configured to only act like a spider - this means it explores the platform merely by following internal links - the content behind these dynamic elements will not be discovered, as the buttons are not functional in the prerendered pages.
To solve this, make sure your archiving tool makes use of the /sitemap.xml file served from your platform. It lists all pages, to make the archive complete. Most scraping software will do this out of the box, but it’s worthwhile to check.
