Overviewcss
This is guide to using the AjaxSnapshots service to provide HTML snapshots of your website to search engines like Google and Bing.html
We have included detailed instructions for configuring popular web servers including examples.node
AjaxSnapshots implements the Crawlable AJAX Specification created by Google and supported by Bing and Yandex. You don't have to know much about this to use our service. All you have to do is tell the search engines that you are making HTML snapshots available and then pass their snapshot requests on to us.nginx
Here's a checklist you can work through to get up and running.git
Sign up for AjaxSnapshots and get an API Keygithub
Tell search engines that snapshots are available for your pages.web
Configure your web server. Instructions for Apache, Nginx, IIS, ExpressJS (on node) and generic servers are provided below.ajax
That's all.express
Search engines will only ask for a page snapshot if you tell them one is available. There are two alternative ways of doing this. You must implement one of these:npm
<meta>
tag to pages you want snapshots for.This is by far the easiest way to get started. Unless the dynamic part of your pages is identified by the fragment part of your page's urls (the bit after the #
) this is what you should do.
All you have to do is add the following tag to the <head>
block of each page you want snapshots for.
<meta name="fragment" content="!">
This method is the one to use if you use PushState based navigation or any other approach where the generated content is independent of the fragment part of the URL (the bit after the #
)
#!
urls.You should only use this method if you are using URLs where the page content is identified by the fragment part of the URL (the bit after the #
). The reason this needs special handling is that browsers and other HTTP clients do not send this part of the URL to the server, so as far as the server is concerned requests to index http://mysite.com#page1
and http://mysite.com#page2
are identical to requests to index http://mysite.com
.
This method requires that you modify your urls so that instead of having the form
http://www.example.com/ajax.html?someQuery#page-identifer
the #
is replaced by a #!
so that it has the form
http://www.example.com/ajax.html?someQuery#!page-identifer
This acts as a signal to Google, telling it that the part after the hash matters and that you are implementing the Crawlable AJAX Specification.
If you have configured your server to use AjaxSnapshots that's all you need to do. However, a brief explanation follows for those who want a deeper understanding of what's going on.
When Google encounters a hashbang URL like the one above it transforms it into one like this:
http://www.example.com/ajax.html?someQuery&_escaped_fragment_=page-identifer
This URL doesn't have a hash in any more so the page identifier will now reach your server. At this point it's up to your server to interpret this and serve a snapshot of the HTML that would be on the original hashbang page after the Javascript that creates it has run. That's what AjaxSnapshots does for you.
URLs containing _escaped_fragment_
are for search engine use only. If you are using AjaxSnapshots you don't have to deal with them at all. If you have a sitemap you should list all hashbang urls on your site there (not hash-only urls and not _escaped_fragment_
urls)
In order to use AjaxSnapshots you have to forward all requests for snapshots on to us, and then return the snapshot we generate to the search engine. This is easy to do for popular web servers. We provide examples for these below.
If you need assistance getting started just send us an email and we'll provide advice and if necessary hands on assistance to help you get set up.
The goal here is to forward all snapshot requests on to the AjaxSnapshots service. We achieve this as follows using a proxy to the service and a conditional rewrite that determines when it should be used.
This approach uses Apache's mod_proxy and mod_rewrite modules. These are included with almost all distributions of Apache and usually enabled by default.
The two examples below are based on Apache 2.4. If you are using an earlier version the REQUEST_SCHEME
variable will not be available. In this case you should just fill in the protocol (http or https) which is preferred for your site.
Some hosting providers only provide access to Apache's .htaccess
file, not the full server configuration. We provide configuration examples for both situations.
Use this example if you have full access to Apache's configuration files. If you only have access to the .htaccess
file then then you'll need to use the .htaccess configuration example instead (don't use both)
#this configuration goes in http.conf or another .conf file you are using via an Include #create a proxy to the AjaxSnapshots <IfModule mod_proxy_http.c> ProxyPass /makeSnapshot http://api.ajaxsnapshots.com/makeSnapshot ProxyPassReverse /makeSnapshot http://api.ajaxsnapshots.com/makeSnapshot </IfModule> <Directory "/var/www/html"> <IfModule mod_rewrite.c> #we will rewrite snapshot requests so that they go to the makeSnapshot proxy RewriteEngine On RewriteCond %{HTTP:X-AJS-CALLTYPE} ^$ #excluded suffixes (expected to be static) RewriteCond %{REQUEST_URI} !\.(jpe?g|png|gif|ico|tiff?|css|less|js|doc|zip|rar|exe|iso|dmg|dat)$ [NC] RewriteCond %{REQUEST_URI} !\.(ppt|psd|pdf|xls|mp3|mp4|m4a|wav|avi|mpe?g|swf|flv|mkv|torrent)$ [NC] #serve snapshots to crawlers that don't follow the crawlable AJAX #spec or only follow it partially RewriteCond %{HTTP_USER_AGENT} FacebookExternalHit|LinkedInBot|TwitterBot|Baiduspider [NC,OR] #only rewrite snapshot requests RewriteCond %{QUERY_STRING} ^(.*_escaped_fragment_=.*) RequestHeader set X-AJS-APIKEY "<< Your API Key >>" #RequestHeader set X-AJS-SNAP-TIME "5000" #RequestHeader set X-AJS-REMOVE-HIDDEN "true" #RequestHeader set X-AJS-REMOVE-SELECTOR "" #RequestHeader set X-AJS-DEVICE-WIDTH "1280" #RequestHeader set X-AJS-DEVICE-HEIGHT "800" #*** READ THIS *** #If you are using Apache 2.2 or earlier the %{REQUEST_SCHEME} variable (below) wont exist so you should #replace it with whichever of http and https is appropriate for your site <IfModule mod_proxy_http.c> RewriteRule ^.*$ /makeSnapshot?url=%{REQUEST_SCHEME}://%{HTTP_HOST}%{REQUEST_URI}?%1 [B,P,L] </IfModule> </IfModule> #your normal processing for this Directory goes here </Directory>
The commented out headers are extra options you can use to take more control of the snapshotting process. The meaning of each is explained in our API Documentation and some of them are discussed elsewhere on this page.
Don't forget that you also need to tell search engines that you are now serving snapshots.
Use this example if your hosting provider only gives you access to a .htaccess
file. If you have access to your full Apache configuration then the http.conf configuration example is a better choice (don't use both)
#this configuration goes in .htaccess <IfModule mod_rewrite.c> RewriteEngine On #prevent loops RewriteCond %{HTTP:X-AJS-CALLTYPE} ^$ #excluded suffixes (expected to be static) RewriteCond %{REQUEST_URI} !\.(jpe?g|png|gif|ico|tiff?|css|less|js|doc|zip|rar|exe|iso|dmg|dat)$ [NC] RewriteCond %{REQUEST_URI} !\.(ppt|psd|pdf|xls|mp3|mp4|m4a|wav|avi|mpe?g|swf|flv|mkv|torrent)$ [NC] #serve snapshots to crawlers that don't follow the crawlable AJAX #spec or only follow it partially RewriteCond %{HTTP_USER_AGENT} FacebookExternalHit|LinkedInBot|TwitterBot|Baiduspider [NC,OR] #only rewrite snapshot requests RewriteCond %{QUERY_STRING} ^(.*_escaped_fragment_=.*) RequestHeader set X-AJS-APIKEY "<< Your API Key >>" #RequestHeader set X-AJS-SNAP-TIME "5000" #RequestHeader set X-AJS-REMOVE-HIDDEN "true" #RequestHeader set X-AJS-REMOVE-SELECTOR "" #RequestHeader set X-AJS-DEVICE-WIDTH "1280" #RequestHeader set X-AJS-DEVICE-HEIGHT "800" <IfModule mod_proxy_http.c> #*** READ THIS *** #If you are using Apache 2.2 or earlier the %{REQUEST_SCHEME} variable (below) wont exist so you should #replace it with whichever of http and https is appropriate for your site RewriteRule ^.*$ http://api.ajaxsnapshots.com/makeSnapshot?url=%{REQUEST_SCHEME}://%{HTTP_HOST}%{REQUEST_URI}?%1 [B,P,L] </IfModule> </IfModule>
The commented out headers are extra options you can use to take more control of the snapshotting process. The meaning of each is explained in our API Documentation and some of them are discussed elsewhere on this page.
Don't forget that you also need to tell search engines that you are now serving snapshots.
The goal here is to forward all snapshot requests on to the AjaxSnapshots service.
For this configuration to work your Nginx server needs to be able to resolve the DNS name of our service. So, if you don't have a DNS resolver configured already you'll need to add one. This is easy to do - just add the following line to the http { ... }
section of your Nginx configuration
resolver 8.8.8.8;
The line above uses Google's DNS server at ip address 8.8.8.8 You can change this to something else if you prefer.
Next we create a proxy to our service and use a conditional rewrite to send all snapshot requests to this proxy.
#this configuration goes in nginx.conf or another .conf file you are using it via an include. location / { set $snapshot 0; #rewrite snapshot requests to the ajaxsnapshots location if ($args ~* "(^|.*&)_escaped_fragment_=.*") { set $snapshot 1; } #also serve snapshots to social sharing bots if ($http_user_agent ~* "FacebookExternalHit|LinkedInBot|TwitterBot|Baiduspider") { set $snapshot 1; } #excluded suffixes (assumed static). Broken into two conditionals for readability if ($uri ~ "\.(jpe?g|png|gif|ico|tiff?|css|less|js|doc|zip|rar|exe|iso|dmg)") { set $snapshot 0; } if ($uri ~ "\.(dat|ppt|psd|pdf|xls|mp3|mp4|m4a|wav|avi|mpe?g|swf|flv|mkv|torrent)") { set $snapshot 0; } #prevent loops if ( $http_x_ajs_calltype ) { set $snapshot 0; } if ( $snapshot = 1) { rewrite ^ /ajaxsnapshots/ last; } #your normal request processing goes here } #this location creates and configures a proxy to the ajaxsnapshots service location ~* /ajaxsnapshots/ { proxy_set_header X-AJS-APIKEY "<< Your API Key >>"; #proxy_set_header X-AJS-SNAP-TIME "5000"; #proxy_set_header X-AJS-REMOVE-HIDDEN "true"; #proxy_set_header X-AJS-REMOVE-SELECTOR ""; #proxy_set_header X-AJS-DEVICE-WIDTH "1280"; #proxy_set_header X-AJS-DEVICE-HEIGHT "800"; proxy_pass https://api.ajaxsnapshots.com/makeSnapshot?url=$scheme://$host:$server_port$request_uri; }
The commented out headers are extra options you can use to take more control of the snapshotting process. The meaning of each is explained in our API Documentation and some of them are discussed elsewhere on this page.
Don't forget that you also need to tell search engines that you are now serving snapshots.
The goal here is to forward all snapshot requests on to the AjaxSnapshots service. We achieve this as follows using a proxy to the service and a conditional rewrite that determines when it should be used.
This approach requires IIS's Application Request Routing (ARR) Extension. If you haven't installed it already you can download ARR 3.0 here.
The example below is based on IIS 7.5, but should be easy to adapt to IIS 7, 8 or 8.5. We're going to edit IIS's configuration files directly, but the same results can be achieved using the IIS Manager GUI.
We're going to be using some HTTP headers to configure the AjaxSnapshots service. IIS requires all HTTP headers to be be declared in advance as "Allowed Server Variables". The easiest way to do this is to add an <allowedServerVariables/>
section to your web server's <location/>
in the applicationHost.config
file. The full path to this file is %WINDIR%\System32\inetsrv\config\applicationHost.config. Here's an example. You can copy the <rewrite/>
section directly into your configuration file.
<location path="Default Web Site"> <system.webServer> <rewrite> <allowedServerVariables> <add name="HTTP_X_AJS_URL" /> <add name="HTTP_X_AJS_APIKEY" /> <add name="HTTP_X_AJS_SNAP_TIME" /> <add name="HTTP_X_AJS_REMOVE_HIDDEN" /> <add name="HTTP_X_AJS_REMOVE_SELECTOR" /> <add name="HTTP_X_AJS_DEVICE_WIDTH" /> <add name="HTTP_X_AJS_DEVICE_HEIGHT" /> </allowedServerVariables> </rewrite> </system.webServer> </location>
Next we need to make sure that ARR has proxying enabled. To do this we need to add the following to the <system.webServer/>
section of the same applicationHost.config file.
<proxy enabled="true" />
Note that this <proxy/>
element may already be present. If it is just leave it alone or modify its attribute value to true if it's currently false.
Next we need to configure the AjaxSnapshots proxy and forward snapshot requests to it. This can be done by adding an appropriate rewrite rule to your web site's top level web.config file, e.g. the one in \inetpub\wwwroot. Here's an example. You can copy the <rewrite/>
section directly into your configuration file, but don't forget to insert your API key where we've indicated.
<?xml version="1.0" encoding="UTF-8"?> <configuration> <system.webServer> <rewrite> <rules> <rule name="AjaxSnapshotsProxy" stopProcessing="true"> <!-- test all requests --> <match url="(.*)" /> <conditions trackAllCaptures="true"> <!-- only proxy requests with an _escaped_fragment_ query parameter --> <add input="{QUERY_STRING}" pattern="(.*_escaped_fragment_=.*)" /> <!-- used to capture the scheme/protocol for use in the rewrite below --> <add input="{CACHE_URL}" pattern="^(https?://)" /> </conditions> <serverVariables> <set name="HTTP_X_AJS_APIKEY" value="<YOUR-API_KEY>" replace="false" /> <!-- You can set other AjaxSnapshots HTTP headers here They all need to be prefixed with HTTP_ as above --> </serverVariables> <!-- send the request to the AjaxSnapshots service --> <action type="Rewrite" url="http://api.ajaxsnapshots.com/makeSnapshot?url={C:2}{HTTP_HOST}:{SERVER_PORT}{UNENCODED_URL}" logRewrittenUrl="true" appendQueryString="false" /> </rule> </rules> </rewrite> </system.webServer> </configuration>
That's all. If you're more comfortable using IIS manager than editing configuration files you can do that instead. Just read off the necessary values from the config example's above.
Don't forget that you also need to tell search engines that you are now serving snapshots.
The goal here is to forward all snapshot requests on to the AjaxSnapshots service. We achieve this by installing some custom ExpressJS middleware. This intercepts snapshot requests and sends them to AjaxSnapshots' snapshotting servers.
The AjaxSnapshots middleware for ExpressJS is distributed as an npm module, called ajs-express
, so to install it all you have to do is run the following from the root of your ExpressJS project:
npm install ajs-express --save
Then in your ExpressJS code import the module, configure it with your API Key (from your account page) and use the middleware:
var ajs = require('ajs-express'); //set api key ajs.set('apikey','put-your-apikey-here'); //use the middleware (add this to your app early to make sure //everything that should be snapshotted is) app.use(ajs);
That's all!
All configuration options are set using the set
method. This can take a key-value pair or a configuration object as follows:
var ajs = require('ajs-express'); //key-value based configuration ajs.set('foo','bar'); ajs.set('baz','elf'); //equivalent fluid/chained configuration ajs.set('foo','bar').set('baz','elf'); //equivalent config-object based configuration ajs.set({ foo:'bar', baz:'elf' });
The available configuration options are:
apikey (mandatory) Your API Key (it's on your account page)
snap-time (default: 5000) This lets you specify how long in milliseconds we should wait after the page's onload event fires before we take the snapshot. Note that the snapshot will be taken earlier than this if either our on-page Javascript API is used to specify an exact time for the snapshot or 40 seconds has elapsed since we started loading your page.
remove-hidden (default: true) If true then all hidden elements in the page body except for scripts and stylesheets will be removed before returning the snapshot. The term hidden is defined as per the :hidden
JQuery 2.0 selector, except that we do not remove head
, meta
, link
, style
or title
elements.
remove-selector (default: undefined) If set this is should be a valid JQuery 2.0 selector. All matching elements on your page will be removed before returning the snapshot.
device-width (default: 1280) Sets the width in pixels of the headless browser used to render your page. Setting this can be important when you are using responsive pages that show different content at different page sizes.
device-height (default: 800) Sets the height in pixels of the headless browser used to render your page. Setting this can be important when you are using responsive pages that show different content at different page sizes.
For more information see the project's Github Page
You can make a snapshot using the makeSnapshot
function from our HTTP API.
All of our other examples work by checking for the _escaped_fragment_
HTTP parameter name and forwarding matching requests on to makeSnapshot
. All you have to do is make sure that your server does the same.
Our goal is to make AjaxSnapshots easy to use for everyone, so if you need help getting set up just send us an email and we'll be happy to help. On the other hand if you've worked out how to configure a server that's not in our list or can see a way to improve on our samples tell us about it and we'll make this list better.
There are 3 factors that influence the timing of the snapshot.
The snap time. This is the time in milliseconds after the page's onload
event has fired. The default is 5000ms, but you can override this using our HTTP API.
The maximum request time. This is the maximum time we allow after the page starts to load before we take a snapshot. This is currently 40 seconds. If your pages take longer than this to load then this is a problem you should deal with before configuring AjaxSnapshots.
The takeSnapshot function from our Javascript API
If the takeSnapshot API function is called before maximum request time or snap time is reached then it triggers the snapshot. Otherwise the snapshot is taken whenever the earliest of the maximum request time and snap time is reached.
When we return a snapshot we add an X-AJS-TRIGGER HTTP header to the response. The value of this header tells you which of the factors above triggered the snapshot. See our HTTP API for details.
As well as getting the timing of the snapshot right there are other aspects of the snapshot that it can be useful to control.
There are several legitimate reasons for removing content from the DOM before serving it to search engines:
For example, if content is present in the DOM but not shown to users you probably don't want search engines to index it either. Similarly, if some of your content isn't essential to the page, such as a popup that you only show to first time vistors, then you probably don't want it to be indexed by search engines either.
Warning You should only remove or change content in the snapshot to improve its representation of your pages, never with the intention of gaming search engines by misrepresenting your content. If search engines interepret your actions as cheating then your site's appearance in search results could be penalized.
The default behavior of AjaxSnapshots is to remove any hidden DOM elements where hidden means that they match JQuery's :hidden
selector except that we do not remove head
, meta
, style
, link
or title
elements. You can override this and customize it using the remove-hidden
and remove-selector
parameters to makeSnapshot
in our HTTP API
If you need finer grained control over your snapshots you can also manipulate the DOM using the beforeSnapshot
callback function from our Javascript API.
If you are using responsive web pages then the content you display may vary depending on the page size. Some important content might be missing at smaller screen sizes, and you might add some less essential content when the screen is very large. In cases like this there is probably some optimal page size for your site.
You can control the page size of our headless browser using the device-width
and device-height
parameters to makeSnapshot
in our HTTP API.