SEO Enable Your Javascript Website

Overviewcss

This is guide to using the AjaxSnapshots service to provide HTML snapshots of your website to search engines like Google and Bing.html

We have included detailed instructions for configuring popular web servers including examples.node

Configuring your site

AjaxSnapshots implements the Crawlable AJAX Specification created by Google and supported by Bing and Yandex. You don't have to know much about this to use our service. All you have to do is tell the search engines that you are making HTML snapshots available and then pass their snapshot requests on to us.nginx

Here's a checklist you can work through to get up and running.git

  1. Sign up for AjaxSnapshots and get an API Keygithub

  2. Tell search engines that snapshots are available for your pages.web

  3. Configure your web server. Instructions for Apache, Nginx, IIS, ExpressJS (on node) and generic servers are provided below.ajax

  4. That's all.express

Telling search engines you provide snapshots

Search engines will only ask for a page snapshot if you tell them one is available. There are two alternative ways of doing this. You must implement one of these:npm

1) Add an HTML <meta> tag to pages you want snapshots for.

This is by far the easiest way to get started. Unless the dynamic part of your pages is identified by the fragment part of your page's urls (the bit after the #) this is what you should do.

All you have to do is add the following tag to the <head> block of each page you want snapshots for.

<meta name="fragment" content="!">

This method is the one to use if you use PushState based navigation or any other approach where the generated content is independent of the fragment part of the URL (the bit after the #)

2) Use hashbang #! urls.

You should only use this method if you are using URLs where the page content is identified by the fragment part of the URL (the bit after the #). The reason this needs special handling is that browsers and other HTTP clients do not send this part of the URL to the server, so as far as the server is concerned requests to index http://mysite.com#page1 and http://mysite.com#page2 are identical to requests to index http://mysite.com.

This method requires that you modify your urls so that instead of having the form

http://www.example.com/ajax.html?someQuery#page-identifer

the # is replaced by a #! so that it has the form

http://www.example.com/ajax.html?someQuery#!page-identifer

This acts as a signal to Google, telling it that the part after the hash matters and that you are implementing the Crawlable AJAX Specification.

If you have configured your server to use AjaxSnapshots that's all you need to do. However, a brief explanation follows for those who want a deeper understanding of what's going on.

When Google encounters a hashbang URL like the one above it transforms it into one like this:

http://www.example.com/ajax.html?someQuery&_escaped_fragment_=page-identifer

This URL doesn't have a hash in any more so the page identifier will now reach your server. At this point it's up to your server to interpret this and serve a snapshot of the HTML that would be on the original hashbang page after the Javascript that creates it has run. That's what AjaxSnapshots does for you.

URLs containing _escaped_fragment_ are for search engine use only. If you are using AjaxSnapshots you don't have to deal with them at all. If you have a sitemap you should list all hashbang urls on your site there (not hash-only urls and not _escaped_fragment_ urls)

Configuring Web Servers

Overview

In order to use AjaxSnapshots you have to forward all requests for snapshots on to us, and then return the snapshot we generate to the search engine. This is easy to do for popular web servers. We provide examples for these below.

If you need assistance getting started just send us an email and we'll provide advice and if necessary hands on assistance to help you get set up.

Configuring Apache

The goal here is to forward all snapshot requests on to the AjaxSnapshots service. We achieve this as follows using a proxy to the service and a conditional rewrite that determines when it should be used.

This approach uses Apache's mod_proxy and mod_rewrite modules. These are included with almost all distributions of Apache and usually enabled by default.

The two examples below are based on Apache 2.4. If you are using an earlier version the REQUEST_SCHEME variable will not be available. In this case you should just fill in the protocol (http or https) which is preferred for your site.

Some hosting providers only provide access to Apache's .htaccess file, not the full server configuration. We provide configuration examples for both situations.

http.conf configuration example

Use this example if you have full access to Apache's configuration files. If you only have access to the .htaccess file then then you'll need to use the .htaccess configuration example instead (don't use both)

#this configuration goes in http.conf or another .conf file you are using via an Include

#create a proxy to the AjaxSnapshots 
<IfModule mod_proxy_http.c>
  ProxyPass /makeSnapshot http://api.ajaxsnapshots.com/makeSnapshot
  ProxyPassReverse /makeSnapshot http://api.ajaxsnapshots.com/makeSnapshot
</IfModule>

<Directory "/var/www/html">

  <IfModule mod_rewrite.c>
    #we will rewrite snapshot requests so that they go to the makeSnapshot proxy
    RewriteEngine On
    
    RewriteCond %{HTTP:X-AJS-CALLTYPE} ^$  
    
    #excluded suffixes (expected to be static)
    RewriteCond %{REQUEST_URI} !\.(jpe?g|png|gif|ico|tiff?|css|less|js|doc|zip|rar|exe|iso|dmg|dat)$ [NC]
    RewriteCond %{REQUEST_URI} !\.(ppt|psd|pdf|xls|mp3|mp4|m4a|wav|avi|mpe?g|swf|flv|mkv|torrent)$ [NC]
    
    #serve snapshots to crawlers that don't follow the crawlable AJAX
    #spec or only follow it partially
    RewriteCond  %{HTTP_USER_AGENT} FacebookExternalHit|LinkedInBot|TwitterBot|Baiduspider [NC,OR]
    #only rewrite snapshot requests
    RewriteCond %{QUERY_STRING} ^(.*_escaped_fragment_=.*)
    
    RequestHeader set X-AJS-APIKEY "<< Your API Key >>"
    #RequestHeader set X-AJS-SNAP-TIME "5000"
    #RequestHeader set X-AJS-REMOVE-HIDDEN  "true"
    #RequestHeader set X-AJS-REMOVE-SELECTOR ""
    #RequestHeader set X-AJS-DEVICE-WIDTH  "1280"
    #RequestHeader set X-AJS-DEVICE-HEIGHT  "800"
    
    #*** READ THIS ***
    #If you are using Apache 2.2 or earlier the %{REQUEST_SCHEME} variable (below) wont exist so you should
    #replace it with whichever of http and https is appropriate for your site 
    <IfModule mod_proxy_http.c>
      RewriteRule ^.*$ /makeSnapshot?url=%{REQUEST_SCHEME}://%{HTTP_HOST}%{REQUEST_URI}?%1 [B,P,L]
    </IfModule>
  </IfModule>
  #your normal processing for this Directory goes here

</Directory>

The commented out headers are extra options you can use to take more control of the snapshotting process. The meaning of each is explained in our API Documentation and some of them are discussed elsewhere on this page.

Don't forget that you also need to tell search engines that you are now serving snapshots.

.htaccess configuration example

Use this example if your hosting provider only gives you access to a .htaccess file. If you have access to your full Apache configuration then the http.conf configuration example is a better choice (don't use both)

#this configuration goes in .htaccess

<IfModule mod_rewrite.c>
  RewriteEngine On

  #prevent loops
  RewriteCond %{HTTP:X-AJS-CALLTYPE} ^$  
  
  #excluded suffixes (expected to be static)
  RewriteCond %{REQUEST_URI} !\.(jpe?g|png|gif|ico|tiff?|css|less|js|doc|zip|rar|exe|iso|dmg|dat)$ [NC]
  RewriteCond %{REQUEST_URI} !\.(ppt|psd|pdf|xls|mp3|mp4|m4a|wav|avi|mpe?g|swf|flv|mkv|torrent)$ [NC]
  
  #serve snapshots to crawlers that don't follow the crawlable AJAX
  #spec or only follow it partially
  RewriteCond  %{HTTP_USER_AGENT} FacebookExternalHit|LinkedInBot|TwitterBot|Baiduspider [NC,OR]
  #only rewrite snapshot requests
  RewriteCond %{QUERY_STRING} ^(.*_escaped_fragment_=.*)

  RequestHeader set X-AJS-APIKEY "<< Your API Key >>"
  #RequestHeader set X-AJS-SNAP-TIME "5000"
  #RequestHeader set X-AJS-REMOVE-HIDDEN  "true"
  #RequestHeader set X-AJS-REMOVE-SELECTOR ""
  #RequestHeader set X-AJS-DEVICE-WIDTH  "1280"
  #RequestHeader set X-AJS-DEVICE-HEIGHT  "800"

  <IfModule mod_proxy_http.c>
    #*** READ THIS ***
    #If you are using Apache 2.2 or earlier the %{REQUEST_SCHEME} variable (below) wont exist so you should
    #replace it with whichever of http and https is appropriate for your site 
    RewriteRule ^.*$ http://api.ajaxsnapshots.com/makeSnapshot?url=%{REQUEST_SCHEME}://%{HTTP_HOST}%{REQUEST_URI}?%1 [B,P,L]
  </IfModule>
</IfModule>

The commented out headers are extra options you can use to take more control of the snapshotting process. The meaning of each is explained in our API Documentation and some of them are discussed elsewhere on this page.

Don't forget that you also need to tell search engines that you are now serving snapshots.

Configuring Nginx

The goal here is to forward all snapshot requests on to the AjaxSnapshots service.

For this configuration to work your Nginx server needs to be able to resolve the DNS name of our service. So, if you don't have a DNS resolver configured already you'll need to add one. This is easy to do - just add the following line to the http { ... } section of your Nginx configuration

resolver 8.8.8.8;

The line above uses Google's DNS server at ip address 8.8.8.8 You can change this to something else if you prefer.

Next we create a proxy to our service and use a conditional rewrite to send all snapshot requests to this proxy.

#this configuration goes in nginx.conf or another .conf file you are using it via an include.

location / {

  set $snapshot 0;

  #rewrite snapshot requests to the ajaxsnapshots location
  if ($args ~* "(^|.*&)_escaped_fragment_=.*") {
    set $snapshot 1;
  }
  
  #also serve snapshots to social sharing bots
  if ($http_user_agent ~* "FacebookExternalHit|LinkedInBot|TwitterBot|Baiduspider") {
    set $snapshot 1;
  }
  
  #excluded suffixes (assumed static). Broken into two conditionals for readability
  if ($uri ~ "\.(jpe?g|png|gif|ico|tiff?|css|less|js|doc|zip|rar|exe|iso|dmg)") {
    set $snapshot 0;
  }
  if ($uri ~ "\.(dat|ppt|psd|pdf|xls|mp3|mp4|m4a|wav|avi|mpe?g|swf|flv|mkv|torrent)") {
    set $snapshot 0;
  }
  
  #prevent loops
  if ( $http_x_ajs_calltype ) {
    set $snapshot 0;
  } 
  
  if ( $snapshot = 1) {
    rewrite ^ /ajaxsnapshots/ last;
  }
  
  #your normal request processing goes here
}


  #this location creates and configures a proxy to the ajaxsnapshots service
  location ~* /ajaxsnapshots/ {
  
  proxy_set_header X-AJS-APIKEY  "<< Your API Key >>";
  #proxy_set_header X-AJS-SNAP-TIME "5000";
  #proxy_set_header X-AJS-REMOVE-HIDDEN  "true";
  #proxy_set_header X-AJS-REMOVE-SELECTOR "";
  #proxy_set_header X-AJS-DEVICE-WIDTH  "1280";
  #proxy_set_header X-AJS-DEVICE-HEIGHT  "800";
  
  proxy_pass https://api.ajaxsnapshots.com/makeSnapshot?url=$scheme://$host:$server_port$request_uri;    
}

The commented out headers are extra options you can use to take more control of the snapshotting process. The meaning of each is explained in our API Documentation and some of them are discussed elsewhere on this page.

Don't forget that you also need to tell search engines that you are now serving snapshots.

Configuring IIS

The goal here is to forward all snapshot requests on to the AjaxSnapshots service. We achieve this as follows using a proxy to the service and a conditional rewrite that determines when it should be used.

This approach requires IIS's Application Request Routing (ARR) Extension. If you haven't installed it already you can download ARR 3.0 here.

The example below is based on IIS 7.5, but should be easy to adapt to IIS 7, 8 or 8.5. We're going to edit IIS's configuration files directly, but the same results can be achieved using the IIS Manager GUI.

We're going to be using some HTTP headers to configure the AjaxSnapshots service. IIS requires all HTTP headers to be be declared in advance as "Allowed Server Variables". The easiest way to do this is to add an <allowedServerVariables/> section to your web server's <location/> in the applicationHost.config file. The full path to this file is %WINDIR%\System32\inetsrv\config\applicationHost.config. Here's an example. You can copy the <rewrite/> section directly into your configuration file.

<location path="Default Web Site">
    <system.webServer>
        <rewrite>
            <allowedServerVariables>
              <add name="HTTP_X_AJS_URL" />
                <add name="HTTP_X_AJS_APIKEY" />
                <add name="HTTP_X_AJS_SNAP_TIME" />
                <add name="HTTP_X_AJS_REMOVE_HIDDEN" />
                <add name="HTTP_X_AJS_REMOVE_SELECTOR" />
                <add name="HTTP_X_AJS_DEVICE_WIDTH" />
                <add name="HTTP_X_AJS_DEVICE_HEIGHT" />
            </allowedServerVariables>
        </rewrite>
    </system.webServer>
</location>

Next we need to make sure that ARR has proxying enabled. To do this we need to add the following to the <system.webServer/> section of the same applicationHost.config file.

<proxy enabled="true" />

Note that this <proxy/> element may already be present. If it is just leave it alone or modify its attribute value to true if it's currently false.

Next we need to configure the AjaxSnapshots proxy and forward snapshot requests to it. This can be done by adding an appropriate rewrite rule to your web site's top level web.config file, e.g. the one in \inetpub\wwwroot. Here's an example. You can copy the <rewrite/> section directly into your configuration file, but don't forget to insert your API key where we've indicated.

<?xml version="1.0" encoding="UTF-8"?>
<configuration>
    <system.webServer>
        <rewrite>
            <rules>
                <rule name="AjaxSnapshotsProxy" stopProcessing="true">
                    <!-- test all requests -->
                    <match url="(.*)" />
                    <conditions trackAllCaptures="true">
                        <!-- only proxy requests with an _escaped_fragment_ query parameter -->
                        <add input="{QUERY_STRING}" pattern="(.*_escaped_fragment_=.*)" />
                        <!-- used to capture the scheme/protocol for use in the rewrite below --> 
                        <add input="{CACHE_URL}" pattern="^(https?://)" />
                    </conditions>
                    <serverVariables>
                        <set name="HTTP_X_AJS_APIKEY" value="<YOUR-API_KEY>" replace="false" />
                        <!-- You can set other AjaxSnapshots HTTP headers here
                           They all need to be prefixed with HTTP_ as above
                        -->
                    </serverVariables>
                    <!-- send the request to the AjaxSnapshots service -->
                    <action type="Rewrite" 
                    url="http://api.ajaxsnapshots.com/makeSnapshot?url={C:2}{HTTP_HOST}:{SERVER_PORT}{UNENCODED_URL}" 
                    logRewrittenUrl="true" appendQueryString="false" />
                </rule>
            </rules>
        </rewrite>
    </system.webServer>
</configuration>

That's all. If you're more comfortable using IIS manager than editing configuration files you can do that instead. Just read off the necessary values from the config example's above.

Don't forget that you also need to tell search engines that you are now serving snapshots.

Configuring ExpressJS (Node)

The goal here is to forward all snapshot requests on to the AjaxSnapshots service. We achieve this by installing some custom ExpressJS middleware. This intercepts snapshot requests and sends them to AjaxSnapshots' snapshotting servers.

The AjaxSnapshots middleware for ExpressJS is distributed as an npm module, called ajs-express, so to install it all you have to do is run the following from the root of your ExpressJS project:

npm install ajs-express --save

Then in your ExpressJS code import the module, configure it with your API Key (from your account page) and use the middleware:

var ajs = require('ajs-express');

//set api key
ajs.set('apikey','put-your-apikey-here');

//use the middleware (add this to your app early to make sure 
//everything that should be snapshotted is)
app.use(ajs);

That's all!

Configuration Options

All configuration options are set using the set method. This can take a key-value pair or a configuration object as follows:

var ajs = require('ajs-express');

//key-value based configuration
ajs.set('foo','bar');
ajs.set('baz','elf');

//equivalent fluid/chained configuration
ajs.set('foo','bar').set('baz','elf');

//equivalent config-object based configuration
ajs.set({
  foo:'bar',
  baz:'elf'
});

The available configuration options are:

  • apikey (mandatory) Your API Key (it's on your account page)

  • snap-time (default: 5000) This lets you specify how long in milliseconds we should wait after the page's onload event fires before we take the snapshot. Note that the snapshot will be taken earlier than this if either our on-page Javascript API is used to specify an exact time for the snapshot or 40 seconds has elapsed since we started loading your page.

  • remove-hidden (default: true) If true then all hidden elements in the page body except for scripts and stylesheets will be removed before returning the snapshot. The term hidden is defined as per the :hidden JQuery 2.0 selector, except that we do not remove head, meta, link, style or title elements.

  • remove-selector (default: undefined) If set this is should be a valid JQuery 2.0 selector. All matching elements on your page will be removed before returning the snapshot.

  • device-width (default: 1280) Sets the width in pixels of the headless browser used to render your page. Setting this can be important when you are using responsive pages that show different content at different page sizes.

  • device-height (default: 800) Sets the height in pixels of the headless browser used to render your page. Setting this can be important when you are using responsive pages that show different content at different page sizes.

For more information see the project's Github Page

Configuring Other Web Servers

You can make a snapshot using the makeSnapshot function from our HTTP API.

All of our other examples work by checking for the _escaped_fragment_ HTTP parameter name and forwarding matching requests on to makeSnapshot. All you have to do is make sure that your server does the same.

Our goal is to make AjaxSnapshots easy to use for everyone, so if you need help getting set up just send us an email and we'll be happy to help. On the other hand if you've worked out how to configure a server that's not in our list or can see a way to improve on our samples tell us about it and we'll make this list better.

When is the snapshot taken

There are 3 factors that influence the timing of the snapshot.

  1. The snap time. This is the time in milliseconds after the page's onload event has fired. The default is 5000ms, but you can override this using our HTTP API.

  2. The maximum request time. This is the maximum time we allow after the page starts to load before we take a snapshot. This is currently 40 seconds. If your pages take longer than this to load then this is a problem you should deal with before configuring AjaxSnapshots.

  3. The takeSnapshot function from our Javascript API

If the takeSnapshot API function is called before maximum request time or snap time is reached then it triggers the snapshot. Otherwise the snapshot is taken whenever the earliest of the maximum request time and snap time is reached.

When we return a snapshot we add an X-AJS-TRIGGER HTTP header to the response. The value of this header tells you which of the factors above triggered the snapshot. See our HTTP API for details.

Controlling Snapshot Content

As well as getting the timing of the snapshot right there are other aspects of the snapshot that it can be useful to control.

Removal of content

There are several legitimate reasons for removing content from the DOM before serving it to search engines:

For example, if content is present in the DOM but not shown to users you probably don't want search engines to index it either. Similarly, if some of your content isn't essential to the page, such as a popup that you only show to first time vistors, then you probably don't want it to be indexed by search engines either.

Warning You should only remove or change content in the snapshot to improve its representation of your pages, never with the intention of gaming search engines by misrepresenting your content. If search engines interepret your actions as cheating then your site's appearance in search results could be penalized.

The default behavior of AjaxSnapshots is to remove any hidden DOM elements where hidden means that they match JQuery's :hidden selector except that we do not remove head, meta, style, link or title elements. You can override this and customize it using the remove-hidden and remove-selector parameters to makeSnapshot in our HTTP API

If you need finer grained control over your snapshots you can also manipulate the DOM using the beforeSnapshot callback function from our Javascript API.

Device Dimensions

If you are using responsive web pages then the content you display may vary depending on the page size. Some important content might be missing at smaller screen sizes, and you might add some less essential content when the screen is very large. In cases like this there is probably some optimal page size for your site.

You can control the page size of our headless browser using the device-width and device-height parameters to makeSnapshot in our HTTP API.

相關文章
相關標籤/搜索