What you need to know about Angular SEO

Search engines, such as Google and Bing are engineered to crawl static web pages, not javascript-heavy, client-side apps. This is typical of a search engine which does not render javascript when the search bot is crawling over web pages.

This is because our javascript-heavy apps need a javascript engine to run, like PhantomJS or v8, for instance. Web crawlers typically load a web page without using a javascript interpreter.

Search engines do not include JS interpreters in their crawlers for good reason, they don’t need to and it slows them down and makes them more inefficient for crawling the web.

Are we out of luck for providing good SEO for our Angular apps? This article will show you exactly what you need to know to get your app indexed today.

Getting angular apps indexed

There are several different ways that we can tell Google to handle indexing our app. One, the more common approach is by using a backend to serve our angular app. This has the advantage of being simple to implement without much duplication of code.

A second approach is to render all of the content delivered by our angular app inside a <noscript> tag in our javascript. We’re not going to cover the second approach

In this article, we’re going to walk through how you can build an SEO toolchain for your workflow, presenting a bunch of different options for you to choose what works best for you.

How modern search engines work with client-side apps

Google and other advanced search engines support the hashbang URL format, which is used to identify the current page that’s being accessed at a given URL. These search engines transform this URL into a custom URL format that enables them to be accessible by the server.

The search engine visits the URL and expects to get the HTML that our browsers will receive, with the fully rendered HTML content. For instance, Google will turn the hashbang URL from:

1
http://www.ng-newsletter.com/#!/signup/page

Into the URL:

1
http://www.ng-newsletter.com/?_escaped_fragment_=/signup/page

Within our angular app, we will need to tell Google to handle our site slightly differently depending upon which style we handle.

Hashbang syntax

Google’s Ajax crawling specification was written and originally intended for delivering URLs with the hashbang syntax, which was an original method of creating permalinks for JS applications.

We’ll need to configure our app to use the hashPrefix (default) in our routing. This will turn our routes from simply using the /#/ syntax to the /#!/ syntax:

1
2
3
4
angular.module('myApp', []) .config(['$location', function($location) { $location.hashPrefix('!'); }]);

HTML5 routing mode

The new HTML5 pushState doesn’t work the same way as it modifies the browser’s URL and history. To get angular apps to “fool” the search bot, we can add a simple element to the header:

1
<meta name="fragment" content="!">

This tells the Google spider to use the new crawling spec to crawl our site. When it encounters this tag, instead of crawling our site like normal, it will revisit the site using the ?_escaped_fragment_= tag.

This assumes that we’re using HTML5 mode with the $location service:

1
2
3
4
5
angular.module('myApp', []) .config(['$location', function($location) { $location.html5Mode(true); }]);

With the _escaped_fragment_ in our query string, we can use our backend server to serve static HTML instead of our client-side app.

Now, our backend can detect if the request has the _escaped_fragment_ in the request and and we can serve static HTML back instead of our pure angular app so that the crawler can crawl our site as though it were a static site.

Options for handling SEO from the server-side

We have a number of different options available to us to make our site SEO-friendly. We’ll walk through three different ways to deliver our apps from the server-side:

Using node/express middleware

To deliver static HTML using NodeJS and Express (the web application framework for NodeJS), we’ll add some middleware that will look for the _escaped_fragment_ in our query parameters.

In this case, this middleware will only be called if the _escaped_fragment_ exists as a query parameter, otherwise it will continue along the call chain.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
// In our app.js configuration app.use(function(req, res, next) { var fragment = req.query._escaped_fragment_; // If there is no fragment in the query params // then we're not serving a crawler if (!fragment) return next(); // If the fragment is empty, serve the // index page if (fragment === "" || fragment === "/") fragment = "/index.html"; // If fragment does not start with '/' // prepend it to our fragment if (fragment.charAt(0) !== "/") fragment = '/' + fragment; // If fragment does not end with '.html' // append it to the fragment if (fragment.indexOf('.html') == -1) fragment += ".html"; // Serve the static html snapshot try { var file = __dirname + "/snapshots" + fragment; res.sendfile(file); } catch (err) { res.send(404); } });

This middleware expects our snapshots to exist in a top-level directory called ‘/snapshots’ and serve files based upon the request path.

For instance, it will serve a request to / as index.html, while it will serve a request to /about as about.html in the snapshots directory.

Use Apache to rewrite URLS

If we’re using the apache server to deliver our angular app, we can add a few lines to our configuration that will serve snapshots instead of our javascript app.

We can use the mod_rewrite mod to detect if the route being requested includes the _escaped_fragment_ query parameter or not. If it does include it, then we’ll rewrite the request to point to the static version in the /snapshots directory.

In order to set the rewrite in motion, we’ll need to enable the appropriate modules:

1
2
$ a2enmod proxy $ a2enmod proxy_http

Then we’ll need to reload the apache config:

1
$ sudo /etc/init.d/apache2 reload

We can set the rewrite rules either in the virtualhost configuration for the site or the .htaccess file that sits at the root of the server directory.

1
2
3
4
5
RewriteEngine On Options +FollowSymLinks RewriteCond %{REQUEST_URI} ^/$ RewriteCond %{QUERY_STRING} ^_escaped_fragment_=/?(.*)$ RewriteRule ^(.*)$ /snapshots/%1? [NC,L]

Use nginx to proxy URLS

If we’re using nginx to serve our angular app, we can add some configuration to serve snapshots of our app if there is an _escaped_fragment_ parameter in the query strings.

Unlike Apache, nginx does not require us to enable a module, so we can simply update our configuration to replace the path with the question file instead.

In our nginx configuration file (For instance, /etc/nginx/nginx.conf), ensure your configuration looks like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
server { listen 80; server_name example; if ($args ~ "_escaped_fragment_=/?(.+)") { set $path $1; rewrite ^ /snapshots/$path last; } location / { root /web/example/current/; # Comment out if using hash urls if (!-e $request_filename) { rewrite ^(.*)$ /index.html break; } index index.html; } }

Once this is complete, we’re good to reload our configuration:

1
sudo /etc/init.d/nginx reload

Taking snapshots

We can take snapshots of our HTML app to deliver our backend app, using a tool like PhantomJS or zombie.js to render our pages. When a page is requested by Google using the _escaped_fragment_ query parameter, we can simply return and render this page.

We’ll discuss two methods to take snapshots, using zombie.js and using a grunt tool. We’re not covering using the fantastic PhantomJS tool as there are plenty of great resources that demonstrate it.

Using Zombie to homebrew snapshots

To set up zombie.js, we’ll need to install the npm package zombie:

1
$ npm install zombie

Now, we’ll use NodeJS to save our file using zombie. First, a few helper methods we’ll use in the process:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
var Browser = require('zombie'), url = require('url'), fs = require('fs'), saveDir = __dirname + '/snapshots'; var scriptTagRegex = /<script\b[^<]*(?:(?!<\/script>)<[^<]*)*<\/script>/gi; var stripScriptTags = function(html) { return html.replace(scriptTagRegex, ''); } var browserOpts = { waitFor: 2000, loadCSS: false, runScripts: true } var saveSnapshot = function(uri, body) { var lastIdx = uri.lastIndexOf('#/'); if (lastIdx < 0) { // If we're using html5mode path = url.parse(uri).pathname; } else { // If we're using hashbang mode path = uri.substring(lastIdx + 1, uri.length); } if (path === '/') path = "/index.html"; if (path.indexOf('.html') == -1) path += ".html"; var filename = saveDir + path; fs.open(filename, 'w', function(e, fd) { if (e) return; fs.write(fd, body); }); };

Now all we need to do is run through our pages, turn every link from a relative link into an absolute link (so the crawler can follow them), and save the resulting html.

We’re setting a relatively high waitFor in the browser options above. This will cover 90% of the cases we care about. If we want to get more precise on how and when we take a snapshot, instead of waiting the 2 seconds we’ll need to modify our angular app to fire an event and listen for the event in our zombie browser.

Since we like to automate as much as possible and prefer not to muck with our angular code, we prefer to set our timeout relatively high to attempt to let the code settle down.

Our crawlPage() function:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
var crawlPage = function(idx, arr) { // location = window.location if (idx < arr.length) { var uri = arr[idx]; var browser = new Browser(browserOpts); var promise = browser.visit(uri) .then(function() { // Turn links into absolute links // and save them, if we need to // and we haven't already crawled them var links = browser.queryAll('a'); links.forEach(function(link) { var href = link.getAttribute('href'); var absUrl = url.resolve(uri, href); link.setAttribute('href', absUrl); if (arr.indexOf(absUrl) < 0) { arr.push(absUrl); } }); // Save saveSnapshot(uri, browser.html()); // Call again on the next iteration crawlPage(idx+1, arr); }); } }

Now we can simply call the method on our first page:

1
crawlPage(0, ["http://localhost:9000"]);

Using grunt-html-snapshot

Our prefered method of taking snapshots is by using the grunt tool grunt-html-snapshot. Since we use yeoman and grunt is already in our build process, we set up this task to run after we make a release of our apps.

To install grunt-html-snapshot, we’ll use npm like so:

1
npm install grunt-html-snapshot --save-dev

If we’re not using yeoman, we’ll need to include this task as a grunt task in our Gruntfile.js:

1
grunt.loadNpmTasks('grunt-html-snapshot');

Once this is set, we’ll set some configuration about our site. To set up configuration, we’ll create a new config block in our Gruntfile.js that looks like:

1
2
3
4
5
6
7
8
htmlSnapshot: { debug: { options: {} }, prod: { options: {} } }

Now we simply get to add our different options for the different stages:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
htmlSnapshot: { debug: { options: { snapshotPath: 'snapshots/', sitePath: 'http://127.0.0.1:9000/', msWaitForPages: 1000, urls: [ '/', '/about' ] } }, prod: { options: {} } }

To see a list of the entire available configuration options, check out the documentation page at https://github.com/cburgdorf/grunt-html-snapshot.

Prerender.io

Alternatively, we can use an open-source tool such as Prerender.io, which includes a node server that renders our site on-the-fly and an express middleware that communicates with the backend to prerenderHTML on-the-fly.

Essentially, prerender.io will take a url and returns the rendered HTML (with no script tags). Essentially, the prerender server we’ll deploy will be called from our app like so:

GET http://our-prerenderserver.com/http://localhost:9000/#!/about

This GET will return the rendered content of our #/about page.

Setting up a prerender cluster is actually pretty easy to do. We’ll also show you how to integrate your own prerender server into your node app. Prerender.io is also avaialble for Ruby on Rails through a gem, but we won’t cover how to set it up.

Setting up our own server to run it is pretty easy. Simply run the npm install to install the dependencies and run the command through either foreman or node:

1
2
3
4
npm install node index.js # Or through foreman foreman start

The prerender library is also convenient to run on heroku:

1
2
3
$ git clone https://github.com/collectiveip/prerender.git $ heroku create $ git push heroku master

We store our rendered HTML in S3, so we recommend you use the built-in s3 cache. Read the docs how to set this up here.

After our server is running, we just need to integrate the fetching through our app. In express, this is very easy using the node library prerender-node.

To install prerender-node, we’ll use npm:

1
$ npm install --save prerender-node

After this is installed, we’ll tell our express app to use this middleware:

1
2
3
var prerender = require('prerender-node').set('prerenderServiceUrl', 'http://our-prerenderserver.com/'); app.use(prerender);

And that is it! This tells our express app that if we see a crawler request (defined by having the _escaped_fragment_ or the user agent string), then make a GET request to our prerender service at the appropriate url and get the prerendered HTML for the page.

Professional alternatives

Although we present a relatively easy methods of creating SEO opportunities for our apps, it does take work to set it up and maintain. There are definitely professional services that offer SEO as a service.

We recommend one of our sponsors, brombone that has fantastic service and great service.

Other professional SEO services include:

If you enjoyed this post, you should really check out our upcoming book at ng-book.com.

Get the weekly email all focused on AngularJS. Sign up below to receive the weekly email and exclusive content.
We will never send you spam and it's a cinch to unsubscribe.

Download a free sample of the ng-book: The Complete Book on AngularJS

ng-book: The Complete Book on AngularJS is the canonical AngularJS book available today.

It's free, so just enter your email address and the PDF will be sent directly to your inbox. Mailchimp can take up to an hour to deliver the free sample chapter, but if you don't receive it within the hour, send us an email and we'll manually send them to you!

We'll send you updates about the book, when it updates and other free content.

We will never send you spam and it's a cinch to unsubscribe.

Comments

comments powered by Disqus