Enabling Vanity URLs with Adobe Experience Manager

tadreeves

Enabling Vanity URLs with Adobe Experience Manager

When migrating from one version of a site to an all new website, or when migrating from an old CMS to Adobe Experience Manager, users often struggle with two things—rewriting legacy URL redirects and handling vanity URLs.

A migration can create mountains of legacy URLs that need rewriting. You need to handle these URLs somehow, whether they’re due to an SEO-driven shift in URL patterns, legacy licensed content that you can no longer access or just old articles that need to be redirected to new locations.

Tables of these URLs are usually generated when moving from one site architecture to another, and developers and operations employees are generally tasked with routing them appropriately. I’ve worked on sites with upward of 100,000 legacy URLs. At the behest of SEO teams, these have sometimes been added to by using Splunk to cull a list of 404s that users and crawlers are making against the site, and then working out rewrite rules to route these requests to appropriate current website content.

According to Wordstream, a vanity URL is, “a unique web address that is branded for marketing purposes … a custom URL that exists to help users remember and find a specific page of your website.”

Marketing needs are what drive these URLs, and they frequently change outside of software release cycles. Non-technical people generally specify where they want URLs to go, so getting these folks to update configuration files directly in Apache is highly undesirable, if not impossible.

You can generally solve the above problems with either a 302/301 redirect or a pass-through rewrite (where the URL doesn't change but the user gets different content on the backend). There are many different ways to handle redirects, which tends to generate an array of wild and wacky solutions to maintaining and updating them all.

Handling Bulk Legacy Rewrites
For the most part, legacy URLs are best handled within Apache, and you can generally use Apache RewriteMaps to handle bulk rewrites for legacy URL patterns, old 404s, etc. I’ve successfully stuffed over 100,000 rewrites into an indexed DBM hash file, and even tiny two-CPU cloud servers were able to handle it with absolutely no problem. The basic sequence is:

  1. Have your marketing, content and SEO teams give you a list—preferably in Excel—of all the URL redirects they want. Have them break it up into:
  • Rewrites they want to be permanent (e.g., 301 redirect)
  • Rewrites they want to pass through (e.g., /news.htmlpulls content from /news/2016/current/index.html but still shows /news.html in the user's browser)

2. Export these lists from Excel into tab-delimited text files.
3. Store this textual map in version control, so that you can track changes to it.
4. Encode this text map as a DBM file with:

httxt2dbm -i mapfile.txt -o mapfile.map

5. Add a directive in your HTTPD config with:

RewriteMap mapname "dbm:/etc/apache/mapfile.map"

6. Then, add in rewriteRules for each map you add, depending on how you want the URLs handled.

Ideally, you’d have the httxt2dbm commands being run in your CI server, so that when you update your rewriteMaps, your CI server can pull the updated maps from version control, convert them to DBM, push them onto Apache and cycle the server.

Handling Vanity URLs Using the AEM Dispatcher Module
Before recent updates, there was no clean way to enable marketing teams to update vanity URLs on their own. I worked with teams that had written their own Java app to give marketing and SEO teams a UI for inputting rewrites; the output was fed automatically into Apache.

Other teams tried to handle all the rewrites in Sling, but that required developers to parse and load Sling rewrite maps. Others still tried going directly to /crx/de in AEM and manually editing the /etc/map nodes on every server—a process that left considerable room for manual error and introducing differences in the DEV, STG and PRD tiers of the application.

Thankfully, recent updates to the AEM Dispatcher module (since version 4.1.9 of the module) allow authors to directly control vanity URLs from within the Author UI, and these are automatically pushed out to the publishers, which then expose them to the dispatchers. This works extremely well—it takes the act of updating and maintaining rewrite rules for vanities entirely out of the hands of IT and puts it into the hands of marketing, where it belongs.

AEM

Here’s how to make this work:

  • On your AEM Publish nodes, download and install the VanityURLS-Components package from Adobe Package Share, or pull it down and install it manually in /crx/packmgr.
  • Go to the /useradmin on your Publish instance and allow “Read” permission to /libs/granite/dispatcher/content/vanityUrls for the “Everyone” group. Do this by double-clicking the “Anonymous” user, Then go to “Permissions” and check the “Read” column for the above path.
  • If you don’t have an allow-all in your dispatcher configs, add a filter rule in the dispatcher to allow the vanity URL to be called on Publish instance:

/0100 { /type "allow" /url "/libs/granite/dispatcher/content/vanityUrls.html" }

  • Add a caching rule to prevent caching of this URL:

/0001 { /type "deny" /glob
"/libs/granite/dispatcher/content/vanityUrls.html" }

  • Add the vanity_urls configuration to the farm:
  • Re-start Apache.

The file defined at the /file setting is not automatically created/updated at the time interval set at /delay, but only when a request is made that fails the /filter rules of your dispatcher. On fail, it checks to see if the file is there — if not, it will generate and use it by pulling /libs/granite/dispatcher/content/vanityUrls.html from the publisher. If it is there, and not older than /delay seconds, it will use it. Finally, if it is older than /delay seconds, it will update it from the Publish instance and use it.