By dave | March 26, 2017

silly 404 cartoon TL;DR: In Joomla there's often more than one way to get to the same page, if you're moving off joomla this needs to be considered, *preferably before moving any pages*. This article discusses the cases that I found [during my recent move to hugo](https://www.thecoderscorner.com/team-blog/web-design/cms/moving-to-hugo-from-joomla/).

URL Mappings in Joomla, the background

Recently I decided to move a joomla site over to hugo; I had search engine friendly URLs turned on and therefore assumed the mapping should be easy. Whatever the path was in Joomla, should be the same path for Hugo (insert CMS of choice) right?

Wrong, most definitely not so, Joomla has many ways of accessing an article, either by reference to the article component or via a module, maybe even by category or by menu. At the end of the day the URL mapping is done using mod_rewrite in apache in a different way to what one may expect. Below are some examples:

  • Wrong: team-blog/12-java-and-jvm/11-reading-a-gzip-file-using-java
  • Correct: team-blog/java-and-jvm/11-reading-a-gzip-file-using-java

Example 2 - article link using the article component

  • Wrong: component/content/article?id=48:building-a-holder-for-my-arduino-board
  • Correct: /electronics/microcontrollers/48-building-a-holder-for-my-arduino-board/

Example 3 - third party plugin creates yet another mapping

  • index.php?option=com_content&view=article&id=66&catid=29&Itemid=136
  • Not easily determined from the above!

Summary

Unfortunately it’s often the case that we wouldn’t look to closely at this until the time comes to move off Joomla - like my recent move to hugo. It’s at this point that the full extent of the situation becomes apparent.

There really is no magic wand that’s going to fix this, it’s just a case of working out all the different combinations from the apache access logs and webmaster tools; then trying to map over as many as possible. Some URLs with lots of parameters may be more trouble than they are worth, leaving the option of a good 404 handler rather than fix every link.

Below, I provide two example Linux shell commands that may be useful, the top one does GET requests and the second POST, they excludes robots.txt, PHP and PNG files, then provide a unique list of 404s:

grep 404 access_log* |grep -v .png|grep -v php | grep -v robots.txt | sed -En 's/.*"(GET[^"]*).*/\1/p' | uniq -c

grep 404 access_log* |grep -v .png|grep -v php | grep -v robots.txt | sed -En 's/.*"(POST[^"]*).*/\1/p' | uniq -c

Other pages within this category

comments powered by Disqus

This site uses cookies to analyse traffic, and to record consent. We also embed Twitter, Youtube and Disqus content on some pages, these companies have their own privacy policies.

Our privacy policy applies to all pages on our site

Should you need further guidance on how to proceed: External link for information about cookie management.

Send a message
X

Please use the forum for help with UI & libraries.

This message will be securely transmitted to our servers.