Web Hosting Forum | Lunarpages

Author Topic: Calling all .htaccess experts!  (Read 10313 times)

Offline MrPhil

  • Senior Moderator
  • Berserker Poster
  • *****
  • Posts: 6230
Calling all .htaccess experts!
« on: January 02, 2017, 10:31:11 AM »
Okey dokey, I've had enough with fighting .htaccess over redirection and rewriting. More problems may show up after I fix this one, but for now, all I'm trying to do is insert www in front of my domain name. It works in every case EXCEPT when the URI starts with real directory names /dir1, /dir2, /dir3, or /dir4. THEN it simply adds a trailing /, and the www-addition RewriteRule never fires. According to Chrome's Network tool, the first thing that happens is a 301 redirect to add that /. I've tried three browsers (Firefox, Chrome, IE11) and have emptied their caches many times. All I can figure is that either LP or my ISP has a cache somewhere with a very long expiration time, but all other changes to .htaccess have immediate effect. Here is my cut-down /.htaccess:

Code: [Select]
# whenever changing production/development trees, hit UPDATE entries
#  also appl(ication)__top.php, ../good.* lists, forumN/Settings.php, shopN/...
# when force https, need to correct some hard coded http:

# explicit Error Document required by LP changes 10/04/07
ErrorDocument 400 /400.shtml
ErrorDocument 401 /401.shtml
ErrorDocument 403 /403.shtml
ErrorDocument 404 /404.shtml
ErrorDocument 406 /406.shtml
ErrorDocument 500 /500.shtml

# protect php.ini against being listed by snoopers. .htaccess already protected
<Files php.ini>
order allow,deny
deny from all
</Files>
# point to php.ini
suPHP_ConfigPath  /home/*****/public_html

# block PHP backup files
<Files ~ "\.php~$">
Order allow,deny
Deny from all
</Files>

# order to look for files. if no index.* found, display public_html/nopeek.php
# this is apparently done AFTER all .htaccess processing
DirectoryIndex index.html index.htm index.php /nopeek.php

# block by IP address
# logged hack attempts on web site
order allow,deny
...
allow from all

RewriteEngine On
Options +FollowSymlinks
RewriteBase /

# ---- any 403s do first
# hotlink protection and allowed list       <<< works OK
# don't forget to add https: for any with SSL
## uncomment following line to PERMIT direct browser access of image files
#RewriteCond %{HTTP_REFERER} !^$ 
RewriteCond %{HTTP_REFERER} !^http://(www\.)?catskilltech\.com(/)?.*$     [NC]
RewriteCond %{HTTP_REFERER} !^http://(www\.)?lunarforums\.com(/)?.*$      [NC]
RewriteRule \.(jpg|jpeg|gif|png|bmp|pdf)$ - [F,NC]

# ---- any 301 redirects do next, so other rewrites don't get displayed
# requests to catskilltech.com divert to www.catskilltech.com
# be careful not to snag subdomains
RewriteCond %{HTTP_HOST} ^catskilltech\.com$  [NC]
RewriteRule ^(.*)$ http://www.catskilltech.com/$1 [R=301,L]   <<< does NOT fire if /dirN URI. otherwise DOES

# recently relocated material
RewriteCond  %{REQUEST_URI} mountaingallery\.html$  [NC]
RewriteRule  ^(.*)/mountaingallery\.html$ /$1/CatskillsGallery.html [R=301,L]  <<< works OK

# ---- any global 200s (rewrites) do last
# ---- any "/shop" redirect to current production shop (/shopN)  UPDATE
RewriteCond %{REQUEST_URI} ^/shop(/|$) [NC]
RewriteRule  ^shop/?(.*)$  /shop1/$1  [NC,L]
# ---- any "/forum" redirect to current production forum (/forumN)  UPDATE
RewriteCond %{REQUEST_URI} ^/forum(/|$) [NC]
RewriteRule  ^forum/?(.*)$  /forum3/$1  [NC,L]

# if empty request www.catskilltech.com/ add index.html       WHY DO I NEED THIS? if I have DirectoryIndex command
RewriteCond %{REQUEST_URI} ^(/)?$
RewriteRule  ^(/)?$  /index.html [L]

# ---- list of things to NOT rewrite. anything else gets current production
#      directory (/dirN) stuck in front, unless there is already explicitly one
# unless certain directories or (root) files, DO insert the default subtree /dir3
... more special cases NOT to insert /dir3
RewriteCond  %{REQUEST_URI}  !^/dir[0-9](/|$)

RewriteRule  ^(.*)$  /dir3/$1 [L]

# what's left? for some reason, .html -> .php has to be done in SOME deeper .htaccess files  WHY not here, why SOME?
#              SEF fake paths -> URL Query String

and /dir3/.htaccess and /dir4/.htaccess are

Code: [Select]
RewriteEngine On

# ---- .html -> .php
RewriteRule  ^(.*)\.html$  $1.php

If I enter catskilltech.com, the address changes to www. catskilltech.com, and /dir3 is invisibly inserted to go to the production branch. My PHP code is such that if I explicitly give /dir4, it will use that in all links as my override to the development branch. So, these /dir3 and /dir4 have been around for a long time (years).

If I enter catskilltech.com/dir4 (or dir1/2/3 for that matter), the address changes to catskilltech.com/dir4/. No www is ever added (it should be). I even tried changing the redirect (rewriterule) to a fail ([F] rule) and it didn't trigger.

Support merely tells me that I must have messed up my rewrite rules. Can anyone see where I went wrong? The only way I can explain this is that someone has catskilltech.com/dirN/ cached, and is using that (returning a 301 to my browser). Of course, I've been staring at this for so long that I might have overlooked something!

There are other odd things going on, such as .html->.php not working in /.htaccess. I moved it to /dir4/.htacess, but in a few cases (deeper directories) I had to add more .htaccess files. Again, caching is the first thing to come to mind.

Thanks much and eternal fame and fortune to whoever can figure this out!
Visit My Site

E-mail Me
  
-= From the ashes shall rise a sooty tern =-

Offline Pete

  • Alien Anomaly
  • Senior Moderator
  • Professor in Nanotechnology
  • *****
  • Posts: 4257
    • X-Visions Website Design
Re: Calling all .htaccess experts!
« Reply #1 on: January 04, 2017, 10:27:20 AM »
 :?
I'm sure you've googled it yourself but in case you didn't trip over this? ....... http://stackoverflow.com/questions/21417263/htaccess-add-remove-trailing-slash-from-url

Any good ?
x-visions.com


As I'm always saying.. (But nobody listens)
"Take a step back.. Take a deep breath and see if there a simple solution there, thats hiding" lol  :DLunarpages Web Hosting   Lunarpages Forums  Lunarpages Affiliate Program

Offline MrPhil

  • Senior Moderator
  • Berserker Poster
  • *****
  • Posts: 6230
Re: Calling all .htaccess experts!
« Reply #2 on: January 05, 2017, 05:58:19 AM »
Hi @Pete,

I don't have anything in my current .htaccess files (root or deeper) adding or removing trailing /'s. A few months ago, I did have such a 301 redirect in the root (actually, in /redirector.php, which no longer exists, not .htaccess). This is why I'm wondering if someone, somewhere, has cached my URLs. Have you ever heard of anyone except the browser doing caching? I've cleared the browsers' caches many times. Support swears that they don't cache. Chrome reports that the first operation is a 301 to add the /.

If anyone has come across a good .htaccess URL rewriting reference, I'd love to know about it. It takes so much trial and error to figure out just what's going on, as most references are very vague or skimpy -- they seem to assume either that the reader already knows a lot about the subject, or they limit themselves to very simple cookbook applications.

Just to rule out a hidden browser cache somewhere, when you type in http://catskilltech.com/dir4, do you get a trailing / added (and no www.), while http://catskilltech.com simply adds the www. (as desired)? If it behaves for you that way, there must be a server-level cache.
Visit My Site

E-mail Me
  
-= From the ashes shall rise a sooty tern =-

Offline Pete

  • Alien Anomaly
  • Senior Moderator
  • Professor in Nanotechnology
  • *****
  • Posts: 4257
    • X-Visions Website Design
Re: Calling all .htaccess experts!
« Reply #3 on: January 07, 2017, 06:17:40 AM »
Both Links work just as you enquire. Trailing / on the first and www on the second
x-visions.com


As I'm always saying.. (But nobody listens)
"Take a step back.. Take a deep breath and see if there a simple solution there, thats hiding" lol  :DLunarpages Web Hosting   Lunarpages Forums  Lunarpages Affiliate Program

Offline MrPhil

  • Senior Moderator
  • Berserker Poster
  • *****
  • Posts: 6230
Re: Calling all .htaccess experts!
« Reply #4 on: January 07, 2017, 06:37:33 AM »
The first one (trailing / only) is broken. That you see it would suggest something being cached at the server level, but I don't know what or for how long. Why would a URI starting with /dirx be treated differently than other URIs? That would include a null URI, and other real directories. The only way that a /dirx is treated differently is that after the 301 redirects, there is a 200 rewrite that inserts a /dir3 as the default set of code to use (3 is production, 4 is current development, and then I'll wrap around to 1 again). If there is already an explicit /dirx, I don't insert it. That allows me to test run /dir4 by only entering that directory on the initial URL. If /dir3 is being inserted (by default), the www add works properly -- how could that rewrite be canceling the www redirect, and adding the trailing /?

This URL rewriting really is voodoo!
Visit My Site

E-mail Me
  
-= From the ashes shall rise a sooty tern =-

Offline MrPhil

  • Senior Moderator
  • Berserker Poster
  • *****
  • Posts: 6230
Re: Calling all .htaccess experts!
« Reply #5 on: January 20, 2017, 07:05:57 AM »
I'm still trying to figure out this thing, and would appreciate input from experienced URL rewriters. catskilltech.com/dir4 still is getting a 301 rewrite (according to Chrome browser's Network analyzer tool) to add a trailing /. I can't find that anywhere in my .htaccess files! It's also failing to add the www. So does any other real dirN directory (N=1..4). The RewriteCond test
Code: [Select]
RewriteCond  %{HTTP_HOST}  ^catskilltech\.com$  [NC]is failing to fire if the URI is /dirN, and only /dirN. I tried replacing the RewriteRule with
Code: [Select]
RewriteRule  ^ - [F]which should have given a 403 error, but nothing happened, which means the RewriteCond is broken. Can anyone spot the problem? My understanding is that %{HTTP_HOST} should return catskilltech.com for this URL, and nothing else. I tried removing the $, and no change in the behavior.

All RewriteRules are guarded by RewriteConds, so it shouldn't be doing strange things on the second or third pass through, although with URL rewriting, anything is possible. I tried removing the Last flag from the 301 redirects, and other things then broke. Removing the RewriteBase command had no effect. I have no header() calls in my PHP.

This is getting extremely frustrating. LP swears they aren't caching anything, and I've repeatedly cleared my browser cache. I even installed a new browser which has never seen this URL, and it behaved the same way. While my site still is functional, it's not good for SEO to get both catskilltech.com and www.catskilltech.com indexed. Plus, according to Chrome, there are multiple 301 redirects within the page for images, JS files, CSS files, etc. which would go away if the address was changed to www. up top! Help!
Visit My Site

E-mail Me
  
-= From the ashes shall rise a sooty tern =-

Offline MrPhil

  • Senior Moderator
  • Berserker Poster
  • *****
  • Posts: 6230
Re: Calling all .htaccess experts!
« Reply #6 on: January 27, 2017, 09:54:51 AM »
Yesterday I received this response from support:
Quote
Please note that your are using custom .htaccess files for your sub-folders as following:

/home/****/public_html/dir1/.htaccess

So, http://catskilltech.com/dir1 is not redirecting to http://www.catskilltech.com/dir1 . By definition, .htaccess applies to the resident directory and all sub-directories. If you use a custom .htaccess file for a sub-folder, then that .htaccess file becomes parent .htacess file for that folder and cannot inherit the redirection to www . So, you need to add a new www. redirect for taht specific folder where you use custom .htaccess.

Can someone translate this to normal English? The way .htaccess is supposed to work is that all the .htaccess files down the chain from / (root) to the final directory where the page in question resides, are processed in order. The revised URL and any settings changes should be inherited by deeper layers. At least, that's how I understand it... can anyone offer a correction? Are there documented exceptions to this?

I have asked for clarification -- is my server configured to skip over intermediate .htaccess files, and go directly to the target directory and its .htaccess (assuming the URI starts with a real directory), or are all .htaccess files processed, but some things don't get inherited down the chain, like URLs? Either way is very non-standard behavior, if they're doing it that way!

If they are in fact skipping down the chain to the last "real" directory in the URI, I think I could get around the problem by using a virtual (fake) directory in the URI root (/d4 instead of /dir4). The last step in /.htaccess would be to 200 rewrite /d4 to /dir4. That's an awful thing to have to do, and I will have to seriously consider leaving Lunarpages if they're going to configure their servers in such an odd manner... or am I misunderstanding how .htaccess is supposed to work?

By the way, the trailing / being added is apparently being done by mod_dir. I can live with it.
Visit My Site

E-mail Me
  
-= From the ashes shall rise a sooty tern =-

Offline MrPhil

  • Senior Moderator
  • Berserker Poster
  • *****
  • Posts: 6230
Re: Calling all .htaccess experts!
« Reply #7 on: January 31, 2017, 05:50:26 AM »
OK.......

I finally got to the bottom of this: Lunarpages servers (or at least, mine) are misconfigured. If your URI starts with a real directory, it skips the .htaccess files in higher level directories, and goes directly to that directory. So, it never goes through /.htaccess, where most of my stuff (including IP deny lists, hotlink protection, and most of the URL rewriting) lives. I got around it by introducing virtual (fake) directories at the top level of all my URIs, and these are translated (rewritten) to real directories in /.htaccess (which everything goes through now). I'm still checking it out, but it seems to be working properly now, except that I still need to check if there are any cases where lower level .htaccess files are still being skipped.

I'm very disappointed with LP for making me go through all of this nonsense, all the while insisting that I must be doing something wrong in my .htaccess files. They clearly don't know what they're doing. I hope this experience will be helpful to others who are having problems getting their .htaccess files to work right -- consider that they may be skipped over if your URI starts with a real directory!
Visit My Site

E-mail Me
  
-= From the ashes shall rise a sooty tern =-

 

Share |