|
hairboy
|
 |
« on: September 15, 2008, 04:16:08 AM » |
|
Hi guys, I'm running an online game on my dedicated server, Plesk tells me we're only sitting at about 15-25% load and plenty of RAM unused....however many of my gamers are reporting the occasional "The connection has timed out. The server at www.trackking.org is taking too long to respond.". Refreshing the page seems to correct it immediately, so I'm confused. Any suggestions what might be causing this, and how I might be able to fix it? It seems more prevalent for those not in the US, but that's not always the case.
|
|
|
|
« Last Edit: September 15, 2008, 04:21:12 AM by hairboy »
|
Logged
|
|
|
|
|
perestrelka
|
 |
« Reply #1 on: September 15, 2008, 05:59:26 AM » |
|
Hi Hairboy,
As it appears from the problem description, it is more than likely that your Apache hits the MaxClients limit that specifies the maximum amount of concurrent connections to Apache. Since your server has plenty free resources, you would try to increase that parameter in httpd.conf. If this does not help, it would be useful to review Apache error logs in /var/log/httpd to get more ideas on the issue.
|
|
|
|
|
Logged
|
Kind Regards, Vlad Artamonov
|
|
|
|
hairboy
|
 |
« Reply #2 on: September 15, 2008, 06:26:35 PM » |
|
Thanks for the starting point....I'm an apache newb, so this is all good learning. I'd really appreciate if someone could read my "theories" below and confirm or deny lol  OK, Having a look thru the logs, there are no errors that seem in any way related... I discovered a new unix command netstat -t -n which is amazingly informative! I have a massive amount of connections sitting in the "Time_wait" state which I think means that a request was completed and is now waiting to be recycled by apache? So the "Timeout" value of 120 is obviously too high...Thinking I might reduce that to about 15? Also, I find that Keepalive is set to off. From my readings and what I think I understand about servers (?!) This should probably be set to on? I assume this would allow each user/script to "re-use" their connection for the couple of seconds specified by keepalivetimeout?? So, if that's the case I would imagine that keepalive would be a good thing. Now the tricky part - to work out how long to keep the connections alive. I'm thinking a timeout of 3-5 seconds should be plenty to allow a page to build and then close the connection? I haven't changed the maxclients for now, because I'm pretty sure that I'm not hitting it with active connections just yet - only filling it with the inactive ones...?? MaxClients is set at 256 for the prefork MPM and 150 for the worker MPM.... can anyone tell me which of these is the applicable one?
|
|
|
|
|
Logged
|
|
|
|
|
perestrelka
|
 |
« Reply #3 on: September 17, 2008, 10:21:03 AM » |
|
Hi, Let me to try to address your inquiries: I have a massive amount of connections sitting in the "Time_wait" state which I think means that a request was completed and is now waiting to be recycled by apache? So the "Timeout" value of 120 is obviously too high...Thinking I might reduce that to about 15? Netstat shows something more underlaying than apache status. Actually, Timeout control the state of the active connection but TIME_WAIT is the sate of closed connection so the first one has nothing to do with the second. Anyway, my opinion is to reduce the Timeout value from default 120 to something like 10 or 20 to instruct Apache to not spend time and resources for lost or very slow links which are something very rare nowdays. Also, I find that Keepalive is set to off. From my readings and what I think I understand about servers (?!) This should probably be set to on? I assume this would allow each user/script to "re-use" their connection for the couple of seconds specified by keepalivetimeout??
So, if that's the case I would imagine that keepalive would be a good thing. Now the tricky part - to work out how long to keep the connections alive. I'm thinking a timeout of 3-5 seconds should be plenty to allow a page to build and then close the connection? That's correct, it is advisable to have KeepAlive's enabled. Timeouts of 5-15 seconds looks good for it. I haven't changed the maxclients for now, because I'm pretty sure that I'm not hitting it with active connections just yet - only filling it with the inactive ones...?? MaxClients is set at 256 for the prefork MPM and 150 for the worker MPM.... can anyone tell me which of these is the applicable one? By default, Apache 2 gets setup using prefork MPM which copies Apache 1 functioning. I hope this helps.
|
|
|
|
|
Logged
|
Kind Regards, Vlad Artamonov
|
|
|
|
hairboy
|
 |
« Reply #4 on: September 25, 2008, 06:08:26 PM » |
|
Thankyou so much for the information.
Have been puzzling this one through after the updates mentioned above, as I'm still finding connection timeout issues.
I've had another thought though - I have database replication operating from my dedicated plan to a VPS..... Is it possible that the replication is in any way responsible for connection timeouts?? Is there something I could/should do to apache confs to support this replication process?
|
|
|
|
|
Logged
|
|
|
|
|
perestrelka
|
 |
« Reply #5 on: September 27, 2008, 08:50:01 PM » |
|
I've had another thought though - I have database replication operating from my dedicated plan to a VPS..... Is it possible that the replication is in any way responsible for connection timeouts?? Is there something I could/should do to apache confs to support this replication process?
More than likely now, although there maybe some relationship. The best way is to check disable it for some period and check how it will go without it. Nothing useful in Apache error logs still?
|
|
|
|
|
Logged
|
Kind Regards, Vlad Artamonov
|
|
|
|
hairboy
|
 |
« Reply #6 on: October 06, 2008, 11:51:03 PM » |
|
No, still nothing useful i nthe apache error log. In fact they're basically empty. I tried a support ticket......but I'm coming back to ask all of my questions in the forums now as you are soooooooooo much more helpful than any of the techos. Man, submitting a ticket for support is like pulling teeth. I don't think any helpdesk staff EVER read the history of the ticket they are dealing with - it just goes round and round in circles with me asking the same questions and never getting answers lol!!  So....here's an interesting one for you I was asked to try a tracert whenever I had a connection timeout. When I did it showed that the tracert was trying to connect to a DIFFERENT domain!! Someone had left their old PTR records pointing to their domain (with my IP!)....so I'm wondering - could that have caused this connection timeout at all? The support staff say they've deleted the PTR now, but now when I tracert to my IP I see "Tracing route to dmvb00246.lunarservers.com" instead of what I'd expect which is "Tracing route to trackking.org"..... Is this OK? What on Earth is dmvb00246.lunarservers.com??? Why is it at my IP?
|
|
|
|
|
Logged
|
|
|
|
|
hairboy
|
 |
« Reply #7 on: October 07, 2008, 07:46:08 AM » |
|
OK, well, here's a new twist. Had another connection problem, so ran a tracerts again.... and I find that starting the tracert from anywhere outside of America I'm experiencing problems with the #9 hop shown below..... I currently can't access my site at all (connection timeout) yet downorjustme.com tells me that the site is up. I can tracert successfully from the USA - resolves very quickly in almost no hops. So...how can I work this through? Who is responsible for that hop and how do I get them to correct it? traceroute to 74.50.5.53 (74.50.5.53), 30 hops max, 40 byte packets 1 vlan250.lon-service6.Melbourne.telstra.net (203.50.2.177) 0.44 ms 0.355 ms 0.237 ms 2 TenGigabitEthernet0-12-0-2.exi-core1.Melbourne.telstra.net (203.50.80.1) 0.484 ms 0.35 ms 0.39 ms 3 Bundle-POS1.chw-core2.Sydney.telstra.net (203.50.6.13) 14.849 ms 14.922 ms 15.452 ms 4 Bundle-Ether1.oxf-gw2.Sydney.telstra.net (203.50.6.90) 14.963 ms 15.035 ms 14.916 ms 5 TenGigabitEthernet6-0.sydo-core01.Sydney.reach.com (203.50.13.38) 15.606 ms 16.595 ms 15.538 ms 6 i-12-0.paix-core02.net.reach.com (202.84.140.22) 164.006 ms 164.03 ms 164.099 ms 7 i-3-2.paix05.net.reach.com (202.84.251.70) 164.205 ms 164.214 ms 164.076 ms 8 * * * 9 nwstdsrj02-ge710.rd.lv.cox.net (68.1.0.91) 186.266 ms 186.23 ms 186.338 ms 10 * * * 11 * * * 12 * * * 13 * * * 14 * * * 15 * * * 16 * * * 17 * * * 18 * * * 19 * * * 20 * * * 21 * * * 22 * * * 23 * * * 24 * * * 25 * * * 26 * * * 27 * * * 28 * * * 29 * * * 30 * * *
|
|
|
|
|
Logged
|
|
|
|
|
Mitch
|
 |
« Reply #8 on: October 07, 2008, 08:13:17 AM » |
|
Looks like there may have been some issue on the network side that is being looked into. Your URLs are resolving now for me, so that is a good sign. 
|
|
|
|
|
Logged
|
|
|
|
|
hairboy
|
 |
« Reply #9 on: October 22, 2008, 06:22:11 AM » |
|
Have made a whole load of "streamlining" improvements to the site so that it is more effective with its' use of MySQL resources.....all was smooth for a couple of weeks.... Now, my server is still showing very low load (lower than it has for the past month or two) however connection timeouts are creeping back in... and Tracert shows the same IP's as being problematic: 7 i-3-2.paix05.net.reach.com (202.84.251.70) 8 * * * 9 nwstdsrj02-ge710.rd.lv.cox.net (68.1.0.91) 10 * * * So, is that inside Lunarpages control? Or should I be contacting someone specific about the issue? Are there any "formalities" I need to go through or can I just search for the owner of the IP address and try and "bombard" them with email requests until they respond? 
|
|
|
|
|
Logged
|
|
|
|
|
perestrelka
|
 |
« Reply #10 on: October 24, 2008, 11:56:57 PM » |
|
Hi Hairboy,
Before proceeding with contacting somebody, I would recommend to grab some more info on the issue which would be, ping stats to the destination when there are the timeouts and when there aren't, traceroutes to the server from a few different locations and ISPs. This can help to localize the issue more accurately.
|
|
|
|
|
Logged
|
Kind Regards, Vlad Artamonov
|
|
|
|
hairboy
|
 |
« Reply #11 on: March 06, 2009, 10:31:12 PM » |
|
Just to "resolve" this issue, and give some ideas to anyone else who might see this post and be experiencing similar issues....
After getting curious about DNS settings and Nameservers etc, I discovered that LP had incorrectly configured mine when setting up my server.
One of my nameservers pointed to an IP address that had not been added to my account - and NONE of my nameservers pointed to my primary IP!!
Fixed both those issues, and now my site is blisteringly fast compared to the same time last week, even though the CPU load has not altered.
Moral of the story: When something is configured for you, don't assume that the person doing it actually did it correctly. Always take the time to learn and double-check things yourself.
|
|
|
|
|
Logged
|
|
|
|
|
perestrelka
|
 |
« Reply #12 on: March 09, 2009, 01:30:02 AM » |
|
Hi Hairboy,
Please accept my apologizes for this misconfiguration for all LP team. And thanks for sharing this on these forums for other to check for similar issues.
|
|
|
|
|
Logged
|
Kind Regards, Vlad Artamonov
|
|
|
firefoxonline
Trekkie

Offline
Posts: 12
|
 |
« Reply #13 on: March 19, 2009, 10:38:00 PM » |
|
You tried pinging it and stuff , the connection could be unstable 
|
|
|
|
|
Logged
|
|
|
|
|