Netscraps

A windsurfing, CSS-grudging, IE-hating, web-developing, gigantic-machine-puzzling blog

Month: October 2011

404 Not Found Abuse: oggiPlayerLoader.htm

In a refreshingly proactive turn of events, one Amazon AWS abuser replied to me directly. The oggiPlayerLoader.htm 404 errors detailed in my previous web abuse post were courtesy of Oggifinogi, a rich media provider based out of Bellevue, WA.

Directory of Technology Paul Grinchenko emailed me back with a friendly explanation:

We are just looking for our IFRAME buster. You were running at least 1 of our ads in an IFRAME.

No surprise there. We have no prior relationship with Oggifinogi, so I figured their ads had been served through one of the 3rd party ad networks we use (turns out it was ValueClick).

Luckily the issue is simpler than that — Amazon AWS prohibits them from 404-bombing our servers at “an excessive or disruptive rate”. My reply to Paul:

As you probably saw from the “comments” I provided, my complaint was your service’s excessive HEAD requests to the same 6 non-existent files. Judging from the excessively long-term & repetitive 404 errors, it seems your service does nothing useful with the “not found” status code returned by our servers each time. Oggifinogi would be better off using a more responsible system: monitor HTTP response codes to your iframe buster requests, & use that information to limit requests when the files clearly don’t exist. By the way, I somewhat appreciate your service’s HEAD requests versus a full GET, but it’s a bandaid.

Also I urge you to consider Amazon’s advice: We would strongly recommend that you customize your UserAgent string to include a contact email address so as to allow target sites being crawled to be able to contact you directly. (…although most responsible web services I’ve come across put a URL in their user agent, rather than an email address…)

A few hours later, Paul replied that Oggifinogi does indeed cache iframe buster file presence for a short period, so their requests should not exceed 75 per hour. That fits the profile I saw — no real strain on the web server, but very annoying when tailing error logs.

The good news is Paul agreed to start using a Oggifinogi user agent — hopefully with a help/explanation page URL too.

Paul also sent me the oggiPlayerLoader.htm instructions. Now Oggifinogi can bust our iframes at will, rather than continuing the 404 war. In case anyone else out there wants to join the peace process:

Instructions for Publishers:

  1. Please download and unpack oggiPlayerLoader.zip – External link for Pubs to Download oggiPlayerLoader.zip
  2. Make sure that unpacked version is called oggiPlayerLoader.htm
  3. Copy oggiPlayerLoader.htm to just one of the following locations – single location is enough:

Please make sure that resulting location is accessible from outside. Location shouldn’t be protected. For example you should be able to open in the browser URL http://www.yoursite.com/oggiPlayerLoader.htm without entering any credentials.

… Working to improving the Interweb, one 404 error at a time.

Web crawl abuse from Amazon AWS-hosted projects

I’ve been keeping an eye on the CarComplaints.com error log lately, watching for phishing attempts, misbehaving bots/scripts, & other random stupidity. Turns out the major offenders have something in common — they’re hosted on Amazon’s AWS platform.

One Amazon AWS customer was crawling pages in bursts at up to 100 per minute, but referencing our mixed-case URLs in all lowercase — racking up several hundred thousand 404 errors over several weeks. Luckily they had a “Ruby” user agent (Ruby script’s HTTP request?) … bye bye Ruby, at least until you change user agents.

Another Amazon AWS customer was requesting oggiPlayerLoader.htm in various locations. Anyone know what this “Frame Booster” is part of? (UPDATE: see my followup about Oggifinogi). Luckily they use a HEAD request, so those got banned too along with some other esoteric request methods suggested by Perishable Press.

RewriteCond %{HTTP_USER_AGENT} "Ruby" [NC,OR]
RewriteCond %{REQUEST_METHOD} ^(delete|head|trace|track) [NC]
RewriteRule ^(.*)$ - [F,L]

I cheerily reported both cases of AWS abuse to Amazon via their web abuse form. Turns out the abuse form is there only to mess with your head. Some form data has to be space-separated while other data must be comma-separated. Fields where you list IPs & URLs barely fit a single entry, much less multiple items. And good luck cutting your access log snippet down to their 2000 character limit. Amazon just launched their Cloud Drive — zillions of decaquintillobytes of storage space — but can they handle processing a few hundred lines of server logs? Nope.

The kicker is if they do accept, verify, & pass on your  complaint to their AWS customer, Amazon won’t provide any details about the offender so that you could, oh I don’t know, blog mean things about them. You’ll need a subpoena for that.

Moving on to abuse not related to AWS — people are referencing themes/default/style.css all over the place. The requests look legitimate, from various random IPs & user agents, so I’m guessing it’s a misbehaving browser plugin. Searching Google indicates it could be something called OpenScape, which I didn’t have time to research. Anyone know what that’s all about? Those got forbidden…

RewriteRule theme/default/style.css$ - [F,L]

And finally there’s Microsoft. For about a year, MSNBot has managed to take legitimate page URLS & tack Javascript onto the end, as in /Kia/Sephia/2001/engine/this.options[this.selectedIndex].value;” Only Microsoft could manage that.

Powered by WordPress & Theme by Anders Norén