Porn, porn, porn

in the website

I’m at the bottom of kind of rabbit hole where you stand knee-deep in shit and wonder how the fuck you got here. So two of my last three articles were on the subjects of the Apache log file and regular expressions. Then this morning a plagiarized article popped up on Reddit.

The plagiarism was so bad that it hotlinked to the assets of the original article. Whoever ripped off the article had no fucks to give. Now, I know it isn’t hard to detect a hotlinked image and then block it (or redirect it to horse porn), but I don’t care as a rule. Still — I was curious about who hotlinks to my site. I attacked my one million line log file because I wanted the information without clicking around a website.

I went through a few iterations of searches until I hit on this query:

grep -v -P '(bhalash|google|baidu|yahoo|bing|yandex)' access.log | sed 's/^.*http/http/' | awk '{print $1}' | sed 's/)\?"//g' | sort | uniq -c | grep -P '(porn|sex|x{2,5}|adult|video)'

You will already know Google, Yahoo! and Bing, but Yandex are a big Russian search portal, and Baidu are a Chinese counterpart. I also excluded referrals from because I link between posts. Once I had excluded the search engines, what I had left was porn, lots of porn. Teen porn, amateur porn, video porn, fetish porn and fattie porn. There hundreds of backlinks to my site from spammy sites who want to boost their search engine rank.

The best worst part is that I can’t even list these sites without either giving them traffic or winding up on on a spam blacklist.

March 20

in me

Your email address will not be published. Required fields are marked *