When we started working on “Gone Phishing“, I anticipated that I’d get some questions, so I’ve been keeping a running list of things that I expect to be FAQs.
Q: What’s unique about your study?
A: As far as we know, no one’s done a public study that directly compares multiple products against a meaningful number of URLs. Most of the evaluations that have been put out there are anecdotal and only used a few URLs.
Q: What did you test?
A: We took 8 anti-phishing products (including the Netcraft toolbar, IE 7’s Phishing Filter, Google’s Safe Browsing for Firefox, Netscape 8.1, GeoTrust TrustWatch, McAfee SiteAdvisor, the eBay toolbar, and EarthLink’s ScamBlocker) and ran two sets of tests: one to determine how good each technology was at catching known phish, and one to see how many mistakes each made on known-good URLs.
Q: Who won?
A: IE 7 came out best overall, with a score of 172 of a possible 200. Netcraft was a very close second, scoring 168/200. For the rest of the scoring, see the report.
Q: Microsoft commissioned the study. Isn’t it biased?
A: No. 3Sharp, not Microsoft, designed the methodology, picked the URLs, and ran the tests. The report includes a complete discussion of how we did this, and even lists of the URLs we tested. We believe our methodology is sound and we’re being 100% transparent about how we got the results we did so that others can duplicate the results if they like.
Q: How’d you decide who won?
A: We calculated a composite accuracy score for each technology. This score combined the product’s performance at blocking or warning phish with its accuracy in not blocking or warning on legitimate URLs. Each technology earned points for correct blocks/warns and lost points for bogus blocks/warns. (See p10 of the report for the full scoring formula). A product that blocked all 100 phish and none of the 500 good URLs would score a perfect 200; a product that didn’t block anything (e.g. IE 6, Safari, Firefox 1.5, Opera, etc.) would score 0.
Q: 200? I thought there were only 100 phish.
A: We used 100 live phish and 500 known good URLs for the test. However, our scoring formula counts 2 points for a block and 1 point for a warning– so if product X blocked all 100 phish, it would score 200.
Q: Why’d you decide that a block should score twice as much as a warn?
A: Users have increasingly become conditioned to ignoring security warnings. In our view, stopping someone from going to a potentially dangerous site is better than suggesting that they not do it.
Q: What URLs did you use?
A: We gathered 100 phish for the tests; we did this by using several data feeds, scanning them using regular expressions, and then manually culling out the real phish. We tested each phish by hand to make sure that it was still live before running our tests, then we manually tested each phish in each technology and scored the results. Each phish was tested within 48 hours of its arrival to make sure it was fresh (or is that “phresh”?) See appendices A and B of the report for a complete list. For the known-good URLs, we took a set of 500 randomly selected URLs from our data feeds, then manually checked them to make sure they weren’t 404.
Q: Why didn’t you test <my favorite product>?
A: We had to take a snapshot of available products at a point in time. We couldn’t test all of the products, and we couldn’t go back and re-do the tests every time one of the technologies got updated. For example, EarthLink released an update to ScamBlocker during our test period, Mozilla released Firefox 2.0 (which includes anti-phishing features) recently, and Microsoft has updated IE 7 twice since the tests. Because phish have such a short lifetime, we couldn’t go back and re-run the tests.