Not finding code -- “False Negatives” -- is the Problem
When you scan a code base to discover unknown open source, the difference between finding a match and missing it can have significant legal ramifications, i.e., re-using open source code and not complying with its license obligations has been the basis of many lawsuits. And what determines whether you find the code or not depends on the database used to identify code matches (similar to matching fingerprints against a database) and techniques applied, including advanced string search and fuzzy matching for a comprehensive OSS discovery process. If a match is missed because the database doesn’t have the code, or techniques for discovery lack sophistication, a “false negative” results. Not finding matches, a false negative, gives you a false sense of security: you may think you’re code is clean when it’s not.
Ideally the database contains all the known open source code so that “matches” can indeed be made. It only makes sense that the database must be both constantly updated and also have a long history of crawling forges – older sites may disappear but the code still gets shared -- and it must also include data from crawling the thousands of sites around the Internet that host code in order to positively match code. If it’s not in the database, and if techniques for its discovery don’t exist, it can’t be identified as part of your company’s audit. Simple concept, right?
Some vendors with lesser databases than the Black Duck® KnowledgeBase™ try to distract customers from the real issue of “false negatives” by suggesting that finding fewer matches is a benefit! They say their solution produces “fewer false positives.” How can a positive be false? There is no such thing as a false positive – they are all matches. And if you don’t make a match because your database doesn’t have the code, you won’t even know it.
False negatives are a lawsuit waiting to happen, and are the result of thinness of a knowledge base – for example, the Black Duck Knowledgbase has 10x the source code for an equivalent number of FOSS projects than what competitors publicly report. So, what are the odds that their scanners – invented both more recently and populated with only 10% of known code – will miss something? How will you know you had a “false negative” result?
Black Duck’s comprehensive KnowledgeBase combined with our advanced scanning technology has a high degree of precision for discovering open source code. And while our solution can find more code than competitors – files of course but also snippets of code for developers who copy and paste – we also provide precision filtering capabilities that allow customers to configure their systems to dial up or down the level of sensitivity commensurate with their needs and risk tolerance.
We think you’d rather be careful than careless. And to be careful you need to know what’s in your code. A match is a match -- the real issue is false negatives.