Using CodeQL variant analysis to find format string vulnerabilities - Part 2 ( Taint Analysis )

In our previous post we have seen examples on how we can perform simple analysis with codeql to detect format string vulnerabilities. There are couple of issues with the previous queries we wrote. Le us take an example where the data that is passed to a printf() call is hardcoded, hence the attacker has no control over it. In that case we would end up with too much of false positives. So this is where taint analysis will come useful. Here is the code that we are going to analyze here. 

 


Also let us try to exploit and see which are the scenarios in which they are vulnerable



So as per my input %08x.%08x , there are 3 cases where my input will cause a format string attack. So my model should be able to find these 3 paths that are potentially exploitable.
 
Next let us try to model taint flow of source to sink in which my source will be the user's input variable and my sink will be a flow where the user's input variable will be passed as the first parameter to the printf(user's input variable )


To write this query, I reused a part of the code from a stackoverflow question , 

https://stackoverflow.com/questions/58164464/semmle-ql-tainttracking-hasflow-problem-with-sources-that-taint-their-argumen

import cpp
import semmle.code.cpp.dataflow.TaintTracking
import DataFlow::PathGraph


predicate checkUserInput(Expr expr) {
    exists(FunctionCall fc, int i | verifythe2ndArgument(fc, i) and expr = fc.getArgument(i))
}
 
predicate verifythe2ndArgument(FunctionCall functionCall, int arg) {
    functionCall.getTarget().hasGlobalName("scanf") and
    arg = 1 // example, scanf("%s",userinput), userinput is being tainted here
}

class Config extends TaintTracking::Configuration {
    Config() { this = "Finding format string vulnerabilities" }
   
    override predicate isSource(DataFlow::Node source) {
        checkUserInput(source.asExpr())
        or
        checkUserInput(source.asDefiningArgument())
    }
    override predicate isSink(DataFlow::Node sink) {
        exists (FunctionCall fc | fc.getArgument(0)=sink.asExpr() and fc.getTarget().hasName("printf"))
  }
}

from Config cfg, DataFlow::PathNode source, DataFlow::PathNode sink
where cfg.hasFlowPath(source, sink)
select source.getNode().getLocation(), sink.getNode().getLocation(), "userinput flow to printf"


The query helps me to identify the same 3 potential location of format string vulnerability which we discussed above. The column [0] being the SOURCE and [1] being the SINK



Again this is not 100% fail proof as we only considered taking the tainted data from scanf input variable but not other possible input, or functions, etc. I will continue to improve the codeql query as I discover and learn more about it. Please let me know if you find any mistakes.

References

  • https://stackoverflow.com/questions/58164464/semmle-ql-tainttracking-hasflow-problem-with-sources-that-taint-their-argumen
  • https://help.semmle.com/QL/ql-training/cpp/data-flow-cpp.html#5