Skip to main content

Using CodeQL variant analysis to find format string vulnerabilities - Part 1

Code Review & Static Code Analysis is something that I really enjoy doing for fun and sometimes for bread and butter. CodeQL is used for variant analysis which is something like searching the codebase with a modelled code pattern. In this blog post I am going to use the following example and try to play around with CodeQL to find the exact matches against the vulnerable format string expression.  
Now using the input "%08x.%08x" , I can see there are 2 scenarios where I would successfully exploit a format string vulnerability.

Enter your name:%08x.%08x
Executing function 1
%08x.%08x
Executing function 2
1a8571fe
Executing function 3
64ed32a0.00000000  ====> Exploited 
Executing function 4
1a8571fe.1a857180  ====> Exploited
Executing function 1
I am hardcoded
Executing function 3
I am hardcoded but safe

So my objective will be to play around with CodeQL and try to write few queries that will detect the code pattern where the input is directly used in format. Hence my query should be able to detect the lines in func3(Line 16) and func4(Line 23,24) as potentially exploitable on the fact that the user controlled string is directly passed to printf() and sprintf() without any format.

  • First let us create the database: codeql database create ./database/vulnc --language="cpp" --command="gcc format_string.c -o format.o" --source-root="./app/"
  • Next I have set up the workspace by installing vscode & vscode-codeql extension
  • Next I have imported the generate database in the workspace
  • Now I am ready to begin.

Trial 1 - Find all printf & sprintf calls

import cpp

from FunctionCall f
where f.getTarget().getName().regexpMatch("(printf|sprintf)")
select f

All the occurrence of printf and sprintf is found 



Now I really don't want the results of Line 17 and Line 25  as they are directly hardcoded newlines and pose no threat.

Trial 2 - Remove all the new lines

import cpp

from FunctionCall fc
where fc.getTarget().getName().regexpMatch("(printf|sprintf)")
and not fc.getArgument(0).getValue() = "\n"
select fc

Well I got rid of the newlines and the results are down to 6



Trial 3 - Find all printf & sprintf calls where the parameter is a variable. It means no printf with hardcoded strings should be detected

import cpp

from FunctionCall fc,VariableAccess var
where fc.getTarget().getName().regexpMatch("(printf|sprintf)")
and fc.getArgument(0) = var
select fc



Hence I could detect all the location that the potential format string vulnerability that could exist.


Testing the model - Lets try another variant of format string and check if our model can detect it.

I added a new code to the existing code and also built the database for the following to verify if the model works for other versions of format string of printf.

void func5(char *str){
printf(str,"AAA");
}



While this CodeQL query looks flawless for my code, this does not fit in many cases. I read some CodeQL articles on some typical scenarios of sprintf , printf where they need to perform a flow analysis. The code also does not support all the family of printf functions. So in the next post we will talk about flow analysis.Let me know if you think any mistakes are there / improvement that can be done.


References:
https://help.semmle.com/QL/ql-support/ql-training/
https://codeql.github.com/docs/ql-language-reference/formulas/
https://codeql.github.com/docs/codeql-language-guides/analyzing-data-flow-in-cpp/

Popular posts from this blog

KringleCon : Sans Holiday Hack 2018 Writeup

SANS HOLIDAY HACK 2018 Writeup , KRINGLECON The objectives  Orientation Challenge  Directory Browsing  de Bruijn Sequences  Data Repo Analysis  AD Privilege Discovery  Badge Manipulation  HR Incident Response  Network Traffic Forensics  Ransomware Recovery  Who Is Behind It All? First I go to Bushy Evergreen and try to solve the terminal challenge . Solving it is fairly easy , Escape_Key followed by  ":q" without quotes After this we move to the kiosk and solve the questions The question were based on the themes of previous Holiday Hack Challenges. Once we answer it correctly we get the flag. For this I visited Minty Candycane and I tried to solve the terminal challenge.  The application has command injection vulnerability , so injecting a system command with the server ip allows execution of the command. So first I perform an `ls` operation to list of the directory contents , followed by a cat of t

Linux Privilege Escalation : SUID Binaries

After my OSCP Lab days are over I decided to do a little research and learn more on Privilege Escalation as it is my weak area.So over some series of blog post I am going to share with you some information of what I have learnt so far. The methods mentioned over here are not my own. This is something what I have learnt by reading articles, blogs and solving CTFs SUID - Set User ID The binaries which has suid enabled, runs with elevated privileges. Suppose you are logged in as non root user, but this suid bit enabled binaries can run with root privileges. How does a SUID Bit enable binary looks like ? -r- s r-x---  1 hack-me-bak-cracked hack-me-bak         7160 Aug 11  2015 bak How to find all the SUID enabled binaries ? hack-me-bak2@challenge02:~$ find / -perm -u=s 2>/dev/null /bin/su /bin/fusermount /bin/umount /usr/lib/openssh/ssh-keysign /usr/lib/eject/dmcrypt-get-device /usr/lib/dbus-1.0/dbus-daemon-launch-helper /usr/bin/gpasswd /usr/bin/newgrp /usr/bin

Bluetooth Low Energy : Build, Recon,Enumerate and Attack !

Introduction In this post I will try to share some information on bluetooth low energy protocol. Bluetooth Low Energy ( BLE ) is Bluetooth 4.0.It has been widely used in creating "smart" devices like bulbs that can be controlled by mobile apps, or electrical switches that can be controlled by mobile apps. The terms Low Energy refers to multiple distinctive features that is operating on low power and lower data transfer. Code BLE Internals and Working The next thing what we need to know is a profile. Now every bluetooth device can be categorized based on certain specification which makes it easy. Here we will take a close look into two profiles of Bluetooth which is specifically designed for BLE. Generic Access Profile (GAP) - This profiles describes how two BLE devices defines discovery and establishment of connection with each other. There are two types of data payload that can be used. The Advertising Data Payload and Scan Response Payload . The GAP uses br