I am working on a Python CLI replacement for DirBuster. As part of this, I needed to learn the basics of Python Threading/Queueing. Below you will find a brief explanation of what Threads and Queues are, and some sample code that I will explain.
What is a Thread?
To answer that, we must first understand that Processes are, "independent execution units that contain their own state information, use their own address spaces, and only interact with each other via interprocess communication mechanisms (generally managed by the operating system)." On the other hand, a thread is, "a coding construct that doesn't affect the architecture of an application. A single process might contains multiple threads; all threads within a process share the same state and same memory space, and can communicate with each other directly, because they share the same variables." (See: http://www.cafeaulait.org/course/week11/02.html)
What is a queue?
A queue is a data structure that is used "when information must be exchanged safely between multiple threads." (See: http://docs.python.org/library/queue.html, http://en.wikipedia.org/wiki/Queue_(data_structure)) They are a First In First Out (FIFO) data structure.(Python also says they have a LIFO queue, but this is technically a "Stack".)
What is your example?
In our example below, we are going to find directories that exist on a webserver by brute force. To do this, we request a URL like "http://www.example.com/foo/" and see if it responds with a 200 OK. If it does, then we know that a directory exists at that address. If I were to make these requests 1 at a time, it could take quite awhile. But, if I had 10 threads all making requests at the same time, we could speed things up significantly.
Let's go through some sample code bit by bit and discuss how I use threads and queues. (All code is from simple-bust.py, found at the bottom of the page.) I am only going to point out the structure for threading and queueing. If you have any questions regarding urllib2 or any other area of the script, please refer to the python documentation.
Import our modules. We will need sys to exit, urllib2 to make the requests, and all the rest for making/using threads.
import Queue
import signal
import sys
import threading
import urllib2
Next we make our thread object. This inherits from threading.Thread. You'll notice that all the action is inside of the run function. Everything that you want a thread to do needs to be contained in the run function.
class ThreadDir(threading.Thread):
"""Thread to request a directory and print to screen if 200 recieved"""
def __init__(self, host, dir_queue, user_agent):
threading.Thread.__init__(self)
self.host = host
self.dir_queue = dir_queue
self.user_agent = user_agent
def run(self):
opener = urllib2.build_opener()
opener.addheaders = [('User-agent', self.user_agent)]
while True:
# Get dir from dir_queue
dir = self.dir_queue.get()
# Form the URL
url = '%s/%s/' % (host, dir)
# Request the URL or return "None" if a 4XX is returned
try:
request = opener.open(url)
except urllib2.HTTPError:
request = None
# Check the return code and print to screen if not 404
if request:
print '%-70s-- %s' % (url, request.getcode())
# Signal to queue that the job is done
self.dir_queue.task_done()
Two important thread/queue items to point out, from the code above, are "self.dir_queue.get()" and "self.dir_queue.task_done(). The first is used to request from the queue the next bit of information that needs to be processed.(Like a directory name for us to brute force.) The second is used to tell the queue that we have successfully processed the last bit of information that it gave us. Pretty simple huh!
Setting up the threads and the queue is also very simple. You will see below that we just instantiate Queue.Queue. In the example we are giving it a maxsize because of some issues I was having with threads being very noisy if the script was interrupted. If you do not pass any options to Queue.Queue it gives you a FIFO queue with no max size. Really easy. We will talk about how we add things to the queue in just a bit.
def main(host, dirs, user_agent, threads):
signal.signal(signal.SIGINT, signal_handler)
dir_queue = Queue.Queue(maxsize=threads - 1)
# Spawn all threads and pass host and dir_queue
for i in range(threads):
t = ThreadDir(host, dir_queue, user_agent)
t.daemon = True
t.start()
# Add all dirs to the queue
for dir in dirs:
if not dir[0] == '#':
dir_queue.put(dir.strip())
# Wait for this queue to finish processing
dir_queue.join()
The threads are equally as easy. We already made the object, now we just pass it the info it needs, say that it is a daemon, and start it. Piece of cake. Setting a thread's daemon to True means that if the rest of the script exits, this thread will not block, so the program will not hang until the thread is complete. And start actually starts the thread's execution.
Now to finish up the queue. To add things to the queue, we just pass info into dir_queue.put(). It doesn't get any more simple than that! In this example we just start throwing directories into the queue until it fills up. When there is a new spot in the queue, it will add another directory in. When it completes putting directories into the queue we don't want our application to exit, thus killing all our daemonized threads. To prevent this we say "dir_queue.join()" This simply blocks the program until the queue is empty. Handy.
Okay, now for that signal.signal thing up there at the top of the function. This is just to set python up to listen for a ctrl+c on the command line. Without this and the accompanying function below, ctrl+c wouldn't be recognized and we couldn't stop our scan if wanted. (And usually we don't want to wait 24 hours for our dirbuster to finish.) If you have any question about how this functions, consult the python documentation. (http://docs.python.org/library/signal.html)
def signal_handler(signal, frame):
print '\nScan aborted'
sys.exit(0)
You will code below to actually execute our script. We pass it a host, a DirBuster directory list, google's User-Agent, and the number of threads we want to start.
if __name__ == '__main__':
host = 'http://scanme.nmap.org'
dirs = open('small.txt', 'r')
user_agent = \
'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html'
threads = 10
main(host, dirs, user_agent, threads)
Does it work? Well, I timed it on my machine, and to do about 400 requests took roughly 120 seconds with a single threaded script. This script was able to make 400 requests in about 12 seconds. Sounds about right for 10 threads.
Although this has been a very basic example, you can see that using threads and queues in python is quite simple. You didn't even need to use twisted. For another example, and an example that uses multiple queues, reference this wonderful article that started me off: http://www.ibm.com/developerworks/aix/library/au-threadingpython/ .
Stay tuned for a more feature rich/complete python DirBuster replacement tool release!
Complete Code --
#!/usr/bin/env python
# simple-bust.py - A very simple directory bruteforcer
# Copyright (C) 2011 Michael Monsivais
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
import Queue
import signal
import sys
import threading
import urllib2
class ThreadDir(threading.Thread):
"""Thread to request a directory and print to screen if 200 recieved"""
def __init__(self, host, dir_queue, user_agent):
threading.Thread.__init__(self)
self.host = host
self.dir_queue = dir_queue
self.user_agent = user_agent
def run(self):
opener = urllib2.build_opener()
opener.addheaders = [('User-agent', self.user_agent)]
while True:
# Get dir from dir_queue
dir = self.dir_queue.get()
# Form the URL
url = '%s/%s/' % (host, dir)
# Request the URL or return "None" if a 4XX is returned
try:
request = opener.open(url)
except urllib2.HTTPError:
request = None
# Check the return code and print to screen if not 404
if request:
print '%-70s-- %s' % (url, request.getcode())
# Signal to queue that the job is done
self.dir_queue.task_done()
def main(host, dirs, user_agent, threads):
dir_queue = Queue.Queue(maxsize=threads - 1)
# Spawn all threads and pass host and dir_queue
for i in range(threads):
t = ThreadDir(host, dir_queue, user_agent)
t.daemon = True
t.start()
# Add all dirs to the queue
for dir in dirs:
if not dir[0] == '#':
dir_queue.put(dir.strip())
# Wait for this queue to finish processing
dir_queue.join()
def signal_handler(signal, frame):
print '\nScan aborted'
sys.exit(0)
if __name__ == '__main__':
signal.signal(signal.SIGINT, signal_handler)
host = 'http://scanme.nmap.org'
dirs = open('small.txt', 'r')
user_agent = \
'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html'
threads = 10
main(host, dirs, user_agent, threads)