Tuesday, October 12, 2010

Scanning Memory Objects with Yara

Below are some Python inserts to enable yara scanning of in-memory objects while parsing something, like a PDF. This particular example enables Yara signature scanning of parsed, filtered PDF objects via Didier Steven's PDF-Parser.

[...imports...]

import yara
import mmap
rules = yara.compile('path/rulefile')

[...parsing code...]

############################ yara insert around line 558
############################ just before
######### print ' %s' % FormatOutput(filtered, options.raw)

memmap=mmap.mmap(-1,len(filtered))
memmap.write(filtered)
memmap.seek(0)
matches = rules.match(data=memmap.read(len(filtered)))
memmap.close()
for m in matches:
__ print ' yara: %s' % (m)

##################################

[...resume Didier's code...]

print ' %s' % FormatOutput(filtered, options.raw)

1 comment:

pinowudi said...

I codified it and made a couple other fixes. Here is the patch to enable yara scanning for Didier's PDF Parser v 0.3.7


35a36
> 2010/10/12: Added YARA bindings
51a53,54
> import yara
> import mmap
68a72,76
> rules = yara.compile(filepaths={
> 'sig1':'embedded_object_signature_set',
> 'sig2':'document_structure_signature_set'
> })
>
556a565,566
> if options.yara:
> YaraScan(filtered)
558a569,570
> if options.yara:
> YaraScan(object.Stream(False))
564a577,588
> def YaraScan(data):
> memmap=mmap.mmap(-1,len(data))
> memmap.write(data)
> memmap.seek(0)
> matches = rules.match(data=memmap.read(len(data)))
> memmap.close()
> if matches != []:
> print "<< YARA RESULTS"
> for m in matches:
> print ' yara: %s' % (m)
> print ">>"
>
621c645,649
< return binascii.unhexlify(''.join([c for c in data if c not in ' \t\n\r']).rstrip('>'))
---
> try:
> return binascii.unhexlify(''.join([c for c in data if c not in ' \t\n\r']).rstrip('>'))
> except:
> ### Added to fix unhexifly fail on miscoded malware
> return binascii.unhexlify(''.join([c for c in data if c not in ' \t\n\r']).rstrip('>\x00'))
744a773
> oParser.add_option('-y', '--yara', default=False, help='scan with yara')