PIN versus Self-Checking and Self-Modifying Code

I’m still playing with PIN. I thought it was cool that it could follow the flow of packed programs, and I wanted to push it a bit more. So here are a few more tests to check how PIN reacts to self-checking and self-modifying code.

A Self-Checking Function (using C function pointers)

#include <stdio.h>

void dump(void *addr_to_dump, int number_of_bytes) {
  int i;
  printf("dumping %d bytes, starting at loc_%06X", number_of_bytes, addr_to_dump);
  for(i=0; i<number_of_bytes; i++) {
    if(i%8 == 0)
      printf("\n");
    printf("%02X ", *((unsigned char*)addr_to_dump+i));
  }
  printf("\n");
}

int main() {
  dump(main, 32);
  return 0;
}

Result:

reynaudd@lhs-2:~/test/packed$ ./schello
dumping 32 bytes, starting at loc_400598
55 48 89 E5 BE 20 00 00
00 BF 98 05 40 00 E8 71
FF FF FF B8 00 00 00 00
C9 C3 90 90 90 90 90 90
reynaudd@lhs-2:~/test/packed$ pin -t ../pin-2.5-23100-gcc.4.0.0-ia32_intel64-linux/source/tools/ManualExamples/obj-intel64/itrace.so -- ./schello
dumping 32 bytes, starting at loc_400598
55 48 89 E5 BE 20 00 00
00 BF 98 05 40 00 E8 71
FF FF FF B8 00 00 00 00
C9 C3 90 90 90 90 90 90

The output is the same in the normal program and in the instrumented program. This is because the address of main() is replaced by the compiler with an immediate value, as you can see here in the following disassembly. PIN does not instrument main() directly but a copy of main(), therefore checksumming the original main() function is pointless because it is never executed.

reynaudd@lhs-2:~/test/packed$ gdb ./schello
GNU gdb 6.8-debian […]
This GDB was configured as “x86_64-linux-gnu”…
(gdb) disas main
Dump of assembler code for function main:
0x0000000000400598 <main+0>:    push   %rbp
0x0000000000400599 <main+1>:    mov    %rsp,%rbp
0x000000000040059c <main+4>:    mov    $0x20,%esi
0x00000000004005a1 <main+9>:    mov    $0x400598,%edi
0x00000000004005a6 <main+14>:    callq  0x40051c <dump>
0x00000000004005ab <main+19>:    mov    $0x0,%eax
0x00000000004005b0 <main+24>:    leaveq
0x00000000004005b1 <main+25>:    retq
End of assembler dump.
(gdb)

A Self-Checking Function (using a call/pop to get the current address)

So the next step is to get the address of the instrumented function that is really executed. I tried to do that with a call followed by a pop to get the address of the pop instruction (thanks to joe from joebox.org for giving me a hand on this one):

#include <stdio.h>

void foo() {
  int var = -1;
  asm("call bar\n\t"
      "bar: pop %%rax\n\t"
      : "=rax"(var));
  printf("0x%x\n", var);
}

int main() {
  foo();
}

Result:

reynaudd@lhs-2:~/test/packed$ ./schello2
0x4004e0
reynaudd@lhs-2:~/test/packed$ pin -t ../pin-2.5-23100-gcc.4.0.0-ia32_intel64-linux/source/tools/ManualExamples/obj-intel64/itrace.so -- ./schello2
0x4004e0

I was pretty surprised to see the same value both in the normal program and in the instrumented program. It means that PIN has sufficient control over the program to give it the value it would normally see. So, we still can’t get the address of the instrumented function that way.

A Self-Modifying Function

Now we know that PIN handles gracefully dynamically generated code and self-checking code, we still have to check how it handles self-modification. It is functionally equivalent to dynamic generation, but the technical difference might abuse PIN’s cache.

#include <stdio.h>

void foo() {
  int var = -1;

  asm ("call bar\n\t"
      "bar: pop %%rax\n\t"
      "movl $0xcafebabe, 10(%%eax)\n\t" // this is an attempt to replace 0xffffffff
                                        // with 0xcafebabe in the next instruction
      "movl $0xffffffff,%%eax\n\t"
      : "=rax"(var));
  printf("0x%x\n", var);
}
int main() {
  foo();
}

Since this program modifies itself, we must set the .code section and the program header permissions to RWX, not just RX otherwise it will segfault. Matthieu Kaczmarek pointed me to HT editor, a very handy hex editor for this purpose.

hteditor screenshot
hteditor screenshot

Result:

reynaudd@lhs-2:~/test/packed$ ./smhello
0xcafebabe
reynaudd@lhs-2:~/test/packed$ pin -t ../pin-2.5-23100-gcc.4.0.0-ia32_intel64-linux/source/tools/ManualExamples/obj-intel64/itrace.so -- ./smhello
0xffffffff

Finally, we get two different results (this means we have a reliable way to detect PIN). The difference comes from the fact that the normal program really modifies the second mov before it occurs and we see the modified output. In the instrumented program, the modification occurs on the normal program (never executed) and not in the cache, therefore we see the unmodified output. The proper way to handle this situation would be to invalidate PIN’s cache when a memory write occurs in the code section I guess.

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s