Fixing Old Bugs

There has been a bug that has bothered me for the last 10 years. I first encountered this bug on 10.4 and it involves the man page view "info". If you want to follow along at home this is a very simple crash to reproduce. Open a new terminal window, type in 'info whois' (or the name of any other program that has a man page), then resize the terminal window to be very small. If you are to then expand the window back out to a normal size you will see that info has crashed.

Reproducing these steps with lldb attached produces this:

* thread #1: tid = 0x96b4bb, 0x000000010af487c7 info`___lldb_unnamed_function7$$info + 1291, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
frame #0: 0x000000010af487c7 info`___lldb_unnamed_function7$$info + 1291
info`___lldb_unnamed_function7$$info + 1291:
 -> 0x10af487c7:  movq   (%r15), %rdi
    0x10af487ca:  movl   $0x1b, %esi
    0x10af487cf:  callq  0x10af62a36               ; symbol stub for: strchr
    0x10af487d4:  testq  %rax, %rax

This snippet of assembly can be translated into the following C code:

strchr(%r15, '\033')

This is trying to scan a cstring for any occurrences of the escape character '\033' (hex representation: 0x1b).

On Mavericks (10.9), the shipped version of info is 4.8, which was published in 2004. I downloaded the old source code to this to see if I could find the exact crash a bit easier.

After building it from source I repeated the steps to cause the crash and this time lldb greeted me with the following:

* thread #1: tid = 0x96d28e, 0x000000010c4f4a7d ginfo`display_update_one_window(win=0x00007f8391d03d40) + 1149 at display.c:303, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
frame #0: 0x000000010c4f4a7d ginfo`display_update_one_window(win=0x00007f8391d03d40) + 1149 at display.c:303
    300                  happen if the window is shrunk very small.)  */
    301               if ((entry && entry->inverse)
    302               /* Need to erase the line if it has escape sequences.  */
 -> 303               || (raw_escapes_p && strchr (entry->text, '\033') != 0))
    304                 {
    305                   terminal_goto_xy (0, line_index + win->first_row);
    306                   terminal_clear_to_eol ();
    

This matches the assembly given by the first crash using the version shipped by Apple. The issue is glaring, this conditional statement has two parts that are OR'd but are dependent on each other to be successful.

  1. Checking if the local variable entry is not NULL, and then if the entry is inverted.
  2. Checking if there are escapes and if the entry text contains any escape characters.

Since they are OR'd, even if the first part of the conditional fails on "entry" not being NULL, it will still try to check the text contents of the entry and result in a NULL dereference. To verify this, I looked at entry in lldb and sure enough:

(lldb) p entry
(DISPLAY_LINE *) $124 = 0x0000000000000000

There is the null pointer which will be dereferenced and cause our crash. Now this seems like a straight forward fix, by separating out the conditional logic to evaluate the entry value before any other part we avoid the dereference and crash entirely. However, since 4.8 is from 2004, it might be better to check the latest shipped version for this crash instead and then file a bug report against apple to update the binary to stop this crash.

The latest official version is 5.2, which was published in September 2013. I downloaded the source code to that to see if this was fixed. After repeating the steps I found something new:

* thread #1: tid = 0x970058, 0x000000010a717db5 ginfo`window_scan_line(win=0x00007fb6f9503db0, line=0, phys=0, fun=0x000000010a718000, closure=0x00007fb6f9503db0) + 69 at window.c:1788, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
frame #0: 0x000000010a717db5 ginfo`window_scan_line(win=0x00007fb6f9503db0, line=0, phys=0, fun=0x000000010a718000, closure=0x00007fb6f9503db0) + 69 at window.c:1788
    1785              void *closure)
    1786    {
    1787      mbi_iterator_t iter;
 -> 1788      long cpos = win->line_starts[line] - win->node->contents;
    1789      int delim = 0;
    1790      char *endp;
    1791      
    

Same crash reason, but caused by different code this time. It is possible that since it had gone through a major version update that it could have entirely new code. Another bad memory access, so first thing to check is the variable win.

(lldb) p win
(WINDOW *) $125 = 0x00007fb6f9503db0

That looks ok, but to make sure we are looking at a valid pointer, we should print the dereferenced value in the debugger to be sure, then check what we are accessing on that line.

(lldb) p *win
(WINDOW) $126 = {
    next = 0x0000000000000000
    prev = 0x0000000000000000
    width = 97
    height = 0
    first_row = 0
    goal_column = 18446744073709551615
    keymap = 0x00007fb6fa01d200
    node = 0x00007fb6f9700370
    pagetop = 27
    point = 1000
    line_map = {
        node = 0x00007fb6f9700370
        nline = 0
        size = 80
        used = 0
        map = 0x00007fb6f96025c0
    }
    modeline = 0x00007fb6f9700990 "-----Info: (*manpages*)whois, 144 lines --18%----------------------------------------------------"
    line_starts = 0x0000000000000000
    line_count = 0
    log_line_no = 0x00007fb6fb003600
    flags = 5
}

Again, another null-dereference, this time from the member line_starts. Now we are in the most recent version of code fixes can be added to stop these crashes from happening. By adding a new conditional check to the start of this function:

####Bug #1

Original:

int
window_scan_line (WINDOW *win, int line, int phys,
      void (*fun) (void *closure, long cpos, size_t replen),
      void *closure)
{
    mbi_iterator_t iter;
    long cpos = win->line_starts[line] - win->node->contents;
    int delim = 0;
    char *endp;
    /*
    ...
    */
    return cpos;
}

Patch:

int
window_scan_line (WINDOW *win, int line, int phys,
      void (*fun) (void *closure, long cpos, size_t replen),
      void *closure)
{
    mbi_iterator_t iter;
    long cpos = 0;
    if (win->line_starts != NULL) { // the new check to stop null-dereference
        cpos = win->line_starts[line] - win->node->contents;
        /*
        ...
        */
    }
    return cpos;
}

This has fixed the first bug. To verify that it has been successfully patched, we are going to run through the steps again.

* thread #1: tid = 0x9723b1, 0x00007fff8aeec866 libsystem_kernel.dylib`__pthread_kill + 10, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
frame #0: 0x00007fff8aeec866 libsystem_kernel.dylib`__pthread_kill + 10
libsystem_kernel.dylib`__pthread_kill + 10:
 -> 0x7fff8aeec866:  jae    0x7fff8aeec870            ; __pthread_kill + 20
    0x7fff8aeec868:  movq   %rax, %rdi
    0x7fff8aeec86b:  jmp    0x7fff8aee9175            ; cerror_nocancel
    0x7fff8aeec870:  retq   

Well, this time it is crashing outside of the scope of the process. Running a backtrace on the crash should reveal more information:

(lldb) bt
* thread #1: tid = 0x9723b1, 0x00007fff8aeec866 libsystem_kernel.dylib`__pthread_kill + 10, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
  * frame #0: 0x00007fff8aeec866 libsystem_kernel.dylib`__pthread_kill + 10
    frame #1: 0x00007fff8e70035c libsystem_pthread.dylib`pthread_kill + 92
    frame #2: 0x00007fff8ad44b1a libsystem_c.dylib`abort + 125
    frame #3: 0x00007fff92fa607f libsystem_malloc.dylib`free + 411
    frame #4: 0x000000010bf2d043 ginfo`window_new_screen_size(width=0, height=0) + 323 at window.c:141
    frame #5: 0x000000010bf2aba3 ginfo`reset_info_window_sizes + 83 at signals.c:174
    frame #6: 0x000000010bf2aa42 ginfo`info_signal_proc(sig=<unavailable>) + 258 at signals.c:276
    frame #7: 0x00007fff8e5ca5aa libsystem_platform.dylib`_sigtramp + 26
    frame #8: 0x00007fff8aeed9f1 libsystem_kernel.dylib`read + 9
    frame #9: 0x00007fff784e9420 libsystem_c.dylib`__strerror_ebuf + 16
    frame #10: 0x000000010bf229ac ginfo`info_read_and_dispatch + 204 at session.c:215
    frame #11: 0x000000010bf22887 ginfo`display_startup_message_and_start [inlined] info_session + 31 at session.c:175
    frame #12: 0x000000010bf22868 ginfo`display_startup_message_and_start + 40 at session.c:166
    frame #13: 0x000000010bf1a56e ginfo`single_file(argc=<unavailable>, filename=<unavailable>, argv=<unavailable>) + 798 at info.c:325
    frame #14: 0x000000010bf19ef5 ginfo`main(argc=<unavailable>, argv=0x00007fff53cefba0) + 2037 at info.c:727
    frame #15: 0x00007fff8e8c15fd libdyld.dylib`start + 1
    

The backtrace shows that the problem is being caused in frame 4, and the resulting frames hint at an over-release problem. Jumping back to frame 4 in the debugger seals it.

(lldb) frame select 4
frame #4: 0x000000010bf2d043 ginfo`window_new_screen_size(width=0, height=0) + 323 at window.c:141
    139               windows->height = 0;
    140               free (windows->line_starts);
 -> 141               free (windows->log_line_no);
    142               windows->line_starts = NULL;
    143               windows->line_count = 0;
    144               break;
    

By checking the "windows" variable in that scope:

(lldb) p *windows
(WINDOW) $128 = {
    next = 0x0000000000000000
    prev = 0x0000000000000000
    width = 105
    height = 0
    first_row = 0
    goal_column = 18446744073709551615
    keymap = 0x00007f8355006a00
    node = 0x00007f8353d00370
    pagetop = 22
    point = 33214047251857408
    line_map = {
        node = 0x00007f8353d00370
        nline = 0
        size = 80
        used = 0
        map = 0x00007f8353d058d0
    }
    modeline = 0x00007f8353e00410 "-----Info: (*manpages*)whois, 144 lines --15%------------------------------------------------------------"
    line_starts = 0x0000000000000000
    line_count = 0
    log_line_no = 0x00007f8355800600
    flags = 5
}

When windows->line_starts is NULL, then the value assigned to windows->log_line_no seems to not be allocated, just assigned a reference. Attempting to free a non-allocated pointer is a bad idea, so it looks like there is some more NULL checks that must be added.

####Bug #2

Original:

windows->height = 0;
free (windows->line_starts);
free (windows->log_line_no);
windows->line_starts = NULL;
windows->line_count = 0;
break;

Patch:

windows->height = 0;
if (windows->line_starts) {
    free (windows->line_starts);
    windows->line_starts = NULL;
    free (windows->log_line_no);
}
windows->log_line_no = NULL;
windows->line_count = 0;
break;

This code fixes two bugs that are basically the same root cause. The crash is caused when windows->line_starts is NULL, the windows->log_line_no isn't allocated memory (just a reference to zero) and will cause a crash when freed.

Running it again with these patches causes yet another set of crashes:

Crash #1:

* thread #1: tid = 0x98265e, 0x0000000104338d91 ginfo`display_node_text(closure=<unavailable>, pline_index=6, lline_index=6, src_line=0x00007f9b60f019b0, printed_line=0x00007f9b60c04640, pl_index=75, pl_count=<unavailable>) + 81 at display.c:138, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
frame #0: 0x0000000104338d91 ginfo`display_node_text(closure=<unavailable>, pline_index=6, lline_index=6, src_line=0x00007f9b60f019b0, printed_line=0x00007f9b60c04640, pl_index=75, pl_count=<unavailable>) + 81 at display.c:138
    135          the line from the screen first.  Why, I don't know.
    136          (But don't do this if we have no visible entries, as can
    137          happen if the window is shrunk very small.)  */
 -> 138           if (entry->inverse
    139           /* Need to erase the line if it has escape sequences.  */
    140           || (raw_escapes_p && mbschr (entry->text, '\033') != 0))
    141         {
    

Crash #2:

* thread #1: tid = 0x982d2a, 0x0000000106974dd1 ginfo`display_node_text(closure=<unavailable>, pline_index=6, lline_index=140735211775544, src_line=0x0000000000000000, printed_line=0x00007fd56b6007c0, pl_index=0, pl_count=<unavailable>) + 145 at display.c:145, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
frame #0: 0x0000000106974dd1 ginfo`display_node_text(closure=<unavailable>, pline_index=6, lline_index=140735211775544, src_line=0x0000000000000000, printed_line=0x00007fd56b6007c0, pl_index=0, pl_count=<unavailable>) + 145 at display.c:145
    142           terminal_goto_xy (0, win->first_row + pline_index);
    143           terminal_clear_to_eol ();
    144           entry->inverse = 0;
 -> 145           entry->text[0] = '\0';
    146           entry->textlen = 0;
    147         }
    

This is the result of a fall-through case in entry->inverse. The struct DISPLAY_LINE is defined as:

typedef struct {
    char *text;         /* Text of the line as it appears. */
    int textlen;            /* Printable Length of TEXT. */
    int inverse;            /* Non-zero means this line is inverse. */
} DISPLAY_LINE;

However the member inverse is only ever set to 0 or 1, never any other number. The check on line 138 in the first crash is a bad access of entry->inverse, and the second crash is a result of the first conditional expression of that if statement passing due to the value of entry->inverse being something other than zero. An explicit check against the value of inverse will mitigate these crashes.

####Bug #3

Original:

/* If the screen line is inversed, then we have to clear
 the line from the screen first.  Why, I don't know.
 (But don't do this if we have no visible entries, as can
 happen if the window is shrunk very small.)  */
  if (entry->inverse
  /* Need to erase the line if it has escape sequences.  */
  || (raw_escapes_p && mbschr (entry->text, '\033') != 0))
{

Patch:

/* If the screen line is inversed, then we have to clear
 the line from the screen first.  Why, I don't know.
 (But don't do this if we have no visible entries, as can
 happen if the window is shrunk very small.)  */
  if (entry->inverse == 1
  /* Need to erase the line if it has escape sequences.  */
  || (raw_escapes_p && mbschr (entry->text, '\033') != 0))
{
    

With the fourth bug patched it is directly onto the next crasher:

* thread #1: tid = 0x98caa1, 0x00007fff8e5caa46 libsystem_platform.dylib`_platform_strchr + 38, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
frame #0: 0x00007fff8e5caa46 libsystem_platform.dylib`_platform_strchr + 38
libsystem_platform.dylib`_platform_strchr + 38:
 -> 0x7fff8e5caa46:  movdqa (%rdi), %xmm2
    0x7fff8e5caa4a:  pcmpeqb %xmm2, %xmm1
    0x7fff8e5caa4e:  pcmpeqb %xmm0, %xmm2
    0x7fff8e5caa52:  por    %xmm1, %xmm2
(lldb) bt
* thread #1: tid = 0x98caa1, 0x00007fff8e5caa46 libsystem_platform.dylib`_platform_strchr + 38, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
    * frame #0: 0x00007fff8e5caa46 libsystem_platform.dylib`_platform_strchr + 38
      frame #1: 0x000000010a6818d8 ginfo`mbschr(string=0x20612d2020202020, c=<unavailable>) + 616 at mbschr.c:51
      frame #2: 0x000000010a662db2 ginfo`display_node_text(closure=<unavailable>, pline_index=6, lline_index=<unavailable>, src_line=<unavailable>, printed_line=0x00007f8eda504680, pl_index=0, pl_count=<unavailable>) + 114 at display.c:140
      frame #3: 0x000000010a67f75b ginfo`process_node_text(win=0x00007f8eda403750, start=<unavailable>, do_tags=1, fun=0x000000010a662d40, closure=0x00007fff5559e420) + 1259 at window.c:1646
      frame #4: 0x000000010a662c44 ginfo`display_update_one_window(win=0x00007f8eda403750) + 164 at display.c:238
      frame #5: 0x000000010a662b75 ginfo`display_update_display(window=<unavailable>) + 69 at display.c:82
      frame #6: 0x000000010a67bba3 ginfo`reset_info_window_sizes [inlined] redisplay_after_signal + 48 at signals.c:160
      frame #7: 0x000000010a67bb73 ginfo`reset_info_window_sizes + 83 at signals.c:175
      frame #8: 0x000000010a67ba12 ginfo`info_signal_proc(sig=<unavailable>) + 258 at signals.c:276
      frame #9: 0x00007fff8e5ca5aa libsystem_platform.dylib`_sigtramp + 26
      frame #10: 0x00007fff8aeed9f1 libsystem_kernel.dylib`read + 9
      frame #11: 0x00007fff784e9420 libsystem_c.dylib`__strerror_ebuf + 16
      frame #12: 0x000000010a67397c ginfo`info_read_and_dispatch + 204 at session.c:215
      frame #13: 0x000000010a673857 ginfo`display_startup_message_and_start [inlined] info_session + 31 at session.c:175
      frame #14: 0x000000010a673838 ginfo`display_startup_message_and_start + 40 at session.c:166
      frame #15: 0x000000010a66b53e ginfo`single_file(argc=<unavailable>, filename=<unavailable>, argv=<unavailable>) + 798 at info.c:325
      frame #16: 0x000000010a66aec5 ginfo`main(argc=<unavailable>, argv=0x00007fff5559eba0) + 2037 at info.c:727
      frame #17: 0x00007fff8e8c15fd libdyld.dylib`start + 1
      

Now back to the original bug with using strchr to check for escape characters in the text. This time instead of a NULL dereference the cstring pointer being handed to it is an invalid memory address. Walking back through the frames reveals some more info:

(lldb) frame select 1
frame #1: 0x000000010a6818d8 ginfo`mbschr(string=0x20612d2020202020, c=<unavailable>) + 616 at mbschr.c:51
    48            return NULL;
    49          }
    50        else
 -> 51          return strchr (string, c);
    52      }
(lldb) frame select 2
frame #2: 0x000000010a662db2 ginfo`display_node_text(closure=<unavailable>, pline_index=8, lline_index=<unavailable>, src_line=<unavailable>, printed_line=0x00007f8eda504680, pl_index=0, pl_count=<unavailable>) + 114 at display.c:140
    137          happen if the window is shrunk very small.)  */
    138           if (entry->inverse == 1
    139           /* Need to erase the line if it has escape sequences.  */
 -> 140           || (raw_escapes_p && mbschr (entry->text, '\033') != 0))
    141         {
    142           terminal_goto_xy (0, win->first_row + pline_index);
    143           terminal_clear_to_eol ();
(lldb) frame select 3
frame #3: 0x000000010a67f75b ginfo`process_node_text(win=0x00007f8eda403750, start=<unavailable>, do_tags=1, fun=0x000000010a662d40, closure=0x00007fff5559e420) + 1259 at window.c:1646
    1645        
 -> 1646          rc = fun (closure, line_index, logline_index,
    1647                mbi_cur_ptr (iter) - in_index,
    1648                printed_line, pl_index, pl_count);
    1649                
    

Inside of frame 2, the variable entry is defined as:

struct display_node_closure *dn = closure;
WINDOW *win = dn->win;
DISPLAY_LINE **display = dn->display;
DISPLAY_LINE *entry = display[win->first_row + pline_index];

In the case of this crash, win->first_row == 0 and pline_index == 8. This would be the contents of closure in frame 2, corresponding to the struct:

(struct display_node_closure *)closure =>

struct display_node_closure {
    WINDOW *win;
    DISPLAY_LINE **display;
};

The contents of the member display is an array of pointers to instances of DISPLAY_LINE:

typedef struct {
    char *text;         /* Text of the line as it appears. */
    int textlen;            /* Printable Length of TEXT. */
    int inverse;            /* Non-zero means this line is inverse. */
} DISPLAY_LINE;

When looking at these contents in memory:

[index] (DISPLAY_LINE **)
[0]     40 00 E0 4A AD 7F 00 00 => 0x00007FAD4AE00040
[1]     C0 00 E0 4A AD 7F 00 00 => 0x00007FAD4AE000C0
[2]     40 01 E0 4A AD 7F 00 00 => 0x00007FAD4AE00140
[3]     C0 01 E0 4A AD 7F 00 00 => 0x00007FAD4AE001C0
[4]     40 02 E0 4A AD 7F 00 00 => 0x00007FAD4AE00240
[5]     C0 02 E0 4A AD 7F 00 00 => 0x00007FAD4AE002C0
[6]     40 03 E0 4A AD 7F 00 00 => 0x00007FAD4AE00340
[7]     00 00 00 00 00 00 00 00 => 0x0, signifying the end of the array

Index 0: (Offset 0x00007FAD4AE00040)
50 00 E0 4A AD 7F 00 00 => 0x00007FAD4AE00050 (offset of the text member)
4D 00 00 00 => textlen == 77
00 00 00 00 => inverse == 0
20 20 20 20 20 20 20 20 20 20 20 20 20 4E 4F 54 45 21 20 20 54 68 65 20 72 65 67 69 73 74 72 61 74 69 6F 6E 20 6F 66 20 74 68 65 73 65 20 64 6F 6D 61 69 6E 73 20 69 73 20 6E 6F 77 20 64 6F 6E 65 20 62 79 20 61 20 6E 75 6D 62 65 72 00 
    => text == "             NOTE!  The registration of these domains is now done by a number"

Index 1: (Offset 0x00007FAD4AE000C0)
D0 00 E0 4A AD 7F 00 00 => 0x00007FAD4AE000D0 (offset of the text member)
4D 00 00 00 => textlen == 77
00 00 00 00 => inverse == 0
20 20 20 20 20 20 20 20 20 20 20 20 20 6F 66 20 69 6E 64 65 70 65 6E 64 65 6E 74 20 61 6E 64 20 63 6F 6D 70 65 74 69 6E 67 20 72 65 67 69 73 74 72 61 72 73 2E 20 20 54 68 69 73 20 64 61 74 61 62 61 73 65 20 68 6F 6C 64 73 20 6E 6F 00 
    => text == "             of independent and competing registrars.  This database holds no"

Index 2: (Offset 0x00007FAD4AE00140)
50 01 E0 4A AD 7F 00 00 => 0x00007FAD4AE00150 (offset of the text member)
4A 00 00 00 => textlen == 74
00 00 00 00 => inverse == 0
20 20 20 20 20 20 20 20 20 20 20 20 20 69 6E 66 6F 72 6D 61 74 69 6F 6E 20 6F 6E 20 64 6F 6D 61 69 6E 73 20 72 65 67 69 73 74 65 72 65 64 20 62 79 20 6F 72 67 61 6E 69 7A 61 74 69 6F 6E 73 20 6F 74 68 65 72 20 74 68 61 6E 00
    => text == "             information on domains registered by organizations other than"

Index 3: (Offset 0x00007FAD4AE001C0)
D0 01 E0 4A AD 7F 00 00 => 0x00007FAD4AE001D0 (offset of the text member)
4B 00 00 00 => textlen == 75
00 00 00 00 => inverse == 0
20 20 20 20 20 20 20 20 20 20 20 20 20 4E 65 74 77 6F 72 6B 20 53 6F 6C 75 74 69 6F 6E 73 2C 20 49 6E 63 2E 20 20 41 6C 73 6F 2C 20 6E 6F 74 65 20 74 68 61 74 20 74 68 65 20 49 6E 74 65 72 4E 49 43 20 64 61 74 61 62 61 73 65 00
    => text == "             Network Solutions, Inc.  Also, note that the InterNIC database"

Index 4: (Offset 0x00007FAD4AE00240)
50 02 E0 4A AD 7F 00 00 => 0x00007FAD4AE00250 (offset of the text member)
4C 00 00 00 => textlen == 76
00 00 00 00 => inverse == 0
20 20 20 20 20 20 20 20 20 20 20 20 20 28 77 68 6F 69 73 2E 69 6E 74 65 72 6E 69 63 2E 6E 65 74 29 20 69 73 20 6E 6F 20 6C 6F 6E 67 65 72 20 68 61 6E 64 6C 65 64 20 62 79 20 4E 65 74 77 6F 72 6B 20 53 6F 6C 75 74 69 6F 6E 73 2C 00
    => text == "             (whois.internic.net) is no longer handled by Network Solutions,"

Index 5: (Offset 0x00007FAD4AE002C0)
D0 02 E0 4A AD 7F 00 00 => 0x00007FAD4AE002D0 (offset of the text member)
3D 00 00 00 => textlen == 61
00 00 00 00 => inverse == 0
20 20 20 20 20 20 20 20 20 20 20 20 20 49 6E 63 2E 20 20 46 6F 72 20 64 65 74 61 69 6C 73 2C 20 73 65 65 20 68 74 74 70 3A 2F 2F 77 77 77 2E 69 6E 74 65 72 6E 69 63 2E 6E 65 74 2F 2E 00
    => text == "             Inc.  For details, see http://www.internic.net/."

Index 6: (Offset 0x00007FAD4AE00340)
50 03 E0 4A AD 7F 00 00 => 0x00007FAD4AE00350 (offset of the text member)
00 00 00 00 => textlen == 0
00 00 00 00 => inverse == 0
00
    => text == "" (empty string)
    

Referencing how we access the current entry:

DISPLAY_LINE *entry = display[win->first_row + pline_index]; // => display[8];

This puts the DISPLAY_LINE* entry beyond the bounds of the array and instead, accessing the pointer to the first string entry contents 0x00007FAD4AE00050. This results in a valid pointer but of the wrong type, resulting in a bad memory access when it performs strchr against the first 8 bytes the string rather than accessing the 8 bytes that would be the pointer to the string contents.

####Bug #4

Original:

struct display_node_closure *dn = closure;
WINDOW *win = dn->win;
DISPLAY_LINE **display = dn->display;
DISPLAY_LINE *entry = display[win->first_row + pline_index];

Patch:

struct display_node_closure *dn = closure;
WINDOW *win = dn->win;
DISPLAY_LINE **display = dn->display;
int index_count = 0;
while (display[index_count] != NULL) {
    index_count++;
}
int entry_index = win->first_row + pline_index;
if (entry_index > index_count) {
    return 0;
}

DISPLAY_LINE *entry = display[entry_index];

This patch is a serious hack, but lacking any way to reliably check the number of indexed entries in the array it is the only thing I can come up with to ensure the valid indexing.

Repeating the steps turns up another crash:

* thread #1: tid = 0x99ec07, 0x0000000108031c62 ginfo`display_update_one_window(win=0x00007faa20d03ce0) + 210 at display.c:264, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
frame #0: 0x0000000108031c62 ginfo`display_update_one_window(win=0x00007faa20d03ce0) + 210 at display.c:264
    261             DISPLAY_LINE *entry = display[win->first_row + line_index];
    262     
    263             /* If this line has text on it then make it go away. */
 -> 264             if (entry && entry->textlen)
    265             {
    266                 entry->textlen = 0;
    267                 entry->text[0] = '\0';

This is another invalid indexing bug.

####Bug #5

Original:

for (; line_index < win->height; line_index++)
{
    DISPLAY_LINE *entry = display[win->first_row + line_index];

    /* If this line has text on it then make it go away. */
    if (entry && entry->textlen)
    {
        entry->textlen = 0;
        entry->text[0] = '\0';

        terminal_goto_xy (0, win->first_row + line_index);
        terminal_clear_to_eol ();
    }
}

Patch:

for (; line_index < win->height; line_index++)
{
    int index_count = 0;
    while (display[index_count] != NULL) {
        index_count++;
    }
    int entry_index = win->first_row + line_index;
    if (entry_index > index_count) {
        break;
    }
    DISPLAY_LINE *entry = display[entry_index];

    /* If this line has text on it then make it go away. */
    if (entry && entry->textlen)
    {
        entry->textlen = 0;
        entry->text[0] = '\0';

        terminal_goto_xy (0, win->first_row + line_index);
        terminal_clear_to_eol ();
    }
}

In this case the value stored in win->height could be -1. This results in a very large unsigned number, causing invalid indexes and bad pointer dereferences to take place. Implementing the same hacked solution as before results in no more invalid indexing.

Another Crash:

* thread #1: tid = 0x9a72ee, 0x000000010bfbbca7 ginfo`display_update_one_window(win=0x00007fe43a5036e0) + 311 at display.c:292, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0xc)
frame #0: 0x000000010bfbbca7 ginfo`display_update_one_window(win=0x00007fe43a5036e0) + 311 at display.c:292
    289     
    290             /* This display line must both be in inverse, and have the same
    291             contents. */
 -> 292             if ((!display[line_index]->inverse) ||
    293                 (strcmp (display[line_index]->text, win->modeline) != 0))
    294             {
    295                 terminal_goto_xy (0, line_index);
    

Again another NULL dereference to fix:

####Bug #6

Original:

if ((!display[line_index]->inverse) ||
    (strcmp (display[line_index]->text, win->modeline) != 0))

Patch:

if (display[line_index] != NULL && (display[line_index]->inverse == 0 ||
    (strcmp (display[line_index]->text, win->modeline) != 0)))

These invalid indexing bugs are caused by the window height using an incorrect number. There are a lot of patches to find the cause of this.

####Bug #7

Original:

next->height--;

Patch:

if (next->height != 0) {
    next->height--;
}

Original:

prev->height--;

Patch:

if (prev->height != 0) {
    prev->height--;
}

Original:

active_window->height = the_screen->height - 1 - the_echo_area->height;

Patch:

active_window->height = the_screen->height - (the_screen->height > 2 ? 1 - the_echo_area->height : 0);

Original:

 if (win->height == delta_each)
    win->height -= (1 + the_echo_area->height);
 

Patch:

if (win->height == delta_each && win->height > 2)
    win->height -= (1 + the_echo_area->height);

Original:

if (win->height <= 0 || win->width <= 0 || display_inhibited)
    return;
    

Patch:

if (win->height <= 0 || win->height > INT32_MAX || win->width <= 0 || display_inhibited)
    return;
    

For final thoughts on this, the comment just above the past line of patched code reads:

/* If the window has no height, or display is inhibited, quit now.
Strictly speaking, it should only be necessary to test if the
values are equal to zero, since window_new_screen_size should
ensure that the window height/width never becomes negative, but
since historically this has often been the culprit for crashes, do
our best to be doubly safe.  */

At this point there is still a lingering a crash or two that I have yet to be able to trigger while info is attached in lldb. However now the mere act if resizing the window does not trigger a fatal crash immediately.

###Conclusion:

I don't feel like I have truly fixed anything from this. I have exposed a number of bugs related to poor string handling and made a poor attempt at holding back a tide of undefined behavior due to unindexed arrays. These bug fixes may actually cause more harm by being introduced than was caused by the original crashing bug. What started as a simple endeavor turned into an exercise in yak shaving and proof of murphy's law.

What made this all the more difficult wasn't the code syntax/formatting, or language, or even the age of the code itself. Lack of any architectural understanding was made this extremely difficult to fix. Most of these errors dealt with storing a negative value as the window height and resulting undefined behavior from that because it was using a unsigned integer to store a signed integer value. The window height value is used to compute the number of stored lines of text to display, which might not be accurate to the actual number of indexed values in the array. No bounds-checking on this results in more problems due to accessing incorrect memory addresses and crashes. Simply put, many of these issues could have been avoided entirely by implementing safe coding practices in the first place.



If this blog post was helpful to you, please consider donating to keep this blog alive, thank you!

donate to support this blog