Android稳定性之native crash —— 之debuggerd(tombstoned in Android P)|Gaozhipeng's Blog

Android稳定性之native crash —— 之debuggerd(tombstoned in Android P)

前言：

说到Android中的native crash，肯定是绕不开debuggerd守护进程的，但在Android P上，Android移除了debuggerd的守护进程，改为使用tombstoned来进行socket通信。不过大致流程是没有变化的，只是对其中的一些位置进行了重构。
针对进程出现的不同状态，Linux Kernel会发送对应的signal给异常进程，捕获signal并对其做对应的处理.

程序分为两种：
1.静态链接: 由链接器在链接时将库的内容添加到可执行程序中的做法，最大缺点是生成的可执行文件太大，需要更多的系统资源，在装入内存也会消耗更多的时间。
2.动态链接: 将独立的模块先编译成动态库，程序运行有需要它们时才加载，最大缺点是可执行程序依赖分别存储的库文件才能正确执行。
在Android中，大部分都是动态链接，而只有init等少部分是静态链接，因此native程序也是动态链接程序，动态链接程序是需要链接器才能跑起来，liner就是Android的链接器。
liner也是在程序的进程空间内，当内核将应用程序加载起来后，并不是先跑应用程序代码，而是先跑liner。linker负责将应用程序所需的库加载到进程空间内，之后才跑应用程序代码。

tombstoned的启动

init在启动的时候，读取tombstoned.rc创建对应的3个socket来等待crashdump的接入

//system/core/debuggerd/tombstoned/tombstoned.rc
service tombstoned /system/bin/tombstoned
    user tombstoned
    group system

    # Don't start tombstoned until after the real /data is mounted.
    class late_start

    socket tombstoned_crash seqpacket 0666 system system
    socket tombstoned_intercept seqpacket 0666 system system
    socket tombstoned_java_trace seqpacket 0666 system system
    writepid /dev/cpuset/system-background/tasks

而具体的启动socket代码，可以查看tombstoned.cpp，代码量不大也不深。

action的注册

在Android p之后，应用启动还是需要从linker中开始，所以默认注册信号处理函数的代码位置改为了bionic/linker/liner_main.cpp其中调用debuggerd_init方法注册，代码如下：

// linker_main.cpp
static ElfW(Addr) linker_main(KernelArgumentBlock& args, const char* exe_to_load) {
    #ifdef __ANDROID__
  debuggerd_callbacks_t callbacks = {
    .get_abort_message = []() {
      return __libc_shared_globals()->abort_msg;
    },
    .post_dump = &notify_gdb_of_libraries,
  };
  debuggerd_init(&callbacks);
#endif
}

debuggerd_init方法会执行信号处理函数的注册：

//system/core/debuggerd/handler/debuggerd_handler.cpp
void debuggerd_init(debuggerd_callbacks_t* callbacks) {
    ...
  struct sigaction action;
  memset(&action, 0, sizeof(action));
  sigfillset(&action.sa_mask);
  action.sa_sigaction = debuggerd_signal_handler;
  action.sa_flags = SA_RESTART | SA_SIGINFO;

  action.sa_flags |= SA_ONSTACK;
  debuggerd_register_handlers(&action);
}

可以看到对应的处理handler是debuggerd_signal_handler,注册的信号是在register_handlers中

//system/core/debuggerd/include/debuggerd/handler.h
static void __attribute__((__unused__)) debuggerd_register_handlers(struct sigaction* action) {
  sigaction(SIGABRT, action, nullptr);
  sigaction(SIGBUS, action, nullptr);
  sigaction(SIGFPE, action, nullptr);
  sigaction(SIGILL, action, nullptr);
  sigaction(SIGSEGV, action, nullptr);
#if defined(SIGSTKFLT)
  sigaction(SIGSTKFLT, action, nullptr);
#endif
  sigaction(SIGSYS, action, nullptr);
  sigaction(SIGTRAP, action, nullptr);
  sigaction(DEBUGGER_SIGNAL, action, nullptr);
}

可以看到，通过sigaction方法，注册的接受信号有如上。

debuggerd handler的调用

所以在发生错误的时候，处理流程：
1.应用的默认信号处理函数debuggerd_signal_handler被调用，执行线程是出问题的当前线程。

// system/core/debuggerd/handler/debuggerd_handler.cpp
// Handler that does crash dumping by forking and doing the processing in the child.
// Do this by ptracing the relevant thread, and then execing debuggerd to do the actual dump.
static void debuggerd_signal_handler(int signal_number, siginfo_t* info, void* context) {
    // 打印fatal日志
      log_signal_summary(info);
    // clone 子进程
  pid_t child_pid =
    clone(debuggerd_dispatch_pseudothread, pseudothread_stack,
          CLONE_THREAD | CLONE_SIGHAND | CLONE_VM | CLONE_CHILD_SETTID | CLONE_CHILD_CLEARTID,
          &thread_info, nullptr, nullptr, &thread_info.pseudothread_tid);
      // Wait for the child to start...
  futex_wait(&thread_info.pseudothread_tid, -1);

  // and then wait for it to terminate.
  futex_wait(&thread_info.pseudothread_tid, child_pid);
}

从注释中也可以看到，在clone出子进程后，需要等待子进程执行结束才能开始继续。

debuggerd_dispatch_pseudothread

//system/core/debuggerd/handler/debuggerd_handler.cpp
static int debuggerd_dispatch_pseudothread(void* arg) {
    pid_t crash_dump_pid = __fork();
    execle(CRASH_DUMP_PATH, CRASH_DUMP_NAME, main_tid, pseudothread_tid, debuggerd_dump_type,
           nullptr, nullptr);  
}

通过clone出来的子进程，执行debuggerd_dispatch_pseudothread方法，通过execle函数，执行crash_dump程序。

crash_dump

//system/core/debuggerd/crash_dump.cpp
int main(int argc, char** argv) {
    DefuseSignalHandlers(); // 取消对应的signal的捕捉，避免dump自己
    pid_t forkpid = fork();
    if (forkpid == -1) {
        PLOG(FATAL) << "fork failed";
    } else if (forkpid == 0) {
        fork_exit_read.reset();
    } else {
        fork_exit_write.reset();
        char buf;
        TEMP_FAILURE_RETRY(read(fork_exit_read.get(), &buf, sizeof(buf)));
        _exit(0);
    }
    // 解析传递过来的目标进程pid等参数
    Initialize(argv);
    ParseArgs(argc, argv, &pseudothread_tid, &dump_type);
    ...
        for (pid_t thread : threads) {
            // ptrace注入
            if (!ptrace_seize_thread(target_proc_fd, thread, &error)) {
                bool fatal = thread == g_target_thread;
                LOG(fatal ? FATAL : WARNING) << error;
            }
            if (thread == g_target_thread) {
                // 读取应用的寄存器等信息，最终汇总所有异常信息，包括机型版本，ABI，信号，寄存器，backtrace等，
                ReadCrashInfo(input_pipe, &siginfo, &info.registers, &abort_msg_address,
                            &fdsan_table_address);
                info.siginfo = &siginfo;
                info.signo = info.siginfo->si_signo;
            } else {
                info.registers.reset(unwindstack::Regs::RemoteGet(thread));
                if (!info.registers) {
                PLOG(WARNING) << "failed to fetch registers for thread " << thread;
                ptrace(PTRACE_DETACH, thread, 0, 0);
                continue;
                }
            }

            {
                // 连接到tombstone进程 输出文件
                g_tombstoned_connected =
                    tombstoned_connect(g_target_thread, &g_tombstoned_socket, &g_output_fd, dump_type);
            }

            if (fatal_signal) {
                if (thread_info[target_process].thread_name != "system_server") {
                    activity_manager_notify(target_process, signo, amfd_data);
                }
            }
        }
}

crashdump中所做的操作：
1.关闭当前crashdump线程的sigcatch
2.通过ptrace的方式注入到目标进程
3.获取并收集到对应的crashinfo
4.连接到tombstone socket输出信息 5.通过socket通知ams进行对应的app crash的操作

总结：

如图，debuggerd的action的注册如下。因为程序是采用动态链接的方式，在启动的时候，就已经通过linker的方式注册了对应的native crash的signal处理handler，流程如下：应用程序发生了崩溃之后，注册的handler会处理对应的注册的signal，处理方式也分为2步：
1.抓取log日志和backtrace
2.通知ams进行后续的app层操作 debuggerd_handler