<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">

  <title><![CDATA[Category: mozilla | Aaron Klotz's Software Blog]]></title>
  <link href="https://dblohm7.ca/blog/categories/mozilla/atom.xml" rel="self"/>
  <link href="https://dblohm7.ca/"/>
  <updated>2023-06-30T14:17:29-06:00</updated>
  <id>https://dblohm7.ca/</id>
  <author>
    <name><![CDATA[Aaron Klotz]]></name>
    
  </author>
  <generator uri="http://octopress.org/">Octopress</generator>

  
  <entry>
    <title type="html"><![CDATA[All Good Things...]]></title>
    <link href="https://dblohm7.ca/blog/2021/08/13/all-good-things/"/>
    <updated>2021-08-13T14:00:00-06:00</updated>
    <id>https://dblohm7.ca/blog/2021/08/13/all-good-things</id>
    <content type="html"><![CDATA[<p>Today is my final day as an employee of Mozilla Corporation.</p>

<p>My first patch landed in Firefox 19, and my final patch as an employee has
landed in Nightly for Firefox 93.</p>

<p>I&rsquo;ll be moving on to something new in a few weeks&#8217; time, but for now, I&rsquo;d just
like to say this:</p>

<p>My time at Mozilla has made me into a better software developer, a better
leader, and more importantly, a better person.</p>

<p>I&rsquo;d like to thank all the Mozillians whom I have interacted with over the years
for their contributions to making that happen.</p>

<p>I will continue to update this blog with catch-up posts describing my Mozilla
work, though I am unsure what content I will be able to contribute beyond that.
Time will tell!</p>

<p>Until next time&hellip;</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[2019 Roundup: Part 1 - Porting the DLL Interceptor to AArch64]]></title>
    <link href="https://dblohm7.ca/blog/2021/03/01/2019-roundup-part-1/"/>
    <updated>2021-03-01T12:50:00-07:00</updated>
    <id>https://dblohm7.ca/blog/2021/03/01/2019-roundup-part-1</id>
    <content type="html"><![CDATA[<p>In my continuing efforts to get caught up on discussing my work, I am now
commencing a roundup for 2019. I think I am going to structure this one
slightly differently from the last one: I am going to try to segment this
roundup by project.</p>

<p>Here is an index of all the entries in this series:</p>

<ul>
<li><a href="https://dblohm7.ca/blog/2021/03/01/2019-roundup-part-1/">Part 1 - Porting the DLL Interceptor to AArch64</a> (this post)</li>
</ul>


<h2>Porting the DLL Interceptor to AArch64</h2>

<p>During early 2019, Mozilla was working to port Firefox to run on the new
AArch64 builds of Windows. At our December 2018 all-hands, I brought up the
necessity of including the DLL Interceptor in our porting efforts. Since no deed
goes unpunished, I was put in charge of doing the work! [<em>I&rsquo;m actually kidding
here; this project was right up my alley and I was happy to do it! &ndash; Aaron</em>]</p>

<p>Before continuing, you might want to review my <a href="https://dblohm7.ca/blog/2019/01/23/2018-roundup-q2-part1/">previous entry</a>
describing the Great Interceptor Refactoring of 2018, as this post revisits some
of the concepts introduced there.</p>

<p>Let us review some DLL Interceptor terminology:</p>

<ul>
<li>The <em>target function</em> is the function we want to hook (Note that this is a
distinct concept from a <em>branch target</em>, which is also discussed in this post);</li>
<li>The <em>hook function</em> is our function that we want the intercepted target function
to invoke;</li>
<li>The <em>trampoline</em> is a small chunk of executable code generated by the DLL
interceptor that facilitates calling the target function&rsquo;s original implementation.</li>
</ul>


<p>On more than one occasion I had to field questions about why this work was
even necessary for AArch64: there aren&rsquo;t going to be many injected DLLs in a
Win32 ecosystem running on a shiny new processor architecture! In fact, the DLL
Interceptor is used for more than just facilitating the blocking of injected
DLLs; we also use it for other purposes.</p>

<p>Not all of this work was done in one bug: some tasks were more urgent than
others. I began this project by enumerating our extant uses of the interceptor to
determine which instances were relevant to the new AArch64 port. I threw a record
of each instance into a colour-coded spreadsheet, which proved to be very useful
for tracking progress: Reds were &ldquo;must fix&rdquo; instances, yellows were &ldquo;nice to have&rdquo;
instances, and greens were &ldquo;fixed&rdquo; instances. Coordinating with the milestones
laid out by program management, I was able to assign each instance to a bucket
which would help determine a total ordering for the various fixes. I landed the
first set of changes in <a title="nsWindowsDllInterceptor porting to aarch64" href="https://bugzilla.mozilla.org/show_bug.cgi?id=1526383">bug 1526383</a>, and the second set in <a title="ARM64: nsWindowsDllInterceptor support for Milestone 4 (accessibility)" href="https://bugzilla.mozilla.org/show_bug.cgi?id=1532470">bug 1532470</a>.</p>

<p>It was now time to sit down, download some AArch64 programming manuals, and
take a look at what I was dealing with. While I have been messing around with
x86 assembly since I was a teenager, my first exposure to RISC architectures was
via the <a href="https://en.wikipedia.org/wiki/DLX">DLX architecture</a> introduced by
Hennessy and Patterson in their textbooks. While DLX was crafted specifically
for educational purposes, it served for me as a great point of reference. When
I was a student taking CS 241 at the University of Waterloo, we had to write a
toy compiler that generated DLX code. That experience ended up saving me a lot
of time when looking into AArch64! While the latter is definitely more
sophisticated, I could clearly recognize analogs between the two architectures.</p>

<p>In some ways, targeting a RISC architecture greatly simplifies things: The
DLL Interceptor only needs to concern itself with a small subset of the AArch64
instruction set: loads and branches. In fact, the DLL Interceptor&rsquo;s AArch64
disassembler only looks for <a href="https://searchfox.org/mozilla-central/rev/362676fcadac37f9f585141a244a9a640948794a/mozglue/misc/interceptor/Arm64.cpp#53">nine distinct instructions</a>!
As a bonus, since the instruction length is fixed, we can easily copy over
verbatim any instructions that are not loads or branches!</p>

<p>On the other hand, one thing that <em>increased</em> complexity of the port is that
some branch instructions to relative addresses have maximum offsets. If we must
branch farther than that maximum, we must take alternate measures. For example,
in AArch64, an unconditional branch with an immediate offset must land in the
range of &plusmn;128 MiB from the current program counter.</p>

<p>Why is this a problem, you ask? Well, Detours-style interception must overwrite
the first several instructions of the target function. To write an absolute jump,
we require at least 16 bytes: 4 for an <code>LDR</code> instruction, 4 for a <code>BR</code>
instruction, and another 8 for the 64-bit absolute branch target address.</p>

<p>Unfortunately, target functions may be <em>really short</em>! Some of the target
functions that we need to patch consist only of a single 4-byte instruction!</p>

<p>In this case, our only option for patching the target is to use an immediate <code>B</code>
instruction, but that only works if our hook function falls within that &plusmn;128MiB
limit. If it does not, we need to construct a <em>veneer</em>. A veneer is a special
trampoline whose location falls within the target range of a branch instruction.
Its sole purpose is to provide an unconditional jump to the &ldquo;real&rdquo; desired
branch target that lies outside of the range of the original branch. Using
veneers, we can successfully hook a target function even if it is only one
instruction (ie, 4 bytes) in length, and the hook function lies more than 128MiB
away from it. The AArch64 Procedure Call Standard specifies <code>X16</code> as a volatile
register that is explicitly intended for use by veneers: veneers load an
absolute target address into <code>X16</code> (without needing to worry about whether or
not they&rsquo;re clobbering anything), and then unconditionally jump to it.</p>

<h3>Measuring Target Function Instruction Length</h3>

<p>To determine how many instructions the target function has for us to work with,
we make two passes over the target function&rsquo;s code. The first pass simply counts
how many instructions are available for patching (up to the 4 instruction
maximum needed for absolute branches; we don&rsquo;t really care beyond that).</p>

<p>The second pass actually populates the trampoline, builds the veneer (if
necessary), and patches the target function.</p>

<h3>Veneer Support</h3>

<p>Since the DLL interceptor is already well-equipped to build trampolines, it did
not take much effort to add support for <a href="https://searchfox.org/mozilla-central/rev/362676fcadac37f9f585141a244a9a640948794a/mozglue/misc/interceptor/Arm64.h#193">constructing veneers</a>.
However, <em>where</em> to write out a veneer is just as important as <em>what</em> to write
to a veneer.</p>

<p>Recall that we need our veneer to reside within &plusmn;128 MiB of an immediate
branch. Therefore, we need to be able to exercise some control over where
the trampoline memory for veneers is allocated. Until this point, our trampoline
allocator had no need to care about this; I had to add this capability.</p>

<h4>Adding Range-Aware VM Allocation</h4>

<p>Firstly, I needed to make the <code>MMPolicy</code> classes range-aware: we need to be able
to allocate trampoline space within acceptable distances from branch instructions.</p>

<p>Consider that, as described above, a branch instruction may have limits on the
extents of its target. As data, this is easily formatted as a <em>pivot</em> (ie, the
PC at the location where the branch instruction is encoutered), and a maximum
<em>distance</em> in either direction from that pivot.</p>

<p>On the other hand, range-constrained memory allocation tends to work in terms
of lower and upper bounds. I wrote a conversion method, <code>MMPolicyBase::SpanFromPivotAndDistance</code>,
to convert between the two formats. In addition to format conversion, this method
also constrains resulting bounds such that they are above the 1MiB mark of the
process&#8217; address space (to avoid reserving memory in VM regions that are
sensitive to compatibility concerns), as well as below the maximum allowable
user-mode VM address.</p>

<p>Another issue with range-aware VM allocation is determining the location, within
the allowable range, for the actual VM reservation. Ideally we would like the
kernel&rsquo;s memory manager to choose the best location for us: its holistic view of
existing VM layout (not to mention ASLR) across all processes will provide
superior VM reservations. On the other hand, the Win32 APIs that facilitate this
are specific to Windows 10. When available, <code>MMPolicyInProcess</code> uses <a href="https://docs.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-virtualalloc2"><code>VirtualAlloc2</code></a>
and <code>MMPolicyOutOfProcess</code> uses <a href="https://docs.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-mapviewoffile3"><code>MapViewOfFile3</code></a>.
When we&rsquo;re running on Windows versions where those APIs are not yet available,
we need to fall back to finding and reserving our own range. The
<code>MMPolicyBase::FindRegion</code> method handles this for us.</p>

<p>All of this logic is wrapped up in the <code>MMPolicyBase::Reserve</code> method. In
addition to the desired VM size and range, the method also accepts two functors
that wrap the OS APIs for reserving VM. <code>Reserve</code> uses those functors when
available, otherwise it falls back to <code>FindRegion</code> to manually locate a suitable
reservation.</p>

<p>Now that our memory management primatives were range-aware, I needed to shift my
focus over to our VM sharing policies.</p>

<p>One impetus for the Great Interceptor Refactoring was to enable separate
Interceptor instances to share a unified pool of VM for trampoline memory.
To make this range-aware, I needed to make some additional changes to
<code>VMSharingPolicyShared</code>. It would no longer be sufficient to assume that we
could just share a single block of trampoline VM &mdash; we now needed to make the
shared VM policy capable of potentially allocating multiple blocks of VM.</p>

<p><code>VMSharingPolicyShared</code> now contains a mapping of ranges to VM blocks. If we
request a reservation which an existing block satisfies, we re-use that block.
On the other hand, if we require a range that is yet unsatisfied, then we need to
allocate a new one. I admit that I kind of half-assed the implementation of the
data structure we use for the mapping; I was too lazy to implement a fully-fledged
interval tree. The current implementation is probably &ldquo;good enough,&rdquo; however
it&rsquo;s probably worth fixing at some point.</p>

<p>Finally, I added a new generic class, <code>TrampolinePool</code>, that acts as an
abstraction of a reserved block of VM address space. The main interceptor code
requests a pool by calling the VM sharing policy&rsquo;s <code>Reserve</code> method, then it
uses the pool to retrieve new <code>Trampoline</code> instances to be populated.</p>

<h3>AArch64 Trampolines</h3>

<p>It is much simpler to generate trampolines for AArch64 than it is for x86(-64).
The most noteworthy addition to the <code>Trampoline</code> class is the <code>WriteLoadLiteral</code>
method, which writes an absolute address into the trampoline&rsquo;s literal pool,
followed by writing an <code>LDR</code> instruction referencing that literal into the
trampoline.</p>

<hr />

<p>Thanks for reading! Coming up next time: My Untrusted Modules Opus.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[2018 Roundup: H2 - Preparing to Enable the Launcher Process by Default]]></title>
    <link href="https://dblohm7.ca/blog/2021/02/24/2018-roundup-h2/"/>
    <updated>2021-02-24T17:30:00-07:00</updated>
    <id>https://dblohm7.ca/blog/2021/02/24/2018-roundup-h2</id>
    <content type="html"><![CDATA[<p><em>This is the fifth post in my &ldquo;2018 Roundup&rdquo; series. For an index of all entries, please see my
blog entry for <a href="https://dblohm7.ca/blog/2019/01/18/2018-roundup-q1/">Q1</a>.</em></p>

<p>Yes, you are reading the dates correctly: I am posting this over two years after I began this series.
I am trying to get caught up on documenting my past work!</p>

<h3>CI and Developer Tooling</h3>

<p>Given that the launcher process completely changes how our Win32 Firefox builds
start, I needed to update both our CI harnesses, as well as the launcher process
itself. I didn&rsquo;t do much that was particularly noteworthy from a technical
standpoint, but I will mention some important points:</p>

<p>During normal use, the launcher process usually exits immediately after the
browser process is confirmed to have started. This was a deliberate design
decision that I made. Having the launcher process wait for the browser process
to terminate would not do any harm, however I did not want the launcher process
hanging around in Task Manager and being misunderstood by users who are checking
their browser&rsquo;s resource usage.</p>

<p>On the other hand, such a design completely breaks scripts that expect to start
Firefox and be able to synchronously wait for the browser to exit before
continuing! Clearly I needed to provide an opt-in for the latter case, so I added
the <code>--wait-for-browser</code> command-line option. The launcher process also implicitly
enables this mode under a few <a href="https://searchfox.org/mozilla-central/rev/31a3457890b5698af1277413ee9d9bd6c5955183/browser/app/winlauncher/LauncherProcessWin.cpp#92">other scenarios</a>.</p>

<p>Secondly, there is the issue of debugging. Developers were previously used to
attaching to the first <code>firefox.exe</code> process they see and expecting to be debugging
the browser process. With the launcher process enabled by default, this is no
longer the case.</p>

<p>There are few options here:</p>

<ul>
<li>Visual Studio users may install the <a href="https://devblogs.microsoft.com/devops/introducing-the-child-process-debugging-power-tool/">Child Process Debugging Power Tool</a>,
which enables the VS debugger to attach to child processes;</li>
<li>WinDbg users may start their debugger with the <code>-o</code> command-line flag,
or use the <code>Debug child processes also</code> checkbox in the GUI;</li>
<li>I added support for a <code>MOZ_DEBUG_BROWSER_PAUSE</code> environment variable, which
allows developers to set a timeout (in seconds) for the browser process to
print its pid to <code>stdout</code> and wait for a debugger attachment.</li>
</ul>


<h3>Performance Testing</h3>

<p>As I have alluded to in previous posts, I needed to measure the effect of adding
an additional process to the critical path of Firefox startup. Since in-process
testing will not work in this case, I needed to use something that could provide
a holistic view across both launcher and browser processes. I decided to enhance
our existing <code>xperf</code> suite in Talos to support my use case.</p>

<p>I already had prior experience with <code>xperf</code>; I spent a significant part of 2013
working with Joel Maher to put the <code>xperf</code> Talos suite into production. I also
knew that the existing code was not sufficiently generic to be able to handle my
use case.</p>

<p>I threw together a rudimentary <a href="https://github.com/dblohm7/xperf">analysis framework</a>
for working with CSV-exported xperf data. Then, after Joel&rsquo;s review, I vendored
it into <code>mozilla-central</code> and used it to construct an analysis for startup time.
[<em>While a more thorough discussion of this framework is definitely warranted, I
also feel that it is tangential to the discussion at hand; I&rsquo;ll write a dedicated
blog entry about this topic in the future. &ndash; Aaron</em>]</p>

<p>In essence, the analysis considers the following facts when processing an xperf recording:</p>

<ul>
<li>The launcher process will be the first <code>firefox.exe</code> process that runs;</li>
<li>The browser process will be started by the launcher process;</li>
<li>The browser process will fire a <a href="https://searchfox.org/mozilla-central/source/toolkit/components/startup/mozprofilerprobe.mof">session store window restored</a> event.</li>
</ul>


<p>For our analysis, we needed to do the following:</p>

<ol>
<li>Find the event showing the first <code>firefox.exe</code> process being created;</li>
<li>Find the session store window restored event from the second <code>firefox.exe</code> process;</li>
<li>Output the time interval between the two events.</li>
</ol>


<p><a href="https://searchfox.org/mozilla-central/rev/31a3457890b5698af1277413ee9d9bd6c5955183/testing/talos/talos/xtalos/parse_xperf.py#36">This block of code</a>
demonstrates how that analysis is specified using my analyzer framework.</p>

<p>Overall, these test results were quite positive. We saw a very slight but
imperceptible increase in startup time on machines with solid-state drives,
however the security benefits from the launcher process outweigh this very small
regression.</p>

<p>Most interestingly, we saw a signficant <em>improvement</em> in startup time on Windows
10 machines with magnetic hard disks! As I mentioned in Q2 Part 3, I believe
this improvement is due to reduced hard disk seeking thanks to the launcher
process forcing <code>\windows\system32</code> to the front of the dynamic linker&rsquo;s search
path.</p>

<h3>Error and Experimentation Readiness</h3>

<p>By Q3 I had the launcher process in a state where it was built by default into
Firefox, but it was still opt-in. As I have written previously, we needed the
launcher process to gracefully fail even without having the benefit of various
Gecko services such as preferences and the crash reporter.</p>

<h4>Error Propagation</h4>

<p>Firstly, I created a new class, <a href="https://searchfox.org/mozilla-central/rev/31a3457890b5698af1277413ee9d9bd6c5955183/widget/windows/WinHeaderOnlyUtils.h#73"><code>WindowsError</code></a>,
that encapsulates all types of Windows error codes. As an aside, I would strongly
encourage all Gecko developers who are writing new code that invokes Windows APIs
to use this class in your error handling.</p>

<p><code>WindowsError</code> is currently able to store Win32 <code>DWORD</code> error codes, <code>NTSTATUS</code>
error codes, and <code>HRESULT</code> error codes. Internally the code is stored as an
<code>HRESULT</code>, since that type has encodings to support the other two. <code>WindowsError</code>
also provides a method to convert its error code to a localized string for
human-readable output.</p>

<p>As for the launcher process itself, nearly every function in the launcher
process returns a <code>mozilla::Result</code>-based type. In case of error, we return a
<code>LauncherResult</code>, which [<em>as of 2018; this has changed more recently &ndash; Aaron</em>]
is a structure containing the error&rsquo;s source file, line number, and <code>WindowsError</code>
describing the failure.</p>

<h4>Detecting Browser Process Failures</h4>

<p>While all <code>Result</code>s in the launcher process may be indicating a successful
start, we may not yet be out of the woods! Consider the possibility that the
various interventions taken by the launcher process might have somehow impaired
the browser process&#8217; ability to start!</p>

<p>To deal with this situation, the launcher process and the browser process share
code that tracks whether both processes successfully started in sequence.</p>

<p>When the launcher process is started, it checks information recorded about the
previous run. If the browser process previously failed to start correctly, the
launcher process disables itself and proceeds to start the browser process
without any of its typical interventions.</p>

<p>Once the browser has successfully started, it reflects the launcher process
state into telemetry, preferences, and <code>about:support</code>.</p>

<p>Future attempts to start Firefox will bypass the launcher process until the
next time the installation&rsquo;s binaries are updated, at which point we reset and
attempt once again to start with the launcher process. We do this in the hope
that whatever was failing in version <em>n</em> might be fixed in version <em>n + 1</em>.</p>

<p>Note that this update behaviour implies that there is no way to forcibly and
permanently disable the launcher process. This is by design: the error detection
feature is designed to prevent the browser from becoming unusable, not to provide
configurability. The launcher process is a security feature and not something
that we should want users adjusting any more than we would want users to be
disabling the capability system or some other important security mitigation. In
fact, my original roadmap for InjectEject called for eventually removing the
failure detection code if the launcher failure rate ever reached zero.</p>

<h4>Experimentation and Emergency</h4>

<p>The pref reflection built into the failure detection system is bi-directional.
This allowed us to ship a release where we ran a study with a fraction of users
running with the launcher process enabled by default.</p>

<p>Once we rolled out the launcher process at 100%, this pref also served as a
useful &ldquo;emergency kill switch&rdquo; that we could have flipped if necessary.</p>

<p>Fortunately our experiments were successful and we rolled the launcher process
out to release at 100% without ever needing the kill switch!</p>

<p>At this point, this pref should probably be removed, as we no longer need nor
want to control launcher process deployment in this way.</p>

<h4>Error Reporting</h4>

<p>When telemetry is enabled, the launcher process is able to convert its
<code>LauncherResult</code> into a ping which is sent in the background by <code>ping-sender</code>.
When telemetry is disabled, we perform a last-ditch effort to surface the error
by logging details about the <code>LauncherResult</code> failure in the Windows Event Log.</p>

<h3>In Conclusion</h3>

<p>Thanks for reading! This concludes my 2018 Roundup series! There is so much more
work from 2018 that I did for this project that I wish I could discuss, but for
security reasons I must refrain. Nonetheless, I hope you enjoyed this series.
Stay tuned for more roundups in the future!</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[2018 Roundup: Q2, Part 3 - Fleshing Out the Launcher Process]]></title>
    <link href="https://dblohm7.ca/blog/2021/01/05/2018-roundup-q2-part3/"/>
    <updated>2021-01-05T09:45:00-07:00</updated>
    <id>https://dblohm7.ca/blog/2021/01/05/2018-roundup-q2-part3</id>
    <content type="html"><![CDATA[<p><em>This is the fourth post in my &ldquo;2018 Roundup&rdquo; series. For an index of all entries, please see my
blog entry for <a href="https://dblohm7.ca/blog/2019/01/18/2018-roundup-q1/">Q1</a>.</em></p>

<p>Yes, you are reading the dates correctly: I am posting this nearly two years after I began this series.
I am trying to get caught up on documenting my past work!</p>

<p>Once I had landed the <a href="https://dblohm7.ca/blog/2021/01/04/2018-roundup-q2-part2/">skeletal implementation</a>
of the launcher process, it was time to start making it do useful things.</p>

<h3>Ensuring Medium Integrity</h3>

<p>[<em>For an overview of Windows integrity levels, check out <a href="https://docs.microsoft.com/en-us/windows/win32/secauthz/mandatory-integrity-control">this MSDN page</a> &ndash; Aaron</em>]</p>

<p>Since Windows Vista, security tokens for standard users have run at a medium integrity level (IL) by default.
When UAC is enabled, members of the <code>Administrators</code> group also run as a standard user with a medium IL, with
the additional ability of being able to &ldquo;elevate&rdquo; themselves to a high IL. When UAC is disabled, an administrator
receives a token that always runs at the high integrity level.</p>

<p>Running a process at a high IL is something that is not to be taken lightly: at that level, the process may
alter system settings and access files that would otherwise be restricted by the OS.</p>

<p>While our sandboxed content processes always run at a low IL, I believed that defense-in-depth called for ensuring
that the browser process did not run at a high IL. In particular, I was concerned about cases where elevation
might be accidental. Consider, for example, a hypothetical scenario where a system administrator is running two
open command prompts, one elevated and one not, and they accidentally start Firefox from the one that is elevated.</p>

<p>This was a perfect use case for the launcher process: it detects whether it is running at high IL, and if so,
it launches the browser with medium integrity.</p>

<p>Unfortunately some users prefer to configure their accounts to run at all times as <code>Administrator</code> with high integrity!
This is <em>terrible</em> idea from a security perspective, but it is what it is; in my experience, most users who
run with this configuration do so deliberately, and they have no interest in being lectured about it.</p>

<p>Unfortunately, users running under this account configuration will experience side-effects of the Firefox browser
process running at medium IL. Specifically, a medium IL process is unable to initiate IPC connections with a process
running at a higher IL. This will break features such as drag-and-drop, since even the administrator&rsquo;s shell processes are running
at a higher IL than Firefox.</p>

<p>Being acutely aware of this issue, I included an escape hatch for these users: I implemented a command line option
that prevents the launcher process from de-elevating when running with a high IL. I hate that I needed to do this,
but moral suasion was not going to be an effective technique for solving this problem.</p>

<h3>Process Mitigation Policies</h3>

<p>Another tool that the launcher process enables us to utilize is process mitigation options. Introduced in Windows 8,
the kernel provides several opt-in flags that allows us to add prophylactic policies to our processes in an effort to
harden them against attacks.</p>

<p>Additional flags have been added over time, so we must be careful to only set flags that are supported by the version
of Windows on which we&rsquo;re running.</p>

<p>We could have set some of these policies by calling the
<a href="https://docs.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-setprocessmitigationpolicy"><code>SetProcessMitigationPolicy</code></a> API.
Unfortunately this API is designed for a process to use on itself once it is already running. This implies that there
is a window of time between process creation and the time that the process enables its mitigations where an attack could occur.</p>

<p>Fortunately, Windows provides a second avenue for setting process mitigation flags: These flags may be set as part of
an attribute list in the <a href="https://docs.microsoft.com/en-us/windows/win32/api/winbase/ns-winbase-startupinfoexw"><code>STARTUPINFOEX</code></a>
structure that we pass into <code>CreateProcess</code>.</p>

<p>Perhaps you can now see where I am going with this: The launcher process enables us to specify process mitigation flags
for the browser process <em>at the time of browser process creation</em>, thus preventing the aforementioned window of opportunity
for attacks to occur!</p>

<p>While there are other flags that we could support in the future, the initial mitigation policy that I added was the
<a href="https://docs.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-updateprocthreadattribute"><code>PROCESS_CREATION_MITIGATION_POLICY_IMAGE_LOAD_PREFER_SYSTEM32_ALWAYS_ON</code></a>
flag. [<em>Note that I am only discussing flags applied to the browser process; sandboxed processes receive additional mitigations. &ndash; Aaron</em>]
This flag forces the Windows loader to always use the Windows <code>system32</code> directory as the first directory in its search path,
which prevents library preload attacks. Using this mitigation also gave us an unexpected performance gain on devices with
magnetic hard drives: most of our DLL dependencies are either loaded using absolute paths, or reside in <code>system32</code>. With
<code>system32</code> at the front of the loader&rsquo;s search path, the resulting reduction in hard disk seek times produced a slight but
meaningful decrease in browser startup time! How I made these measurements is addressed in a future post.</p>

<h3>Next Time</h3>

<p>This concludes the Q2 topics that I wanted to discuss. Thanks for reading! Coming up in <a href="https://dblohm7.ca/blog/2021/02/24/2018-roundup-h2/">H2</a>: Preparing to Enable the Launcher Process by Default.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[2018 Roundup: Q2, Part 2 - Implementing a Skeletal Launcher Process]]></title>
    <link href="https://dblohm7.ca/blog/2021/01/04/2018-roundup-q2-part2/"/>
    <updated>2021-01-04T15:45:00-07:00</updated>
    <id>https://dblohm7.ca/blog/2021/01/04/2018-roundup-q2-part2</id>
    <content type="html"><![CDATA[<p><em>This is the third post in my &ldquo;2018 Roundup&rdquo; series. For an index of all entries, please see my
blog entry for <a href="https://dblohm7.ca/blog/2019/01/18/2018-roundup-q1/">Q1</a>.</em></p>

<p>Yes, you are reading the dates correctly: I am posting this nearly two years after I began this series.
I am trying to get caught up on documenting my past work!</p>

<p>One of the things I added to Firefox for Windows was a new process called the &ldquo;launcher process.&rdquo;
&ldquo;Bootstrap process&rdquo; would be a better name, but we already used the term &ldquo;bootstrap&rdquo;
for our XPCOM initialization code. Instead of overloading that term and adding potential confusion,
I opted for using &ldquo;launcher process&rdquo; instead.</p>

<p>The launcher process is intended to be the first process that runs when the user starts
Firefox. Its sole purpose is to create the &ldquo;real&rdquo; browser process in a suspended state, set various
attributes on the browser process, resume the browser process, and then self-terminate.</p>

<p>In <a title="Skeletal bootstrap process" href="https://bugzilla.mozilla.org/show_bug.cgi?id=1454745">bug 1454745</a> I implemented an initial skeletal (and opt-in) implementation of the
launcher process.</p>

<p>This seems like pretty straightforward code, right? Na&iuml;vely, one could just rip a <code>CreateProcess</code>
sample off of MSDN and call it day. The actual launcher process implementation is more complicated than
that, for reasons that I will outline in the following sections.</p>

<h3>Built into <code>firefox.exe</code></h3>

<p>I wanted the launcher process to exist as a special &ldquo;mode&rdquo; of <code>firefox.exe</code>, as opposed to a distinct
executable.</p>

<h3>Performance</h3>

<p>By definition, the launcher process lies on the critical path to browser startup. I needed to be very
conscious of how we affect overall browser startup time.</p>

<p>Since the launcher process is built into <code>firefox.exe</code>, I needed to examine that executable&rsquo;s existing
dependencies to ensure that it is not loading any dependent libraries that are not actually needed
by the launcher process. Other than the essential Win32 DLLs <code>kernel32.dll</code> and <code>advapi32.dll</code> (and their
dependencies), I did not want anything else to load. In particular, I wanted to avoid loading <code>user32.dll</code>
and/or <code>gdi32.dll</code>, as this would trigger the initialization of Windows&#8217; GUI facilities, which would be a
huge performance killer. For that reason, most browser-mode library dependencies of <code>firefox.exe</code>
are either delay-loaded or are explicitly loaded via <code>LoadLibrary</code>.</p>

<h3>Safe Mode</h3>

<p>We wanted the launcher process to both respect Firefox&rsquo;s safe mode, as well as alter its behaviour
as necessary when safe mode is requested.</p>

<p>There are multiple mechanisms used by Firefox to detect safe mode. The launcher process detects
all of them except for one: Testing whether the user is holding the shift key. Retrieving keyboard
state would trigger loading of <code>user32.dll</code>, which would harm performance as I described above.</p>

<p>This is not too severe an issue in practice: The browser process itself would still detect the
shift key. Furthermore, while the launcher process may in theory alter its behaviour depending on
whether or not safe mode is requested, none of its behaviour changes are significant enough to
materially affect the browser&rsquo;s ability to start in safe mode.</p>

<p>Also note that, for serious cases where the browser is repeatedly unable to start,
the browser triggers a restart in safe mode via environment variable, which <em>is</em> a mechanism that
the launcher process honours.</p>

<h3>Testing and Automation</h3>

<p>We wanted the launcher process to behave well with respect to automated testing.</p>

<p>The skeletal launcher process that I landed in Q2 included code to pass its console handles
on to the browser process, but there was more work necessary to completely handle this case.
These capabilities were not yet an issue because the launcher process was opt-in at the time.</p>

<h3>Error Recovery</h3>

<p>We wanted the launcher process to gracefully handle failures even though, also by definition, it does not
have access to facilities that internal Gecko code has, such as preferences and the crash reporter.</p>

<p>The skeletal launcher process that I landed in Q2 did not yet utilize any special error handling
code, but this was also not yet an issue because the launcher process was opt-in at this point.</p>

<h3>Next Time</h3>

<p>Thanks for reading! Coming up in <a href="https://dblohm7.ca/blog/2021/01/05/2018-roundup-q2-part3/">Q2, Part 3</a>: Fleshing Out the Launcher Process</p>
]]></content>
  </entry>
  
</feed>
