Monday, April 20, 2020

Setting Up a Windows Development Environment for the Selenium .NET Language Bindings

One of the most common challenges I come across with users trying to work on developing the .NET language bindings is how to set up an environment on their machine to be able to build the language bindings and get them to build. When I and other committers facilitate the "Fix a Bug, Become a Committer" workshop at various Selenium Conferences, the vast majority of issues we see for people is in setting up their development environment to get the build tools working. Over the last couple of years, the Selenium project has put a fair amount of time and energy into unifying the build system across language bindings, but this is especially challenging for Windows users, as most of the tooling for open source software projects treat Windows as a second-class citizen. I've had to build and rebuild development environments often enough that I understand all of the gotchas involved with setting up such an environment, but I've never documented the process outside of my own head. This post is an attempt to provide a more definitive set of steps for setting up such an environment for folks interested in building the Selenium .NET bindings on their own.

A Couple of Preliminary Notes

First, these instructions will be customized to use Windows 10. They may work on other versions of Windows, but you should not expect the screenshots included later in this document to exactly match up with your experience.

Second, we will be running several commands from the command line. This seems like it should be table stakes for most open source developers, but there are many, many effective developers who rely solely on graphical tools to do their work, especially on Windows. If you are not comfortable with opening a command prompt and executing commands within it, you will need to become more comfortable with it working with the Selenium code base.

Third, you will need administrative privileges on your machine to configure the environment. Unfortunately, there is no way around this. I recognize this is a barrier to entry for some people, but a properly configured environment requires several tools that would not be on a typical IT department's install list.

Finally, to install the tooling in this document, we will be using Chocolatey (https://chocolatey.org/). It's possible to accomplish all of these installs manually, and if you want to do this, you're welcome to, but you'll need to make sure everything happens post-install, like environment variables and paths getting updated to enable things to run from a command line. Installing the tooling manually will be outside the scope of this document.

Setting Up Windows Features

You will need to have certain features enabled in Windows 10 to be able to successfully build Selenium code. Specifically, you will need to turn on "Developer Mode." To do this, you can open the Windows 10 Settings app (click the gear icon on the Start Menu), and choosing "Update and Security."


In the Update and Security section, choose "For developers," and turn on "Developer mode." Windows will install the feature for Developer mode.


Installing the Tools

To work in the Selenium .NET code base on Windows, we are going to need the following tools:
  • Chocolatey, which we will use for installation of other tools.
  • Git, which will be used for getting and updating the Selenium source code
  • A Java Development Kit (JDK) which will be used to build some components needed for testing the .NET bindings. Note that a JDK is different from a Java Runtime Environment (JRE) which is what most people install when they "install Java," and we will be using OpenJDK, version 11.
  • Python, which some build rules used by the Selenium build process requires. We will install the latest Python 3.x version.
  • Visual Studio 2019, for modifying and building the C# code that make up the .NET bindings. It is perfectly acceptable to install the Community Edition to work with the code base.
  • MSYS2, which is an alternative shell environment that provides Unix-like commands. While the normal build process using Visual Studio does not require this, the command-line cross-language, cross-platform build tool used to produce the .NET bindings assemblies does require it.
  • Bazel, the command-line build tool used to build components of the Selenium project in multiple supported languages, including C#.

Install Chocolatey

To install Chocolatey, you'll need to run an install script from within PowerShell, an enhanced command line available in Windows. We also need to run PowerShell as an administrator so that the subsequent packages being installed can be run without further elevation prompts. To open PowerShell as an administrator, the easiest thing to do is search for it using the Windows Start Menu search feature, and choose the option to "Run as Administrator."


In the resulting PowerShell window, type the following command:

Set-ExecutionPolicy Bypass -Scope Process -Force; [System.Net.ServicePointManager]::SecurityProtocol = [System.Net.ServicePointManager]::SecurityProtocol -bor 3072; iex ((New-Object System.Net.WebClient).DownloadString('https://chocolatey.org/install.ps1'))

When this is done, you should be able to type choco into the PowerShell command prompt and receive an informational message that includes the version of Chocolatey installed.

Install Git

The Selenium project uses Git (https://git-scm.com/) as its source control system. To get the source code on your machine and to keep it updated, you will need to have Git installed. To install it using Chocolatey, type the following command in the PowerShell window, and answer the prompts:

choco install git

When this command completes, you should be able to open a regular command prompt and type git, and receive an information message about the usage of Git.

Install OpenJDK 11

The .NET language bindings of the Selenium project share the same pages for testing that the Java language bindings do. To enable the running of the .NET bindings' test suite, we need to be able to build the Java language bindings as well, so that we can build the web server that the test suite uses to serve the pages for the browser to browse to during the tests. The Selenium project requires Java 11 to build the Java components, and the OpenJDK is preferred. To install OpenJDK 11 using Chocolatey, type the following command in the PowerShell window, and answer the prompts.

choco install openjdk11

When this command completes, you should be able to open a regular command prompt and type javac, and receive an information message about the use of the Java compiler. If you can type java and receive an informational message, but typing javac yields a "command not recognized" error, you've installed a Java Runtime Environment (JRE), and not a Java Development Kit (JDK). You must have a JDK.

Install Python

Some of the Bazel rules used by the Selenium project require Python (https://www.python.org/) to execute. To install Python using Chocolatey, type the following command in the PowerShell window, and answer the prompts:

choco install python

When this command completes, you should be able to open a regular command prompt and type python, and enter the Python interactive debugging environment (also called a REPL). To exit the REPL, enter quit().

Install Visual Studio 2019 Community Edition

The Selenium project uses Bazel as its cross-language build tool, which means that to build the .NET bindings, you need the prerequisites that Bazel requires to build C# code, including Visual Studio 2019, and the project will build successfully using the free Community Edition (https://visualstudio.microsoft.com/downloads/). Among those prerequisites are a C++ compiler. Even though you may not be building any C++ code as part of the Selenium project, Bazel still requires access to a C++ compiler to build tools to compile other language elements. Additionally, after you install Visual Studio, you may need to add optional components that are not automatically selected during the Visual Studio install. To install Visual Studio 2019 Community Edition using Chocolatey, type the following command in the PowerShell window and answer the prompts.

choco install visualstudio2019community

When the command completes, you should have a working installation of Visual Studio on your machine. You may need to reboot to fully complete the installation. After installing Visual Studio, you will need to make sure the correct optional components are installed as well. To do this, launch the Visual Studio Installer from the Windows Start Menu.


Once the installer launches, click the "Modify" button.


In the ensuing component selection dialog, make sure that the ".NET desktop development" workflow is installed, and that the following optional components are installed as part of that workflow:

  • .NET Core development
  • .NET Core 2.1 LTS Runtime
  • .NET Framework 4-4.6 development tools
  • .NET Framework 4.6.1 development tools
  • .NET Framework 4.6.2 development tools
  • .NET Framework 4.7 development tools
  • .NET Framework 4.7.1 development tools
  • .NET Framework 4.8 development tools


You will also need to make sure the "Desktop development with C++" workflow is installed, and then you can click the "Modify" button to install the optional components.


Once the installation is complete, you can launch Visual Studio to make sure the IDE runs properly.

Install MSYS2

Some of the build rules that the Bazel build tool uses in the Selenium build process are not completely ported to work seamlessly with Windows, and hard-code the use of Unix-like shell commands. To support this as part of the build process, Bazel will require the installation of the MSYS2 subsystem (https://www.msys2.org/). Because of other configuration requirements, it is important to know the directory where the MSYS2 system is installed. This example will use C:\tools\msys64 as the install location, but you can substitute another location if you wish. To install MSYS2 to a specific location using Chocolatey, type the following command in the PowerShell window and answer the prompts.

choco install msys2 --params "/InstallDir=C:\tools\msys64"

Once the installation is completed, you should see the MSYS2 tools installed at the location you specified. So that Bazel can also find those tools at the specified location, you will need to set up an environment variable for Bazel to use. To accomplish this, the easiest thing to do is search for the environment variable editor using the Windows Start Menu search feature, and choose the option to "Open."


In the Environment Variables dialog, click the "New..." button under System variables.


In the New System Variable dialog, enter BAZEL_SH as the variable name, and the path to bash.exe in your MSYS2 installation as the variable value. The bash.exe executable will be located in the usr\bin directory of your MSYS2 installation. Using our example path for installation, the path to add would be C:\tools\msys64\usr\bin\bash.exe. If you chose a different installation path, your path to enter would be different.


Once you have entered those values, click OK on the New System Variable and the Environment Variables dialogs.

Install Bazel

You can now install the final tool in the development environment tool chain, Bazel (https://bazel.build/). Bazel is a cross-language build environment that creates repeatable builds and uses caching to avoid building components unnecessarily, dramatically decreasing build times. To install Bazel using Chocolatey, type the following command in the PowerShell window and answer the prompts.

choco install bazel

When this command completes, you should be able to open a regular command prompt and type bazel, and receive an informational message about the use of the command.

Getting and Building Selenium Code

After running all of the installers for getting the build tools, it's probably a good idea to reboot your machine so that registry changes from the installers can take effect. Once the reboot is complete, and now that we have all of the tooling in place to build Selenium artifacts, you can fetch the code from the source control repository on GitHub. To make and submit changes to the project, you will need to fork the project on GitHub, and submit pull requests to the project. Performing those tasks is beyond the scope of this document. To get the Selenium code base copied to your machine, open a Developer Command Prompt for Visual Studio 2019 from the Windows Start Menu.


In the resulting command prompt window, navigate to an appropriate directory, and clone the project. I usually keep my projects in a directory called C:\Projects, but you can choose whatever directory you like. If you use a different directory structure, keep in mind that you'll need to use those paths when entering commands. To clone the project in your desired location, enter the following commands in your command prompt.

cd \Projects
git clone https://github.com/SeleniumHQ/selenium

A quick warning, the Selenium Git repository is very, very large. This is due to the use of a monorepo, where dependencies are kept checked into the source tree so that anyone can build without the requirement of a network connection to download dependencies. If you don't need the full source control history, or are in a hurry, you can add the --depth=1 switch to the clone command (e.g., git clone --depth=1 https://github.com/SeleniumHQ/selenium) to only get the tip of the tree and build from there.

When this command finishes, the repo should be cloned to your local machine in the selenium subdirectory. To make sure that you can build from the command line, enter the following commands in your command prompt window.

cd selenium
bazel build //dotnet/test/common:chrome

This will build everything the .NET language bindings depend on, including the JavaScript automation atoms, the Java test web server, and the .NET bindings assemblies as well as the NUnit test assemblies, but will stop just short of running the tests. Be aware that this will take awhile the first time you attempt it. Many of the tasks will not require rebuilds once built for the first time.


Summary

At this point, you should be able to successfully build the .NET bindings from the command line using Bazel. You should also be able to open the WebDriver.NET.sln file in the project's source tree and modify, build, and test using Visual Studio. As a reminder, here is a summary of the steps, assuming you are using the paths that are listed in the preceding text; your paths may need to be different if you've chosen different locations.

In an elevated PowerShell command window, run the following commands:

Set-ExecutionPolicy Bypass -Scope Process -Force; [System.Net.ServicePointManager]::SecurityProtocol = [System.Net.ServicePointManager]::SecurityProtocol -bor 3072; iex ((New-Object System.Net.WebClient).DownloadString('https://chocolatey.org/install.ps1'))
choco install git
choco install openjdk11
choco install python
choco install visualstudio2019community
choco install msys2 --params "/InstallDir=C:\tools\msys64"
choco install bazel

Update the Visual Studio installation through the installer to include the ".NET desktop development" and "Desktop development with C++" workflows with the appropriate optional components, and add the BAZEL_SH environment variable to point to the MSYS2 Bash shell executable. After installing these packages, you'll probably want to reboot your machine.

In a Developer Command Prompt for Visual Studio 2019, run the following commands:

cd \Projects
git clone https://github.com/SeleniumHQ/selenium
cd selenium
bazel build //dotnet/test/common:chrome

To run the .NET tests from the command line, a few additional configuration options are required, but we will cover that in a separate blog post. If you run into issues with these steps, feel free to contact me on the Selenium IRC (#selenium at freenode.net) or Slack channels.

Monday, July 8, 2019

Announcing Selenium .NET Bindings 4.0 alpha 2

I'm very proud to announce the 4.0-alpha2 release of the Selenium .NET bindings! There are several exciting things to look forward to in this release. The first is the fixing of several issues that have cropped up because beginning with version 75, Chrome (and ChromeDriver) now use the W3C WebDriver Specification as the default protocol dialect for communication between Selenium and the driver. This has led to a few issues like the loggingPrefs capability being renamed, and the legacy logging APIs (driver.Manage().Logs) no longer working. This functionality should be restored in this release.

By far, the biggest addition to this release of the .NET bindings is the addition of integration with the Chrome DevTools Protocol (CDP). We are extremely excited to be able to bring this feature to users of Selenium. It allows users, using their existing WebDriver instance, to instantiate and use a CDP session, including the two-way communication of events. The .NET API for using CDP is still experimental, and may change between releases until the alpha/beta period is over. But to whet your appetite, here's what code to use this looks like in 4.0.0-alpha02:


The API uses the .NET System.Net.WebSockets.ClientWebSocket implementation for communication with Chrome, which means it's limited to use with Windows 8.1 and above. This is a limitation of the WebSocket implementation, so complaints about that should be directed toward Microsoft. Accordingly, most of the CDP API is async, though the remainder of the WebDriver API is not.

Also, for the moment, the .NET bindings do not implement domains marked as "experimental" in the protocol definition. One thing that we really do not want for Selenium is for it to be tied down to specific versions of Chrome. Since the DevTools Protocol is not subject to any standard, and can be changed at the whim of the Chromium developers, this seems like a potentially suboptimal solution if we do that.

The CDP integration is something we'd really like to get feedback on, so give it a whirl, and let us know what you think.

Tuesday, June 4, 2019

Handling Authentication Requests with Selenium: Epilogue

I hope the preceding series of posts has been useful. To wrap things up, I want to share a GitHub repository that contains sample code for each of the items we've discussed. It includes an ASP.NET Core demo web site that implements Basic, Digest, and NTLM authentication. It includes sample Selenium code using BenderProxy (version 1.1.2 or later) and PassedBall (version 1.2.0 or later) to automate the site. The Selenium code runs in a console application, which will await you pressing the Enter key before shutting down the proxy and quitting the browser. This will allow you to see the state of the browser before everything quits. Other features of the sample repo include working factory classes for Selenium sessions and the demo cases themselves.

To make the demo in the source repo properly work, you must run it on Windows, because we are enabling NTLM authentication. Also, you will need administrative access on your Windows machine, which is unfortunate, but there is no other way to get the development web server to listen on a host name other than "localhost". If you change the test to navigate to the site on "localhost", the browser will likely bypass the proxy, because most browsers bypass proxies browsing on localhost without taking other configuration steps. By default, the demo project uses www.seleniumhq-test.test and port 5000, but you can use whatever you want. Here's how to configure your test environment so that the demo app will work properly:

From an elevated ("Run as Administrator") command prompt, edit your hosts file to contain a mapped entry for the host you wish to use. The hosts file can be edited in any text editor, including Notepad, so the following command will open it:

notepad.exe %WinDir%\System32\drivers\etc\hosts

Once open, add the following line:

127.0.0.1 <host name>

Be sure to substitute your preferred host name for <host name> . Save and close the hosts file. As an aside, this is a very useful technique for Selenium code to simulate navigation to external sites without actually having to navigate outside one's local machine.

Also in the elevated command prompt, execute the following command:

netsh http add urlacl url="http://<host name>:<port>/" user=everyone

Be sure to substitute your preferred host name and port for <host name> and <port> respectively. You should see a message that the URL reservation was successfully added. Now, this is a dangerous command, because it does open up a URL reservation for everyone, so you don't want to leave this permanently in place. You can remove it at any time after you're done using the sample by using another elevated command prompt to execute:

netsh http remove urlacl url="http://<host name>:<port>/"

Once you've added the hosts file entry and the URL ACL, you're ready to load and run the authentication tests. Open the solution in Visual Studio 2019, and you should be able to build and run. When running, the solution runs a console application that will launch the test web app, start the proxy server, start a browser configured to use the proxy with Selenium, navigate to a protected URL for a specific authentication scheme, and then wait for the Enter key to be pressed. This will let you examine the browser to validate that, yes, the authentication succeeded You can also examine the diagnostic output written to the console by the test code, which describes the WWW-Authenticate and Authorization headers being used. Once you've validated to your satisfaction that in the browser really did authenticate using Selenium and without prompting the user, you can press Enter, which will quit the browser, stop the proxy server, and shut down the test web app. As an extra validation step, you can also start the test web app from Visual Studio and manually navigate to the URLs to validate that they really do prompt for credentials when browsed to.

Here's the Main method of the test app:

As you can see, you can change the browser being used (line 5 in the listing above), and the authentication type (line 9 in the listing above) being tested by changing the commented lines in the main method. If you decided to use a different host name or port, you can also change that by uncommenting and changing the appropriate lines (lines 15 and 16 in the listing above, respectively).

Hopefully, this series has given you some insights into how browsers perform authentication, and how it's possible to automate this using Selenium, without resorting to other UI automation tools. Happy coding!

Monday, June 3, 2019

Handling Authentication Requests with Selenium - Part 4: NTLM Authentication

Now that we've created a thorough intellectual framework for how to handle authentication requests using Selenium in combination with a web proxy, and thanks to our last post, we can handle more than Basic authentication, let's take things a step further, and see how you can use Selenium in automating pages secured with NTLM authentication. Before we can do that, though, we need to have an understanding of how NTLM authentication differs from the previous types of authentication we've used before.

NTLM authentication is a Microsoft-developed technology, originally implemented in the company's IIS web server product. It's not widely used on the public internet, but it does integrate nicely with things like Active Directory, so it can be quite useful for web applications used on company intranets that require security based on Active Directory credentials. This means that to provide sample code, we'll need to have a few things in place first. First, we'll need a test website that we can run locally, running on a server that implements NTLM authentication. Since we're working in C# in this series, we can create an ASP.NET Core web project to do that.

Second, we'll also need to host the application using Windows. Even though the ASP.NET Core project can run against .NET Core, and that can run on platforms other than Windows, we'll need to actually run on Windows to take advantage of NTLM authentication, unless we want to introduce a ton of complexity with Active Directory domains and the like (which we don't for this post).

Finally, most browsers bypass the use of a proxy when running strictly on localhost. This means that if you're running things all on the same system, you'll need to either configure the browser not to do this, or trick it into thinking the site the browser is connecting to isn't the local machine. The latter is far easier, since it only involves adding a line to the Windows hosts file (located at %WinDir%\System32\drivers\etc\hosts). On my test system, I've redirected www.seleniumhq-test.test to 127.0.0.1 by using the hosts file, and the sample code will reflect this.

NTLM authentication is a challenge-response based authentication scheme, and it differs from other HTTP authentication schemes in that it authenticates a connection, not an individual request. This means that the browser and server must support so-called "keep-alive," or persistent TCP connections between them. It also means that our proxy has to support persistent TCP connections, and must allow us to use that exact connection for making the requests. Fortunately, the proxy we've been using so far, BenderProxy, does support this.

The challenge-response mechanism used is complicated. Very complicated. So again, we'll be using the PassedBall library to parse authentication headers and generate authorization responses. It also requires multiple request/response round trips to perform the authentication handshake. Here's the implementation code for handling the NTLM authentication challenge for a sample site hosted on our local host machine:

Note carefully that the initial 401 Unauthorized response may contain multiple WWW-Authenticate headers, so one may need to make sure the proper one is being used to interpret the response. Browsers, when faced with this, will usually choose what they perceive to be the "strongest" authentication method. In our case, we need to do that determination for ourselves.

We'll wrap up this series with one more post, summing everything up.

Friday, May 31, 2019

Handling Authentication Requests with Selenium - Part 3: Beyond Basic Authentication

In the last post in this series, we saw the general procedure for handling authentication requests with Selenium and a web proxy:
  • Start the programmable proxy
  • Start a Selenium session configuring the browser to use the proxy
  • Wire up a method to intercept the 401 Unauthorized response
  • Use the method to resend the request with the correct Authorization header value
As we noted previously, the use of the Basic HTTP authentication scheme is rather weak. There are other authentication schemes that don't require the sending of a password in plain text over the wire. One such case is HTTP Digest authentication. Let's see what that looks like. First, let's navigate to a page that implements Digest authentication, and examine what we see. As before, we'll use the hosted version of The Internet at http://the-internet.herokuapp.com/

Browser sends:
GET http://the-internet.herokuapp.com/digest_auth HTTP/1.1
Host: the-internet.herokuapp.com
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:67.0) Gecko/20100101 Firefox/67.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
Upgrade-Insecure-Requests: 1


Browser receives back:
HTTP/1.1 401 Unauthorized
Connection: keep-alive
Content-Type: text/plain
Content-Length: 0
Www-Authenticate: Digest realm="Protected Area", nonce="MTU1ODkwNDI2MyBkYjYzMTA0ZTY0NmZjNmZhNDljNzQ2ZGY0ZTc3NDM4OA==", opaque="610a2ee688cda9e724885e23cd2cfdee", qop="auth"
Server: WEBrick/1.3.1 (Ruby/2.2.5/2016-04-26)
Date: Sun, 26 May 2019 20:57:43 GMT
Via: 1.1 vegurTTP/1.1 200 OK 


Note the value of the WWW-Authenticate header, which is considerably more complex than in the Basic authentication scheme case. The algorithm for figuring out the correct value for the Authorization header is likewise much more complex, which, in the simplest case, involves getting the MD5 hash of the string "userName:realm:password", then the MD5 hash of the HTTP verb and the URL of the resource being requested, then getting the Base64-encoded string of those two hashes along with the "nonce" value send in the authenticate header.


Whew. That's an awful lot to keep straight. Probably a little too complicated to post the code for resolving all of the nuances of it within this blog post. So it's time to introduce a new library to add to our toolbox for calculating the authorization header value for any of a variety of authentication methods. That library is called PassedBall, and it's available both on GitHub and as a NuGet package. Since PassedBall supports Digest authentication, and using the same process as in our previous post, here's the implementation of the method to intercept and resend the HTTP request:

Now that we have a library and generic framework for the generation of arbitrary authentication schemes, we'll look at one last approach for authentication, one that uses connection semantics for authentication, NTLM authentication.

Thursday, May 30, 2019

Handling Authentication Requests with Selenium - Part 2: Using a Web Proxy for Basic Authentication

As I mentioned in the immediately prior post in this series, the way to avoid having the browser prompt for credentials while using a Selenium test is by supplying the correct information in the Authorization header. Since Selenium's focus is automating the browser as close to how a user does so as possible, there's not a built-in way to examine or modify the headers. However, Selenium does make it very easy to configure the browser being automated to use a web proxy. A web proxy is a piece of software that stands between your browser and any request made of a web server, and can be made to examine, modify, or even block requests based on any number of rules. When configured to use a proxy, every request made by your browser flows through the proxy. Many businesses use proxies to ensure that only authorized resources are being accessed via business computers, or making sure that requests only come from authorized computers, or any number of other legitimate business purposes.

How do you configure your browser to use a proxy with Selenium? The code looks something like this:

Since we're Selenium users, we'll be using a proxy that allows us to programmatically start and stop it, and hook into the request/response chain via our code, and modify the results in order to interpret and replace the headers as needed. Any number of proxies could be used in this project. Many Selenium users have had great success using BrowserMob Proxy, or there are commercial options like Fiddler. Since I personally prefer FOSS options, and don't want to leave the .NET ecosystem, for our examples here, we'll be using BenderProxy. Here's the code for setting that up.

Now, how do we wire up the proper processing to mimic the browser's processing of an authentication prompt? We need to implement the addition of an Authorization header that provides the correct value, for the authentication scheme requested by the server. BenderProxy's OnResponseReceived handler happens after the response has been received from the web server, but before it's forwarded along to the browser for rendering. That gives us the opportunity to examine it, and resend another request with the proper credentials in the proper format. We're using the Basic authentication scheme in this example, and once again using The Internet sample application. Here's the code for the method:

Running the code, we'll see that when the Selenium code is run, the browser will show the authorized page, as we intended. As you can tell from the implementation code, Basic authentication is pretty simple, sending the Base64 encoding of "userName:passsword". Its simplicity is also one reason it's not used very often, as it sends the credentials across the wire, essentially in clear text. There are other, more secure authentication schemes available, and they can be automated in similar ways. The trick is knowing how to specify the value for the Authentication header. In the next post in the series, we'll look at another authentication mechanism, and how to handle something a little more complicated.

Wednesday, May 29, 2019

Handling Authentication Requests with Selenium - Part 1: How Does Browser Authentication Work Anyway?

In order to understand how to use browser technologies to automate pages that use some form of authentication, it is useful to know what happens when you browse to such a page. What's actually happening when your browser prompts you for some form of credentials, usually a user name and password, before it will let you access a given resource on the web?

At the risk of dropping down to a ridiculously low level, let's talk about how browsers transfer data for browsing websites. First, an obligatory disclaimer. I'm going to deliberately gloss over using pages served via secure HTTP ("https"), and I'm going to ignore mostly-binary protocols like HTTP/2 for this series. Those items, while important, and may impact the outcomes you see here, are beyond the scope of this series.

Most of the time, a browser is using the Hypertext Transfer Protocol (or HTTP) to communicate with a given web server. When you type in a URL in your browser's address bar, your browser sends off an HTTP request (that's what the "http://" means at the beginning of the URL), and receives a response from the server. For the following examples, we'll be using Dave Haeffner's excellent Selenium-focused testing site, The Internet, which is designed to provide examples of challenging things a user might encounter when automating web pages with Selenium, and a hosted version of which is available at http://the-internet.herokuapp.com. Here's what a typical exchange might look like:

Browser sends:
GET http://the-internet.herokuapp.com/checkboxes HTTP/1.1
Host:the-internet.herokuapp.com
User-Agent:Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0
Accept:text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language:en-US,en;q=0.5
Accept-Encoding:gzip, deflate
Connection:keep-alive
Upgrade-Insecure-Requests:1


Browser receives back:
HTTP/1.1 200 OK
Connection: keep-alive
Content-Type: text/html;charset=utf-8
Content-Length: 2008
Server: WEBrick/1.3.1 (Ruby/2.2.5/2016-04-26)
Date: Thu, 23 May 2019 23:44:54 GMT
Via: 1.1 vegur

<body of HTML page here>


This is what happens for virtually every time a browser makes a request for a resource. The important thing to note is in that first line of the response. The "200 OK" bit means that the server had the resource and was sending it in response to the request. Now let's look at a request for a resource that is protected by authentication:

Browser sends:
GET http://the-internet.herokuapp.com/basic_auth HTTP/1.1
Host: the-internet.herokuapp.com
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:67.0) Gecko/20100101 Firefox/67.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
Upgrade-Insecure-Requests: 1


Browser receives back:
HTTP/1.1 401 Unauthorized
Connection: keep-alive
Content-Type: text/html;charset=utf-8
Www-Authenticate: Basic realm="Restricted Area"
Content-Length: 15
Server: WEBrick/1.3.1 (Ruby/2.2.5/2016-04-26)
Date: Thu, 23 May 2019 23:52:24 GMT
Via: 1.1 vegur


Note the all-important first line of the response, which says "401 Unauthorized". That tells us that we have a page that requires authentication. Note that if you asked your browser to browse to the page http://the-internet.herokuapp.com/basic_auth, you would have been prompted for a user name and password. Note in the response the line that says Www-Authenticate: Basic realm="Restricted Area". That tells the browser that the "Basic" authentication scheme is expected, and that the user's user name and password are required, and so the browser prompts you, and then it re-sends the request to the server, but with an additional header. If you used the proper credentials for the aforementioned URL (user name: admin, password: admin), you'd see something like the following:

Browser sends:
GET http://the-internet.herokuapp.com/basic_auth HTTP/1.1
Host: the-internet.herokuapp.com
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:67.0) Gecko/20100101 Firefox/67.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
Upgrade-Insecure-Requests: 1
Authorization: Basic YWRtaW46YWRtaW4=


Browser receives back:
HTTP/1.1 200 OK
Connection: keep-alive
Content-Type: text/html;charset=utf-8
Content-Length: 1643
Server: WEBrick/1.3.1 (Ruby/2.2.5/2016-04-26)
Date: Thu, 23 May 2019 23:59:31 GMT
Via: 1.1 vegur

<body of HTML page here>


Clearly, that additional header that says Authorization: Basic YWRtaW46YWRtaW4= tells us that the browser must've done something with those credentials we gave it. If only we had a way to intercept the unauthorized response, calculate what needs to go into that authorization header, and resend the request before the browser had the chance to prompt us for credentials, we'd be golden. As luck (and technology) would have it, we do have exactly that ability, by using a web proxy. Every browser supports proxies, and Selenium makes it incredibly easy to use them with browsers being automated by it. The next post in this series will outline how to get that set up and working with Selenium.