Thursday, October 25, 2012

Automating Website Authentication

Recently I had to implement automated logging on the website. In my particular case, that was Yahoo.com website, so the code snippets will be specific to this site. It should not be hard to modify them for other purposes. I developed two separate ways to achieve that, the first one has more code and is more complex (have to subscribe to two events and make more logical checks), but I figured it out first. It makes use of the WebBrowser class.

Create an instance of the WebBrowser and subscribe to Navigated and DocumentCompleted events

_browser = new WebBrowser();
_browser.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(browser_DocumentCompleted);
_browser.Navigated += new WebBrowserNavigatedEventHandler(browser_Navigated);

On a timeline, first meaningful event that will be caught is browser_DocumentCompleted on the login.yahoo.com. The code then will analyse the controls on the page. For successful operation, I need to know actual names of the login and password input elements. I find them by name, and set the values to actual login and password. Then I simulate the click on the login button.

Next meaningful event is browser_Navigated on my.yahoo.com page - see below.

After that, I'll point the browser to the url of the document I want to read or download. I'll catch browser_DocumentCompleted again, on that page, and read the contents using the WebBrowser.Document.Body.InnerText (end of the code snippet).

void browser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
 //loaded the Yahoo login page
 if (_browser.Url.AbsoluteUri.Contains(LoginUrl))
 {
  if (_browser.Document != null)
  {
   //Find and fill the "username" textbox
   HtmlElementCollection collection = _browser.Document.GetElementsByTagName("input");
   foreach (HtmlElement element in collection)
   {
    string name = element.GetAttribute("id");
    if (name == "username")
    {
     element.SetAttribute("value", _login);
     break;
    }
   }

   //Find and fill the "password" field
   foreach (HtmlElement element in collection)
   {
    string name = element.GetAttribute("id");
    if (name == "passwd")
    {
     element.SetAttribute("value", _password);
     break;
    }
   }

   //Submit the form
   collection = _browser.Document.GetElementsByTagName("button");
   foreach (HtmlElement element in collection)
   {
    string name = element.GetAttribute("id");
    if (name == ".save")
    {
     element.InvokeMember("click");
     break;
    }
   }
  }
 }
 
 //downloaded "quote.csv"
 if(_browser.Url.AbsoluteUri.Contains(".csv"))
 {
  if (_browser.Document != null && _browser.Document.Body != null)
  {
   string s = _browser.Document.Body.InnerText;
  }
 }
}

Here I actually copy the cookies, but that is not necessary. The WebBrowser will keep them internally and use them. The purpose of this code is to check if the browser is redirected to "my.yahoo.com", which is the indication of successful login. Further logic may be applied from here.
void browser_Navigated(object sender, WebBrowserNavigatedEventArgs e)
{
 //Successful login takes to "my.yahoo.com"
 if (_browser.Url.AbsoluteUri.Contains(MyYahoo))
 {
  if (_browser.Document != null && !String.IsNullOrEmpty(_browser.Document.Cookie))
  {
   _cookies = _browser.Document.Cookie;
  }
 }
}

The second approach is shorted, but it took me longer to figure out. Here I have to explicitly use the CookieContainer to save the cookies "harvested" by the HttpWebRequest which does the login, and use them in the HttpWebRequest which asks for the file after authentication. Of course, I still need to know what are the names of login and password elements, because I'm sending the values in the POST data.

Step one - authentication

string strPostData = String.Format("login={0}&passwd={1}", _login, _password);

// Setup the http request.
HttpWebRequest wrWebRequest = WebRequest.Create(LoginUrl) as HttpWebRequest;
wrWebRequest.Method = "POST";
wrWebRequest.ContentLength = strPostData.Length;
wrWebRequest.ContentType = "application/x-www-form-urlencoded";
_yahooContainer = new CookieContainer();
wrWebRequest.CookieContainer = _yahooContainer;

// Post to the login form.
using (StreamWriter swRequestWriter = new StreamWriter(wrWebRequest.GetRequestStream()))
{
 swRequestWriter.Write(strPostData);
 swRequestWriter.Close();           
}

// Get the response.
HttpWebResponse hwrWebResponse = (HttpWebResponse)wrWebRequest.GetResponse();

Step two - accessing data using the cookies.

HttpWebRequest req = (HttpWebRequest)WebRequest.Create(_downloadUrl);
req.CookieContainer = _yahooContainer;
HttpWebResponse resp = (HttpWebResponse)req.GetResponse();

using(StreamReader streamReader = new StreamReader(resp.GetResponseStream()))
{
 string t = streamReader.ReadToEnd();
}

References:

WebBrowser control
submit a form data from external address !
C# Login to Website via program
how to login to yahoo website programatically by . Also posted on my website

No comments: