Introduction
This article is more of a guide on how to programmatically execute some actions on web sites. I took an example of submission of links to DZone to illustrate this concept. These days, there is a lot of emphasis on submitting or sharing your post, links, articles with the whole community so I thought this example of DZone link submission will work out good. At the heart of implementing this whole concept are the HttpWebRequest
and HttpWebResponse
objects. Since I am using the .NET Framework, I mentioned these classes. But behind the scenes, it is as simple as sending an HTTP request and analyzing the response. So you can use whatever tool you have at hand. I will explain each step that I followed to come up with the solution. These steps pretty much work for all kinds of applications.
Use Web Site to Perform the Action
First you need to analyze what action you are performing and how the web site sends its request and what kind of response is returned. These two analysis steps are what drive this whole solution. Let's take an example of submitting a new link from the DZone site. You click on "Add a new link" and you are taken to a new page where it asks you to login. Then you login and you are sent to a page where you supply values for URL, Title, Description, Tags, etc. Then you click "Submit" button and you are done. So based on this, following are the steps that you need to perform programmatically.
- Submit request to add link
- Catch redirect to login page
- Perform login into site
- Send request to add page after unsuccessful login
Now to see what the browser is doing to perform all these actions, fire up tool like Fiddler and monitor all requests/responses for these actions. So if you can mimic these actions, you are good to go. Now let's see how you will perform this action programmatically.
Submit Add Request
You will be using the HttpWebRequest
object to send a request to http://www.dzone.com/links/add.html. At this point, you do not have to worry about specifying any other parameters like Title, URL, etc. as your request is not going to go through because you are not logged into the site. In technical terms, you have not established an authenticated session with the site.
Catch Redirect To Login Page
When you send an unauthorized request to add a link, the site will redirect you to the login page. What this means is that when you send an HTTP request to access add.html page, the server sends an HTTP response with status code 302 which means that the response is being redirected. And with that response, it sends the redirection location in Location header in response. So programmatically you need to submit a request, look for the response status code and find the Location header. The code is as shown below:
static string GetLoginUrl(CookieContainer cookies, string targetUrl)
{
int hops = 1;
int maxRedirects = 20;
bool foundIt = false;
HttpWebRequest webReq;
string loginUrl = targetUrl;
do
{
webReq = WebRequest.Create(loginUrl) as HttpWebRequest;
webReq.CookieContainer = cookies;
webReq.AllowAutoRedirect = false;
string msg = string.Format("Hope[(0) - {1}", hops++, loginUrl);
Debug.WriteLine(msg);
HttpWebResponse webResp = webReq.GetResponse() as HttpWebResponse;
webResp.Close();
if (webResp.StatusCode == HttpStatusCode.Found)
{
loginUrl = webResp.Headers["Location"] as String;
}
else
{
foundIt = (webResp.StatusCode == HttpStatusCode.OK);
break;
}
} while (hops <= maxRedirects);
return foundIt ? loginUrl : string.Empty;
}
Notice that the code is in a while
loop, the reason being that some sites actually can redirect you to a couple of pages before sending you to the final login page. So I have limited the loop to 20 hops.
Cookies
This is the biggest part of the whole implementation. When you start a session with a site, it sends some cookies in response. And it expects some of those cookies sent in subsequent requests to make sure that you have an authorized session open. If you look at the code above, I have attached a CookieContainer
object to request to make sure that all the cookies sent in response are collected. And then this container can be attached with subsequent requests.
Perform Login
When you perform login on site, it does a FORM
submission to server with some key-value pairs that contain the data required to validate the user. You can use Internet Explorer Toolbar, FireBug or any other tool to inspect the HTML of the page to locate the FORM
tag and values that need to be sent. I used FireBug to inspect that section to find out the values that I need. The following images show the result:
You can see that there is a FORM
with POST
action pointing to /links/j_acegi_security_check. And you will find that it has two text boxes with element names j_username
and j_password
that take login information and are used to submit data with POST
request. So these are the pieces of information you needed to perform the login action. The following code shows how this is accomplished:
RequestAttributes reqAttribs = new RequestAttributes();
reqAttribs.OverrideConfigurationSettings = true;
reqAttribs.AllowSecureSiteCrawl = true;
reqAttribs.AutoRediectEnabled = false;
reqAttribs.MaxRedirects = 100;
reqAttribs.IsPost = true;
reqAttribs.RequestUrl = "http://www.dzone.com/links/j_acegi_security_check";
reqAttribs.CookieContainer = container;
reqAttribs.RequestParameters.Add("j_username", "xxxxxx");
reqAttribs.RequestParameters.Add("j_password", "xxxxxx");
HttpProtocol obHttp = new HttpProtocol(reqAttribs);
HttpProtocolOutput obOutput = obHttp.GetProtocolOutput();
Did Login Succeed?
After you executed the above request and got the response back, now the big question you will ask is how do I check if the login succeeded or not. You can't rely on status code of response because if it will be 200 means request succeeded. There are a couple of things that you can check. Some sites will redirect you to a landing page so you can check if you got 302 response code. Or a sure way to check is to parse the response and see if you have a login box on the page. For example in case of DZone.com site, you can check if there is a markup node on the page that has name
attribute with value of j_username
or any markup that is unique to the login page. If you will find that node, that means login did not work. Here is some sample code that I used for my application.
static bool CheckLoginStatus(HttpProtocolOutput loginRespOutput)
{
ParserStream obStream =
new ParserStream(new System.IO.MemoryStream
(loginRespOutput.Content.ContentData));
Source obSource =
new InputStreamSource(obStream, null,
loginRespOutput.Content.ContentData.Length);
Page obPage = new Page(obSource);
obPage.Url = "http://www.dzone.com/links/j_acegi_security_check";
Lexer obLexer = new Lexer(obPage);
Parser obParser = new Parser(obLexer);
HasAttributeFilter filter = new HasAttributeFilter("name", "j_username");
NodeList oNodes = obParser.ExtractAllNodesThatMatch(filter);
return (oNodes.Count == 0);
}
Submit New Request with Authorized Session
During this whole process of login and redirections, make sure that you keep the cookie container around so that it keeps collecting all the cookies. You are going to need this cookie container to send a request to submit your links. Now you just need to send a new POST
request to the target URL with appropriate FORM
parameters like title, URL and description.
Sample Project
A sample project and other pre-requisites for the code shown here are available at ByteBlocks.