Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / web / HTML

IntelliLink - An Alternative Windows Version to Online Link Managers

4.96/5 (18 votes)
6 Sep 2024GPL36 min read 70.7K   276  
A look at the URLDownloadToFile function and architecture of IntelliLink
The easiest way to check the content of a web page is to download it locally (to a file) and search through it. To accomplish this, we look at the URLDownloadToFile function. The article then looks at the architecture of IntelliLink.

Image 1

Introduction

Have you ever owned a website? Did you do some sysadmin work for somebody else? Have you made link exchange? If so, you probably wish to monitor the internal/external links from/to your site.

Background

The easiest way to check the contents of a web page is to download it locally (to a file) and search through it. To accomplish this, we use URLDownloadToFile function, which has the following syntax:

C++
HRESULT URLDownloadToFile(
	LPUNKNOWN pCaller,
	LPCTSTR szURL,
	LPCTSTR szFileName,
	_Reserved_  DWORD dwReserved,
	LPBINDSTATUSCALLBACK lpfnCB);

Parameters

  • pCaller: A pointer to the controlling IUnknown interface of the calling ActiveX component, if the caller is an ActiveX component. If the calling application is not an ActiveX component, this value can be set to NULL. Otherwise, the caller is a COM object that is contained in another component, such as an ActiveX control in the context of an HTML page. This parameter represents the outermost IUnknown of the calling component. The function attempts the download in the context of the ActiveX client framework, and allows the caller container to receive callbacks on the progress of the download.
  • szURL: A pointer to a string value that contains the URL to download. Cannot be set to NULL. If the URL is invalid, INET_E_DOWNLOAD_FAILURE is returned.
  • szFileName: A pointer to a string value containing the name or full path of the file to create for the download. If szFileName includes a path, the target directory must already exist.
  • dwReserved: Reserved. Must be set to 0.
  • lpfnCB: A pointer to the IBindStatusCallback interface of the caller. By using IBindStatusCallback::OnProgress, a caller can receive download status. URLDownloadToFile calls the IBindStatusCallback::OnProgress and IBindStatusCallback::OnDataAvailable methods as data is received. The download operation can be cancelled by returning E_ABORT from any callback. This parameter can be set to NULL if status is not required.

Return Value

This function can return one of these values:

  • S_OK: The download started successfully.
  • E_OUTOFMEMORY: The buffer length is invalid, or there is insufficient memory to complete the operation.
  • INET_E_DOWNLOAD_FAILURE: The specified resource or callback interface was invalid.

So, our implementation using the above function will be:

C++
BOOL ProcessHTML(CString strFileName, CString strSourceURL, 
                 CString strTargetURL, CString strURLName)
{
   CString strURL;
   CString strFileLine;
   CString strLineMark;
   BOOL bRetVal = FALSE;

   try
   {
      CStdioFile pInputFile(strFileName, CFile::modeRead | CFile::typeText);
      while (pInputFile.ReadString(strFileLine))
      {
         int nIndex = strFileLine.Find(_T("href="), 0);
         while (nIndex >= 0)
         {
            const int nFirst = strFileLine.Find(_T('\"'), nIndex);
            if (nFirst >= 0)
            {
               const int nLast = strFileLine.Find(_T('\"'), nFirst + 1);
               if (nLast >= 0)
               {
                  strURL = strFileLine.Mid(nFirst + 1, nLast - nFirst - 1);
                  if (strURL.CompareNoCase(strTargetURL) == 0)
                  {
                     TRACE(_T("URL found - %s\n"), strTargetURL);
                     strLineMark.Format(_T(">%s<"), strURLName);
                     if (strFileLine.Find(strLineMark, nLast + 1) >= 0)
                     {
                        TRACE(_T("Name found - %s\n"), strURLName);
                        bRetVal = TRUE;
                     }
                  }
               }
            }
            nIndex = (nFirst == -1) ? -1 : strFileLine.Find(_T("href="), nFirst + 1);
         }
      }
      pInputFile.Close();
   }
   catch (CFileException* pFileException)
   {
      TCHAR lpszError[MAX_STR_LENGTH] = { 0 };
      pFileException->GetErrorMessage(lpszError, MAX_STR_LENGTH);
      pFileException->Delete();
      OutputDebugString(lpszError);
      bRetVal = FALSE;
   }
   VERIFY(DeleteFile(strFileName));
   return bRetVal;
}

BOOL CLinkData::IsValidLink()
{
   BOOL bRetVal = TRUE;
   TCHAR lpszTempPath[MAX_STR_LENGTH] = { 0 };
   TCHAR lpszTempFile[MAX_STR_LENGTH] = { 0 };

   const DWORD dwTempPath = GetTempPath(MAX_STR_LENGTH, lpszTempPath);
   lpszTempPath[dwTempPath] = '\0';
   if (GetTempFileName(lpszTempPath, _T("html"), 0, lpszTempFile) != 0)
   {
      TRACE(_T("URLDownloadToFile(%s)...\n"), GetSourceURL());
      if (URLDownloadToFile(NULL, GetSourceURL(), lpszTempFile, 0, NULL) == S_OK)
      {
         if (!ProcessHTML(lpszTempFile, GetSourceURL(), GetTargetURL(), GetURLName()))
         {
            TRACE(_T("ProcessHTML(%s) has failed\n"), lpszTempFile);
            bRetVal = FALSE;
         }
      }
      else
      {
         TRACE(_T("URLDownloadToFile has failed\n"));
         bRetVal = FALSE;
      }
   }
   else
   {
      TRACE(_T("GetTempFileName has failed\n"));
      bRetVal = FALSE;
   }
   return bRetVal;
}

The Architecture

What do Source URL, Target URL, and URL Name mean in the above piece of code?

  • Source URL = what web page to check
  • Target URL = what link should be on the above web page
  • URL Name = what name should be for the above link

Each URL definition is contained in a CLinkData class, with the following interface:

  • DWORD GetLinkID(); - Gets ID of the current URL definition
  • void SetLinkID(DWORD dwLinkID); - Sets ID for the current URL definition
  • CString GetSourceURL(); - Gets Source URL for current URL definition
  • void SetSourceURL(CString strSourceURL); - Sets Source URL for current URL definition
  • CString GetTargetURL(); - Gets Target URL for current URL definition
  • void SetTargetURL(CString strTargetURL); - Sets Target URL for current URL definition
  • CString GetURLName(); - Gets URL Name for current URL definition
  • void SetURLName(CString strURLName); - Sets URL Name for current URL definition
  • int GetPageRank(); currently not implemented
  • void SetPageRank(int nPageRank); currently not implemented
  • BOOL GetStatus(); - Gets status for current URL definition
  • void SetStatus(BOOL bStatus); - Sets status for current URL definition

Then, we define CLinkList as typedef CArray<CLinkData*> CLinkList;.

This list is managed inside the CLinkSnapshot class, with the following interface:

  • BOOL RemoveAll(); - Removes all URL definitions from list
  • int GetSize(); - Gets the size of URL definition list
  • CLinkData* GetAt(int nIndex); - Gets an URL definition from list
  • BOOL Refresh(); - Updates the status for each URL definition from list
  • CLinkData* SelectLink(DWORD dwLinkID); - Searches for a URL definition by its ID
  • DWORD InsertLink(CString strSourceURL, CString strTargetURL, CString strURLName, int nPageRank, BOOL bStatus); - Inserts a URL definition into list
  • BOOL DeleteLink(DWORD dwLinkID); - Removes an URL definition from list
  • BOOL LoadConfig(); - Loads the URL definition list from XML file
  • BOOL SaveConfig(); - Saves the URL definition list to XML file

The Good, the Bad, and the Ugly

The good thing is that I learned to use Windows ribbons. The bad thing is that I still don't know how to get a web page's PageRank value. The ugly thing is that the processing (i.e., checking link validity) should be done in a separate working thread, but I am planning this change for the next release. Stay tuned!

Final Words

IntelliLink application uses many components that have been published on CodeProject. Many thanks to:

  • My CMFCListView form view (see source code)
  • Lee Thomason for his TinyXML2 class
  • PJ Naughter for his CTrayNotifyIcon class
  • PJ Naughter for his CVersionInfo class

Further plans: I would like to add support for Google's PageRank as soon as possible.

History

  • Version 1.04 (November 9th, 2014): Initial release
  • Moved source code from CodeProject to GitLab (April 10th, 2020)
  • Moved source code from GitLab to GitHub (February 23rd, 2022)
  • Version 1.05 (April 28th, 2022): Added setup project
  • Version 1.06 (May 23rd, 2022): Added program to Startup Apps
  • Version 1.07 (August 20th, 2022): Updated font size of About dialog
  • Version 1.08 (August 26th, 2022): Removed program from Startup Apps
  • Version 1.09 (September 9th, 2022): Added Contributors hyperlink to AboutBox dialog
  • Version 1.10 (January 23rd, 2023): Updated PJ Naughter's CVersionInfo library to the latest version available
    Updated the code to use C++ uniform initialization for all variable declarations
  • Version 1.11 (January 24rd, 2023): Updated PJ Naughter's CInstanceChecker library to the latest version available
    Updated the code to use C++ uniform initialization for all variable declarations
  • Replaced NULL throughout the codebase with nullptr
    Replaced BOOL throughout the codebase with bool
    This means that the minimum requirement for the application is now Microsoft Visual C++ 2010
  • Version 1.12 (May 27th, 2023): Updated About dialog with GPLv3 notice
  • Version 1.13 (June 16th, 2023): Made persistent the length of columns from interface
  • Version 1.14 (June 24th, 2023): Updated PJ Naughter's CTrayNotifyIcon library to the latest version available
  • Version 1.15 (July 20th, 2023): Extended application's functionality with two new buttons: Website Review and Webmaster Tools
  • Version 1.16 (August 20th, 2023):
    • Changed article's download link. Updated the About dialog (email & website)
    • Added social media links: Twitter, LinkedIn, Facebook, and Instagram
    • Added shortcuts to GitHub repository's Issues, Discussions, and Wiki
  • Version 1.17 (October 29th, 2023): Updated PJ Naughter's CTrayNotifyIcon library to the latest version available
    Fixed an issue where the CTrayNotifyIcon::OnTrayNotification callback method would not work correctly if the m_NotifyIconData.uTimeout member variable gets updated during runtime of client applications. This can occur when you call CTrayNotifyIcon::SetBalloonDetails. Thanks to Maisala Tuomo for reporting this bug.
  • Version 1.18 (January 27th, 2024): Added ReleaseNotes.html and SoftwareContentRegister.html to GitHub repo
  • Version 1.19 (February 21st, 2024): Switched MFC application' theme back to native Windows
  • Version 1.20 (September 6th, 2024):
    • Replaced old XML library from CodeProject with Lee Thomason's TinyXML2 library.
    • Implemented User Manual option into Help menu.
    • Implemented Check for updates... option into Help menu.

License

This article, along with any associated source code and files, is licensed under The GNU General Public License (GPLv3)