Web scrapping example in Csharp

Web Scraping Made Easy with C#

Hey there! Have you ever had to find the website for a long list of companies but only had their names? It’s definitely a tedious task, especially when dealing with hundreds of companies. That’s why I decided to build a program using the Selenium WebDriver in C# to automate this process. With just a simple list of company names, the program will search for their website on Google and return the results. It’s a real time-saver and makes the task much more enjoyable. So, let’s sit back, relax, and see how this program can make your life easier!

network, web, skyline-3443547.jpg

The Program

The program we’ll be looking at today is a C# script designed to automate the search for website URLs of a list of companies. This script was created to tackle the time-consuming task of manually searching for website addresses for a large number of companies. The program starts by defining a list of company names, and then uses a foreach loop to iterate through the list and perform a Google search for each company. The Selenium WebDriver library is used to drive a Chrome browser, and the program extracts the first search result URL returned by Google. The program then outputs the list of companies and their corresponding website URLs to the console. You could extract the data to excel if you need to.

Web scrapping

After initializing the Selenium WebDriver, the program navigates to the Google search page and uses the driver’s FindElement method to locate the search box element by its name attribute. It then enters the search query by using the SendKeys method on the search box element, concatenating the company name with the string “site internet”. The search is then submitted by using the Submit method on the search box element. Once the search results page is loaded, the program retrieves the URL of the first search result by using the FindElement method again to locate the anchor tag within the first search result container, and then calling the GetAttribute method on the anchor tag to retrieve the href attribute.

Selenium WebDriver

While the Selenium WebDriver is a powerful tool for automating web tasks, it can also present some challenges. One common issue is the element not interactable exception, which can occur when attempting to interact with an element that is not yet fully loaded or is not visible on the page. To avoid this, the program can use the Wait method of the driver to wait for the element to become visible before attempting to interact with it. Additionally, it is important to ensure that the target element is visible on the page before attempting to interact with it, as attempting to interact with a hidden element can also result in an exception.

Conclusion

Using a tool like the Selenium WebDriver, developers can easily write programs that interact with web pages and extract data in a structured format. This can save a lot of time and effort compared to manual data collection, especially when dealing with large amounts of data. By leveraging the power of web scraping and automation, developers can enhance their workflows and gain new insights from the vast amounts of data available on the web.