Release V0.3

External Project Contribution

Background Knowledge

On October 19th 2020, during my Release v0.2, I worked on a web scraper. Web scraping is extracting data from websites, such as Google. While I was browsing on Github, I found this issue. I have used web data extraction before when I was harvesting urls for a website. The repository owner for that weeks project, was looking to purchase some weights for at home workouts during quarantine. The repository owner wanted a notification sent to his personal email when the scraper found weights in his price range. You can check out the implementation of SendGrid here

This week, I thought it would be cool to try and implement a web scraper myself! I found this issue while browsing on Github called realpython/python-scripts. I always thought these repositories were about getting a commit into a repository for Hackoctober Fest, also known as cheating. It turns out these repositories are actually pretty cool! The owner of the repository created an issue saying it would be nice to have a script that scrapes all the events from the list of Hackoctober events, and store them in a CSV file. I thought this would be a great challenge to test my abilities in Python. 

During this Pull Request, I got some exposure to the library called Pandas. Pandas is a software library for data manipulation. Although I didn’t get to work with Pandas very much, I got to use DataFrames one of the methods that this library includes. BeautifulSoup is another library that I had the pleasure to gain experience with. BeautifulSoup is a package for parsing HTML documents and allows you to find HTML attributes efficiently. With the use of BeautifulSoup, I was able to find the HTML elements that I wanted to manipulate rather than waste time parsing any other text that I did not want. I simply was able to append this data to a list, that I would later use to create my comma separated value list.

 




Comments

Popular posts from this blog

Working with others

Release V0.3 Part 2

Paving the Way