This project was to craete an end to end data pipeline for getting real estate price points in toronto area and visualizing it.
- Scrapping realtor.ca for starting data points
- Data Enrichment by adding postal codes for each address. I did this by scrapping search results on Google Map.
- Data transformation and analysis
- Visualizing the price points
- Scraping realtor.ca using python (Selenium and BeautifulSoup)
- Scraping seach result for pincodes on Google Map using Python (Selenium and BeautifulSoup)
- Pandas
- Selenium
- BeautifulSoup
- Seaborn
- Folium
I used code from this repo for drawing Choropleth map for Toronto. GeoJSon file for Toronto is not readily available so huge thanks to A Gordon. Relevant links are:-
- Medium post explaining this - medium post
- GitHub Repo - github_choropleth Please check above links.
Looked into no of listings by no of bedrooms. 1 bedrooms apartment are the most common in Toronto area.
No of lsitings by no of bathrooms. 1 bathrooms apartment are the most common.
Created a histogram of price points. We can see that it is right skewed. One of the reason for thsi graph is the presence of all types of houses i.e. 2 bedrooms, 3 bedrooms and 4 bedrooms etc.
In this graph, we can see no of listings for different postal code. I put a filter of no of listings more than 2 to focus more on high frequency postal code. It also helped in hightlighting some of the error in postal codes.
Drew mean price points for different area of toronto by no of bedrooms.
Choropleth map for toronto for all the listings
Choropleth map for toronto for all the 2 bedrooms real estate listings
Choropleth map for toronto for all the 3 bedrooms real estate listings
I have also posted this project here