REAL ESTATE PRICE PREDICTION USING ADVERTISEMENT DATA
DOI:
https://doi.org/10.24867/01BE48VidovicKeywords:
data analysis, machine learning, regression analysis, neural networksAbstract
This paper presents a real estate price prediction model using advertisement data. Real estate technical specifications, images and, if available, geographical coordinates are extracted from advertisements, acquired from a real estate advertising website. The coordinates are used to form location ratings. The images are used to train a neural network for the detection of a real estate's equipment on images.Three separate datasets for training the price prediction models were formed. The first dataset contains only technical specifications of the real estates, the second dataset also contains location ratings, while the third dataset contains location ratings and objects detected on images. These datasets were used to train different regression models for predicting real estate prices. The performances of the models, represented by their achieved R2 scores, were compared in order to establish which model had the best performances.. The best performing model was the GBT (Gradient Boosted Trees) model on the dataset with location ratings and detected objects, with an achieved R2 score of 0.856.
References
[2] Azme Bin Khamis and Nur Khalidah Khalilah Binti Kamarudin. Comparative study on estimate house price using statistical and neural network model. International Journal of Scientific & Technology Research, 3(12):126–131, 2014.
[3] John Ottensmann, Seth Payton, and Joyce Man. Urban Location and Housing Prices within a Hedonic Model. Journal of Regional Analysis and Policy, 38, January 2008.
[4] Fanhua Kong, Haiwei Yin, and Nobukazu Nakagoshi. Using GIS and landscape metrics in the hedonic price modeling of the amenity value of urban green space: A case study in Jinan City, China. Landscape and Urban Planning, 79(3):240–252, March 2007.
[5] Nekretnine.rs. dostupno na https://www.nekretnine.rs.
[6] Scrapy | A Fast and Powerful Scraping and Web Crawling Framework.dostupno na https://scrapy.org.
[7] Stephen Conroy, Andrew Narwold, and Jonathan Sandy. The value of a floor: valuing floor level in high-rise condominiums in san diego. International Journal of Housing Markets and Analysis, 6(2):197–208, 2013.
[8] overpass api. dostupno na https://wiki.openstreetmap.org/wiki/Overpass_AP.
[9] Openstreetmap. dostupno na https://www.openstreetmap.org.
[10] darrenl. Tzutalin. labelimg. git code (2015)., September 2018. dostupno na https://github.com/tzutalin/labelImg.
[11] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European conference on computer vision, pages 740–755. Springer, 2014.
[12] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
[13] Giuseppe Bonaccorso. Machine learning algorithms. Packt, 2017.
[14] Robert Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), pages 267–288, 1996.
[15] Arthur E Hoerl and Robert W Kennard. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1):55–67, 1970.
[16] Hui Zou and Trevor Hastie. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2):301–320, 2005.
[17] Leo Breiman. Classification and regression trees. Routledge, 2017.
[18] Yoav Freund and Robert E Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences, 55(1):119–139, 1997.
[19] Jerome H Friedman. Greedy function approximation: a gradient boosting machine. Annals of statistics, pages 1189–1232, 2001.