Xindian District Real Estate Regression Analysis
The following is a regression model of real estate prices in the Xindian Dist. , New Taipei City, Taiwan. The Xindian District is an inner-city district located in southern New Taipei City. Xindian District is considered a highly fluent district. The Taipei Metro has five stations included in this part of the city. It has a population of 300,283 as of January 2016, with a population density of 2,500/km2 .The total area of the district is 120.2255km2.
Data
The UCI machine learning repository page provided the data for this model. The data was donated to the UCI team by Prof. I-Cheng Yeh from the Department of Civil Engineering, Tamkang University, Taiwan.
The dependent variables are.
- Transaction date
- House age (year)
- Distance to the nearest MRT (Metro Station) station (meters)
- Number of convenience stores in the living circle on foot (integer 1–10)
- Latitude
- Longitude
The independent variable is.
- House price of unit area (1000 New Taiwan Dollar/Ping, where Ping is a local unit, 1 Ping = 3.3 meters squared)
All regressions calculated with the OLS Function in the statsmodels python library.
Regressions
B1(Distance to Nearest MRT Station) = 0; Rejected


Distance to MRT station explains over half of the price unit variation(R-squared value 0.539). Distance to an MRT station produces a -6.1853 coefficient with the Price Unit variable.
Beta2 (#Convience Stores) = 0; Failed to Reject

There is not strong enough evidence to deploy this categorical data set in the final model. A positive correlation appears to exist, but with 95% confidence, one could not conclude that storefronts’ coefficients [1, 2, 3, 7, 10] were not zero. Grouping the storefront categories into high traffic [10–7], medium traffic [6–4], and low traffic [3–1] also did not yield better results. A final attempt to reduce the standard error and create two categories, high traffic [10–6] and low traffic [1–5], did not produce confident results.
A Better Model Fit
MRT station distance is the most significant variable, but with the failure to reject the 2nd null hypothesis, other factors are impacting the price variable.

House age does seem to have a weak negative correlation with price unit. Also, the northern part of the district seems more expensive than the southern part of the district.
When looking at the date the house was sold, compare seasonal price fluctuations with yearly price change.


More houses were sold in the summer and spring, but there was little effect on price. Price did increase in 2013. With categorical observation ‘Year’ added to the model, our final fine-tuned linear regression model is as follows.

Price-Unit = -7566.0012 + -4.9273(Log2 Meters from MRT Station) + -0.2324(Age of House) + 306.41(Latitude) + 3.3392(Years since 2012)
Note that in this equation, Latitude must is confined within the points of 25.02 and 24.93. Also, in this data set, only contains 2 years of house real estate prices. The 3.34 Year coefficient does line up with approximant Taipei real-estate growth estimates as a whole, but further examination should incur if speculating many years out.
Final adjusted R value for the model is .687.
Future Exploration
This analysis and model is missing a couple of undeniable price drivers. About 1/3 of Xindian’s real estate price variability is still unexplained by this model. Information about the type of apartment/house, number of room/bathrooms, and garage or not could fill in the rest model. This model could also improve with newer data, especially with the global pandemic occurring. Housing market fluctuations in large cities around the world have been occurring due to new remote work possibilities.
https://archive.ics.uci.edu/ml/datasets/Real+estate+valuation+data+set