GitHub |
您所在的位置:网站首页 › DATA2002课程 › GitHub |
Projects
There are projects done from my coursework at university Data2002Data analysis on Data2002 class survey This is a survey conducted by Dr.Garth about the characteristics and some demographic information about students. 1 Cleaning data: The tasks including: tackling missing value, unrealistic value and the formatting issue. 2 Statistical Test: Using multiple test methods including t-test, Welch test and Wilcoxon rank-sum test to test the relationship between different variables using R. Prediction of Birth Weight This is the final group project, we build a model try to identify the plausible factors affecting birth weight. 1. Data Description: We firstly introduce some demographic information about data set. Then, we talk about the data cleaning process. 2. Model Selection: We try to select the model which best predicts birthweight by AIC and p-value both backward and forward. 3. Assumption checking: Checking assumptions: Linearity, Heteroskedascity and Normality by QQ-plot and Residual Plot 4. Model Evaluation: We calculate RMSE and MAE by cross-validation to evaluate our model compared to full model Thanks to my team members, they are very helpful and responsible! ❤️ OLET1632Childcare Centres in Sydney and Melbourne This Project analyses the dataset relating to Child Care Benefit (CCB) approved child care services in Greater Sydney and Greater Melbourne. The analysis contains Initial Data Analysis, Data cleaning and Research questions. 1. Initial Data Analysis: The part involves Background to report, including purpose, stakeholders and data (eg source, ethics, limitations). 2. Data cleaning: Inclusion of angle brackets to indicate multiple ‘less than’ values for two variables within the data set is a limitation that may make some statistical analyses difficult to conduct. We tackle this problem by assigning 2.5 and 5 to ambiguous value 'less than 5' and 'less than 10' assuming it is discrete uniform distribution. 3. Research questions: 1. Which services type is most effective? and 2. Is the average of the mean childcare fee same across time? If it is different, what might be the reason? and the possible reasons. Package and techiniques being used: ggplot,tidyverse,One-way ANOVA test. Ecmt2150Analysis of fertility data using multiple regression This project involves the application of a range of econometric methods in analysing the effect of schooling or education on women’s fertility. This analysis contains 5 parts, including Descriptive statistics, Multiple regression model: Estimation and Testing, Check for heteroskedasticity, Instrumental Variables and Summary of finding using Stata. |
今日新闻 |
推荐新闻 |
CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3 |