GitHub

您所在的位置：网站首页 › DATA2002课程 › GitHub

GitHub

2023-02-19 19:40| 来源: 网络整理| 查看: 265

Projects

There are projects done from my coursework at university

Data2002

Data analysis on Data2002 class survey

This is a survey conducted by Dr.Garth about the characteristics and some demographic information about students.

1 Cleaning data: The tasks including: tackling missing value, unrealistic value and the formatting issue.

2 Statistical Test: Using multiple test methods including t-test, Welch test and Wilcoxon rank-sum test to test the relationship between different variables using R.

Prediction of Birth Weight

This is the final group project, we build a model try to identify the plausible factors affecting birth weight.

1. Data Description: We firstly introduce some demographic information about data set. Then, we talk about the data cleaning process.

2. Model Selection: We try to select the model which best predicts birthweight by AIC and p-value both backward and forward.

3. Assumption checking: Checking assumptions: Linearity, Heteroskedascity and Normality by QQ-plot and Residual Plot

4. Model Evaluation: We calculate RMSE and MAE by cross-validation to evaluate our model compared to full model

Thanks to my team members, they are very helpful and responsible! ❤️

OLET1632

Childcare Centres in Sydney and Melbourne

This Project analyses the dataset relating to Child Care Benefit (CCB) approved child care services in Greater Sydney and Greater Melbourne.

The analysis contains Initial Data Analysis, Data cleaning and Research questions.

1. Initial Data Analysis: The part involves Background to report, including purpose, stakeholders and data (eg source, ethics, limitations).

2. Data cleaning: Inclusion of angle brackets to indicate multiple ‘less than’ values for two variables within the data set is a limitation that may make some statistical analyses difficult to conduct. We tackle this problem by assigning 2.5 and 5 to ambiguous value 'less than 5' and 'less than 10' assuming it is discrete uniform distribution.

3. Research questions: 1. Which services type is most effective? and 2. Is the average of the mean childcare fee same across time? If it is different, what might be the reason? and the possible reasons. Package and techiniques being used: ggplot,tidyverse,One-way ANOVA test.

Ecmt2150

Analysis of fertility data using multiple regression

This project involves the application of a range of econometric methods in analysing the effect of schooling or education on women’s fertility.

This analysis contains 5 parts, including Descriptive statistics, Multiple regression model: Estimation and Testing, Check for heteroskedasticity, Instrumental Variables and Summary of finding using Stata.

【本文地址】

GitHub

GitHub

今日新闻

推荐新闻