--- title: "Math 50 Fall 2017, Homework #2" ---
__NOTE: For your homework download and use the template__ (https://math.dartmouth.edu/~m50f17/HW2.Rmd) __Read the green comments in the rmd file to see where your answers should go.__
## Question-1 (Sample) Given a fixed confidence interval percentage (say 95%) at what value of x does CI on the mean response achieve its minimum width? Write an R-chunk using the propellant data which computes the following. (a) Fit a simple linear regression model relating shear strength to age. (b) Plot scatter diagram. (c) Plot two curves (in blue color) that traces upper and lower limits of 95% confidence interval on $E(y|x_0)$ (d) Plot two curves (in red color) that traces upper and lower limits of 95% prediction interval for $y$ (e) Print the 95% quantile of the corresponding t distribution ### Answer: The width of the interval is $$ 2 t_{\alpha/2,n-2} \sqrt{MS_{Res} ((1/n) + (x_0 - \bar{x})^2 / S_{xx} } $$ and all terms inside the square root are positive. Therefore it is minimized when $x_0=\bar{x}$. ```{r} # Computation part of the answer : prop<-read.table("https://math.dartmouth.edu/~m50f17/propellant.csv", header=T, sep=",") shearS<-prop$ShearS age<-prop$Age plot(age, shearS, xlab = "Propellant Age (weeks)", ylab = "Shear S. (psi)", main = "Rocket Propellant") fitted <- lm(shearS ~ age) ageList <- seq(0,25,0.5) cList <- predict(fitted, list(age = ageList), int = "c", level = 0.95) pList <- predict(fitted, list(age = ageList), int = "p", level = 0.95) matlines(ageList, pList, lty='solid' , col = "red") matlines(ageList, cList, lty = 'solid', col = "blue") # since n=20 we look at the t_18 distribution wantedQuantile <- qt( 0.95, 18) ; cat("95% quantile is of t_18 is : ", wantedQuantile ) ; ```
## Question-2 Plot the same graph as in Question-1 without using R function predict, but instead directly calculating the interval limits we discussed in class. In particular, what are the limits of 95% confidence interval on $E(y|x_0)$? ### Answer: ```{r} # Computation part of the answer : Sxx<-sum((age-mean(age))^2) Sxy<-sum((age-mean(age))*shearS) beta1Hat<-Sxy/Sxx beta0Hat<-mean(shearS)-Sxy/Sxx*mean(age) plot(age,shearS) abline(c(beta0Hat,beta1Hat), lwd=2, col='blue') N <- length(age) ; dof <- N - 2 ; muHat <- beta0Hat + beta1Hat * ageList ; yHat <- fitted$fitted.values ; SSres <- sum((shearS - yHat)^2); MSres <- SSres / dof ; xBar <- mean (age) tQuantile_975 <- qt( 1 - 0.05/2 , 18) ; upperCI <- muHat + tQuantile_975 * sqrt ( MSres * ( (1/N) + (ageList-xBar)^2 / Sxx ) ) lowerCI <- muHat - tQuantile_975 * sqrt ( MSres * ( (1/N) + (ageList-xBar)^2 / Sxx ) ) matlines(ageList, upperCI, lty = 'solid', col = "blue") matlines(ageList, lowerCI, lty = 'solid', col = "blue") yHatList <- beta0Hat + beta1Hat * ageList ; upperPI <- yHatList + tQuantile_975 * sqrt ( MSres * ( 1 + (1/N) + (ageList-xBar)^2 / Sxx ) ) lowerPI <- yHatList - tQuantile_975 * sqrt ( MSres * ( 1 + (1/N) + (ageList-xBar)^2 / Sxx ) ) matlines(ageList, upperPI, lty = 'solid', col = "red") matlines(ageList, lowerPI, lty = 'solid', col = "red") ```
## Question-3 Load the propellant data and fit a simple linear regression model relating shear strength to age. (a) Test the hypothesis $\beta_1 = -30$ using confidence level 97.5%. (b) Calculate the limits of 97.5% confidence interval for $\beta_0$ and $\beta_1$ (c) Is there any relation between the answers you find in part (a) and (b) ? (d) Calculate $R^2$ ### Answer: ```{r} # Computation part of the answer : seBeta1Hat = sqrt ( MSres / Sxx ) ; guess = -30 ; testStat = ( beta1Hat - guess ) / seBeta1Hat ; tQuantile_9875 <- qt( 1 - 0.025/2 , 18) ; cat ("Part (a) Absolute value of test statistic is " , abs(testStat) , " Since it is greater than t_quantile=" , tQuantile_9875 , " we REJECT the hypothesis \n\n") seBeta0Hat = sqrt ( MSres * ( (1/N) + (xBar)^2 / Sxx ) ) ; cat ("Part (b) Lower and upper bounds of 97.5% CI for beta0 are : " , beta0Hat - tQuantile_9875 * seBeta0Hat ," and ", beta0Hat + tQuantile_9875 * seBeta0Hat , " Lower and upper bounds of 97.5% CI for beta1 are : " , beta1Hat - tQuantile_9875 * seBeta1Hat ," and ", beta1Hat + tQuantile_9875 * seBeta1Hat, "\n\n") cat ("Part (c) Yes, t-test is equivalent to testing whether -30 is in the CI for beta1. We observe that -30 is out of the above confidence interval for beta1, thus hypothesis is rejected in part b \n\n") yBar <- mean (shearS) SSt <- sum((shearS - yBar)^2); SSr <- sum((yHat - yBar)^2); cat ("Part (d) R-square is : " , SSr / SSt ) ```
## Question-4 Load the propellant data. This time let us consider a relation between square of shear strength and propellant age. (a) Fit a simple linear regression model relating __square__ of shear strength to age. Plot scatter diagram and fitted line. (b) Using analysis-of-variance test for significance of regression (using the formulas we discussed in class) (c) Use t-test and check significance of regression (using the formulas we discussed in class) (d) Does the regression analysis predict a linear relationship between square of shear strength and propellant age ? ### Answer: ```{r} # Computation part of the answer : prop<-read.table("https://math.dartmouth.edu/~m50f17/propellant.csv", header=T, sep=",") shearS<-prop$ShearS age<-prop$Age shear_SQ<- (prop$ShearS)^2 plot(age, shear_SQ, xlab = "Propellant Age (weeks)", ylab = "Shear S. Squared", main = "Rocket Propellant") fitted_SQ <- lm(shear_SQ ~ age) abline(fitted_SQ$coef, lwd=2, col='blue') yHat_SQ <- fitted_SQ$fitted.values ; beta1Hat_SQ <- fitted_SQ$coefficients[[2]] SSres_SQ <- sum((shear_SQ - yHat_SQ)^2); MSres_SQ <- SSres_SQ / dof ; Sxx<-sum((age-mean(age))^2) seBeta1Hat_SQ = sqrt ( MSres_SQ / Sxx ) ; testStat_SQ = ( beta1Hat_SQ - 0 ) / seBeta1Hat_SQ ; tQuantile_975 <- qt( 1 - 0.05/2 , 18) ; fQuantile95 <- qf(0.95,1,18) ; yBar_SQ <- mean(shear_SQ) SSr_SQ <- sum((yBar_SQ - yHat_SQ)^2); SSt_SQ <- sum((yBar_SQ - shear_SQ)^2); MSres_SQ <- SSres_SQ / dof ; MSr_SQ <- SSr_SQ / 1 ; fStat <- MSr_SQ / MSres_SQ ; cat ("Part (b) F statistic is " , fStat , " Since it is greater than f_quantile=" , fQuantile95, " analysis of variance test REJECTS the hypothesis of significance of regression \n\n") cat ("Part (c) Absolute value of test statistic is " , abs(testStat_SQ) , " Since it is greater than t_quantile=" , tQuantile_975, " we REJECT the hypothesis of significance of regression \n\n") cat ("Part (d) Yes, since rejection of beta1=0 hypothesis above suggests a linear relationship.") ```
## Question-5 Once again using propellant data fit a simple linear regression model between shear strength and propellant age. Consider the steps that we used to obtain t-test for hypothesis $\beta_1 = G_1$, and following similar steps in order to develop a test for $\beta_1 > G_1$ instead. Then (a) Test the ~~~hypothesis~~~ statement $$\beta_1 > -50$$ with confidence level 99.9%. (b) Find the smallest value $G_1$ such that the above ~~~hypothesis~~~ statement is rejected. (c) Similarly what is the smallest value $G_0$ such that the statement "$\beta_0 > G_0$ with probability 0.999" is rejected. ### Answer: First note that, with probability of 0.999 the value of $(\hat{\beta}_1 -\beta_1 )/se(\hat{\beta}_1)$ is smaller than $q_{0.999}$ ($0.999^{th}$ t-Quantile value). Thus, $\beta_1$ is greater than $\hat{\beta}_1 - q_{0.999} se(\hat{\beta}_1)$ with the same probability. ```{r} # Computation part of the answer : tQuantile_999 = qt( 0.999 , 18) ; cat ( "Using the note above, beta1 is greater than (beta1Hat - seBeta1Hat * tQuantile_999 ) = ", (beta1Hat - seBeta1Hat * tQuantile_999 ) ," with probability 0.999 \n\n" ) wantedQuantile <- ( beta1Hat - (-50) ) / seBeta1Hat ; cat ("Part (a) Test is not rejected since -50 is smaller than " , (beta1Hat - seBeta1Hat * tQuantile_999 ) , " Alternatively, you can look at the p-value for the test : p-value : " , pt(wantedQuantile, dof) , " which is larger than 0.999 \n\n") cat ("Part (b) For any value greater than " , (beta1Hat - seBeta1Hat * tQuantile_999 ) , " test is rejected \n\n" ) cat ("Part (c) \n Similar to above Go=(beta0Hat - seBeta0Hat * tQuantile_999 ) Go=", (beta0Hat - seBeta0Hat * tQuantile_999 ), "\n " ) ```