Project: Analysis of Risk Factors

The data file ncbirth.sav is a sample of birth records taken by the North Carolina State Center for Health and Environmental Statistics. The data set represents a sample of 1450 births taken within the state of North Carolina. Of particular interest will be incidents of Low Infant Birth Weight. Low birth weight has been associated with weaker development of many characteristics such as intelligence, coordination, strength, etc. Low birth weight is commonly defined as less than 2500 grams (approximately 88.18 ounces).

The variables examined are:
sex: Sex of child (1=Male, 2=Female)
race: Race of child (0=other Nonwhite, 1=White, 2=Black, 3=American Indian,  4=Chinese, 5=Japanese, 6=Hawaiian, 7=Filipino, 8=Other) 
Age of mother
mothed: Education level of mother
gest: Completed Weeks of Gestation
bwtgroup: Birth weight (grams) group (0=500 or less, 1=500-1000, 2=1001-1500, 
3=1501-2000, 4=2001-2500, 5=2501-3000, 6=3001-3500, 7=3501-4000, 8=4001-4500, 9=4501 and over)
marital: Marital status (1=married, 2=not married)
pounds: Number of pounds in actual birth weight 
ounces: Number of remaining ounces in actual birth weight
cigs: Average number of cigarettes daily (98=smokes an unknown amount)
drinks: Average number of alcoholic drinks weekly (98=drinks an unknown amount)
apgar1: Apgar score at 1 minute (0-10)
fas: Fetal Alcohol Syndrome (0=No, 1=Yes)
Number of children born of the pregnancy
totounc: Weight of child in total ounces
Answer the following questions. Answers to questions need to be properly labeled. Use SPSS output to support your answers wherever it is needed. SPSS outputs have to be also properly labeled and referenced in your answers, summaries or conclusions.  Be aware of data value "98" in the variables cigs and drinks. Find a reasonable way (or form) to include them in the analysis unless the variable is not considered in the analysis.
  1. Dichotomize the totounc variable using 88.18 ounces as the cutoff and save it as btotounc which will have value 1 if underweight and have value 0 if over 88.18 ouonces. Create an indicator variable for smoking status, name it bsmoke. This variable will have a value 1 for indicating the mother smoked and 0 indicating the mother did not smoke.  Create an indicator variable from drinks, name it bdrankal.  This variable will have a value 1 for indicating the mother drank and 0 indicating the mother did not drink. Create an indicator variable, bmothed, for whether a mother completed high school or not (0 if mothed £ 12, 1 if mothed > 12).
  2. The odds ratio for studying a risk factor can be calculated from a simple 2 by 2 contingency table and can also be estimated from a logistic regression model as in problem 3. Is there any difference between these two odds ratios? If yes, explain the difference and the advantages and/or disadvantages for each method.
  3. What are the limitations of the analysis above?
The data was from North Carolina Vital Statistics Institute for Research in Social Science. Visiting the web site may help you to gain more insight about the data. The data for this project comes from 1995 birth registry at the North Carolina State Center for Health and Environmental Statistics.  Use is allowed if reference is cited to the above agency.