-
An exploration of the
BRFSS data
Fernando Montenegro -
fsmontenegro@
Executive Summary
This
document is the report from the final course
project for the
Introduction to
Probability and Data
course, as part of
the
Duke/Coursera
Statistics
with R
specialization. The project
consisted of exploring a real-world
dataset -
CDC’s 2013
Behavioral Risk Factor Surveillance
System
- and creating a report
on three student-chosen research
questions.
The research questions
chosen - and their respective results -
were:
?
Is a respondent’s opinion of their
health status related to their
Body
Mass Index (BMI)? Is there any difference between
gender?
?
Yes, there were noticeable relations
between health
perception and BMI, as
well as gender-specific
differences.
?
How does being
a parent of a young child affect the amount sleep
time reported? How is this reported
differently between
genders?
?
Being a parent
of a young child resulted in less sleep being
reported, including a difference
between the genders.
?
Are responses to general health
perception related to the time of
year
of the survey was conducted? How do any
differences
show up across states?
?
There were no
significant differences between winter and
non-winter responses at the national
level, but there
indications of
differences in per-state responses.
Setup
The initial phase
consisted of loading the required packages and
data. This was done as per the project
instructions.
Load packages
library
(ggplot2)
library
(dplyr)
Load data
The data was
loaded from a local copy of the file, as per
course
instructions.
load
(
)
dim
(brfss2013)
##
[1] 491775 330
As can be seen above,
the dataset consisted of almost 500,000
observations with 330 possible
variables. Not all observations
included all variables, so data quality
was handled individually on
each
question below.
Part 1:
Data
Background on the BRFSS
According to the CDC
website
, “The Behavioral
Risk Factor
Surveillance System (BRFSS)
is the nation’s
premier system of
health-related telephone surveys that
collect state data about U.S.
residents
regarding their health-related risk behaviors,
chronic
health conditions, and use of
preventive services. Established in
1984 with 15 states, BRFSS now collects
data in all 50 states as
well as the
District of Columbia and three U.S. territories.
BRFSS
completes more than 400,000 adult
interviews each year, making it
the
largest continuously conducted health survey
system in the
world.”
Methodology
According to
the CDC, “BRFSS is a
cross
-sectional telephone
survey that state health departments
conduct monthly over landline
telephones and cellular telephones with
a standardized
questionnaire and
technical and methodological assistance from
CDC. In conducting the BRFSS landline
telephone survey,
interviewers collect
data from a randomly selected adult in a
household. In conducting the cellular
telephone version of the
BRFSS
questionnaire, interviewers collect data from an
adult who
participates by using a
cellular telephone and resides in a private
residence or college
housing.”
Observations on
Generalizability,
Causality, and Bias
While the course material makes brief
references to more
advanced statistical
content (causal inference), given the author’s
current knowledge about causality, the
following statements can be
made:
?
On the topic of
Generalizability: given the breadth of the survey
-
across all 50 states and other US
territories, coordinated by
the CDC
with each state’s health agency, …
- it
does seem to
capture enough of a random
sample to make it generalizable
to the
broad US population.
?
On Causality: given that the BRFSS is
an observational exercise -
with no
explicit random assignments to treatments - all
relationships indicated may indication
association, but not
causation.
-
-
-
-
-
-
-
-
-
上一篇:有人口增长英语作文
下一篇:关于中国人口问题的英语作文,作文