Public Health Coursework Paper on Epidemiology and Biostatistics

Epidemiology and Biostatistics

W2. Compare and Contrast the Concepts of Causal Association and Chance

Scientist developed five critical criteria for distinguishing causal association from chance. These are:


For a causal relationship, the cause must precede the effect. Considerations of temporarily are noteworthy especially for a disease that take a long time to develop, such as cancer. The exposure of environment to chemical substances with a given year, for instance, could contribute to increasing the cancer levels within the same year (Bonita, Beaglehole, Kjellström & World Health Organization, 2006). Still, in asserting that the air pollution could cause cancer, the person suffering from the lung cancer could be excluded from the study. Follow-ups on the people on people suffering from the air pollution could occur to determine whether lung cancer develops.

Dose response

It is used to mean that the likelihood or intensity of a biological effect is greater in people or animals with more significant exposures to an agent than those with lower exposures. The availability of the dose-response relationship tends to support causality. Importantly, a dose-response relationship might also be due to a confounding factor that varies in intensity along with the elements under investigation. In fact, an association could be established in people who carry matches more frequently as an indication that they are heavy smokers.


An association is more likely to be causal if it is evaluated by different researchers at different places. It is inappropriate to consider the outcomes of a single study as causal because there are no ready grounds to establish a consistence judgment. Although associations created by confounding factors are expected to fluctuate between studies, the drive to convectional biological results should be consistent in all studies.


The scientist develops more confident in the causality of a strong epistemological study with strong association than weak ones. When an association is strong, it is more likely to be causal. For instance, it is true for the relationship between cigarette smoking and lung cancer. In fact, the causality gives the relative risk of at least 10. Else, an association could be strong due to readily identifiable confounding factors, which is demonstrated through the relationship between carrying of matches and lung cancer (Bonita et al, 2006). The causality of a weaker relationship could easily be due to subtle confounding factors that are hard to identify.

Epidemiologists would consider an increase in relative risks of greater than 1 but less than 2 as being equal to moderate, and would hesitate to label them as causal unless there is more supportive evidence on the relationship. Weak relationships of this nature could be generated by confounding that might never be predicated. Small increases in relative risk, however, could be crucial health matters, particularly if they are based on the heart disease. 

Moreover, the possibility of causal relationship when the association is small cannot be dismissed. For instance, the exposure to an infectious agent that shows relatively few clinical cases of meningococcal meningitis, a bacterial disease, with symptoms that include; headache, stiff neck, nausea, and vomiting.

Biological Plausibility

An association has a probability of being causal if it is meaningful in terms of scientific understanding of the biology of the disease or the health issue under consideration. For instance, an association between cigarette smoking and lung cancer makes biological sense; while an association between carrying match and lung-cancer is not. It means, therefore, that the criterion requires that an association must be biologically plausible from the standpoint of contemporary biological knowledge.

Roles of chance

In assessing the findings of a study the investigator needs to know whether the results obtained could have occurred by chance alone. This can be assessed by hypothesis testing (significance testing and P-values) or by estimation and confidence intervals.

Hypothesis testing

This requires a clear statement of the hypothesis being tested and the formulation of an appropriate null hypothesis. The null hypothesis usually states that there is no relationship between exposure and outcome; the alternative hypothesis states that there is an association present. In a case-control study for example, the null hypothesis might be that the odds ratio (case-control study) or relative risk (cohort study) is equal to one. The alternative hypothesis is that the effect is either greater or smaller than one (Bonita et al, 2006). This hypothesis can then be addressed using an appropriate test of statistical significance.

 The principle of these tests of statistical significance is usually that there is a difference between groups that can be measured and that the variance of the estimate of the difference can be estimated. The common form of test is observed minus the expected divided by the square root of the variance. Therefore, for a given variance, the larger the difference between groups, the more likely the observed outcome has not occurred just by chance alone and can be described as a statistically crucial results. For a given variance difference, the variance of the estimate will also affect the level of significance, the smaller the variance the more statistically important is the recorded difference. Here, the standard error of the estimate is a function of the sample size.

Estimation of the confidence level

The contribution of chance can also be estimated by delivering the confidence intervals. It determines the summary of data in the original units of measurements, as opposed to dimensionless probability scale of significance testing. Standard error, which has a core role of determining the confidence interval, is found from the sample size and standard deviation. It represents the estimate of the uncertainty of the sample statistic.

Example association

Consider an example of non-insulin-dependent diabetes that appears to occur in early stage among youths living in the United States. Assuming that the situation an epidemiologist is considering involves studying whether dietary consumption of sugar is related to diabetes, and then two variables are established: the level of sugar consumption and diabetes. If there is no association between the dietary sugar and diabetes, it means that the presence of diabetes independent of the amount of sugar consumed. A positive association of the sugar intake and diabetes could indicate that diabetes could increase with a rise in the product consumed. A negative association could mean an occurrence of diabetes decreases with an increase in the dietary sugar. In a non-causal association between diabetes and sugar intake could occur due to predisposition of genetic factors. People with genetic predisposition might prefer more sugar in the diet, hence increase diabetes cases. Therefore, the association between consumption of sugary material is secondary to the genetic predisposition and its non-causal association.

W3: What is Surveillance?

Surveillance is an approach that identifies health events from existing data, such as clinical or laboratory records, hospital discharge data, and death certificates. It requires collection of critical information. The purpose of and time frame for data collection should be specific. Data analysis is necessary to enhance interpretation and interventions.

Most surveillance programs concentrate on outcomes that can be ascertained with reasonable ease rather soon after the emergence of the damage and where effects of changes in the environment hazards are likely to be found. According to Labarthe (2011) “Many infections and control programs establish surveillance systems due to recommendations from federal agencies.” Clinic facilities that have developed programs under such conditions may not have agreed on goals and priorities to undertaking surveillance. As a result, the data collection becomes an end unto itself. Unluckily, the surveillance data have little influence on the infection rates because clarity as to their purpose and practical application is missing. Conversely, healthcare epidemiology and infection control programs with defined objectives, goals, and administrative support can be imperative and effective; hence the data can motivate clinicians to improve the quality of services.  

Maintaining the confidentiality of surveillance data is particularly crucial. While reporting is often a requirement of the system, surveillance and subsequent disease control efforts works best when those who report and the public cooperate. Breaches of confidentiality may damage the trust between these providers and the public health department. When analyzing data from small geographic areas, individuals may be identifiable using small numbers in some table cells.

A successful surveillance model must inherit several requirements. First is a set of clear and specific primary objectives. When creating a new surveillance system or revising an existing system, the staff must first define the priorities of the infection control program. Here, both the type of surveillance they should conduct and the types of data are determined. After data analysis, the hospital staff custom designs a surveillance system for their facility (Labarthe, 2011). The hospital surveillance and infection control personnel should consider characteristics of the institution, including size and the type of hospital.

W4: A Healthy Case

Multiple studies conducted to evaluate microbiological qualities on wastewater focused on the occupational and recreational risks. The observational study methods applied to assess the risks associated with existing practices, since there was no possibility of introducing a wastewater treatment facility and assessing its impact on health through an intervention study. Moreover, an invention in persons from farming families in direct contact with affluent from storage reservoirs or raw wastewater were compared with infections in a control group of farming families engaged in rain-fed agriculture. The storage tank performs a “partial treatment” function, producing water of varying microbiological properties. Some of the confounding factors that could apply in assessing the wastewater exposure include socio-economic factors, water supply, sanitation, and hygiene.

Some of the effluents from the first reservoir are made to pass into the second reservoir where it can be retained for quality improvement. The study will indicate that the untreated wastewater contains a high concentration of fecal coliforms and nematode eggs. Retention of single container could reduce the number of nematodes eggs considerably, to mean of <1 eggs/1 whereas fecal coliform levels will reduce to 105/100 ml, with annual variations based on factors such as rainfall. The intensity of nematode eggs would remain below 1 egg/1 even after a small amount of raw wastewater enters the affluent in the lower side.

 In the analysis, the approximate effects of the exposure to wastewater and the reservoir water will be adjusted for the outcomes of all other variables emerged as the confounders. Exposure to one reservoir is linked to an increased risk of Ascaris infections in young children, whereas the other reservoir may not indicate evidence of exposure. If the consequences of the exposure were not carried in the rainy season, the later result could not be conclusive. The effect could be enormous in the in the dry season for all the age groups (Melnick & Everitt, 2008).

A confidential interval in this study will be relevant based on the measure of the sensitive health matters. It will involve assigning each individual a personal exposure status based on the activities engaging direct contact with the wastewater as well as the frequency of that contact. Consequently, a more valid measure of infection will be established, improving the biological plausibility and strength of the data. The results could show that contact with effluent has an association with increase in the prevalence of Ascaris infections in the general population as compared to the control group.


ANOVA is essentially a procedure for testing the difference among varied groups of data for homogeneity. The main concept of ANOVA is that the total amount of variations in a set of data is broken down into two types, that amount which can be attributed to chance and that amount that can be attributed to specified causes. There may be variations between samples and also within sample items. ANOVA consists in splitting the variance for analytical purposes. Therefore, it is a method of evaluating the variance to which a response is a subject into various components corresponding to different sources of variation.

Through ANOVA technique, one can study multiple factors that are hypothesized or said to influence the dependent variable. According to Labarthe (2011), “One may examine the differences amongst various categories within each of the factors that may consist of large number of possible values.” ANOVA has several epidemiology applications. In the medical case, different drugs are manufactured with the motive of preventing or curing a given disease. The performance of these drugs can be judged based on the interrelationship between the independent and dependent variables (Labarthe, 2011). If a team of medical researchers want to test which antidepressant is more effective at reducing depressive symptomatology in patients, they need a logical sample size with depressive symptoms and a control group without the syndromes. In this study, the effect of time would be revealed by the way in which the average depressions in a given treatment group changes over time. Repeated ANOVA measures will show the primary interaction between treatment and time in their effects on average depression.

W6: Statistical Inference

Statistical inference is the process of drawing conclusions about the population with regard to a representative sample of the population. It has to do with predicting what might be expected of further observations or further studies, and quantifying degrees of certainty or uncertainty about the results that have been obtained. Probability is used to show the level of reliability in the conclusion. For instance, measures such as the transmission probability that condition on contact between infectives and susceptible are conditional statistical inferences. Measures of disease frequency that do not, such as incidence rate and incidence proportion, are unconditional statistical inferences. As a result, association and causal effects in statistical inference will differ.

The design of epidemiological studies needs to include clear statements about the degree of certainty desired in the results. Inference using confidence intervals is preferred over using p-values. However, the use of statistical inference is not suitable to asses confounding. Instead, the investigator must make such an evaluation by applying data-based comparisons that address the expected outcomes. Moreover, confounding to statistical inference means that the extent of certainty is not quantifiable but only arguable. For an epidemiology study focusing exposed and non-exposed groups, the validity of statistical inference will entail eliminating the confounding elements to achieve the prime goals.

W7: Screening for Disease

A good screening test should be simple to prepare and perform. Moreover, one that shall be administered by non-physician medical personnel will be less expensive that the one that requires years of medical training to administer. Depending on the medical condition, the screen test should be carried rapidly. Occasionally, the amount of time spent to screen an individual is directly related to the success of the program. For instance, if a screening test require 10 minutes out of the person’s schedule, it is perceived as being more valuable than one that requires an hour or more. Furthermore, an immediate feedback is worth than a test that takes weeks or months to allow development of other treatments.

Since the majority of people on whom screening is performed are healthy, or at least have no complaints referable to the disease for which they are being screened, the test must be safe. “It would be unacceptable to cause injury by screening activity itself, or by the workup of many false positive results created by screening, unless the possible injury is heavily outweighed by the benefits of the screening activity” (Labarthe, 2011).

A screen test should not expose the client to risks, thus should be acceptable to the target group. For instance, devices for screening testicular cancer are more acceptable in women than in men.

W8: Using Scatterplot

A scatter plot is suitable to show the relation between two quantitative variables; how much one variable is affected by the other. “Each point in the scatter plot represents two values, and one or more than one value can correspond to any value in the horizontal or vertical axis” (Labarthe, 2011). Suppose a study focus on the relationship between birthweight and the gestation period. This means that the data is “paired” to make two measurements: one birthweight and the gestation period. These parts consists continuous variables. The horizontal axis presents the independent variable, which is not influenced by the other measurement. In this case, the gestation period will take the independent variable while the birthweight is the dependent variable, hence displayed on the vertical axis. A scatterplot is an efficient method of presenting such a relationship where each point displays a specific mother-child pair.

If ancillary variable accounts for heterogeneity of the whole date, it will be necessary to draw scatterplot of the estimate from each study against this ancillary variable. If the extra variable is crucial in evaluating the effect of the response variable, this should be indicated as a pattern in the plot. Melnick & Everitt (2008) argues, “Instead of drawing a standard scatterplot, it is prudent to present these points in relative size to the weights of different studies.”

W9: Approach to Interpreting Epidemiologic Studies

Meta-Analysis is a suitable approach of addressing the effects of exposures in a range that would be of concern for the general population. Melnick & Everitt (2008) says, “This approach has been utilized to summarize data for the steps of hazard identification and dose-response assessment.” Epidemiologists have applied this technique to summarizing experimental and observational studies. Again, this approach presents a potentially more informative interpretation of the evidence when sampling variation and small studies have obscured the status of evidence. It is instructive to use meta-analysis to characterize the relationship between indoor air pollution and lung cancer risk.

In randomized clinical trials (RCTs), meta-analysis can be very effective and powerful. The approach is suitable since it manipulates the RCTs to generate unbiased estimates of the impacts of an experimental treatment. For a set of sensibly selected and combines trials, the differences of the results between the studies is attributable to the meta-analysis variations. Hence, the entire effect examined through meta-analysis approach provides unbiased estimates of the effects of the treatment. Practically, this approach improves precision particularly in examining food poisoning, eliminating the probability of false-negative results. This aspect initiates the process of introducing a timely treatment to control the effects of the diseases and also in early identification of the adverse-effects of a health issue.

W10: Epidemiologic Knowledge

The application of epidemiology knowledge to health services management is broad, thus an efficient and effective means of care delivery. From this course, the insight gained could be helpful to generate focus on the personal health and the general matters concerning the health services management. The concept of disease analysis, management and prevention as linked to the Epidemiologic framework indicates that there are different approaches of improving individual physical and mental status. As a result, health institutions should embrace new dimensions of promoting good health in the community.

These courses also provide an expanded concept of improving study methods as well as the risks of poor choice of approach. Since most epidemiological researches deal with tests of association, inferring causation is an uncertain process and the degree of uncertainty can be reduced by considering the results of both epidemiological investigations. Causation is approached by a convergence of evidence from experimental works. Convergence means here the tendency for different studies in different loci to support a similar association. This knowledge could be essential in health promotion and disease prevention techniques.

Lastly, insight in this course suggests that epidemiological research differs from clinical research in the context of the size of the population of study. If some errors are detected in the course of conduct of such studies, repeating the entire process could be time consuming and expensive. Hence, utmost care is required during epidemiological studies.


Bonita, R., Beaglehole, R., Kjellström, T., & World Health Organization. (2006). Basic epidemiology. Geneva: World Health Organizatio

Labarthe, D. (2011). Epidemiology and prevention of cardiovascular diseases: A global challenge. Sudbury, Mass: Jones and Bartlett Publishers.

Melnick, E. L., & Everitt, B. (2008). Encyclopedia of quantitative risk analysis and assessment. Chichester, West Sussex, England: John Wiley.