A considerable number of concepts and terms need to be understood to develop a rigorous research design. These concepts must be taken into consideration before moving on with the experimental phase of the project. This paper examines some of these concepts and compares their definitions in order to clearly elucidate their meaning.
According to Lee and Baskerville (2003), generalizability is the validity of a theory or hypothesis outside the setting in which it was tested. Without generalizability a hypothesis is not useful; it cannot be applied to a larger sample or population that is under study (Lee and Baskerville, 2003). To help ensure generalizability, a large representative sample of the population under study must be procured. Generalizability is important to research design because it enables researchers to extrapolate results to larger populations and make conclusions. For example, generalizability may not occur in some medical studies because of the lack of randomized sampling. Clinicians cannot predict which patients will have a heart attack, go into a coma or improve in condition. Subjects are often self-selected from a population because of their previous condition and it is difficult to rule out confounding factors. This often limits the applicability of medical research to a large general population (Slack & Draugalis, 2001).
Type I error is an error associated with hypothesis testing and is important to understanding the validity of test results. There are two types of error in hypothesis testing, Type I and Type II error. According to Peck and Devore (2012) Type I error, or false positives, occurs when a researcher or test asserts that the alternative hypothesis is true when it is not, thus rejecting a true null hypothesis. The probability of Type I error is also called the significance level of the test; this can thus be used to limit the amount of Type 1 error. Limiting Type 1 error risks increasing Type II error (Peck and Devore, 2012). Type I error is important because it can be limited by the significance level of the test; different studies require different significance levels for practical and ethical reasons. Practically, significance helps to control the amount of error and ensure validity without having to take extremely large sample sizes, which could be financially and practically unrealistic (Wuensch, 1994).
Type II error, also known as a false negative, is the error that occurs when a test fails to reject the null hypothesis when the alternative hypothesis is true. The probability of this type of error is symbolized by β and is inversely related to the power of the test (Type I and Type II Error n.d.). Depending on circumstances, Type II errors may have worse consequences for the experiment. For example, if investigators wish to study a potential association between a drug and prevalence of psychotic episodes, they must take into account both types of error. If the null hypothesis is that the drug is not associated with psychotic episodes whereas the alternative hypothesis states the reverse, Type II error has more serious consequences. It would be less damaging to believe that the drug is associated with psychotic episodes when it does not than to believe the reverse (Banerjee et al., 2009). The decision to limit Type I errors should be made in light of these consequences.
Statistical power is the probability that a test will result in the rejection of a null hypothesis when it is false. Alternatively, power measures a test’s ability to detect an alternative hypothesis. It is thus inversely related to the Type II error (The Power of a Statistical Test, n.d.). It is most important for helping to calculate the required sample size needed to test the effects of size (Baguley, 2004).
A hypothesis is a testable statement about a phenomenon under study. It must be simple, direct, and testable. In hypothesis testing, the hypothesis is defined in two different ways: the null hypothesis and the alternative hypothesis. The null hypothesis states a default or null state of circumstances, that is there is no effect or difference in the sample. The alternative hypothesis states that there is an effect or difference in or between samples (Coolidge, 2012). Hypothesis testing is central to research design because it allows investigators to systematically test research variables without introducing confounding variables or bias to their results (Gravetter & Wallnau, 2013).
Often used in qualitative research, purposeful sampling is not random at all. Investigators select subjects based on the information that they can add to the study, not by how well they represent a general population (Creswell, 2007). Thus the prime importance of purposeful sampling is that it allows investigators to get an in-depth understanding of a subject. This type of sampling is used in expert interviews. Experts are chosen based on what they know, not how well they represent a population. Purposeful sampling is useful for qualitative data acquisition rather than quantitative data analysis. (Creswell, 2004).
In simple random sampling, each subject has an equal opportunity to be selected in the sample to be tested (Yates Moore & Starnes, 2008). For example, when sampling a batch of light bulbs for defectiveness, each light bulb in the population has an equal chance of being selected. Simple random sampling is important because it helps ensure that results can be extrapolated to a larger population. Thus any results taken from our sample of light bulbs can be extrapolated to the population from which those light bulbs were sampled. Furthermore, the mathematical theorems for which such testing is based on, cannot work in a non-random sample (Simple Random Sampling. n.d.). If we were to non-randomly choose light bulbs to test defectiveness, we may not get an accurate picture of how well the light bulbs work. Furthermore, we could not do any statistical analysis on such a sample since any conclusions would be invalid.
When a population is divided into variable sub-populations, it is better to sample each subpopulation independently from one another. First, the researcher divides all possible sample units into these relatively homogenous subgroups. Then simple random sampling is employed within each group (Easton & McColl, n.d.). If this method were not employed in cases with variable subpopulations, it is likely that some sub-groups will be better represented in the sample than others. This method reduces statistical error and thus improves the validity of results (Levy & Lemeshow, 2008).
Cluster sampling is a systematic sampling method whereby the population is divided into “natural groups.” These different subgroups are randomly sampled. Individuals are then sampled from the groups that were included as a part of the primary sampling strategy (Sampath, 2005). Cluster sampling is an important alternative form of sampling because it recognized that there may be limitations in the logistics and costs of data acquisition (Som, 1996).
All results obtained through experiment or observation have at least a small chance of being caused by statistical accident rather than real processes. Statistical significance is simply defined as the probability that an effect is not due to random chance. It is also the limiting level that we place on a Type 1 error (Scholzhauer, 2007). Statistical significance is important to consider during research design because it allows researchers to control for certain types of error and thus gain confidence in their results. Certain types of studies require different levels of statistical significance. Social and behavioral sciences generally allow for a 5% error rate, whereas medical studies must have lower error rates, 1 or .5% (McKillup, 2006). Importantly, note that statistical significance does not necessarily equal overall importance (Gelman & Stern, 2006).
In experimental research, the independent variable is the factor that is manipulated or changed in order to induce and test responses in other factors, the dependent variable(s). In non-experimental research, the independent variable is believed to have some effect on other variables; this effect is tested and quantified. For example, in an experiment testing the effect of drug A on mouse mortality, the drug is the independent variable. The presence or absence of drugs indicates different states of this variable (Independent and dependent variables, n.d.). Explicitly defining the independent variable is important to research design because it allows researchers to focus on changing that variable alone while keeping all other factors constant (White & McBurney, 2013). Testing the independent variable like this ensures that no other factor confounds the results of the study.
The dependent variable is responsive to the independent variable. In the example previously mentioned, the dependent variable was mouse mortality. Mouse mortality was dependent on the presence or absence of drug A. Measuring the dependent variable is important because it's responsiveness to the independent variable can tell us about their relationship (Punch, 2005). For example, if no mice died when exposed to drug A, (with a large representative sample of mice), then the researcher can assume (within a certain confidence level) that drug A does not cause death in mice. Independent and dependent variables are intrinsic to the question or questions that researchers wish to answer and therefore they must be carefully defined to avoid ambiguity (Iverson & Gergen, 1997).
The intervening variable is a theoretical variable used to explain the relationship between two or more observable variables such as the dependent and independent variables (Maccorqodale & Meehl, 1948). The intervening variable is important to define in research because the relationship between the dependent and the independent event may be difficult or impossible to solve without it. It is also important that until such physical data exist for the presence of intervening variables in an experimental, they should be regarded as hypothetical constructs used to elucidate simpler theories of the relationship between two variables (Tolman, 1938)
Type I error, Type II error, and Power are fundamentally different but related concepts in research design and statistical analysis. These concepts are related to the idea of statistical significance; that is, they are different parts of a methodology in ensuring statistical validity (Banerjee et al., 2009). In hypothesis testing, vigorous methods and definitions are employed to ensure the validity of experimental results. Researchers make clear, unambiguous null and alternative hypotheses and test these hypotheses using experimental data. The null is regarded as the state of no effect whereas the alternative hypothesis asserts that there is an effect or change. There are four possible scenarios in hypothesis testing. One is that there is no effect; the test correctly identifies the null hypothesis as true. The second scenario is when the test detects an effect when there is none, thus erroneously stating the null hypothesis as incorrect (type I error). The third scenario is when the test detects an effect when there is one thus correctly stating that the null hypothesis is false (power). The fourth scenario occurs when the test incorrectly states that the null hypothesis is true when in reality there is an effect––a type II error(Banerjee et al., 2009).
The probability of a type I error occurring can be limited by choosing a small α, or significance level. The significance level equals the probability of a type I error. The lower the probability of type I error means a higher probability of type II error. The reasoning behind this inverse relationship is fairly straightforward. When the test decreases type I error, it decreases the test's ability to detect an effect. Decreasing this sensitivity makes the test more prone to missing an effect when there is one actually present. To decrease type II error, type I error (or significance level) must be increased or alternatively, power must be increased (Banerjee et al., 2009).
Power is the test's ability to detect an effect when effects are present. It is mathematically expressed as (1-β) where β is the probability of type I error present. For example, if β is 10% then power must be 90%. This means that in 90 of 100 trials the test will detect a true effect. Power is increased in a test by increasing sample size or sample representativeness. Increasing sample size and power decreases both type I and type II error, but it is not without its costs. Larger sample sizes require more resources; there is a trade-off between the sample size and error rates. This becomes increasingly important in medical studies in which researchers must use observational data and cannot randomize their samples (Banerjee et al. 2009).
A common question among researchers is which significance level to use. This depends on the particular question that the researcher wishes to answer and what consequences such studies may have. In some circumstances, it may be more beneficial to limit Type I error, while in others type II error should be limited. For example, in a medical study, a certain drug has potential beneficial effects on regulating blood sugar but has a chance of causing heart failure. The null hypothesis states that the drug is not associated with heart failure. The alternative hypothesis states that the drug is associated with heart failure. A type I error in this scenario would result in a rejected null hypothesis and thus the drug being deemed unsafe when it, in fact, is safe. A type II error would result in an accepted null hypothesis and thus the drug being deemed safe when in fact in it causes heart failure (Wuensch, 1994).
A consumer would more likely accept type I error over type II error in this case since the risk of heart failure and death is associated with type II error. However, what if the drug is very effective as regulating blood sugar? Should a small but significant type II error prevent patients from receiving effective treatment? What if this occurs in a population where drugs treating blood sugar are not easily obtainable? Consumers in this population may wish to risk the small chance of heart failure. For pharmaceutical companies, type 1 error may be a bigger threat since it reduces profits by disallowing drugs that could be made available to consumers. This is why higher rates of type I and II error are allowed in social and behavioral studies but not so in medical research (Wuensch, 1994). It is important for researchers to take into account all of these risks before assigning α and determining sample size.
Sampling strategies can be divided first based on intention. The purpose of data acquisition in purposeful sampling is to gain more in-depth knowledge of a subject. In this type of sampling, large sources of rich information are targeted without regard to the representativeness of such information. Examples of this information include interviews of elders who are the last to speak their language or an expert's opinion (Creswell, 2004).
In random sampling, the purpose and analysis of data acquisition are quite different from that in purposeful sampling. The goal of random sampling strategies (simple random, stratified and cluster) is to gain a representative idea of a population without having to sample that entire population. These findings are then analyzed using the principles of random selection and findings are then extrapolated to this larger population. The differences between random sampling strategies rely on the nature of the population and the resources on hand for researchers (Yates, Moore, & Starnes, 2008).
Simple random sampling is ideal for most situations, except that complete randomization is difficult to mimic and it requires a well-mixed population. If simple random sampling is tried on a population with recognizable subgroups, some of the subgroups may be better represented than others simply by chance. In such a situation it is better to use stratified random sampling. Stratified random sampling allows the researcher to define the strata, then take equal random samples from each stratum, ensuring that each is equally represented. This also has the additional advantage of allowing a researcher to test for separate effects such as a subject's membership in a certain group (Levy & Lemeshow, 2008). Cluster sampling is different altogether. It is employed in instances where logistically or financially it may be impossible to sample from each sub-population. Therefore a random sample of the sub-populations is taken; a random sample of each sub-population unit is then taken. This is a more efficient and less time-consuming method, particularly when time study experiments are taken into account. If the first stage of sampling does not reveal a representative sample of the total population it may lead to error in the results (Som, 1996).
Independent, Dependent and Intervening variables are all a part of the same mechanism used to test hypotheses. When a researcher wishes to test the effects of one variable upon another, such as the effects of a drug, therapy, nutrition on human health, they are using the scientific method to see whether there is a correlational or causal relationship between them (White & McBurney 2013). The independent variable is the variable that is manipulated or controlled to observe effects on a dependent variable. In an experiment, such variables can switch to see in what direction cause and effect moves. For example, a researcher may wish to see the effects of a certain pesticide on insect prevalence on a crop. In this case, the pesticide is the independent variable and the prevalence of insects is the dependent variable. However, if the researcher wishes to continue studying the pest, they may wish to test the effects that the pest has on crop yield. In this example, the dependent variable switched to an independent variable.
In experiments where randomization of subjects is feasible and all confounding variables have been eliminated a causal link can be established between the independent variable and the dependent variable. This is true because randomization eliminates the effects of other variables present and focuses researchers' attention on determining one possible cause. In observational studies (where experiment may not be feasible or ethical) researchers can only establish a correlational link, not a causal one since confounding variables cannot be eliminated from the sample (Iverson & Gergen, 1997).
An intervening variable is one that explains a link between an independent and dependent variable. In Tolman's studies on rat behavior and learning, he invented the term to describe the relationship between a stimulus (independent variable) and behavior (dependent variable) (Tolman, 1948).
References
Baguley, T. (2004). Understanding statistical power in the context of applied research. Applied Ergonomics, 35(2), 73-80.
Banerjee, A., Chitnis, U., Jadhav, S., Bhawalkar, J., & Chaudhury, S. (2009). Hypothesis testing, type I And type II errors. Industrial Psychiatry Journal, 18(2), 127.
Coolidge, F. L. (2012). Statistics: A gentle introduction (3rd ed.). London: SAGE Publications.
Creswell, J. W. (2004). Educational research: Planning, conducting, and evaluating quantitative and qualitative research (2nd ed.). Upper Saddle River, N.J.: Pearson/Merrill Prentice Hall.
Creswell, J. W. (2007). Qualitative inquiry & research design: Choosing among five approaches (2nd ed.). Thousand Oaks: SAGE Publications.
Easton, V. J., & McColl, J. H. (n.d.). Stratified Random Sampling. Statistics Glossary v1.1. Retrieved from http://www.stats.gla.ac.uk/steps/glossary/sampling.html#stratsamp
Gelman, A., & Stern, H. (2006). The difference between “significant” and “not significant'” is not itself statistically significant. The American Statistician, 60(4), 328-331.
Gravetter, F. J., & Wallnau, L. B. (2013). Introduction to hypothesis testing. In Statistics for the behavioral sciences (9th ed., pp. 1-38). New York: Cengage.
Independent Variable. (n.d.). Independent variable. Retrieved from http://www.ncsu.edu/labwrite/po/independentvar.htm
Iversen, G. R., & Gergen, M. (1997). Statistics the conceptual approach. New York, NY: Springer New York.
Lee, A. S., & Baskerville, R. L. (2003). Generalizing generalizability in information systems research. Information Systems Research, 14(3), 221-243.
Levy, P. S., & Lemeshow, S. (2008). Sampling of populations: Methods and applications (3rd ed.). New York: Wiley.
Maccorquodale, K., & Meehl, P. E. (1948). On a distinction between hypothetical constructs and intervening variables. Psychological Review, 55(2), 95-107.
McKillup, S. (2006). Statistics explained: An introductory guide for life scientists. Cambridge, UK: Cambridge University Press.
Peck, R., & Devore, J. L. (2012). Statistics: The exploration and analysis of data. Boston: Brooks/Cole Cengage Learning. (Original work published 2005)
Punch, K. (2005). Introduction to social research: Quantitative and qualitative approaches (2nd ed.). London: SAGE Publications.
Sampath, S. (2005). Sampling theory and method (2nd ed.). Middlesex: Alpha Science International Ltd.
Schlotzhauer, S. D. (2007). Elementary statistics using JMP. Cary, NC: SAS Press.
Slack, M. K., & Draugalis, J. R. (2001). Establishing the internal and external validity of experimental studies. American Journal of Health System Pharmacy, 58(22), 2173-2181.
Som, R. K. (1996). Practical sampling techniques (2nd ed.). New York: M. Dekker.
The Power of a Statistical Test. (n.d.). University of Chicago, Statistics. Retrieved from http://statistics.uchicago.edu/~s220e/Lect/lec14.pdf
Tolman, E. C. (1938). The determiner of behavior at a choice point. Psychological Review, 45, 338-370.
Type 1 and Type 2 Error. (n.d.). STATC141 . Retrieved from http://www.stat.berkeley.edu/users/hhuang/STAT141/Lecture-FDR.pdf
Wuensch, K. L. (1994). Evaluating the relative seriousness of type I versus type II errors in classical hypothesis testing. In B. Brown (Ed.), Disseminations of the International Statistical Applications Institute: Vol 1(3rd ed., pp. 76-79). Wichita, KS: ACG Press
White, T., & McBurney, D. (2013). Research methods (9th ed.). Belmont Ca: Wadsworth Cengage Learning.
Why Is  Random Sampling Important?. (n.d.). Common mistakes in using statistics. Retrieved from https://www.ma.utexas.edu/users/mks/statmistakesRandomSampleImportance.html
Yates, D. S., Moore, D. S., & Starnes, D. S. (2008). The practice of statistics (3rd ed.). New York: W.H. Freeman.
Capital Punishment and Vigilantism: A Historical Comparison
Pancreatic Cancer in the United States
The Long-term Effects of Environmental Toxicity
Audism: Occurrences within the Deaf Community
DSS Models in the Airline Industry
The Porter Diamond: A Study of the Silicon Valley
The Studied Microeconomics of Converting Farmland from Conventional to Organic Production
© 2024 WRITERTOOLS