Introduction
Measurement
The nominal level of measurement
The ordinal level of measurement
Interval Level of Measurement
Ratio Level of Measurement
Errors of measurement
Population, Sample, Variable
Dependent and independent variables
Hypothesis
Hypothesis is statement or declaration of the expected outcome of a research study. It is based on logical rationale and has empirical possibilities for testing. Hypothesis is formulated in experimental research. In some non-experimental correlational studies, hypothesis may also be developed. Normally, there are four elements in a hypothesis:
Standards in formulating a hypothesis (Ahuja, R. 2001):
Types of errors
Type I Error
Type II Error
Sampling
Probability Sampling or Random sampling
Nonprobability sampling
Sampling Error (Standard Error)
Descriptive statistics
As the context of health care is
changing due to the pharmaceutical services and technological advances,
nurses and other health care professionals need to be prepared to
respond in knowledgeable and practical ways. Health information is very
often explained in statistical terms for making it concise and
understandable. Statistics plays a vitally important role in the
research. Statistics help to answer important research questions and it
is the answers to such questions that further our understanding of the
field and provide for academic study. It is required the researcher to
have an understanding of what tools are suitable for a particular
research study. It is essential for healthcare professionals to have a
basic understanding of basic concepts of statistics as it enables them
to read and evaluate reports and other literature and to take
independent research investigations by selecting the most appropriate
statistical test for their problems. The purpose of analyzing data in a
study is to describe the data in meaningful terms.
Descriptive approach and inferential approach
Depending on the kinds of variables
identified (nominal, ordinal, interval, and ratio) and the design of
particular study, a number of statistical techniques is available to
analyze data. There are two approaches to the statistical analysis of
data the descriptive approach and inferential approach. Descriptive
statistics convert data into picture of the information that is readily
understandable. The inferential approach helps to decide whether the
outcome of the study is a result of factors planned within design of
the study or determined by chance. The two approaches are often used
sequentially in that first, data are described with descriptive
statistics, and then additional statistical manipulations are done to
make inferences about the likelihood that the outcome was due to
chance through inferential statistics. When descriptive approach is
used, terms like mean, median, mode, variation, and standard deviation
are used to communicate the analysis information of data. When
inferential approach is used, probability values (P) are used to communicate the significance or lack of significance of the results (Streiner & Norman, 1996).
Measurement
Measurement defined as “assignment of
numeral according to rules” (Tyler 1963:7). Regardless of the variables
under study, in order to make sense out of data collected, each
variable must be measured in such a way that its magnitude or quantity
must be clearly identified. The specific strategy for a particular
study depends upon the particular research problem, the sample under
study, the availability of instruments, and the general feasibility of
the project (Brockopp & Hastings-Tolsma, 2003). A variety of
measurement methods are available for use in nursing research. Four
measurement scales are used: nominal, ordinal, interval and ratio.
The nominal level of measurement
The nominal level of measurement is
the most primitive or lowest level of classifying information. Nominal
variables include categories of people, events, and other phenomena
are named, are exhaustive in nature, and are mutually exclusive. These
categories are discrete and noncontinous. In case of nominal
measurement admissible statistical operation are counting of frequency,
percentage, proportion, mode, and coefficient of contingency.
The ordinal level of measurement
The ordinal level of measurement is
second in terms of its refinement as a means of classifying
information. Ordinal implies that the values of variables can be
rank-ordered from highest to lowest.
Interval Level of Measurement
Interval level of measurement is
quantitative in nature. The individual units are equidistant from one
point to the other. The interval data does not have an absolute zero.
For example, temperature is measured in Celsius or Fahrenheit.
Interval level of measurement refers to the third level of measurement
in relation to complexity of statistical techniques that can be used to
analyze data. Variables with in this level of measurement are assessed
incrementally, and the increments are equal.
Ratio Level of Measurement
Ratio level of measurement is
characterized by variables that are assessed incrementally with equal
distances between the increments and a scale that has an absolute zero.
Ratio variables exhibit the characteristics of ordinal and interval
measurement and can also be compared by describing it as two or three
times another number or as one-third, one-quarter, and so on. Variable
like time, length and weight are ratio scales and also be measured
using nominal or ordinal scale.
The mathematical properties of interval and ratio scales are
very similar, so the statistical procedures are common for both the
scales.
Errors of measurement
When a variable is measured there is
the potential for errors to occur. Some of the sources of errors in
measurement are, instrument clarity, variations in administrations,
situational variations, response set bias, transitory personal factors,
response sampling, and instrument format.
Population, Sample, Variable
Population is defined as the entire collection of a set of objects, people, or events, in a particular context. The population
is the entire group of persons or objects that is of interest to the
investigator. In statistics population means, any collection of
individual items or units that is the subject of investigation.
Population refers to the collection of all items upon which statements
will be based. This might include all patients with schizophrenia in a
particular hospital, or all depressed individuals in a certain
community.
Characteristics of a population that differ form individual to individual are called variables. A variable is a concept (construct) that has been so specifically defined that precise observations and therefore measurement can be accomplished. Length, age, weight, temperature, pulse rate are a few examples of variables.
Characteristics of a population that differ form individual to individual are called variables. A variable is a concept (construct) that has been so specifically defined that precise observations and therefore measurement can be accomplished. Length, age, weight, temperature, pulse rate are a few examples of variables.
The sample is a subset
of the population selected by investigator to participate in a
research study. A sample refers to a subset of observations selected
from the population. It might be unusual for an investigator to
describe only patients with schizophrenia in a particular hospital and
it is unlikely that an investigator will measure every depressed person
in a community. As it is rarely practicable to obtain measures of a
particular variable from all the units in population, the investigator
has to collect information from a smaller group or sub-set that
represents the group as a whole. This sub-set is called a sample. Each
unit in the sample provides a record, such as measurement, which is
called an observation. The sample represents the population of those critical characteristics the investigator plan to study.
Dependent and independent variables
An independent variable is presumed
cause of the dependent variable-the presumed effect. The independent
variable is one which explains or accounts for variations in the
dependent variable. An independent variable is one whose change results
in change in other variable. In experiments, the independent variable
is the variable manipulated by the experimenter. A dependent variable
is one which changes in relationship to changes in another variable. A
variable which is dependent in one study may be independent in another.
Intervening variable is one that comes between the independent and
dependent variable.
Hypothesis
Hypothesis is statement or declaration of the expected outcome of a research study. It is based on logical rationale and has empirical possibilities for testing. Hypothesis is formulated in experimental research. In some non-experimental correlational studies, hypothesis may also be developed. Normally, there are four elements in a hypothesis:
- (1) dependent and independent variables,
- (2) some type of relationship between independent and dependent variable,
- (3) the direction of the change, and
- (4) it mentions about the subjects, i.e. population being studied.
Standards in formulating a hypothesis (Ahuja, R. 2001):
- It should be empirically testable, whether it is right or wrong.
- It should be specific and precise.
- The statements in the hypothesis should not be contradictory.
- It should specify variables between which the relationship to be established
- It should describe one issue only.
- Characteristics of a Hypothesis (Treece & Treece, 1989)
- It is testable
- It is logical
- It is directly related to the research problem
- It is factually or theoretically based
- It states a relationship between variables
- It is stated in such a form that it can be accepted or rejected
Directional hypothesis predicts an outcome in a
particular direction, and nondirectional hypothesis simply states that
there will be difference between the groups. There can be two
hypotheses, research hypothesis and null hypothesis. The null
hypothesis is formed for the statistical purpose of negating it. If the
research hypothesis states there is positive correlation between
smoking and cancer, the null hypothesis states there is no relation
between smoking and cancer. It is easy to negate a statement than
establishing it.
The null hypothesis is statistical
statement that there is no difference between the groups under study. A
statistical test is used to determine the probability that the null
hypothesis is not true and rejected, i.e. inferential statistics are
used in an effort to reject the null, thereby showing that a deference
does exists. The null hypothesis is a technical necessity when using
inferential statistics, based on statistical significance which is used
as criterion.
Types of errors
When the null hypothesis is rejected,
the observed differences between groups are deemed improbable by chance
alone. For example, if drug A is compared to a placebo for its effects
on depression and the null hypothesis is rejected, the investigator
concludes that the observed differences most likely are not explainable
simply by sampling error. The key word in these statements is probable.
When offering this conclusion, the investigator has the odds on his or
her side. However, what are the chances of the statement being
incorrect?
In statistical inference there is no way to
say with certainty that rejection or retention of the null hypothesis
was correct. There are two types of potential errors. A type I error
occurs when the null hypothesis is rejected when indeed it should have
been retained; a type II error occurs if the null hypothesis is
retained when indeed it should have been rejected.
Type I Error
Type I errors occur when the null
hypothesis is rejected but should have been retained, such as when a
researcher decides that two means are different. He or she might
conclude that the treatment works or those groups are not sampled from
the same population whereas in reality the observed differences are
attributable only to sampling error. In a conservative scientific
setting, type I errors should be made rarely. There is a great
disadvantage to advocating treatments that really do not work.
The probability of a type I error is denoted
with the Greek letter alpha (a). Because of the desire to avoid type I
errors, statistical models have been created so that the investigator
has control over the probability of a type I error. At the .05
significance or alpha level, a type I error is expected to occur in 5
percent of all cases. At the .01 level, it may occur in 1 percent of
all cases. Thus, at the .05 a level, one type I error is expected to be
made in each of 20 independent tests. At the .01 a level, one type I
error is expected to be made in each 100 independent tests.
Type II Error
The motivation to avoid a type I error
might increase the probability of making a second type of error. In
this case the null hypothesis is retained when it actually was wrong.
For example, an investigator may reach the conclusion that a treatment
does not work when actually it is efficacious. The probability of a
type II error is symbolized by the Greek capital letter beta (B). Here
the decision is not to reject the null hypothesis when in actuality the
null hypothesis was false. This is a type II error with the
probability of beta (B).
Statistical Power
There are several maneuvers that will
increase control over the probability of different types of errors and
correct decisions. One type of correct decision is the probability of
rejecting the null hypothesis and being correct in that decision. Power
is defined as the probability of rejecting the null hypothesis when it
should have been rejected. Ultimately, the statistical evaluation will
be more meaningful if it has high power.
It is particularly important to have high
statistical power when the null hypothesis is retained. Retaining the
null hypothesis with high power gives the investigator more confidence
in stating that differences between groups were non-significant.
One factor that affects the power is the sample size. As the
sample size increases, power increases. The larger the sample, greater
the probability that a correct decision will be made in rejecting or
retaining the null hypothesis.
Another factor that influences power is the
significance level. As significance increases, the power increases. For
instance, if the .05 level is selected rather than the .01 level,
there will be a greater chance of rejecting the null hypothesis.
However, there will also be a higher probability of a type I error. By
reducing the chances of a type I error, the chances of correctly
identifying the real difference (power) are also reduced. Thus, the
safest manipulation to affect power without affecting the probability
of a type I error is to increase the sample size.
The third factor affecting power is
effect size. The larger the true differences between two groups, the
greater the power. Experiments attempting to detect a very strong
effect, such as the impact of a very potent treatment, might have
substantial power even with small sample sizes. The detection of subtle
effects may require very large samples in order to achieve reasonable
statistical power. It is worth noting that not all statistical tests
have equal power. The probability of correctly rejecting the null
hypothesis is higher with some statistical methods than with others.
For example, nonparametric statistics are typically less powerful than
parametric statistics, for example.
Sampling
The process of selecting a fraction of
the sampling unit (i.e. a collection with specified dimensions) of the
target population for inclusion in the study is called sampling.
Sampling can be probability sampling or non-probability sampling.
Probability Sampling or Random sampling
Probability sampling, also
called random sampling, is a selection process that ensures each
participant the same probability of being selected. Probability sampling
is the process of selecting samples based on probability theory.
Probability theory states that possibility that events occur by chance.
Random sampling is the best method for ensuring that a sample is
representative of the larger population. Random sampling can be simple
random sampling, stratified random sampling, and cluster sampling.
Nonprobability sampling
Nonprobability sampling is the
selection process in which the probability that any one individual or
subject selected is not equal to the probability that another
individual or subject may be chosen. The probability of inclusion and
the degree to which the sample represents the population are unknown.
The major problem with nonprobability sampling is that sampling bias
can occur. Nonprobability sampling can be convenience sampling,
purposive sampling or quota sampling.
Sampling Error (Standard Error)
Sampling error refers to the
discrepancies that inevitably occur when a small group (sample) is
selected to represent the characteristics of a larger group
(population). It is defined as the deference between a parameter and an
estimate of that parameter which is derived from a sample (Lindquist,
1968:8). The means and standard deviations calculated from the data
collected on a given sample would not be the same as those calculations
derived from data collected from the entire population. It is the
discrepancy between the characteristics of the sample and the
population that constitutes sampling error.
Descriptive statistics
Descriptive statistics are
techniques which help the investigator to organize, summarize and
describe measures of a sample. Here no predictions or inferences are
made regarding population parameters. Descriptive statistics are used
to summarize observations and to place these observations within
context. The most common descriptive statistics include measures of
central tendency and measures of variability.
Central tendency or “measures of the middle”
There are three commonly used
measures of central tendency: the mean, the median, and the mode- are
calculated to identify the average, the most typical and the most
common values, respectively among the data collected. The mean is the
arithmetic average, the median is the point representing the 50th
percentile in a distribution, and the mode is the most common score.
Sometimes each of these measures is the same; on other occasions, the
mean, the median, and the mode can be different. The mean, median, and
mode are the same when the distribution of scores is normal. Under most
circumstances the mean, median, and mode will not be exactly the same.
The mode is most likely to misrepresent the underlying distribution
and is rarely used in statistical analysis. The mean and the median are
the most commonly reported measures of central tendency.
The major consideration in choosing
between them is how much weight should be given to extreme scores. The
mean takes into account each score in the distribution; the median
finds only the halfway point. As mean best represents all subjects and
because of desirable mathematical properties, the mean is typically
favored in statistical analysis. Despite the advantages of the mean,
there are also some advantages to the median. In particular, the median
disregards outlier cases, whereas the mean moves further in the
direction of the outliers. Thus, the median is often used when the
investigator does not want scores in the extreme of the distribution to
have a strong impact. The median is also valuable for summarizing data
for a measure that might be insensitive toward the higher ranges of
the scale. For instance, a very easy test may have a ceiling effect but
does not show the true ability of some test-takers. A ceiling effect
occurs when the test is too easy to measure the true ability of the
best students. Thus, if some scores stack up at the extreme, the median
may be more accurate than the mean. If the high scores had not been
bounded by the highest obtainable score, the mean may actually have
been higher.
The mean, median, and mode are exactly
the same in a normal distribution. However, not all distributions of
scores have a normal or bell-shaped appearance. The highest point in a
distribution of scores is called the modal peak. A distribution with
the modal peak off to one side or the other is described as skewed. The
word skew literally means "slanted."
The direction of skew is determined by the location of the tail or flat area of the distribution. Positive skew occurs when the tail goes off to the right of the distribution. Negative skew occurs when the tail or low point is on the left side of the distribution. The mode is the most frequent score in the distribution. In a skewed distribution, the mode remains at the peak whereas the mean and the median shift away from the mode in the direction of the skewness. The mean moves furthest in the direction of the skewness, and the median typically falls between the mean and the mode. Mode is the best measure of central tendency when nominal variables are used. Median is the best measure of central tendency when ordinal variables are used. Mean is the best measure of central tendency when interval or ratio scales are used.
The direction of skew is determined by the location of the tail or flat area of the distribution. Positive skew occurs when the tail goes off to the right of the distribution. Negative skew occurs when the tail or low point is on the left side of the distribution. The mode is the most frequent score in the distribution. In a skewed distribution, the mode remains at the peak whereas the mean and the median shift away from the mode in the direction of the skewness. The mean moves furthest in the direction of the skewness, and the median typically falls between the mean and the mode. Mode is the best measure of central tendency when nominal variables are used. Median is the best measure of central tendency when ordinal variables are used. Mean is the best measure of central tendency when interval or ratio scales are used.
Measures of Variability
If there is no variability
within populations there would be no need for statistics: a single item
or sampling unit would tell us all that is needed to know about the
population as a whole. Three indices are used to measure variation or
dispersion among scores: (1) range, (2) variance, and (3) standard
deviation (Cozby, 2000). The range describes the deference between the
largest and smallest observations made: the variance and standard
deviation are based on average difference or deviation of observations
from the mean.
Measures of central tendency, such as
the mean and median, are used to summarize information. They are
important because they provide information about the average score in
the distribution. Knowing the average score, however, does not provide
all the information required to describe a group of scores. In
addition, measures of variability are required. The simplest method of
describing variability is the range, which is simply the difference
between the highest score and lowest score.
Another statistic, known as the
interquartile range, describes the interval of scores bounded by the
25th and 75th percentile ranks; the interquartile range is bounded by
the range of scores that represent the middle 50 percent of the
distribution. In contrast to ranges, which are used infrequently in
statistical analysis, the variance and standard deviation are used
commonly. Since the mean is the average score in a distribution, the
sum of the deviations around the mean will always equal zero. Yet, in
order to understand the characteristic of a distribution of scores,
some estimation of deviation around the mean is important. The sum of
these deviations will always equal zero. However, the squared
deviations around the mean can yield a meaningful index. The variance
is the sum of the squared deviations around the mean divided by the
number of cases.
Range
Range is the simplest method of
examining variation among scores and refers to the difference between
the highest and lowest values produced. It shows how wide the
distribution is over which the measurements are spread. For continuous
variables, the range is the arithmetic difference between the highest
and lowest observations in the sample. In the case of counts or
measurements, 1 should be added to the difference because the range is
inclusive of the extreme observations.. The range takes account of only
the most extreme observations. It is therefore limited in its
usefulness, because it gives no information about how observations are
distributed. Inter quartile range is the area between the lowest
quartile and the highest quartile, or the middle 50% of the scores
Variance
The variance is a very useful
statistic and is commonly employed in data analysis. However, its
calculation requires finding the squared deviations around the mean
rather than the simple or absolute deviations around the mean. Thus,
when the variance is calculated, the resulting calculation will be in
units that are the natural squared units. Taking the square root of the
variance puts the observations back into their original metric. The
square root of the variance is known as the standard deviation. The
standard deviation is an approximation of the average deviation around
the mean. Although the standard deviation is not technically equal to
the average deviation, it gives an approximation of how much the
average score deviates from the mean. One method for calculating
variance is to first calculate the deviation scores. The sum of the set
of deviation score equal to zero. Variance is the squire of the
standard deviation: conversely, a standard deviation is the squire root
of the variance. The deviation of a distribution of scores can then be
used to calculate the variance.
Standard Deviation
The standard deviation is the most
widely applied measure of variability. When observations have been
obtained from every item or sampling unit in a population, the symbol
for the standard deviation is (lower case sigma). This is parameter of
the population. When it is calculated from a sample it is symbolized s.
Standard deviation of a distribution of scores is the squire root of
the variance. Large standard deviations suggest that scores do not
cluster around the mean: they are probably widely scattered. Similarly
small standards deviations suggest that there is very little deference
among scores.
Normal Distribution
The normal distribution is a
mathematical construct which suggests that naturally occurring
observations follow a given pattern. The pattern is the normal curve,
which places most observations at the mean and lesser number of
observations at either extreme. This curve or bell-shaped distribution
reflects the tendency of the observations concerning a specific
variable to cluster in a particular manner
The normal curve can be described for
any set of data given the mean and standard deviation of the data and
assumptions that the characteristics under study would be normally
distributed within the population. A normal distribution of the data
suggests that 68% of observations fall within one standard deviation of
the mean, 95% fall within two standard deviations of the mean, and
99.87% fall within three standard deviations of the mean. Theoretically
range of the curve is unlimited.
Standard Scores
One of the problems with means and
standard deviations is that their meanings are not independent of
context. For example, a mean of 45.6 means little unless the score is
known. The Z-score is a transformation into standardized units that
provides a context for the interpretation of scores. The Z-score is the
difference between the score and the mean, divided by the standard
deviation. To make comparisons between groups, standard scores rather
than raw scores can be used. Standard scores enable the investigator to examine the position of a given score by measuring its mean deviation from the means of all sores.
Most often, the units on the x axis of
the normal distribution are in Z-units. Any variable transformed into
Z-units will have a mean of 0 and a standard deviation of 1.
Translation of Z-scores into percentile ranks is accomplished using a
table for the standard normal distribution. Certain Z-scores are of
particular interest in statistics and psychological testing. The
Z-score 1.96 represents the 97.5th percentile in a distribution whereas
-1.96 represents the 2.5th percentile. A Z-score of less than -1.96 or
greater than +1.96 falls outside of a 95 percent interval bounding the
mean of the Z-distribution. Some statistical definitions of
abnormality view these defined deviations as cutoff points. Thus, a
person who is more than 1.96 Z-scores from the mean on some attribute
might be regarded as abnormal. In addition to the interval bounded by
95 percent of the cases, the interval including 99 percent of all cases
is also commonly used in statistics.
Confidence Intervals
In most statistical inference problems
the sample mean is used to estimate the population mean. Each sample
mean is considered to be an unbiased estimate of the population mean.
Although the sample mean is unlikely to be exactly the same as the
population mean, repeated random samples will form a sampling
distribution of sample means. The mean of the sampling distribution is
an unbiased estimate of the population mean. However, taking repeated
random samples from the population is also difficult and expensive.
Instead, it is necessary to estimate the population mean based on a
single sample; this is done by creating an interval around the sample
mean.
The first step in creating this
interval is finding the standard error of the mean. The standard error
of the mean is the standard deviation divided by the square root of the
sample size. Statistical inference is used to estimate the probability
that the population mean will fall within some defined interval.
Because sample means are distributed normally around the population
mean, the sample mean is most probably near the population value.
However, it is possible that the sample mean is an overestimate or an
underestimate of the population mean. Using information about the
standard error of the mean, it is possible to put a single observation
of a mean into context.
The ranges that are likely to capture
the population mean are called confidence intervals. Confidence
intervals are bounded by confidence limits. The confidence interval is
defined as a range of values with a specified probability of including
the population mean. A confidence interval is typically associated with
a certain probability level. For example, the 95 percent confidence
interval has a 95 percent chance of including the population mean. A 99
percent confidence interval is expected to capture the true mean in 99
of each 100 cases. The confidence limits are defined as the values for
points that bound the confidence interval. Creating a confidence
interval requires a mean, a standard error of the mean, and the Z-value
associated with the interval.
Inferential statistics
Inferential statistics are
mathematical procedures which help the investigator to predict or infer
population parameters from sample measures. This is done by a process
of inductive reasoning based on the mathematical theory of probability
(Fowler, J., Jarvis, P. & Chevannes M. 2002).
Probability
The idea of probability is basic
to inferential statistics. The goal of inferential statistical
techniques is same, to determine as precisely as possible the
probability of an occurrence. It can be regarded as quantifying the
chance that a stated outcome of an event will take place. Probability
refers to the likelihood that the differences between groups under
study are the result of chance. Probability Theory states, any
given event out of all possible outcomes. When any numbers of mutually
exclusive sets are given they add up to a singularity. When a coin is
tossed it has two out comes, either head or tail, i.e. 0.5 chance for
head and 0.5 chance for tail. When these two chances are added it gives
1. For example, in a class there are fifty students, the chance of
students to become first in the class is 1 in 50 (i.e. .02). By
convention, probability values fall on a scale between 0
(impossibility) and 1 (certainty), but they are sometimes expressed as
percentages, so the ‘probability’ scale has much in common with the
proportion scale. The chance of committing type one error is decided by
testing the hypothesis for its probability value. In behavioural
sciences <.05 is taken as alpha value for testing the hypothesis.
When stringent outcomes are required <.01 or <.001 are taken as
the alpha value or p value.
Statistical Significance (alpha level)
The level of significance (or
alpha level) is determined to identify the probability that the
deference between the groups have occurred by chance rather than in
response to the manipulation of variables. The decision of whether the
null hypothesis should be rejected depends on the level of error that
can be tolerated. The tolerance level of error is expressed as a level of significance or alpha level. The
usual level of significance or alpha level is 0.05, although at times
levels of 0.01 or o.001 may be used when high level of accuracy is
required. In testing the significance of obtained statistics, if the
investigator rejects the null hypothesis when, in fact, it is true he
commits type I error or alpha error, and when the investigator accepts the null hypothesis when, in fact, it is false he commits type II or beta error (Singh AK, 2002).
Parametric and Non-parametric Tests
Parametric and non-parametric test are commonly employed in behavioral researches.
Parametric Tests
A parametric test is one which
specifies certain conditions about the parameter of the population from
which a sample is taken. Such statistical tests are considered to be
more powerful than non-parametric tests and should be used if their
basic requirements or assumptions are met. Assumptions for using
parametric tests:
- The observation must be independent.
- The observation must be drawn from a normal distribution.
- The sample drawn from a population must have equal variances and this condition is more important if the size of the sample is particularly small, i.e. homogenicity of variables.
- The variables must be expressed in interval or ratio scales.
- The variables under study should be continuous
Examples of parametric tests are t-test, z-test and F-test.
Non-parametric tests
A non-parametric test is one does
not specify any conditions about the parameter of the population from
which the population is drawn. These tests are called distribution-free
statistics. For non-parametric tests, the variables under study should
be continuous and the observations should be independent. Requisites
for using a non-parametric statistical test are:
- The shape of the distribution of the population from which a sample is drawn is not known to be normal curve.
- The variables have been quantified on the basis of nominal measures (or frequency counts)
- The variables have been quantified on the basis of ordinal measures or ranking.
- A non-parametric test should be used only when parametric assumptions cannot be met.
Common non-parametric tests
- Chi-squire test
- Mann-Whitney U test
- Rank difference methods (Spearman rho and Kendal’s tau)
- Coefficient of concordance (W)
- Median test
- Kruskal-Wallis test
- Friedman test
Tips on using appropriate tests in experimental design
Two unmatched (unrelated) groups,
experimental and control (e.g. patient receiving a prepared
therapeutic intervention for depression and control group of patients
on routine care)-
- See the distribution, whether normal or non-normal
- If normal, use parametric tests (independent t-test)
- If non-normal, go for nonparametric tests- Mann-Whitney U test or making the data normal through natural log transformation or z-transformation.
Two-matched (related) groups,
pre-post design (the same group is rated before intervention and after
the period of intervention the group is again rate. i.e. two ratings
in the same or related group)-
- See distribution, whether normal or non-normal
- If normal use parametric paired t-test.
- If non-normal, use nonparametric Wilcoxon Sign Rank (W) test
More than two –unmatched (unrelated) groups (for example three groups: schizophrenia, bipolar and control group)-
- see distribution whether normal or non-normal
- if normally distributed use parametric One-way ANOVA
- if non-normal use nonparametric Kruskal-Wallis test
More than two matched (related) groups (for example in ongoing intervention ratings at different times- t1, t2, t3, t4 …)
- See distribution, normal or non-normal
- If the data is normal use parametric Repeated Measures ANOVA
- If data is non-normal use nonparametric Friedman’s test
Matched (related) and unmatched (unrelated) observations
When analyzing bivariate data
such as correlations, a single sample unit gives a pair of observations
representing two different variables. The observations comprising a
pair are uniquely linked, are said to be matched or paired. For
example, the systolic blood pressure of 10 patients and measurements of
another 10 patients after administration are unmatched. However, the
measurements of the same 10 patients before and after administration of
the drug are matched. It is possible to conduct more sensitive
analysis if the observations are matched.
Common Statistical tests
Chi-squire (X2) Test (analyzing frequencies)
The chi-squire test is one of
the important non-parametric tests. Guilford (1956) has called it the
‘general-purpose statistic’. Chi-squire test are widely referred to as
test of homogenicity, randomness, association, independence and
goodness of fit. The chi-squire test is used when the data are expressed
in terms of frequencies of proportions or percentages. This test
applies only to discrete data, but any continuous data can be reduced
to the categories of in such a way that they can be treated as discrete
data. The chi-square statistic is used to evaluate the relative
frequency or proportion of events in a population that fall into
well-defined categories. For each category, there is an expected
frequency that is obtained from knowledge of the population or from
some other theoretical perspective. There is also an observed frequency
for each category. The observed frequency is obtained from observations
made by the investigator. The chi-square statistic expresses the
discrepancy between the observed and the expected frequency.
There are several uses of chi-squire test as:
1. Chi-squire test can be used as a test of equal probability hypothesis (equal probability hypothesis is meant the probability of having the frequencies in all the given categories as equal).2. Testing the significance of the independence hypothesis (independent hypothesis means that one variable is not affected by or related to another variable and hence, these two variables are independent).3. Chi-squire test can be used in testing a hypothesis regarding the normal shape of a frequency distribution (goodness-of-fit).4. Chi-squire test is used in testing significance of several statistics like phi-coefficient, coefficient of concordance, and coefficient of contingency.5. In chi-squire test, the frequencies we observe are compared with those we expect on the basis of some null hypothesis. If the discrepancy between the observed and expected frequencies is great, then the value of the calculated test statistic will exceed the critical value at the appropriate number of degree of freedom. Then the null hypothesis is rejected in favor of some alternative. The mastery of the method lies not in so much in the computation of the test statistic itself, but in the calculation of expected frequencies.6. The chi-squire statistic does not give any information regarding the strength of a relationship: it only conveys the existence of or non-existence of the relationship between the variables investigated. To establish the extent and nature of the relationship, additional statistics such as phi, Cramer’s V, or contingency coefficient can be used (Brockopp &Hastings-Tolsma, 2003).
Tips on analyzing frequencies
- All versions of the chi-squire test compare the agreement between a set of observed frequencies and those expected if some null hypothesis is true.
- All objects are counted the nominal scale or unambiguous intervals on a continuous scale like successive days or moths ma be regarded for the application of the tests.
- Apply Yate’s correction in the chi-squire test when there is only one degree of freedom, i.e. when there is only ‘one way’ test and in 2×2 contingency table.
Testing normality of a data
Parametric statistical
techniques depend upon the mathematical properties of the normal curve.
They usually assume that samples are drawn from populations that are
normally distributed. Before adopting a statistical test, it is
essential to determine whether the data is normal or non-normal. The
normality of data can be checked by two ways, either plot out the data
to see if they look normal or using sophisticated statistical
procedures. There are statistical tests to see normality of the data.
The commonest one is Kolmogorov-Smirnov test. As per the central limit theorem, if there is no significance in the P
value (> .05) ideally a parametric test can be used for analyzing
the data, and if there is significance (<.05) a non-parametric test
should be used for analysis. A Shapiro-Wilk test is
used to see whether parameters used to test normality is within the
allowed limit. Statistical packages like SPSS can be used for doing
this test.
t-test and z-test (comparing means)
In experimental sciences,
comparisons between groups are very common. Usually, one group is the
treatment, or experimental group, while the other group is the
untreated, or control group. If patients are randomly assigned to these
two groups, it is assumed that they differ only by chance prior to
treatment. Differences between groups after the treatment are usually
used to estimate treatment effect. The task of the statistician is to
determine whether any observed differences between the groups following
treatment should be attributed to chance or to the treatment. The
t-test is commonly used for this purpose. There are actually several
different types of t-tests
Types of t-Tests
- Comparison of a sample mean with a hypothetical population mean.
- Comparison between two scores in the same group of individuals.
- Comparison between observations made on two independent groups.
t-test and z-test are parametric
inferential statistical techniques used when comparison of two means
are required. It is used to test the null hypothesis that there is no
difference in means between the two groups. The reporting of the results
of t-test generally includes the df, t-value, and probability level. A t-test
can be one-tailed or two-tailed. If the hypothesis is directional, a
one-tailed test is generally used, and if the hypothesis is
non-directional. t-test is used when sample size is less than 30 and z-test is used when sample size is more than 30.
There are dependent and independent t-tests. The formula to calculate a t-test
can differ depending on whether the samples involved are dependent or
independent. Samples are independent when there are two groups such as
an experimental and a control group. Samples are dependent when the
participants from two groups are paired in some manner. The form of the
t-test that is used with a dependent sample may be termed as
paired, dependent, matched, or correlated (Brockopp &
Hastings-Tolsma, 2003).
Degree of freedom (df)
Degree of freedom (df)
is a mathematical concept that describes the number of events or
observations that are free to vary: for each statistical test there is a
formula for calculating the appropriate degree of freedom (n-1).
Mann-Whitney U-test
The Mann-Whitney U test is a non-parametric substitute for the parametric t-test, for comparing the medians of two unmatched pairs. For application of U test data must be obtained on ordinal or interval scale. We can use Mann-Whitney U-test
to compare the median time undertaken to perform the task by a sample
of subjects who had not drunk with that of another sample who had drunk
a standardized volume of alcohol. This test is used to see group
difference, when the data is non-normal and the groups are independent.
The test can be applied in groups with unequal or equal size.
Some key points about using Mann-Whitney U-test are:
Some key points about using Mann-Whitney U-test are:
- This test can be applied to interval data (measurements), to count of things, derived variable (proportions and indices) and to ordinal data (rank scales, etc.)
- Unlike some test statistics, the calculated value of U has to be smaller than the tabulated critical value in order to reject null hypothesis.
- The test is for difference in medians. It is common error to record a statement like ‘the Mann-Whitney U-test showed there is significant difference in means. There is, however, no need to calculate the medians of each sample to do the test.
Wilcoxon test -matched pairs
The Wilcoxon test for matched
pairs is a non-parametric test for comparing the medians of two matched
samples. It calls for a test statistic T whose probability
distribution is known. The observation must be drawn on interval scale.
It is not possible to use this test on ordinal measurements. The
Wilcoxon's test can be used in matched pair samples. This test is for
difference in medians and the test assumes that samples have been drawn
from parent populations that are symmetrically not necessarily
normally distributed.
Pearson Product-Moment Correlation Coefficient
The Pearson product-moment correlation
method is a parametric test is a common method assessing the
association between two variables under study. In this test an
estimation of at least one parameter is involved, measurement is at an
interval level, and it is assumed that the variable under study is
normally distributed within the population.
Spearman Rank correlation Coefficient
Spearman’s r is a nonparametric test, which is equivalent to parametric Pearson r. Spearman’s
Rank Correlation Technique is used when the conditions of the Product
Moment Correlation Coefficient do no apply. This test is widely used by
health scientists and uses ranks of the x and y observations and the
raw data themselves are discarded.
Tips on using correlation tests
- When observations of one or both variables are on an ordinal scale, or are proportions, percentages, indices or counts of things, use the Spearman’s Rank Correlation Coefficient. The number of units in the sample i.e. the number of paired observations should be between 7 and 30.
- When observations are measured on interval scale use Product Moment Correlation Coefficient should be considered. . Sample units must be obtained randomly, and the data should be bivariate normal i.e. x and y.
- The relationship between the variables should be rectilinear (straight line) not curved. Certain mathematical transformations (e.g. logarithmic transformation) will ‘straighten up’ curved relationships.
- A strong and significant correlation does not mean does not mean one necessarily the cause of the other. It is possible that some additional, unidentified factor is underlying source of variability in both variables.
- Correlations measured in samples estimate correlations in the populations. A correlation in a sample is not ‘improved’ or strengthened by obtaining more observations: however, larger samples may be required to confirm the statistical significance of weaker correlations.
Common Statistical Tests
Regression Analysis
Regression analysis is often used to
predict the value of one variable given information about another
variable. The procedure can describe how two continuous variables are
related. Regression analysis is used to examine relationships among
continuous variables and is most appropriate for data that can be
plotted on a graph. Data are usually plotted, so that the independent
variable is seen on the horizontal (x) axis and the dependent variable
on the vertical (y) axis. The statistical procedure for regression
analysis includes a test for the significance of the relationship
between two variables. Given a significant relationship between two
variables, knowledge of the value of the independent variable permits a
prediction of the value of the dependent variable.
One-Way Analysis of Variance (ANOVA)
When there are three or more
samples, and the data from each sample are thought to be distributed
normally, analysis of variance (ANOVA) may be a technique of choice
One-way analysis of variance is a parametric inferential statistical
test that enables the investigators to compare two or more group means,
which was developed by RF. Fisher. The reporting of the results
includes the df, F value and the probability level. ANOVA is
of two types: simple analysis of variance and complex analysis of
variance or two-way analysis of variance. One-Way Analysis of Variance
(ANOVA) is an extension of t-test, which permits the investigator to
compare more than two means simultaneously.
Researchers studying two or more groups can
use ANOVA to determine whether there are differences among the groups.
For example, nurse investigators who want to assess the levels of
helplessness among three groups of patients--long-term, acute care and
outpatients-can administer an instrument designed to measure levels of
helplessness and then calculate an F ratio. If the F ratio
is sufficiently large, then conclusion can be that there is a
difference between at least two of the means can be drawn.
The larger the F-ratio, the more likely it is
that the null hypothesis can be rejected. Other tests called post hoc
comparisons, can be used to determine which of the means differ
significantly. Fisher’s LSD, Duncan’s new multiple range test, the
Neuman-Keuls, Tukey’s HSD, and Scheffe’s test are the post hoc
comparison tests that are most frequently used following ANOVA. In some
instances a post hoc comparison is not necessary because the means of
the groups under consideration readily convey the differences between
the groups (Brockopp & Hastings-Tolsma, 2003).
Kruskal-Wallis test-more than two samples
The Kruskal-Wallis test is a
simple non-parametric test to compare the medians of three or more
samples. Observations may be interval measurements, counts of things,
derived variables, or ordinal ranks. If there are only three samples,
then there must be at least five observations in each sample. Samples do
not have to be of equal sizes. The statistic K is used to indicate the test value.
Two-way or Factorial Analysis of Variance
Factorial analysis of variance
permits the investigator to analyze the effects of two or more
independent variables on the dependent variable (one-way ANOVA is used
with one independent variable and one dependent variable). The term
factor is interchangeable with independent variable and factorial ANOVA
therefore refers to the idea that data having two or more independent
variables can be analyzed using this technique.
Analysis of Covariance (ANCOVA)
ANCOVA is an inferential
statistical test that enables investigators t adjusts statistically for
group differences that may interfere with obtaining results that
relate specifically to the effects of the independent variable(s) on the
dependent variable(s).
Multivariate Analysis
Multivariate analysis refers to a
group of inferential statistical tests that enable the investigator to
examine multiple variables simultaneously. Unlike other statistical
techniques, these tests permit the investigator to examine several
dependent and independent variables simultaneously.
Choosing the appropriate test
If the data fulfill the requirement of
parametric assumptions, any of the parametric tests which suit the
purpose can be used. O the other hand, if the data do not fulfill the
parametric requirements, any of the non-parametric statistical tests,
which suit the purpose, can be selected. Other factors which decide the
selection of appropriate statistical tests are the number of
independent and dependent variables, and he nature of the variables
(whether nominal, ordinal, interval or ratio). When both independent
and dependent variables are interval measures and are more than one,
multiple correlation is the most appropriate statistic. On the other
hand when they are interval measures and their number is only one,
Pearson r may be used. With ordinal and nominal measures, the
non-parametric statistics are the common choice.
Computer Aided Analysis
The availability of computer
software has greatly facilitated the execution of most statistical
techniques. The many statistical packages run on different types of
platforms or computer configurations. For general data analysis the
Statistical Package for the Social Sciences (SPSS), the BMDP series,
and the Statistical Analysis System (SAS) are recommended. These are
general-purpose statistical packages that perform essentially all the
analyses common to biomedical research. In addition, a variety of other
packages have emerged.
SYSTAT runs on both IBM-compatible and
Macintosh systems and performs most of the analyses commonly used in
biomedical research. The popular SAS program has been redeveloped for
Macintosh systems and is sold under the name JMP. Other commonly used
programs include Stata, which is excellent for the IBM-compatible
computers. The developers of Stata release a regular newsletter
providing updates, which makes the package very attractive. StatView is
a general-purpose program for the Macintosh computer.
Newer versions of StatView include an
additional program called Super ANOVA, which is an excellent set of
ANOVA routines. StatView is user-friendly and also has superb graphics.
For users interested in epidemiological analyses, Epilog is a
relatively low-cost program that runs on the IBM-compatible platforms.
It is particularly valuable for rate calculations, analysis of
disease-clustering patterns, and survival analysis. GB-STAT, is a
low-cost, multipurpose package that is very comprehensive.
SPSS (Statistical Package for Social
Sciences) is one among the popular computer programs for data analysis.
This software provides a comprehensive set of flexible tools that can
be used to accomplish a wide variety of data analysis tasks (Einspruch,
1998). SPSS is available in a variety of platforms. The latest product
information and free tutorial are available at www.spss.com.
Computer software programs that
provide easy access to highly sophisticated statistical methodologies
represent both opportunities and dangers. On the positive side, no
serious researcher need be concerned about being unable to utilize
precisely the statistical technique that best suits his or her purpose,
and to do so with the kind of speed and economy that was inconceivable
just two decades ago. The danger is that some investigators may be
tempted to employ after-the-fact statistical manipulations to salvage a
study that was flawed to start with, or to extract significant
findings through use of progressively more sophisticated multivariate
techniques.
References & Bibliography
- Ahuja R (2001). Research Methods. Rawat Publications, New Delhi. 71-72.
- Brockopp D Y & Hastings-Tolsma M (2003). Fundamental of Nursing Research. 3rd Edition. Jones and Bartlet: Boston
- Cozby P C (2000). Methods in Behavioral Research (7th Edition). Toronto: Mayfield Publishing Co.
- Kerr A W, Hall H K, Kozub S A (2002). Doing Statistics with SPSS. Sage Publications, London.
- Einspruch E L (1998). An Introductory Guide to SPSS for Windows. Sage Publications, Calf.
- Fowler J, Jarvis P & Chevannes M (2002). Practical Statistics for Nursing and Health Care. John Wiley & Sons: England
- Guilliford, J P (1956). Fundamental Statistics in Psychology and Education. New York: McGraw-Hill Book Co.
- Lindquist, E F. (1968). Statistical Analysis in Educational Research. New Delhi: Oxford and IBH Publishing Co.
- Singh AK. (2002). Tests, Measurements and Research Methods in Behavioural Sciences. Bharahty Bhavan. New Delhi.
- Singlton, Royce A. and Straits, Bruce (1999). Approaches to Social Research (3rd Ed), Oxford University Press, New York.
- Streiner, D. & Norman, G. (1996). PDQ Epidemiology (2nd Edition). St. Louis: Mosbey
- Therese Baker L (1988). Doing Social Research, McGraw Hill Book Co., New York.
- Treece E W & Treece J H (1989). Elements of Research in Nursing, The C.V. Mosby Co.,St.Louis.
- Tyler L E (1963).Tests and Measurements. Englewood Cliffs, New Jersey: Prentice Hall, a-p7.b-p.14
- Chalmers TC, Celano P, Sacks H, Smith H(1983). Bias in
treatment assignment in controlled clinical trials. N Engl J Med
309:1358.
- Cohen J (1988). Statistical Power Analysis for the Behavioral Sciences. Erlbaum, Hillsdale, NJ.
- .Cook TD, Campbell DG(1979). Quasi-experimentation: Design and Analysis Issues for Field Studies. Rand-McNally, Chicago.
- Daniel WW (1995) Biostatistics: A Foundation for Analysis in the Health Sciences, ed 6. Wiley, New York.
- Daniel WW (1990). Applied Nonparametric Statistics, ed 2. PWS-Kent, Boston.
- Dawson-Saunders B, Trapp RG (1994) Basic and Clinical Biostatistics, ed 2. Appleton & Lange, Norwalk, CT.
- Edwards LK, editor (1993) Applied Analysis of Variance in Behavioral Science. Marcel Dekker, New York.
- Efron B, Tibshirani R (1991). Statistical data analysis in the computer age. Science 253:390.
- Jaccard J, Becker MA (1997). Statistics for the Behavioral Sciences, ed 3. Brooks/Cole Publishing Co, Pacific Grove, CA.
- Keppel G (1991). Design and Analysis. Prentice-Hall, Englewood Cliffs, NJ.
- Kaplan RM, Grant I, (200). Statistics and Experimental Design in Kaplan & Sadock's Comprehensive Textbook of Psychiatry 7th Edition.
- McCall R (1994). Fundamental Statistics for Psychology, ed 6. Harcourt Brace, & Jovanovich, New York.
- Pett MA (1997). Nonparametric Statistics for Health Care
Research: Statistics for Small Samples and Unusual Distributions. Sage
Publications, Thousand Oaks, CA.
- Sacks H, Chalmers DC, Smith H (1982). Randomized versus historical controls for clinical trials. Am J Med 72:233.
- Ware ME, Brewer CL, editors (1999). Handbook for Teaching Statistics and Research Methods, ed 2. Erlbaum, Mahwah, NJ.
0 comments:
Post a Comment