Content validity and test–retest reliability with principal component analysis of the translated Malay four-item version of Paffenbarger physical activity questionnaire (2022)

Abstract

Purpose

This study aimed to develop the construct validity for the Malay version of the Paffenbarger physical activity questionnaire (PPAQ) by adapting the original questionnaire to suit the local context.

Design/methodology/approach

The PPAQ was adopted and translated into the Malay language and modified to reach good content agreement among a panel of experts. A total of 65 participants aged 22–55 years old, fluent and literate in the Malay language were selected. Principal component analysis (PCA) was used to investigate construct validity. Reliability of this adapted instrument was analyzed according to types of variables.

Findings

The panel of experts reached a consensus that the final four items chosen in the adapted Malay version of PPAQ were valid and supported by a good content validity index (CVI). In total, two domains consonant with the operational domain definition were identified by PCA. Based on scores from intensity and duration of exercise, the study further divided the group into who were physically active and those who chose the unstructured physical activity. Relative reliability after a 14-day interval demonstrated moderate strength of agreement with an acceptable range of measurement error.

Research limitations/implications

PPAQ has been used worldwide but was less familiar in the local context. The Malay-four item PPAQ will provide the locally validated version of physical activity questionnaire. In addition, the authors have improved the original PPAQ by dividing the question items into two distinct domains which will effectively identify those who are physically active and those who are involved in unplanned exercise. Nevertheless, further research is recommended in bigger and heterogeneous samples along with a number of reliability tests.

Practical implications

To the authors’ knowledge, this is the first study to assess internal structure of the four-item version of PPAQ. This analysis successfully identified two components with eigenvalue more than one in the Malay four-item PPAQ. Based on this, the authors were able to separate pool of population into two groups, which are physically active and unplanned exercise (involved in unstructured exercise). The ability of the validated questionnaire to divide the population into various intensities of physical activity is a novel one, which may be useful in many public health studies where high intensity of physical activity; hence, greater energy expenditure is associated with increased longevity, better health benefit and improved cognitive function.

Social implications

In addition, the second domain “unplanned exercise” was successfully grouped together. Implication of the unplanned exercise component is to identify pool of population with active lifestyle awareness and choose the unstructured exercise instead of vigorous and formal exercising. Even though the amount of intensity and duration of incidental exercise does not reach recommended public health recommendation, it has been proven that preferred healthier lifestyle is positively associated with better cognition in later life.

Originality/value

The adapted Malay version of PPAQ has sound psychometric properties and could assist in differentiating groups of population based on their physical activity.

Keywords

  • Paffenbarger physical activity
  • Malay-translated physical activity
  • Test–retest reliability
  • Content validity
  • Principal component analysis

Citation

Ghazali, F.B., Ramlee, S.N.S., Alwi, N. and Hizan, H. (2021), "Content validity and test–retest reliability with principal component analysis of the translated Malay four-item version of Paffenbarger physical activity questionnaire", Journal of Health Research, Vol. 35 No. 6, pp. 493-505. https://doi.org/10.1108/JHR-11-2019-0269

Publisher

:

Emerald Publishing Limited

Copyright © 2020, Fazlisham Binti Ghazali, Siti Nurhafizah Saleeza Ramlee, Najib Alwi and Hazuan Hizan

License

Published in Journal of Health Research. Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this licence may be seen at http://creativecommons.org/licences/by/4.0/legalcode

Introduction

Producing an accurate measurement of physical activity is important for detecting important health associations or effects. Moreover, the choice of an appropriate physical activity measurement tool depends upon the application for which it is intended [1]. We aimed to develop a reliable tool for physical activity measurement to be adapted to primary care in the Malaysian setting.

The Paffenbarger physical activity questionnaire (PPAQ) has been developed to suit the changing terms and guidelines for physical health. The PPAQ was developed by Dr. Ralph Seal Paffenbarger to assess physical activity via questionnaires [2]. Since then, it has been extensively tested for its reliability and validity in large population studies. The current format of PPAQ consists of eight questions that measure not only sedentary lifestyle but also energy expenditure through a physical activity index [3]. A recent study showed that PPAQ is more adept at capturing vigorous activity as it uses more descriptive terms and proper physiological definitions of physical activity intensity [4].

This study aimed to translate and validate the PPAQ which has been used in the Common Cold Project [5] to provide a reliable questionnaire to measure the level of physical activity adapted to the local primary care setting.

Methodology

Study design

To validate the questionnaire, a cross-sectional study was conducted in selected private hospitals in the area of Hulu Langat, Selangor, Malaysia. A total of 65 participants who were staff at the respective hospitals and were literate and fluent in the Malay language were selected using convenient sampling. Subjects were constructed to answer the modified Malay version of PPAQ which took about 15–20minutes to complete. All recruited participants gave consent prior to completing the questionnaire on two occasions, 14 days apart.

Ethical consideration

This study was approved by the Ethics Committee of the Cyberjaya University College of Medical Sciences (Reference number CUCMS/CRERC/FR/023). Permission to carry out the research was granted by the General Manager and Chief Executive Officer of the respective hospitals.

Sample size

The sample size calculation for this study was based on the suggestion by Viechtbauer [6], for studies of similar nature.

n=ln(1y)ln(1π)

where
  • n=number of sample size,

  • y=confidence level (95%) and

  • π=probability for non-responses to occur (0.05).

It was anticipated that problems that might occur would be minor such as nonresponses or item misinterpretation. Hence, it was decided that, if such difficulties are presented themselves with at least π=0.05 probability (i.e. in at least 1 out of 20 participants), it would be good to detect this problem during the validation process. Accordingly, from the above equation, 60 participants needed to be screened to achieve 95% confidence that one or more such problem cases would be encountered.

Measure and procedure

Instrument modification and operational domain definition

It was imperative that the translated version of measurement was clear to respondents, and they perceived the same meaning as what researchers intended to achieve from the questionnaire. Therefore, in this adapted questionnaire, the content was developed and forward translated to Malay through an expert review.

Question 1

In this section, participants were asked if they engaged in any REGULAR physical activity that was long enough to work up a sweat. If the answer was yes, the next question requested them to detail the number of times per week. Physical activity was defined as any bodily movement produced by skeletal muscle that required energy. Translated into the Malay language this was “Aktivitifizikaldidefinasikansebagaipergerakan badan yang memerlukantenaga”. The word skeletal muscle translated into “ototrangka” in Malay was removed for its rarity of medical term usage among the nonmedical Malay population.

The word REGULAR in the original Paffenbarger questionnaire was replaced with more specific term that is “engaged at least once a week,” which was translated into “sekurang-kurangnyasekaliseminggu” in Malay. By establishing consistency and frequency in a week, it would be possible to identify the physically active compared to the more sedentary.

Sweating is commonly associated with physical endurance with a significant linear relationship between sweat excretion and physical intensity [7]. Sweating sooner or more profusely has been a good indicator of physical activity intensity [8]. A question to assess the physical activity that induces sweating would identify those who are physically fit.

Questions 2 and 3

These questions assessed the subject's lifestyle by identifying how many stairs they climbed up each day and the distances they walked on average. Sperandio's study showed that walking less than 500meters per week was the best predictor of physical inactivity [9] as it provided a better research metric for epidemiologic research and better public health targets than walking duration [10].

On the other hand, climbing stairs, an underrated exercise, has been proven to benefit an individual's health [11, 12] and predicted longevity [13] as well as lowering blood pressure and improving fitness [14]. There is no universal consensus about the ideal number of stairs, but 8900–9900 steps per week are recommended [15].

Question 4
  1. Seven day recall

This question was about sports or recreational activity in the past week. The seven-day recall contrasted with the original Paffenbarger physical activity question requesting details of such activity in the past year. Due to limitations in human memory, it was deemed best to keep the reporting interval relatively short.

Kjellsson's experiment showed that the overall level of recall error increased with the length of the recall period [16]. Masse and de Niet's literature reviews showed that seven-day recall can be validly ranked to identify those who are physically active and is sensitive enough to detect changes in physical activity behavior [17]. Therefore, this modified questionnaire required only a short-term recall of seven days.

  1. Restriction to sports and recreational activities

In this study, the specific activity of any sports or recreational activities was used as a heading under question 4.

Question 5 and 6

The effectiveness of public health campaigns depends on people to know the intensity, duration and frequency of physical activity performed [18]. The WHO suggested adults aged 18–64 should do at least 150minutes of physical activity at moderate intensity or 75minutes at vigorous intensity throughout the week to achieve the desired health outcomes. Also, for reasons of practicality, raw data of all components of this complex behavior which include the type (intensity), duration and frequency of physical activity were converted into energy expenditure, i.e. the metabolic equivalent of tasks (METs). Therefore, under questions 5 and 6, we asked about the frequency and duration of physical activity performed.

Translation and back translation

It is imperative that the translated version of measurement was well understood by respondents, and that they perceived the same meaning as what researchers intended to achieve from the questionnaire. Hence, the modified Malay version of the PPAQ was translated into the Malay language by a sports scientist (HH) who was also well versed in both the Malay and English languages. The questionnaire was then back translated into English by an independent professional translator. Another independent professional translator reviewed the back-translated version against the original PPAQ and concluded that no further modification was necessary.

An accredited professional translator then checked the Malay translated version to ensure the terms used were correct and culturally appropriate. The final Malay version was harmonized for any language errors by all the experts until an acceptable translation was developed.

Content validity

A total of four professional bilingual senior sports science lecturers with over five years' research study in the English language medium were requested to determine if the items fully and sufficiently represented the targeted domain.

All four specialists were initially contacted by email and phone. They were provided with a formal invitation letter from Cyberjaya University, including details of the research and instructions. Attached to this was a set of questionnaires in the Malay language with an empty box for them to score each domain on a Likert scale.

The four experts rated the content validity of each test in relation to the five tasks in the rating protocol. The scale was scored as follows: 1=test not being relevant; 2=somewhat relevant; 3=quite relevant and 4=highly relevant. Grades 3 and 4 were considered acceptable. Apart from assessing the content, the four experts were invited to comment in more detail in boxes on the side of each question.

Subjects understanding of the modified questionnaire (cognitive interview)

The final Malay version was pretested on ten respondents randomly picked from the public who fulfilled the criteria of being fair-minded and literate in Malay. They were aged between 20 and 40 years old with an equal mix of genders. The objective was to identify any words and grammatical errors that might affect the comprehension of the respondents. This also included an examination of respondents' cognitive ability to recall the information and assessment of the format and wording to elicit appropriate responses and whether respondents gave socially desirable answers.

Subjects were instructed to share their thoughts about each question and to describe their thought processes before answering each question. Participants were also invited to suggest alternative wording or sentences if they wished. In this session, the examiner read out the questions, and the subjects were answered with minimal interference from the examiner.

At the end of the session, participants were requested to provide more feedback about the length of questions and their clarity. All ten participants agreed that the questions were reasonable, and they were able to recall events pertinent to the questions asked.

Test–retest reliability

Participants were informed that they were required to complete the questionnaire twice at 14 days apart. The researcher was present during the completion period to assist participants if required. All 65 volunteers completed the test–retest assessment.

Analyses and results

Sociodemographic data

A total of 65 respondents ranging from 22 to 55 years old were with mean and standard deviation (SD) of 29.49 and 5.54 years, respectively. Females and Malays were dominant in gender and ethnicity. Most participants had studied beyond secondary school with 47.9 % studying beyond further education level for further 3.5 years.

Statistical analysis

In content validity tests, the initial content validity index (CVI) was used to analyze agreement between four experts judging the relevance of question items used. Further, construct validity test, principal component analysis (PCA) was done by using SPSS version 19. Where in the reliability test, the analysis was divided into two, i.e. analysis of continuous and categorical data; for continuous data, intraclass correlation coefficient (ICC), paired t-test and Bland-Altman plot were used to examine the agreement between two tests at two different times. In addition, standard error mean (SEM), minimal detectable change (MDC) and minimal important difference (MID) were used to demonstrate the absolute reliability of the questionnaire. Moreover, agreement between categorical data was observed from weighted kappa.

Validity test

Content validity index

A panel of four experts reached a consensus that the final items in all six questions were valid to be used. An item-level CVI (I-CVI) was computed by dividing the total number of experts giving a rating of 3 or 4 (relevant) by the total number of experts in which all items scored a rating of 1, as presented in Table 1.

Construct validity

Construct validity was done using confirmatory and exploratory factor analysis with a factor loading of 0.4 or more considered good.

  1. KMO and Bartlett's test of sphericity

Bartlett's test of sphericity resulted in 0.707, which reached statistical significance, supporting the factorability of the correlation matrix [19]. The null hypothesis could be rejected, and the alternate hypothesis that there may be a statistically significant interrelationship between variables was accepted. Hence, factor analysis was considered as an appropriate technique for further analysis of the data.

  1. Confirmatory and exploratory factor analysis

From this analysis, Table 2, two components have been identified with eigen values of more than 1.0 suggesting that dividing the questionnaire into two components was most appropriate.

In further analysis, orthogonal rotation (varimax) was to delineate further the two components with an assumption that what was explained by one factor was independent of information from other factors. Factor rotation made it easier for further interpretation of components.

Rotated component matrix sorted six variables into two overlapping groups each with a loading factor of 0.4 or more. There were blanks in the matrix where weights were less than 0.4 (Table 3). The factor column represented the rotated factors that were extracted out of the total factor. These are the core factors, which will be used as the final factor after data reduction.

The first component suggests that the mode of intensity and duration is highly correlated with each other, which explains about 51% of the variability in the performance of this physical activity questionnaire. The second component consisting of the number of stairs climbed and walking distance per day explained 22% of variance from PCA, as presented in Table 2. Surprisingly, the second component successfully delineated the two question items together which comprised the stairs climbed per day and walking distances per day with a higher loading factor.

  1. Internal consistency test

Cronbach’s alpha was used to measure the internal consistency of the scale. As from the factor analysis calculated earlier, two components were extracted out from this scale (Tables 2 and 4).

In this analysis, all items in component 1 had a moderately high corrected item scale correlation. On the other hand, there was no correlation at all between climbing stairs and walking distances in the second component which is expected considering that both questions were not considered related to each other. The final Malay version PPAQ kept all the questions in view that it makes clinical sense to retain them in the respective components.

Reliability test

An agreement between continuous data, i.e. walking distances per day and stairs climbed per day of Malay version PPAQ at two different times of measurement were analyzed using ICC (two-way random effects, absolute agreement and single rater) for relative reliability, paired t-test and Bland–Altman diagram for systemic bias. Furthermore, the Bland–Altman plot was useful to provide the limit of agreement and to detect outliers possibly caused by errors of measurement [20]. SEM, MDC and MID were used to estimate minimal scores that are not due to error [21]. In contrast, categorical data in this reliability test were examined by using weighted kappa, which is more helpful to provide strength of agreement between two measures.

Determine relative reliability for continuous data
  1. Intraclass correlation coefficient (ICC)

Table 3 shows the acceptable test–retest reliability with ICC ranging 0.534 to 0.623 for both climbing stairs and walking distances per day.

Determine systemic bias for continuous data
  1. Paired t-test

We also found that there were no significant differences for means at 14 days interval with both p-value>0.05 and agreed not to reject the null hypothesis that there was no statistically significant difference between the two tests.

  1. Bland–Altman plot

Potential error of measurement was further analyzed by using the Bland–Altman diagram which addresses if there is any systematic difference between two sets of measurements as well as to identify possible outliers (see Figure 1).

Each sample was represented on the graph by conveying the mean value of the two assessments (x-axis) and the difference between the two assessments (y-axis). The mean difference was the estimated bias, and the SD of the differences measured the fluctuations around this mean (outliers being above 1.96 SD difference).

Determine absolute reliability for continuous data
  1. Standard error of mean (SEM) and minimal detectable change (MDC90))

The findings demonstrate that although test–retest reliability (relative reliability) for the clinical tests was excellent, there was still a substantial degree of variability of performance for individual participants from one test session to the next (absolute reliability). The SEM and MDC90 were calculated to objectify these findings.

SEM was calculated based on the formula:

SEM=SDdifferencemeanX(1ICC)

In accordance, SEM was based on the assumption of normal distribution, and probabilities of the normal curve could be applied to SEM values. In total, 68% probability that repeated questions for climbing stairs and walking distances will be within ±37.7 and±572.6 of the mean score on the first day of assessment, respectively. Thus, a 96% probability that repeated measures will give the values ranging ±75.4 for climbing stairs per day (2×SEM) and ±1145.2meters per day for walking distances (2×SEM).

A 90% value as a confidence interval for MDC was used. Using the formula of MDC90=SEM×1.645 (Z score at 90% confidence interval)×√2, the resulted values for MDC90 are shown in Table 3. Both MDC90 values for climbing stairs and walking distances were out of range from changes of means across the two-time points. The overlapped test–retest scores with the interval of MDC90 value indicated that the changes were likely due to random measurement error.

  1. Minimal Important difference (MID)

This study is the first to determine the measurement error of PPAQ which is an indication of the accuracy of the measurement instrument. COSMIN guidelines proposed that the interpretation of SEM should be based on the value of the MID [22]. However, the true purpose of MID which is to represent the smallest change in score that is considered a relevant outcome is not going to be utilized. Instead, MID was to assess statistical reliability, i.e. measurement errors relying on other statistical measures like SD, SEM and effect size [23]

1SEM: 1×standard error of means; SD: standard deviation and ICC: intraclass correlation coefficient

Determine reliability of categorical data
  1. Weighted kappa

The observed percentage of agreement implies the proportion of ratings where the raters agree, and the expected percentage is the proportion of agreements that is expected to occur by chance as a result of the raters scoring randomly. Hence, kappa is the proportion of agreements that is observed between raters, after adjusting for the proportion of agreements that takes place by chance [24].

By using the formula of

Kappa(K)=PoPc1Pc

where Po=observed agreement and
  • Pc=proportion of agreement by chance.

We were able to generate values of kappa as shown in Tables 57.

Many scholars agreed that it is important to retain the hierarchical nature of the categories.

Therefore, further analysis of the ordinal data, weighted kappa was used to reflect the degree of agreement in terms of their seriousness, as shown in Table 5. In this analysis, quadratic weighting was preferred over linear as the variation coefficients of the former increases with the number of categories, which will be a more desirable weighting scheme given the hierarchical nature of categories.

Discussion

This study aimed to translate PPAQ into the Malay language. The Malay version PPAQ had good interrater reliability and internal structure. The panel of experts reached a consensus that the final items in both domains were valid to be used with item CVI reached a total mutual agreement.

To our knowledge, this is the first study to assess the internal structure of PPAQ. Our analysis successfully identified two components with eigen values more than one in the Malay version PPAQ. The ability of the validated questionnaire to divide the population into various intensities of physical activity is a novel one, which may be useful in many public health studies where high intensity of physical activity; hence, greater energy expenditure is associated with increased longevity, better health benefit and improved cognitive function.

In addition, the second domain “unplanned exercise” was successfully extracted with Q2 and Q3 grouped under principal component analysis.

Analysis of measurement errors in this study was divided into two parts according to the type of variables. In continuous data, which is the unplanned exercise component, we found that the self-reported Malay version PPAQ has fair relative reliability within 14 days of interval.

Limitations of this study

The Malay version of PPAQ will provide a locally validated version of the physical activity questionnaire. Future studies in bigger and heterogeneous samples along with more reliability tests are encouraged to evaluate the validity of this instrument with more objective measures for example accelerometer as in this study; we only measured the reliability and content validity of the translated version for PPAQ. These future studies are particularly important in view of the limitations of subjective measurement to accurately identify those who need further recommendations for health activity.

Conclusion

PPAQ instrument has been used worldwide but is less familiar in the local region of Malaysia. Lack of its translated version and psychometric analysis makes this study imperative as a starting point for further research. Our statistical analysis successfully identified and delineated two major components in accordance with our operational domain definition with fair internal consistency. Hence, the six items were compressed into a four-item questionnaire. Further research is recommended in bigger and heterogeneous samples along with more reliability tests.

Figures

Figure 1

Bland–Altman plots showing limits of agreement between two sets of measurement at 14 days interval for numbers of stairs climbed per day (left-hand side) and distance of walking per day (right-hand side)

Content validity index of Malay version PPAQ

Question item (Malay version)Question item (English version)Expert 1Expert 2Expert 3Expert 4Content validity index (CVI)- item level
Q1. Adakahandamelibatkandiridalam mana-mana aktivitifizikalsepertiberjalanpantas…aktiviti yang mengeluarkanpeluh?Q1. Do you engage in any regular physical activity like brisk walking, i.e. long enough to work up a sweat?34444/4=1.00
Q2. Berapakahjumlahanaktangga yang anda naik pada setiap hari?Q2. How many stairs do you climb up every day?34444/4=1.00
Q3. Berapajauhkahandaberjalandalampuratasetiap hari?Q3. How much of a distance do you walk per day?34434/4=1.00
Q4. Senaraikanaktivitisukanataurekreasi yang andamengambilbahagiandalamtempoh masa seminggu yang lepas. Kami hanyaberminatdengan activity yang aktifQ4. List down any sport or recreational activities you participated in during the past week. We are only interested in the time you were physically active34444/4=1.00
Q5. Berapa kali dalamsemingguandamenjalankanaktivititersebut?Q5. How frequent do you perform the activity in one week?44444/4=1.00
Q6. Purata masa setiapaktiviti?Q6. How long do you do the activity per session?44444/4=1.00

Principle component analysis of Malay version PPAQ

ComponentInitial eigen valuesExtraction sums of squared loadings
Total% of varianceCumulative %Total% of varianceCumulative %
13.03550.58150.5813.03550.58150.581
21.34922.48873.0691.34922.48873.069
30.5719.51682.585
40.4777.94490.529
50.3235.38395.912
60.2454.088100.000

Rotated component atrix: factor loadings (>0.4) for Malay version PPAQ

Component 1Component 2
Structured physical activityUnstructured physical activity
Malay version: intensity of physical activity Q40.919
Malay version: duration of physical activity Q60.870
Malay version: frequency of physical activity Q50.752
Malay version: involves in any physical activity Q10.5440.456
Malay version: climbing stairs/day Q2 0.852
Malay version: walking distances/day (m) Q3 0.797

Note(s): Rotation method: varimax with Kaiser normalization

a. Rotated converged in three iterations

Cronbach’s alpha on each component and their proposed names for Malay version PPAQ

ComponentCronbach’s alphaItemsCronbach’s alpha if item deletedCorrected item-total correction
Physical activity (component 1)0.774Intensity of physical activity (Q4)0.5910.853
Duration of physical activity (Q6)0.7320.665
Frequency of physical activity (Q5)0.7190.578
Involve in any physical activity(Q1)0.7990.468
Unplanned exercise (component 2)0.083Climbing stairs/day (Q2)0.0020.454
Walking distances /day (meters)(Q3)0.4200.454

Average scores of day 1 and 14, paired t-test, intraclass correlation coefficient (ICC) with significant level p≤0.001, standard error of mean (SEM) and minimal detectable change (MDC) at 90% CI

Malay version PPAQD1 of test mean (SD)D14 of test mean (SD)Difference mean (SD)p-value (paired t-test)ICC (p≤0.001)SEMMDC90
Climbing stairs/ day50.08 (40.14)54.22 (56.23)−4.138 (55.20)0.1880.53437.788
Walking distances/day798.77 (843.96)745.38 (930.94)53.385 (932.49)0.8730.623572.61337

Distribution-based estimates of the minimal importance difference (MID)

MethodMid calculationMid climbing stairs per dayMid walking distances per day (meter)
1SEMSD baseline×√(1−ICC)27518
Empirical rule effect size0.08×6×SD difference26448
Cohen's effect size0.5×SD difference28466
0.5×SD0.5×SD baseline20422

Proportions of agreement of physical activity index scores, physical intensity, frequency and duration

CategoryObserved agreement (Po)Chance agreement (Pc)Un weighted Cohen's kappa (95% CI)Quadratic weighted kappa (95% CI)
Physical activity index score0.7780.5610.494 (0.260–0.728)0.613 (0.316–0.909)
Physical activity intensity0.6770.3710.487 (0.306–0.667)0.660 (0.374–0.945)
Physical activity frequency0.7230.3800.5534 (0.378–0.729)0.603 (0.161–0.1.00)
Physical activity duration0.8460.4690.7103 (0.545–0.875)0.778 (0.662–0.895)

References

1Bassett DR Jr. Validity and reliability issues in objective monitoring of physical activity. Res Q Exerc Sport. 2000 Jun; 71(Suppl 2): 30-6. doi: 10.1080/02701367.2000.11082783.

2Lee IM, Matthews CE, Blair SN. The legacy of Dr. Ralph Seal Paffenbarger, Jr. - past, present, and future contributions to physical activity research. Pres Counc Phys Fit Sports Res Dig. 2009 Mar; 10(1): 1-8.

3Paffenbarger RS Jr, Hyde RT, Wing AL, Hsieh CC. Physical activity, all-cause mortality, and longevity of college alumni. N Engl J Med. 1986 Mar; 314(10): 605-13.

4Tonstad S, Herring P, Lee J, Johnson JD. Two Physical activity measures: Paffenbarger physical activity questionnaire versus Aerobics Center Longitudinal Study as predictors of adult-onset type 2 diabetes in a follow-up study. Am J Health Promot. 2018 May; 32(4): 1070-7. doi: 10.1177/0890117117725282.

5.Carnegie Mellon University. The common cold project. . https://www.cmu.edu/common-cold-project/measures-by-study/health-practices/physical-activity/index.html#cahsq.

6Viechtbauer W, Smits L, Kotz D, Budé L, Spigt M, Serroyen J, Crutzen R. A simple formula for the calculation of sample size in pilot studies. J Clin Epidemiol. 2015 Nov; 68(11): 1375-9. doi: 10.1016/j.jclinepi.2015.04.014.

7Holmes N, Miller V, Bates G, Zheo Y. The effect of exercise intensity on sweat rate and sweat sodium loss in well trained athletes. J Sci Med Sport. 2011; 14: e112. doi: 10.1016/j.jsams.2011.11.234.

8Shibasaki M, Crandall CG. Mechanisms and controllers of eccrine sweating in humans. Front Biosci (Schol Ed). 2010 Jan; 2: 685-96. doi: 10.2741/s94.

9Sperandio EF, Arantes RL, Silva RPD, Matheus AC, Lauria VT, Bianchim MS, Romiti M, Gagliardi ARDT, Dourado VZ. Screening for physical inactivity among adults: the value of distance walked in the six-minute walk test. A cross-sectional diagnostic study. Sao Paulo Med J. 2016 Jan-Feb; 134(1): 56-62. doi: 10.1590/1516-3180.2015.00871609.

10Williams PT. Distance walked and run as improved metrics over time-based energy estimation in epidemiological studies and prevention; evidence from medication use. PLoS ONE. 2012; 7(8): e41906. doi: 10.1371/journal.pone.0041906.

11Shenassa ED, Frye M, Braubach M, Daskalakis C. Routine stair climbing in place of residence and body mass index: a pan-European population based study. Int J Obes (Lond). 2008 Mar; 32(3): 490-4. doi: 10.1038/sj.ijo.0803755.

12Meyer P, Kayser B, Kossovsky MP, Sigaud P, Carballo D, Keller PF, Eric Martin X, Farpour-Lambert N, Pichard C, Mach F. Stairs instead of elevators at workplace: cardioprotective effects of a pragmatic intervention. Eur J Cardiovasc Prev Rehabil. 2010 Oct; 17(5): 569-75. doi: 10.1097/HJR.0b013e328338a4dd.

13Lee IM, Paffenbarger RS Jr Associations of light, moderate, and vigorous intensity physical activity with longevity. The Harvard Alumni Health Study. Am J Epidemiol. 2000 Feb; 151(3): 293-9. doi: 10.1093/oxfordjournals.aje.a010205.

14Andersen LL, Sundstrup E, Boysen M, Jakobsen MD, Mortensen OS, Persson R. Cardiovascular health effects of internet-based encouragements to do daily workplace stair-walks: randomized controlled trial. J Med Internet Res. 2013 Jun; 15(6): e127. doi: 10.2196/jmir.2340.

15Tudor-Locke C. Steps to better cardiovascular health: how many steps does it take to achieve good health and how confident are we in this number? Curr Cardiovasc Risk Rep. 2010 Jul; 4(4): 271-6. doi: 10.1007/s12170-010-0109-5.

16Kjellsson G, Clarke P, Gerdtham UG. Forgetting to remember or remembering to forget: a study of the recall period length in health care survey questions. J Health Econ. 2014 May; 35: 34-46. doi: 10.1016/j.jhealeco.2014.01.007.

17Masse LC, de Niet JE. Sources of validity evidence needed with self-report measures of physical activity. J Phys Act Health. 2012 Jan; 9(Suppl 1): S44-55. doi: 10.1123/jpah.9.s1.s44.

18Wicker P, Frick B. The relationship between intensity and duration of physical activity and subjective well-being. Eur J Public Health. 2015 Oct; 25(5): 868-72. doi: 10.1093/eurpub/ckv131.

19Cerny BA, Kaiser HF. A study of a measure of sampling adequacy for factor-analytic correlation matrices. Multivariate Behav Res. 1977 Jan 1; 12(1): 43-7.

20Watson PF, Petrie A. Method agreement analysis: a review of correct methodology. Theriogenology. 2010 Jun; 73(9): 1167-79. doi: 10.1016/j.theriogenology.2010.01.003.

21Haley SM, Fragala-Pinkham MA. Interpreting change scores of tests and measures used in physical therapy. Phys Ther. 2006 May; 86(5): 735-43.

22Mokkink LB, Terwee CB, Patrick DL, Alonso J, Stratford PW, Knol DL, Bouter LM, De Vet HC. The COSMIN checklist for assessing the methodological quality of studies on measurement properties of health status measurement instruments: an international Delphi study. Qual Life Res. 2010 May; 19(4): 539-49. doi: 10.1007/s11136-010-9606-8.

23Copay AG, Subach BR, Glassman SD, Polly DW, Jr, Schuler TC. Understanding the minimum clinically important difference: a review of concepts and methods. Spine J. 2007 Sep-Oct; 7(5): 541-6. doi: 10.1016/j.spinee.2007.01.008.

24Sim J, Wright CC. The kappa statistic in reliability studies: use, interpretation, and sample size requirements. Phys Ther. 2005 Mar; 85(3): 257-68.

Acknowledgements

Declaration of conflicting interests: The author(s) declared no potential conflicts of interest concerning the research, authorship and/or publication of this article.Funding support for this study was provided by a grant from the Cyberjaya University College of Medical Sciences: Grant number CRG /01/03/2018.

Corresponding author

Fazlisham Binti Ghazali can be contacted at: adarhusni@gmail.com

FAQs

What is content validity questionnaire? ›

Content validity refers to the extent to which the items in a questionnaire are representative of the entire theoretical construct the questionnaire is designed to assess.

What is content validity of an instrument? ›

Content validity evaluates how well an instrument (like a test) covers all relevant parts of the construct it aims to measure. Here, a construct is a theoretical concept, theme, or idea—in particular, one that cannot usually be measured directly. Content validity is one of the four types of measurement validity.

How do you measure content validity? ›

The formula of content validity ratio is CVR=(Ne - N/2)/(N/2), in which the Ne is the number of panelists indicating "essential" and N is the total number of panelists. The numeric value of content validity ratio is determined by Lawshe Table.

What is the procedure of content validity? ›

Measuring content validity involves assessing individual questions on a test and asking experts whether each one targets characteristics that the instrument is designed to cover. This process compares the test against its goals and the theoretical properties of the construct.

How do you test validity and reliability of a questionnaire? ›

Follow these six steps:
  1. Establish face validity.
  2. Conduct a pilot test.
  3. Enter the pilot test in a spreadsheet.
  4. Use principal component analysis (PCA)
  5. Check the internal consistency of questions loading onto the same factors.
  6. Revise the questionnaire based on information from your PCA and CA.

How do you find the content validity of a questionnaire? ›

A survey has face validity if, in the view of the respondents, the questions measure what they are intended to measure. A survey has content validity if, in the view of experts (for example, health professionals for patient surveys), the survey contains questions which cover all aspects of the construct being measured.

Why content validity is important? ›

Content validity is often seen as a prerequisite to criterion validity, because it is a good indicator of whether the desired trait is measured. If elements of the test are irrelevant to the main construct, then they are measuring something else completely, creating potential bias.

What is the purpose of content validity? ›

Content validity assesses whether a test is representative of all aspects of the construct. To produce valid results, the content of a test, survey or measurement method must cover all relevant parts of the subject it aims to measure.

What is content validity ratio? ›

Content Validity Ratio

A CV ratio (CVR) is a numeric value indicating the instrument's degree of validity determined from expert's ratings of CV. One rule of thumb suggests that a CVR of at least 0.78 is necessary to deem an item or scale as valid.

What is validity and example? ›

Validity refers to whether a test measures what it aims to measure. For example, a valid driving test should include a practical driving component and not just a theoretical test of the rules of driving.

What is a measure of validity? ›

Measurement validity is the extent to which the data or results of a research method represent the intended variable. Valid results display accuracy in research methods and results. It's important for methods and results to have validity so the research results can be useful and applied in various settings.

What are the types of reliability? ›

There are two types of reliability – internal and external reliability. Internal reliability assesses the consistency of results across items within a test. External reliability refers to the extent to which a measure varies from one use to another.

What is a good content validity score? ›

Researchers recommend that a scale with excellent content validity should be composed of I-CVIs of 0.78 or higher and S-CVI/UA and S-CVI/Ave of 0.8 and 0.9 or higher, respectively.

What is validity and types of validity? ›

Validity can be demonstrated by showing a clear relationship between the test and what it is meant to measure. This can be done by showing that a study has one (or more) of the four types of validity: content validity, criterion-related validity, construct validity, and/or face validity.

What is content validity in quantitative research? ›

Content validity (Rossiter, 2008) is defined as “the degree to which elements of an assessment instrument are relevant to a representative of the targeted construct for a particular assessment purpose” (Haynes et al., 1995, p. 238).

How do you ensure validity and reliability in quantitative research? ›

The reliability and validity of your results depends on creating a strong research design, choosing appropriate methods and samples, and conducting the research carefully and consistently.

How important is validity and reliability of a questionnaire? ›

The purpose of establishing reliability and validity in research is essentially to ensure that data are sound and replicable, and the results are accurate. The evidence of validity and reliability are prerequisites to assure the integrity and quality of a measurement instrument [Kimberlin & Winterstein, 2008].

How do you test reliability of a test? ›

Test-retest reliability is a measure of reliability obtained by administering the same test twice over a period of time to a group of individuals. The scores from Time 1 and Time 2 can then be correlated in order to evaluate the test for stability over time.

What type of validity is used to determine if the research instrument meets the objectives of the study? ›

Criterion or concrete validity is the extent to which a measure is related to an outcome. It measures how well one measure predicts an outcome for another measure.

What is the difference between content validity? ›

To ensure construct validity your test should be based on known indicators of introversion (operationalization). On the other hand, content validity assesses how well the test represents all aspects of the construct. If some aspects are missing or irrelevant parts are included, the test has low content validity.

What is an example of reliability and validity? ›

A simple example of validity and reliability is an alarm clock that rings at 7:00 each morning, but is set for 6:30. It is very reliable (it consistently rings the same time each day), but is not valid (it is not ringing at the desired time).

How can content validity be improved? ›

How can you increase content validity?
  1. Conduct a job task analysis (JTA). ...
  2. Define the topics in the test before authoring. ...
  3. You can poll subject matter experts to check content validity for an existing test. ...
  4. Use item analysis reporting. ...
  5. Involve Subject Matter Experts (SMEs). ...
  6. Review and update tests frequently.

Why is validity important in assessment? ›

Validity will tell you how good a test is for a particular situation; reliability will tell you how trustworthy a score on that test will be. You cannot draw valid conclusions from a test score unless you are sure that the test is reliable. Even when a test is reliable, it may not be valid.

How do you write reliability and validity in a research proposal? ›

You would need to provide detailed information about the validity and reliability of to data collection instrument. For instrument's validity, explain if you intend to assess face, content, construct or other kinds of validity. Then, you will have to address the internal and external validity of your study.

What are the 7 types of validity? ›

Here are the 7 key types of validity in research:
  • Face validity.
  • Content validity.
  • Construct validity.
  • Internal validity.
  • External validity.
  • Statistical conclusion validity.
  • Criterion-related validity.
2 Jan 2021

What affects validity in research? ›

There are eight threats to internal validity: history, maturation, instrumentation, testing, selection bias, regression to the mean, social interaction, and attrition.

What is the difference between content validity and face validity? ›

Content validity is an indicator of whether the research questions all the content about the subject you are trying to measure. While content validity is also a subjective assessment, unlike face validity you can understand whether the content for measurement covers all aspects of the content or not.

What is the difference between content and criterion validity? ›

Criterion validity compares the indicator to some standard variable that it should be associated with if it is valid. Content validity examines whether the indicators are capturing the concept for which the latent variable stands.

What is content validity in qualitative research? ›

Qualitative research investigates events that are difficult to quantify mathematically, such as beliefs. Content validity, derived during concept elicitation, is the measurement property that assesses whether items are comprehensive and adequately reflect the patient perspective for the population of interest.

What is an example of internal validity? ›

Another example of internal validity is time priority or proving that the cause occurred before the consequence. One could argue that smoking cigarettes cause lung cancer by demonstrating that most of those treated had a smoking history.

What is an example of a valid argument? ›

A valid argument is an argument in which the conclusion must be true whenever the hypotheses are true. In the case of a valid argument we say the conclusion follows from the hypothesis. For example, consider the following argument: “If it is snowing, then it is cold. It is snowing.

What are the four types of validity? ›

There are basically four major types of Validity. These types are Internal, External, Statistically Conclusive and Construct.

What is the best definition of validity? ›

Definition of validity

: the quality or state of being valid: such as. a : the state of being acceptable according to the law The validity of the contract is being questioned.

What statistical test is used for validity? ›

Statistical analyses, such as correlations, are used to determine if criterion-related validity exists. Scores from the instrument in question should be correlated with an item they are known to predict. If a correlation of > . 60 exists, criterion related validity exists as well.

What measures validity in statistics? ›

Statistical validity can be defined as the extent to which drawn conclusions of a research study can be considered accurate and reliable from a statistical test. To achieve statistical validity, it is essential for researchers to have sufficient data and also choose the right statistical approach to analyze that data.

What are the 5 reliability tests? ›

Here are some common ways to check for reliability in research:
  • Test-retest reliability. The test-retest reliability method in research involves giving a group of people the same test more than once. ...
  • Parallel forms reliability. ...
  • Inter-rater reliability. ...
  • Internal consistency reliability.

What are the different types of reliability and give examples of each? ›

The 4 Types of Reliability in Research | Definitions & Examples
Type of reliabilityMeasures the consistency of…
Test-retestThe same test over time.
InterraterThe same test conducted by different people.
Parallel formsDifferent versions of a test which are designed to be equivalent.
3 more rows
8 Aug 2019

What are the 5 types of reliability? ›

Types of reliability
  • Inter-rater: Different people, same test.
  • Test-retest: Same people, different times.
  • Parallel-forms: Different people, same time, different test.
  • Internal consistency: Different questions, same construct.

What is the difference between content validity ratio and content validity index? ›

The content validity index is the mean of the content validity ratios for each item on the form. The formula for calculating the CVI is shown below.

How do you validate a research instrument? ›

As a process, validation involves collecting and analyzing data to assess the accuracy of an instrument. There are numerous statistical tests and measures to assess the validity of quantitative instruments, which generally involves pilot testing.

How do you improve validity in quantitative research? ›

You can increase the validity of an experiment by controlling more variables, improving measurement technique, increasing randomization to reduce sample bias, blinding the experiment, and adding control or placebo groups.

What is nature of validity? ›

Nature of Validity. 1. Validity refers to the appropriateness of the interpretation of the results of a test or evaluation instrument for a given group of individuals, and not to the instrument itself. 2. Validity is a matter of degree; it does not exist on an all-or-none basis.

› researchmethods › chapter › reliabi... ›

Reliability refers to the consistency of a measure. Psychologists consider three types of consistency: over time (test-retest reliability), across items (intern...
When you measure or test something, you must make sure your method is valid. Validity is split up into construct, content, face and criterion.
Reliability refers to the consistency of a measure. Psychologists consider three types of consistency: over time (test-retest reliability), across items (intern...

What is content validity and examples? ›

the extent to which a test measures a representative sample of the subject matter or behavior under investigation. For example, if a test is designed to survey arithmetic skills at a third-grade level, content validity indicates how well it represents the range of arithmetic operations possible at that level.

What is content validity in quantitative research? ›

Content validity (Rossiter, 2008) is defined as “the degree to which elements of an assessment instrument are relevant to a representative of the targeted construct for a particular assessment purpose” (Haynes et al., 1995, p. 238).

What is a validated questionnaire? ›

Questionnaire validation is a process in which the creators review the questionnaire to determine whether the questionnaire measures what it was designed to measure. If a questionnaire's validation succeeds, the creators label the questionnaire as a valid questionnaire.

What is content validity in qualitative research? ›

Qualitative research investigates events that are difficult to quantify mathematically, such as beliefs. Content validity, derived during concept elicitation, is the measurement property that assesses whether items are comprehensive and adequately reflect the patient perspective for the population of interest.

What is the importance of content validity in research? ›

Content validity is often seen as a prerequisite to criterion validity, because it is a good indicator of whether the desired trait is measured. If elements of the test are irrelevant to the main construct, then they are measuring something else completely, creating potential bias.

What is the purpose of content validity? ›

Content validity assesses whether a test is representative of all aspects of the construct. To produce valid results, the content of a test, survey or measurement method must cover all relevant parts of the subject it aims to measure.

What is content validity and why is it important? ›

Content validity refers to the ability of a test to capture a measure of the intended domain. Identification of the pertinent domain, and obtaining agreement on it, are of primary importance to content validation.

What is an example of reliability and validity? ›

A simple example of validity and reliability is an alarm clock that rings at 7:00 each morning, but is set for 6:30. It is very reliable (it consistently rings the same time each day), but is not valid (it is not ringing at the desired time).

How do you ensure validity and reliability in quantitative research? ›

The reliability and validity of your results depends on creating a strong research design, choosing appropriate methods and samples, and conducting the research carefully and consistently.

How do you write validity and reliability in a research proposal? ›

You would need to provide detailed information about the validity and reliability of to data collection instrument. For instrument's validity, explain if you intend to assess face, content, construct or other kinds of validity. Then, you will have to address the internal and external validity of your study.

How do you validate a translated questionnaire? ›

The recommendation to translate is to get the survey translated by a native speaker who has good knowledge of both the languages and then get it back translated by a similarly knowledgeable bilingual. Match the back translation with the original and you know whether the translations have included all the nuances.

How do you validate research findings? ›

A mathematical approach to validating a result in a research paper is to do the following:
  1. Give a property of the result. ...
  2. Give a Lemma needed to prove a property like the one you have stated.
  3. Give a Theorem about the result property that can be proved using the Lemma.

How do you validate data in research? ›

Data Validation Methods

Be consistent and follow other data management best practices, such as data organization and documentation. Document any data inconsistencies you encounter. Check all datasets for duplicates and errors. Use data validation tools (such as those in Excel and other software) where possible.

What are the types of validity in qualitative research? ›

Maxwell (1992) identified five different types of validity: descriptive, interpretive, theoretical, generalization and evaluative.

How do you improve validity in quantitative research? ›

You can increase the validity of an experiment by controlling more variables, improving measurement technique, increasing randomization to reduce sample bias, blinding the experiment, and adding control or placebo groups.

Top Articles

You might also like

Latest Posts

Article information

Author: Otha Schamberger

Last Updated: 09/14/2022

Views: 5473

Rating: 4.4 / 5 (75 voted)

Reviews: 90% of readers found this page helpful

Author information

Name: Otha Schamberger

Birthday: 1999-08-15

Address: Suite 490 606 Hammes Ferry, Carterhaven, IL 62290

Phone: +8557035444877

Job: Forward IT Agent

Hobby: Fishing, Flying, Jewelry making, Digital arts, Sand art, Parkour, tabletop games

Introduction: My name is Otha Schamberger, I am a vast, good, healthy, cheerful, energetic, gorgeous, magnificent person who loves writing and wants to share my knowledge and understanding with you.