Evidence for Reliability and Validity of Functional Performance Testing in the Evaluation of Nonarthritic Hip Pain
The single-legged–squat test (SLST) and step-down test (SDT) are 2 functional performance tests commonly used to evaluate active people with nonarthritic hip pain and dysfunction. However, evidence to support the use of the SLST and SDT in this population is lacking. To offer evidence of reliability and validity for the SLST and SDT in evaluating patients with nonarthritic hip pain. Cross-sectional study. Orthopaedic surgeon's clinical office. Forty-five patients (27 female and 18 male participants; age = 28.5 ± 10 years, height = 171.6 ± 10.1 cm, weight = 73.9 ± 15.2 kg, and body mass index = 25 ± 4.1) diagnosed with nonarthritic hip pain. Participants performed the SLST and SDT. Interrater reliability and validity of passive internal rotation of the hip, visual analog scale (VAS) scores, and hip outcome scores (HOSs) for limitations in activities of daily living and sport-related activities (SRAs) were collected. Interrater reliability was moderate to excellent for both the SLST (0.603–0.939) and SDT (0.745–0.943). Participants who passed or failed the SLST and SDT differed on the following measures: VAS for the SLST (F1,43 = 16.21, P < .001); VAS for the SDT (F1,43 = 13.41, P = .001); HOS-activities of daily living for the SLST (F1,40 = 5.15, P = .029); HOS-SRAs for the SLST (F1,40 = 7.48, P = .009); and HOS-SRAs for the SDT (F1,40 = 6.42, P = .015). Our study offers evidence for the use of the SLST and SDT as reliable and valid functional performance tests in the evaluation of physical function for patients with nonarthritic hip pain.Context
Objective
Design
Setting
Patients or Other Participants
Main Outcome Measure(s)
Results
Conclusions
Functional performance tests are often used to evaluate dynamic movement patterns that combine range of motion, strength, and proprioception. These tests reflect the physical demands and neuromuscular control needed for sport-related movements. The single-legged–squat test (SLST) and step-down test (SDT) are 2 functional performance tests commonly used in the clinical setting. Although the SLST and SDT are frequently performed to evaluate basic dynamic movement patterns of the trunk and lower extremity,1 their use as functional performance tests for patients with nonarthritic hip pain and dysfunction has not yet been defined in the literature.
The SLST and SDT account for several deviations in the hip, pelvis, and trunk that are considered important when assessing patients with hip pain and dysfunction.1,2 The overall normal movement patterns during descent in both the SLST and SDT include hip and knee flexion with anterior pelvic tilt, trunk flexion, and hip adduction with knee internal rotation and abduction.1,3,4 Visual observation of the SLST and SDT has been shown to be reliable for evaluating kinematic and biomechanical deficiencies of the hip, pelvis, and trunk in healthy people.5,6 These tests have also been established as valid for assessing dynamic lower extremity control and hip muscle function in both healthy people and those with diagnosed hip chondropathy.2,5,7,8 The SLST and SDT are performed in similar ways, but they produce different movement patterns, muscular-recruitment patterns, and stresses on the intra-articular structures of the hip.1,9,10 Specifically, the SLST is performed with more knee abduction, whereas the SDT is performed with greater hip adduction.1 An increase in hip-abduction kinematics needed during the SDT can cause greater activation of the medial and lateral hamstrings as compared with the SLST.4
Conditions associated with hip-joint pain in the absence of severe degenerative joint disease are defined as nonarthritic hip pain and include femoroacetabular impingement (FAI), acetabular labral tears, dysplasia, structural instability (ie, acetabular retroversion, femoral anteversion), and ligamentum teres tears.11,12 These conditions are believed to result from repetitive microtrauma that occurs during dynamic movement between the proximal femur and the acetabulum.11,13 Excessive femoral head motion and joint instability can also cause deficiencies in and overactivation of the surrounding hip musculature, leading to increased intra-articular symptoms over time.14–16 With increased attention on nonarthritic hip injuries,17,18 these conditions are being identified and diagnosed more commonly, especially in the young, athletic population. Although functional performance tests are commonly used to evaluate active people with hip pain and dysfunction,1,4,19 studies establishing their reliability and validity in patients with nonarthritic hip pain are limited.
Both the SLST and SDT could be useful in assessing deficiencies relating to the hip and surrounding musculoskeletal structures among patients with nonarthritic hip pain and dysfunction. However, evidence supporting the use of the SLST and SDT in this population is lacking. Therefore, the purpose of our study was to offer evidence of reliability and validity for the SLST and SDT in evaluating patients with nonarthritic hip pain. Our first hypothesis was that the interrater reliability between differentially trained musculoskeletal experts evaluating both the SLST and SDT would be moderate to excellent. Our second hypothesis was that we could establish validity by demonstrating that people who passed the SLST and SDT had greater passive internal rotation of the hip (IR), lower reported pain levels, and greater self-reported levels of function than those who failed.
METHODS
For our cross-sectional study, we compared evaluations between a certified athletic trainer (R.P.M.) and a Board-certified orthopaedic surgeon and sports medicine specialist with more than 10 years' experience performing arthroscopic hip preservation surgery (J.J.C.). The independent variables were evaluation of test performance (passing or failing) on the SLST and SDT. The main outcome variables were passive IR, visual analog scale (VAS) score, and hip outcome score (HOS) for limitations in activities of daily living (ADLs) and sport-related activities (SRAs).
Participants
Forty-five patients consecutively diagnosed with nonarthritic hip pain who met the inclusion criteria and did not meet the exclusion criteria participated in our study. They consisted of 27 female and 18 male participants with a mean age = 28.5 ± 10 years (range, 14–48 years), height = 171.6 ± 10.1 cm (range, 155–190.5 cm), weight = 73.9 ± 15.2 kg (range, 41.7–108.9 kg), and body mass index (BMI) = 25 ± 4.1 (range, 16.3–35.4). These physically active participants reported an average of 24.2 ± 24.2 months' (range, 1–144 months') duration of symptoms relating to nonarthritic hip pain. They were evaluated by the orthopaedic surgeon and diagnosed with the following conditions: 40 with acetabular labral tears (89%), 20 with FAI (44%), 9 with dysplasia (20%), 5 with structural instability (11%), and 3 with ligamentum teres partial tears (7%). All participants and parents or guardians (when applicable) approved and signed the written informed consent and authorization to disclose protected health information for a research study established under the Allegheny Singer Research Institute Institutional Review Board.
Inclusion criteria were age between 14 and 49 years; BMI <40; clinical diagnosis of intra-articular injury (confirmed by magnetic resonance imaging or magnetic resonance arthrogram evaluated by a radiologist and the orthopaedic surgeon); ambulation without mobility aids or assistance; physical ability to perform the SLST and SDT using the unaffected leg; and ability to read and understand English.
Exclusion criteria were age >49 years; BMI ≥40; moderate to severe (Tönnis 2 or 3) osteoarthritic change of the hip20; any previous surgical intervention on the affected hip; documented current injury to the lumbar spine, knee, or ankle on the affected side within the previous 6 months; or concurrent extra-articular, musculoskeletal condition confirmed by magnetic resonance imaging or magnetic resonance arthrogram (eg, gluteus tendinopathy, trochanteric bursitis, hamstrings tendinopathy).
Data Collection
The orthopaedic surgeon evaluated and recorded IR with the participant in a supine position with the hip and knee positioned at a 90° angle during the initial physical examination. The VAS scores for current pain level, HOS-ADLs, and HOS-SRAs were completed by the participants before performing the functional tests. The VAS was quantified on a scale of 0 to 10, whereas the HOS-ADLs and HOS-SRAs were both quantified on scales of 0 to 100. The VAS has been shown to be a reliable and valid psychometric response scale for pain in participants with spine fractures and dislocations.21 Both the HOS-ADLs and HOS-SRAs have demonstrated high reliability and responsiveness of testing as well as high correlations with measures of physical function in patients with nonarthritic hip pain.22–24
Functional Test Procedures
A standardized protocol for administering both the SLST (Figure 1) and SDT (Figure 2) was determined from a prior literature review25 and incorporated into the routine clinical practice of the orthopaedic surgeon. Participants were required to wear shorts or tight-fitting pants that enabled the evaluators to observe their lower extremity position throughout the performance of both functional tests. The primary investigator (R.P.M.) demonstrated test performance for both the SLST and SDT. Participants were then instructed to perform both tests on the unaffected limb in the presence of the primary investigator. Three repetitions of each test were then completed to evaluate the participant's ability to understand and perform the proper technique before performing the tests on the affected side.



Citation: Journal of Athletic Training 54, 3; 10.4085/1062-6050-33-18



Citation: Journal of Athletic Training 54, 3; 10.4085/1062-6050-33-18
Single-Legged–Squat Test
A “T” (horizontal distance = 15.24 cm [6 in], vertical distance = 25.4 cm [10 in]) was marked with 3.81-cm (1.5-in) -wide white athletic tape on the floor. Participants were instructed to stand barefoot with their legs shoulder-width apart and parallel and arms at their sides. They were told to place the unaffected foot on the long axis of the T-shape with the second metatarsal aligned perpendicular to the stem but not touching the line. The participants then transitioned to single-legged stance on the unaffected leg with the non-stance knee flexed to 90° and the thigh vertically aligned with the stance leg. While maintaining a straight trunk, the participants were then asked to squat in a balanced and controlled motion at a rate of 1 squat per 2 seconds until they could no longer see the line in front of their toes (approximately 45° to 60° of flexion).
Step-Down Test
Participants were instructed to stand barefoot on a standardized step 20- to 25-cm high with their legs shoulder-width apart and parallel and arms at their sides. They were then asked to transition to single-legged stance on the unaffected leg, with the non-stance knee extended out from the step with the foot in dorsiflexion. The stance leg was positioned so that the toes were even with the front edge of the step. While maintaining a straight trunk, participants were then told to bend the knee of the stance leg until the heel of the contralateral leg touched the floor. Without putting weight on the heel, they returned to the starting position at a rate of 1 squat per 2 seconds.
Functional Test Evaluation
Three trials of the SLST and SDT using the affected extremity were performed in front of the primary investigator and orthopaedic surgeon. The testing order for the SLST and SDT was randomized for all participants. Both investigators completed forms to evaluate each participant's test performance on the SLST and SDT. Each repetition of both the SLST and SDT on the affected extremity was assessed for (1) overall impression of the trials (including balance and evaluation of the arm strategy), (2) posture or movement of the trunk, (3) posture or movement of the pelvis, (4) hip-joint movement and posture, (5) knee-joint movement and posture, and (6) depth of the squat.7,8,26,27 Along with an overall impression, each repetition was graded as positive for deviation or negative for deviation on the other 5 criteria (Table 1). For the participant to pass, the evaluator must grade the overall impression of test performance as passing and 4 of the 5 specific criteria must be negative for deviation. A passing grade on at least 1 of the 3 repetitions was needed for the overall evaluation to be graded as passing. Therefore, failing 2 of the 3 tests still resulted in passing.

Sample Size
To determine the sample size needed for our study, we performed a power analysis (version 3.1.9.1; G*Power; Universität Dusseldorf, Dusseldorf, Germany) for validity based on a 1-way (a priori) analysis of variance (ANOVA) with omnibus fixed effects. Our power analysis was derived from a pilot study of 9 patients with nonarthritic hip pain evaluated by the primary investigator and orthopaedic surgeon. Two people passed the SLST with a mean HOS-SRAs of 61.05 ± 3.92, whereas 7 people failed the SLST with a mean HOS-SRAs of 45.72 ± 16.31. Based on this sample, a calculated effect size of 0.6373290, α error probability of .05, and power value of 0.80, the total sample size needed was 22 (2 groups of 11 people). Given the pilot study's demonstration that roughly 25% of patients with nonarthritic hip pain passed the SLST, 44 participants were required.
Statistical Analysis
Reliability
Statistical analysis for reliability was evaluated as the interrater reliability between the primary investigator and orthopaedic surgeon. Interrater reliability was first assessed as an interclass correlation coefficient (ICC) with a 2-way mixed model (3,1) to compare the total number of deviations (out of 6) assessed by both investigators for each repetition of the SLST and SDT. Interrater reliability using the Cohen κ statistic was assessed for the overall evaluation of passing or failing on each repetition of the SLST and SDT. Reliability was also assessed using the κ statistic for a dichotomous assessment of positive for deviation versus negative for deviation for each repetition of the SLST and SDT in evaluating the trunk, pelvis, hip, knee, and depth of squat. The range of values for both the ICC (3,1) and κ coefficient was 0.0 to 1.0, with values closer to 1 showing higher reliability.28 A value for either the ICC or κ of ≥0.75 was considered excellent; between 0.74 and 0.40, moderate; and <0.40, poor.29
Validity
Statistical analysis of validity was assessed via IR, VAS, HOS-ADLs, and HOS-SRAs between participants with passing and those with failing scores on the SLST and SDT. We performed a 1-way ANOVA for each value (SPSS version 23; IBM Corp, Armonk, NY) to identify any differences between the means of those who passed and those who failed the SLST and SDT.
RESULTS
Reliability
The ICC (3,1) and κ values for interrater reliability are presented in Table 2. The ICC (3,1) values of 0.939 for the SLST and 0.942 for the SDT demonstrated excellent interrater reliability between the primary investigator and orthopaedic surgeon in evaluating participants for the total number of deviations on each repetition. The κ values for the overall evaluation of passing or failing on each repetition of the SLST (0.933) and SDT (0.841) indicated excellent reliability. The κ values for the evaluation of the trunk, pelvis, hip, knee, and depth of squat showed moderate to excellent interrater reliability for both the SLST (0.603–0.831) and SDT (0.745–0.943).

Validity
Of the 45 people who participated in this study, 11 passed the SLST and 6 passed the SDT. The mean and standard deviation values of IR, VAS, HOS-ADLs, and HOS-SRAs for patients who passed and those who failed the SLST and SDT are presented in Table 3. One-way ANOVAs were conducted to examine the relationships between those participants who passed and those who failed the SLST and SDT (Table 4). The patients who passed and those who failed differed on the following measures: VAS for the SLST (F1,43 = 16.21, P < .001); VAS for the SDT (F1,43 = 13.41, P = .001); HOS-ADLs for the SLST (F1,40 = 5.15, P = .029); HOS-SRAs for the SLST (F1,40 = 7.48, P = .009); and HOS-SRAs for the SDT (F1,40 = 6.42, P = .015). No differences were found for the following measures: IR for the SLST (F1,43 = 0.63, P = .431); IR for the SDT (F1,43 = 0.14, P = .710); and HOS-ADLs for the SDT (F1,40 = 2.83, P = .101).


DISCUSSION
Our study offers evidence of reliability and validity for the SLST and SDT as measures of functional performance for patients with nonarthritic hip pain and dysfunction. These results confirmed our first hypothesis, demonstrating moderate to excellent interrater reliability between a certified athletic trainer and orthopaedic surgeon in evaluating SLST and SDT. Although both tests were reliable, greater agreement was noted for the SDT. The SDT was also more difficult to pass than the SLST for people with nonarthritic hip pain. Self-reported pain and physical function during SRAs were different between participants who passed and those who failed the SLST and SDT. However, self-reported physical function in ADLs was only different between those who passed and those who failed the SLST. Due to the difficulty of performing the test and the insignificant relationship with physical function during ADLs, the SDT could reflect higher-level functional performance than the SLST. Therefore, including both the SLST and SDT in a comprehensive clinical examination could be an effective way of evaluating limitations in the daily and sport-related function of patients with nonarthritic hip pain.1,4
Nonarthritic hip pain is typically evaluated through a combination of diagnostic imaging and a comprehensive clinical examination.11 Internal rotation is commonly measured during the examination for people with intra-articular hip conditions. Limited IR could affect performance on the SLST and SDT. However, our results demonstrated no difference in IR between those who passed and those who failed the SLST and SDT. Thus, our findings did not support the hypothesis that participants who passed the SLST and SDT would have greater passive IR than those who failed the tests. The diverse hip conditions experienced by patients in our study could explain why the amount of IR did not influence test performance on the SLST and SDT. Not all intra-articular conditions cause functional limitation of IR, as indicated by the presence of participants with dysplasia and structural instability and the fact that only 20 of the 45 participants were diagnosed with FAI.
Together with a thorough physical examination, a comprehensive clinical examination should include outcome measures that have been shown to be reliable and valid in constructing a satisfactory representation of a person's self-reported pain and physical function. All participants were administered the VAS and HOS before they performed the SLST and SDT. In administering these measures, we could determine the relationship of outcomes to a participant's success in passing the SLST and SDT. The patients who passed and those who failed the SLST for the VAS, HOS-ADLs, and HOS-SRAs differed in that the former demonstrated less pain, greater functional ability in their ADLs, and greater functional ability in their SRAs than the latter. This confirms our hypothesis that participants who passed the SLST would report less pain and greater levels of physical function in their ADLs and SRAs.
The patients who passed and those who failed the SDT differed on the VAS and HOS-SRAs results. However, the HOS-ADLs were not different. Participants who passed the SDT demonstrated less pain and greater functional ability in their SRAs than those who failed. The former did not display more functional ability in their ADLs than the latter, yet their mean scores differed by 9.7 points. Self-reported pain and physical function during SRAs differed between patients who passed and those who failed the SDT. Although the former reported less pain and greater function during their SRAs, they did not demonstrate greater function during ADLs than the latter. Due to the difficulty most participants had performing the test, the SDT could indicate higher-level function in participants with nonarthritic hip pain and therefore not affect the lower-level function associated with ADLs.
Limitations of our study need to be considered when interpreting the results. Internal rotation was evaluated visually by the secondary investigator during the comprehensive physical examination. Previous authors30 demonstrated no difference between an experienced orthopaedic surgeon's visual assessment of hip IR and goniometric measurements performed by 2 experienced physiotherapists. The orthopaedic surgeon had 11 years' experience at the time of our study and was able to accurately assess IR during the initial physical examination. Other passive range-of-motion measurements could also have been evaluated in our study, including hip flexion, extension, abduction, and external rotation, for relationships to the functional performance tests. Caution should be exercised when generalizing the results of our study to other populations. Further studies are needed to confirm these results with multiple testers of different backgrounds (eg, physical therapist, primary care physician) among participants with other lower extremity and hip disorders. Three-dimensional motion-analysis technology could add quantitative information to validate the use of the SLST and SDT in future studies.
Deficiencies in neuromuscular control during dynamic weight-bearing activities have been shown to dramatically change functional movement patterns and increase the risk for musculoskeletal injuries.31,32 Lost strength, functional motion, and proprioception during weight-bearing activities combine to cause neuromuscular deficiencies that decrease the dynamic stability of the hip, pelvis, and trunk.14 Deficiencies in people with nonarthritic hip pain during dynamic movements should be evaluated before a rehabilitation intervention or conservative treatment is prescribed.33 Both the SLST and SDT could be beneficial for evaluating and screening people reporting nonarthritic hip pain; however, these functional performance tests should not be used to identify specific impairments.
CONCLUSIONS
The SLST and SDT were used to assess patients with diagnoses of different intra-articular hip conditions. Interrater reliability for both tests was moderate to excellent. Self-reported pain and physical function during ADLs and SRAs differed between participants who passed the SLST and those who failed. Self-reported pain and physical function during SRAs differed between participants who passed the SDT and those who failed. We offer evidence for the SLST and SDT as reliable and valid functional performance tests in evaluating physical function among patients with nonarthritic hip pain.

The single-legged–squat test. A, Initial test position. B, Squat position.

The step-down test. A, Initial test position. B, Step-down position.
Contributor Notes