We conducted a cross-sectional study on 374 subjects – 196 (52.4%) with hemoglobin AA (HbAA), 97 (25.9%) with hemoglobin AS (HbAS) and 81 (21.7%) with hemoglobin SS (HbSS) pattern on cellulose acetate paper elecrophoresis at pH 8.4. The subjects were recruited from Departments of Pathology at two centers in India – Mahatma Gandhi Institute of Medical Sciences, Wardha (n = 216) and Indira Gandhi Government Medical College, Nagpur (n = 158). The characteristics of subjects recruited at the Wardha center have been described previously . In this group of subjects ~69% were males and none of the studied blood parameters were significantly different across gender. In the Nagpur center, all the sickle cell subjects were males. Overall, our study sample was predominantly male (~82%). The two study centers were comparable with respect to the level of health care provided, socioeconomic status of the subjects, diagnostic protocol for sickle cell disease inclusive of hemoglobin electrophoresis and measurement of peripheral blood indexes on an automated cell counter, therapeutic protocol for sickle cell disease and the frequency of infectious disease observed in the sickle cell subjects (data not shown). There were 196 (52.4%) children (age<12 years), 71 (19.0%) adolescents (age 12 – <18 years) and 107 (28.6%) adults (age ≥ 18 years) in our study sample. The study protocol was approved by the Ethical Committee of the Indira Gandhi Government Medical College, Nagpur and the institutional review board of the Lata Medical Research Foundation.
We used principal components factor analysis (PCF) to gain insights into the pathophysiology of sickle cell anemia. This analysis permits the representation of each variable as a linear combination of the latent factors  as zij = Σbikfkj + ui, where z represents the standardized value of the ith variable for the jth subject, b represents the factor loading of the ith variable on the kth factor, f represents the factor score of the kth factor for the jth subject and u represents the uniqueness of the ith variable. For the purposes of our analyses, we used PCF on nine variables: red blood cell count (RBC), hemoglobin concentration (HB), packed cell volume (PCV), mean corpuscular volume (MCV), mean corpuscular hemoglobin (MCH), mean corpuscular hemoglobin concentration (MCHC), red cell distribution width (RDW), platelet count (PLT) and total leukocyte count (TLC).
Our overall analytical strategy was as follows: i) We used PCF only on the HbAA subjects. We undertook this step since we first needed to understand the principal components derived from normal subjects and second we needed to see if this derived factor structure is altered in subjects with the sickle cell gene – either in homozygous or heterozygous state. Hemoglobin AA genotype represents the normal non-mutated adult hemoglobin. Therefore this group of subjects served as the reference group for deriving the principal components. The PCF solution in this group of subjects was obtained using a criterion of a minimum eigenvalue of 1. Then we optimized the factor solution using varimax rotation. ii) Using the factor structure so obtained we estimated the factor scores on all subjects including the HbAS and HbSS. In other words we applied the factor structure of the HbAA subjects to all the subjects. iii) We then compared the factor scores for each identified factor across the HbAA, HbAS and HbSS groups as well as across the different age groups. Our decision to categorize age was based on the following reasons considered in unison: a) we wished to examine if particular clinically relevant age groups influenced the factor structure rather than whether age is a predictor of the factor scores; b) if age would have been used as a continuous variable then it would have constrained our analysis around an implicit assumption of a linear association between factor scores and age. However, as our age categorization indicates, we had no a priori reason for assuming this linear relationship; and c) we conducted analysis of variance using factor scores as the dependent variable and the sickle cell genotype and age as the predictors. The results of these analyses indicated (especially for the second factor score) that the model fit was better when age categories were used as the predictor. These results (data not shown) suggested that the assumption of linearity for our dataset was not valid. iv) We used analysis of variance (ANOVA) to understand the simultaneous and interacting influence of age and hemoglobin on the identified factor scores.