R Tutorial: Partition LR Chi‐Square I x J (5 x 2) Contingency Table

An R Tutorial by W. Greg Alvord, National Cancer Institute.

We partition (decompose) a 5 x 2 contingency table with R. The data are from Alan Agresti’s Categorical Data Analysis (1990, 2nd ed.) – Table 3.10, Problem 3.6, page 72.

Two hundred seventy-six (276) psychiatric patients were cross classified as to their diagnosis in one of five psychiatric groups: (1) Schizophrenia, (2) Affective Disorder, (3) Neurosis, (4) Personality Disorder, and (5) Special Symptoms and as to whether (or not) they were prescribed drugs in their treatment regimens.

We examine the relationship between the patients’ diagnostic class (Diagnosis) and whether or not drugs were prescribed (Drugs.Rx). We use the Likelihood Ratio Chi-Squared statistic (as opposed to the Pearson statistic), also known as LR χ2 (LR X^2, G2) to test for independence between Diagnosis and Drugs.Rx. (Independent partitionings of χ2 have the property that their LR values and degrees of freedom are additive (Agresti, 1990, pp 50-51)). One way to do this is with the loglm() function from the MASS package. First, load the MASS package (Venables and Ripley). Next, perform a ‘global’ test for the hypothesis of independence (no association) between Diagnosis and Drugs.Rx. The null hypothesis states that Diagnosis and Drugs.Rx are independent.

Reject the null hypothesis that Diagnosis and Drugs.Rx are independent; the LR X^2 value is 96.54 on 4 df, p << 0.0001. [Note: Ignore the Pearson X^2 value in these analyses.]

The loglinear analysis reveals a strong relationship between Diagnosis and Drugs.Rx. However, we wish to ascertain more specifically which diagnostic categories, or groupings of diagnostic categories, account for the relationship. We partition (decompose) the table in a statistically rigorous way to “describe similarities and differences among the diagnoses in terms of the relative frequencies of the prescribed drugs,” (Agresti, page 72). The decomposition involves the partitioning of the contingency table and its corresponding Likelihood Ratio Chi-Square statistic, LR χ2, into independent (orthogonal), additive components (Agresti, pp 50-54). The advantage to this is that independent inferences can be drawn for each component involved in the partitioning. “A [correct] partitioning may show that an association primarily reflects differences between certain categories or groupings of categories,” (Agresti, page 50). Rules for partitioning the table are provided in Agresti (page 53).

Search for sub-tables that might be homogeneous, which can be combined (collapsed). For example, identify two rows of this table (i.e., two psychiatric diagnostic groups) that appear to have comparable proportions (percentages) of cases classified as Yes (or alternatively as No). Prepare a table comprised of percentages that, for each diagnostic group, sum to 100% across the two categories of whether or not drugs were prescribed (Yes or No).

For Neurosis, 48.6% of patients were prescribed drugs (Yes) while 51.4% were not (No). For Personality.Disorder, 47.5% were prescribed drugs while 52.5% were not. The percentages of patients who were prescribed drugs for Neurosis (48.6%) and Personality.Disorder (47.5%) appear to be comparable. From the original 5 x 2 table of observed frequencies, extract the following 2 x 2 sub-table.

Test for independence between Diagnosis and Drugs.Rx for these two diagnostic classes alone.

Do not reject the null hypothesis for independence, LR X^2 = 0.015 on 1 df, p = 0.90.

Now continue the search for other homogeneous patterns in the original 5 x 2 table. Extract the 2 x 2 sub-table considering only those cases associated with Schizophrenia and Affective.Disorder and test for independence.

LR X^2 = 0.75 on 1 df, p = 0.39. Do not reject the null.

We have identified two 2 x 2 sub-tables from the original that are homogeneous, one for Neurosis and Personality.Disorder and one for Schizophrenia and Affective.Disorder. When this occurs the counts in the sub-table can be combined or ‘collapsed’, i.e., summed over its margins, without loss of information. The original 5 x 2 table can now be collapsed (combined) into a 3 x 2 table.

The observed counts for Schizophrenia and Affective.Disorder are combined into a single category now labeled Schiz.or.Aff.Dis. Similarly, the observed counts for Neurosis and Personality.Disorder are combined into a single category now labeled Neur.or.Pers.Dis. Since the counts associated with Special.Symptoms have not been used in a previous sub-table, they are retained in the table here.

Test for independence between Diagnosis and Drugs.Rx in the combined table. Actually, we are less concerned with independence here; we compute the LR X^2 statistic to complete the steps for the partitioning.

Reject the null hypothesis for independence, LR X^2 = 95.77 on 2 df, p << 0.0001.

Summarizing to this point: (1) with respect to the original table, Diagnosis and Drugs.Rx are not independent (not homogeneous); (2) patients diagnosed with either Schizophrenia or Affective Disorder are homogeneous; (3) patients diagnosed with Neurosis or Personality Disorder are homogeneous; (4) from the combined (collapsed) table, Diagnosis and Drugs.Rx are not homogeneous.

When the partitioning is performed correctly, the LR X^2 values of the sub-tables sum, exactly, to the LR X^2 value for the original table (Agresti, 1990, pp. 50-51). Similarly, the degrees of freedom associated with each test sum to the degrees of freedom associated with the test from the original table.

Add the three LR X^2 values associated with the three sub-tables . . .

... and compare to LR X^2 value for the original 5 x 2 table, on 4 degrees of freedom,

They are equal. Also, the degrees of freedom for each component are, respectively, 1, 1, and 2, which sum to 4 degrees of freedom associated with the original table.

Summary and Interpretation

Psychiatric patients were relatively more or less likely to be prescribed drugs depending on their respective diagnoses. Patients diagnosed with Schizophrenia or Affective Disorder were more likely to be prescribed drugs than not (92% vs. 8%). Patients diagnosed with Neurosis or Personality Disorder, were about equally likely to be prescribed drugs or not (48% vs. 52%). Patients with Special Symptoms were not likely to be prescribed drugs; in fact, no drugs were prescribed for these patients in this sample (0.0% vs. 100.0%).

References

Agresti A, Categorical Data Analysis, 2nd ed., Wiley, New York, 1990.

Core Team (2012). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org/

Venables WN & Ripley BD (2002) Modern Applied Statistics with S Fourth Edition. Springer, New York. ISBN 0-387-95457-0

Alvord WG, “Partition (Decompose) a 5 x 2 Contingency Table using R” provides a more thorough (and verbose) analysis of the problem considered here:
http://css.ncifcrf.gov/services/alvord/PartitionDecompose5x2ContingencyTableWithR.pdf

Alvord WG, “R Tutorial: Partition LR Chi-Square I x J (5 x 2) Contingency Table” provides a PDF document for this problem that is similar to this HTML document: http://css.ncifcrf.gov/services/alvord/R_Tutorial_Partition_LR_Chi-Square_IxJ_Contingency_Table.pdf