Skip to Navigation
University of Pittsburgh
Print This Page Print this pages

May 18, 2000

Census data on race is flawed, experts say

Although the 2000 U.S. census is expected to collect more information than any census in the past on the racial make-up of the U.S. population, that data is fraught with flaws in its accuracy and completeness, according to three experts.

For the first time, U.S. residents are able to identify themselves as members of more than one race on the census form. Racial category questions in one form or another have been included in every census since the first one in 1790.

One of two forms was sent to every U.S. household earlier this year: a short form, with six questions per person in the household (plus one question on property ownership per household), and a long form with 34 questions, which went to about 17 percent of the households nationwide. Both forms included two race-related questions per person: "Is this person Spanish/Hispanic/Latino?" and "What is this person's race?"

"First of all, race is not a scientific entity in its own right," said Gary M. Marsh, professor of biostatistics at Pitt's Graduate School of Public Health (GSPH). "It is a social and cultural construct with no biological basis. It acts as a surrogate for other things."

However, race nearly always is used in evaluating the etiology, risk factors, occurrence rates, diagnosis and prevention of diseases, Marsh said. "In combination with age and gender, race is always a factor in health studies. It is also used in health policy. Race information, despite its weakness as a concept, is used for federal programs, especially those targeted toward under-served minority populations."

Marsh and Stephen E. Fienberg, Maurice Falk University Professor of Statistics and Social Science and acting director of the Center for Automated Learning and Discovery at Carnegie Mellon University, were panelists at the second annual GSPH school-wide symposium titled "Census 2000 … Scientifically or Politically Correct."

Judith R. Lave, professor of health economics at GSPH, moderated the May 5 panel discussion, which also addressed the merits of population sampling as a tool to improve census data and to counterbalance under-reporting.

Fienberg said that dividing racial information into two questions on the 2000 census form is a political decision and not based in science. "Most scientists argued for a combined question," said Fienberg, who has served in a number of professional statistics organizations. "In essence, the choice of race questions has always been politically motivated. We've been counting by race, in effect, since 1790, when slaves were counted as three-fifths of a person."

In the 1990 census form, the question on race preceded the question on Hispanic heritage, while in 2000 those questions are reversed. "Why? Tests going back to 1980 showed that a large number of people checked 'other,' and then wrote in Hispanic, before going to the next question," Fienberg said. "And that if they could check Hispanic first, they were more willing to then check a category other than 'other' in the second question. It seems 'Hispanic' is not a race by the census standards, but it is for everybody else."

Marsh pointed out that no more than two consecutive census forms, which are distributed every 10 years, have had the same racial categories, meaning that comparing data from the past has built-in statistical limitations. "Japanese has been in some times, out in others. Korean is another example."

Marsh said that comparing census data on race with other known data also is imperfect. "Death rates, for example, are taken from administrative records and direct observations. A funeral director puts race down on a death certificate based on observation and maybe by consulting a relative. We've found that this leads to a bias toward larger group categories and a misclassification of minorities. This information versus the self-reporting information of the census mixes two sources of data."

Featured speaker Nancy M. Gordon, associate director for demographic programs at the U.S. Census Bureau, provided background for the questions on race and described how the data are used.

Given the option to specify two or more races, there are 63 possible categories derived from combinations of the six major categories (white; black or African American; American Indian and Alaskan native; Asian; native Hawaiian and other Pacific Islander; and other), Gordon said. Four main statistical tables are derived from the racial data: total population by race; non-Hispanic population by race; total population by race for those 18 years and older (voting-age eligible), and non-Hispanic population by race for those 18 and older.

Gordon said that, among other things, the data are used for civil rights monitoring issues. "If you check white plus one minority, you are counted as the minority for the purposes of civil rights issues only. If you check two minority races, and if civil rights complaints are in either one of these, you are counted in the complaint category," she said.

Gordon also acknowledged certain limitations in the data, but said the Census Bureau was confident the 2000 data will be more accurate than in the past. She said the bureau is taking a number of steps to increase the relevance of the race data as compared with past census statistics.

"We sent 10,000 of the 1990 forms out this year, to give us a kind of global-level comparison of how 1990 questions are answered in the 2000 context. Second, we will do a stratified random sample of a number of households reporting residents of more than one race, plus a single-race group for control. We'll go back to that sample and ask who filled out the form in the household and, regarding race, if you have to pick only one, which one would it be, to give us an analysis on how one-race reporting relates to two or more. Third, we'll take 2000 data and match it with other completed [non-census] survey data where respondents could not choose more than one race, to develop an algorithm that might be used as a bridge to past data."

According to Fienberg, there are two types of errors in recording census data, those of omission and those of commission. While the census bureau claimed that "actual enumeration" successfully counted 98.4 percent of the population in 1990, that figure only allowed for the estimated 1.6 million omissions. "It's probably true that 1 in 10 were improperly counted. Either they're counted multiple times, or counted from the wrong place or, according to the census, they don't exist at all," which is different from the contacted but non-responsive population. "These errors are not spread uniformly across the population. Blacks are always under-counted," Fienberg said. Selective Service statistics in the 1940s, for example, revealed a much higher count of blacks than census figures indicated, he said.

"The net [number of errors] has gone down, but the differential between whites and blacks has remained constant. Here's where sampling comes in. For those who say we have to try to count everybody, I say, 'We did, we do, and it's failed.'"

He said he was not aware of a single credible statistical organization that did not agree that population sampling yielded a more accurate count, despite an ongoing battle in Congress about its merits.

Marsh pointed out that the U.S. Supreme Court ruled in January 1999 that only actual numbers could be used to determine the number of House of Representatives seats apportioned to states. However, the court ruled that numbers adjusted by statistical sampling were permissible for all other uses of census data.

Gordon added that the Census Bureau is mandated by law to report all short-form data in both hard numbers and numbers adjusted by sampling figures, if, in the court's words, "it's feasible."

Gordon said the Census Bureau hopes to phase out the long forms entirely by 2010, if Congress will fund a new program called the American Community Survey, which would gather information year-round beginning in 2003. That's when long-form data tables on this year's census are expected to be released.

As of the end of April, Gordon said, the total response rate was 65 percent, with a higher than expected rate for the short forms and lower than expected for the long forms. She said the long forms, although pared to include only government- and court-ordered questions, had a much lower response rate than in 1990, putting a larger onus on the follow-up census workers, who contact non-respondents in person.

The census will generate some 1.5 billion pieces of paper, Gordon said. About $4.5 billion will be spent this year, another $400 million next year; the total cost likely will reach $70 billion. Nearly a million people will be hired by the bureau, mostly for following up with non-responding households. Census information will be posted on the Internet as it becomes available, she added.

Fienberg said, "I need to remind you, that no new follow-up methods have been developed and so there is no reason to expect that they will find the missing people — let alone the missing households. We could see an increase in overall erroneous data."

The GSPH census symposium was co-sponsored by Pitt's Departments of Biostatistics and Health Services Administration.

–Peter Hart


Leave a Reply