The World Mental Health Composite International Diagnostic Interview
The World Mental Health Survey Initiative
World Health Organization Health and Work Performance Questionaire
NCS-R: Answers to Frequently Asked Questions

Q: I am thinking of running some comparisons between the NCS and NCS-R and am wondering if I could have a copy of the severity rating index variable used in your group's NCS vs. NCS-R comparisons of service use (e.g., the NEJM report).

We did not have a variable for severity distribution in the NCS but rather we used a simulation model that generated multiply imputed predicted probabilities of each value on the severity distribution to the NCS data. We don't have user-friendly documentation of this simulation and we're not set up to provide consultation on the statistical methods required to implement this approach. As a result, we are not making the simulation programs available in our public data release. However, it is possible to develop the same logic based on the information provided in the NEJM paper. If you can't figure our how to implement the approach based on that information, you should not be trying to work with this method.

Q: Do you know the total US population on which the NCS-R is based?

In making our population projections, we have used a population estimate of 209,128,094 people. This is the number of people ages 18+ in the US in 2001 based on published Census data. The majority of all respondents were interviewed in 2001 and therefore this is the best population we can use for our projections. We did not pull out homeless or institutionalized people or people who don't speak English, all of whom were excluded from our sample. These people probably make up about 5% of the population. We don't want to reify our rough estimate of population size, though, so you should feel free to use another estimate. For example, you might want to average the Census population estimates over the years of the survey and/or adjust for the exclusion of the non-household population.

Q: What population was used to post-stratify the sample?

We used the best data available at the time of our weighting. Originally we used the year 2000 Census data at the level of both the Block Group (BG) and the Census Tract (CT). Prediction equations were estimated to compare the BG and CT information of survey respondents to that of non-respondents. Information about significant differences was used to develop a weighting equation that adjusted for differences between respondent and non-respondent characteristics. The Census information did not exclude people who do not speak English even though the sample excludes such people. In order to get the block group and census data we needed to use the 2000 census, however better data could be found for the postratification. A final post-stratification weight was created to adjust for the variation between the joint distribution of several socio-demographic variables in this weighted sample compared to the March 2002 Current Population Survey (CPS) data. For more details go to our website for an overview paper on the survey design:

Kessler RC, Berglund P, Chiu WT, Demler O, Heeringa S, Hiripi E, Jin R, Pennell BE, Walters EE, Zaslavsky A, Zheng H. The US National Comorbidity Survey Replication (NCS-R): design and field procedures. Int J Methods Psychiatr Res. 2004;13:69-92.    

Q: I am working with the NCS-R data and Obsessive Compulsive Disorder is not on the public access data. Is there anyway I can get these diagnoses?

There was a problem with the skip logic for OCD which caused the disorder to be under estimated in the CIDI. Therefore we are no longer using this disorder in our papers and did not release the data.

Q: I was wondering if you know approximately when the NCS-2 and the NCS-A datasets will be publicly released?

We put aside the NCS-2 and NCS-A data cleaning work in order to concentrate on the preparation of a public release version of the NCS-R dataset. Now that the latter has been prepared, we're turning our attention to NCS-2 and NCS-A. We have not yet advanced far enough in our analysis of data cleaning and coding to have even tentative release dates, but we will post such dates as soon as our work with these data has progressed sufficiently for us to feel that realistic estimates release dates can be provided.

Update (October 2011): SAMHDA is making the NCS-2 data publicly available and they have said that they can make restricted copies of the dataset available prior to public release.  Interested parties will need to apply for restricted release; for more information please click here.

Q: I am researching prevalence rates for depressive episodes over the past 12 months using the NCS-R data. I am using only employed individuals and cross-tabulating the data by age, region and sex (n=6314). I have weighted the variables and am coming up with a depression rate of 7.9%. This seems very high especially when compared to your paper on prevalence and severity of 12-month DSM-IV disorders from Arch Gen Psychiatry 2005;62:617-627. I have attached my cross tabulations and was wondering if you could give me some feedback on where I may be miscalculating the data.

We get many questions of this sort. We do not have the staff resources needed to reproduce all these analyses with our own dataset or to work our way through the code of other researchers to discover errors. It's consequently important for users to do as much detective work as they can before contacting us with such questions. Questions to ask yourself are: (i) Are any memos posted in the NCS web site about changes in diagnostic codes that might account for such discrepancies? We changed the codes for bipolar disorder, for example, shortly after publishing the paper cited in the question, and this could be involved in some cases. (ii) Are there indications in the published NCS-R reports about sub-sample differences inconsistent with the ones found? For example, if total population prevalence was less than 7.9%, was there a suggestion in an NCS-R report that prevalence would be lower rather than higher in the sub-sample under consideration? (iii) Can you reproduce the total-sample prevalence esitmate in the cited report? If not, then that's the problem to focus on before worrying about sub-sample results. (iv) If you can reproduce the total-sample prevalence estimate, then do other aspects of sub-sample results (e.g., among the retired, homemakers, etc.) look suspicious? (iv) Be thoughtful about the diagnoses you're studying. In the present case, are you referring to MDD or MDE? If the former, 12-month prevalence is likely to be about 1-1.5% higher.

Follow-up: The researcher got back to us in response to our answer to the original question and reported that the problem was resolved. We asked for an explanation and here it is: We were using the Major Depressive Episode (MDE) variable. When I reran the data using the Major Depressive Disorder (MDD) variable our findings were much more inline with your findings in the Arch Gen Psychiatry 2005;62:617-627 article.

Q: We are interested in pursuing some projects using the PEA variables in the NCS-R.  I'm writing to inquire about the origin of these variables (e.g., were they adapted from an existing, validated scale/measure?), and their intended correspondence to DSM-IV Personality Disorder criteria. 

The questions in the PEA section include nine items from the social desirability scale of the Zuckerman Personality Scales and a subset of the screening questions from the screening scale developed in conjunction with the International Personality Disorder Examination (IPDE). The Zuckerman items were included to facilitate the study of social desirability response bias in the survey. The IPDE screening questions were included as a screening scale for a small clinical reappraisal study of PD’s that was carried out in a probability sub-sample of NCS-R respondents. This reappraisal study administered the full IPDE. The method of multiple imputation (MI) was used to generate predicted probabilities of DSM-IV diagnoses of Clusters A,B, C, and any PDs (including NOS) as well as diagnoses of antisocial PD and borderline PD. The latter two were the only specific PDs included in the MI analysis due to the fact that we had a special interest in them and we included the full set of IPDE screening questions for those two but only a subset of screening questions for other PDs. A paper reporting the results of the analysis was written by Mark Lenzenweger et al. is scheduled for publication in Biological Psychiatry in late 2006. That paper describes the MI procedure and also reports results for the IPDE scales in the clinical reappraisal sub-sample. We’re currently preparing the MI recodes of the screening scale data for inclusion in the public use dataset and will release this data file along with a series of appendix tables at the same time the Lenzenweger et al. paper is published. You will need to read up on the use of MI methods to figure out how to work with this kind of data file. You can find out about MI by Googling the term “Multiple imputation.” References to the Zuckerman scales and to the IPDE are below.

Zuckerman Personality Scales: Zuckerman M, Psychology of Personality (Cambridge University Press: Cambridge, 1991); Zuckerman M, Behavioral expressions and biosocial bases of personality (Cambridge University Press: New York, 1994); Zuckerman M, Link K, Construct validity for the sensation-seeking scale, J Consult Clin Psychol (1968), 32:420-6; Zuckerman M, Bone RN, Mangelsdorff D, Brustman B, What is the sensation seeker? Personality trait and experience correlates of the sensation-seeking scales, J Consult Clin Psychol (1972), 39:308-21; Zuckerman M, Eysenck S, Eysenck HJ, Sensation seeking in England and America: Cross-cultural, age, and sex comparisons, J Consult Clin Psychol (1978), 46:139-49; Zuckerman M, Kuhlman DM, Personality and risk-taking: Common biosocial factors, J Pers (2000), 68:999-1029.

 IPDE: Loranger AW, Sartorious N, Andreoli A, Berger P, Buchheim P, Channabasavanna SM, Coid B, Dahl A, Diekstra RFW, Fergusin B, Jacobsberg LB, Mombour W, Pull C, Ono Y, Reiger D, The International Personality Disorder Examination (IPDE): The World Health Organization/Alcohol, Drug Abuse, and Mental Health Administration International Pilot Study of Personality Disorders. Arch Gen Psychiatry (1994) 51:215-24. Loranger AW, Sartorius N, Janca A, Assessment and Diagnosis of Personality Disorders: The International Personality Disorder Examination (IPDE) (Cambridge University Press: New York, 1996).

Q: Hello Dr. X (A COLLABORATOR WHO WAS THE FIRST AUTHOR ON AN UNDER-REVIEW NCS-R PAPER): I work with the (X) Research Group in (X). We've been corresponding a bit with Dr. Kessler's group about pursuing projects with the (X) variables in the NCS-R. I understand you have a related paper on this topic. I was wanting to get in touch with you to inquire about what you've already investigated with these variables, so as to avoid duplication of efforts. Do you by chance have a preprint of this publication available? Also, I believe multiple imputation methods are required in order for these variables to be interpreted as equivalent to DSM diagnoses. I've read a bit about multiple imputation, and we have strong statistical support here, but any details or information you can provide regarding this process would be greatly appreciated. And finally, I'm wondering which route your group went with respect to incorporating the (X) criterion. Thanks in advance for any info. I really look forward to hearing from you.

We cannot send out papers until they are posted on the NCS website in order to avoid being overwhelmed with the large number of people who write us asking if we have started to do work on various topics and, if so, could tell them what we are doing in order to avoid overlap. Although we are working on a number of areas, it would go well beyond our limited personal power to describe our line of thinking, send out preliminary copies of tables or papers, or etc. With regard to MI: We explain the MI procedures in each paper that uses this method and we are preparing to post all individual-level MI values for people to use in secondary analysis. However, we cannot give consultation on the process of doing this kind of analysis. Our general policy is not to give data analysis consultation. We can answer questions about problems with the data, but that's usually the limit of what we're able to do.

Q: What criteria were used for Bipolar 2 Subsyndromal Diagnosis?

Please see the "Diagnosis" link on the NCS-R website and proceed to the NCS-R diagnostic documents that describe the algorithms used for the diagnostic variables. The website link is: From there you will need to drill down to the correct NCS-R diagnostic zip file with diagnostic variable descriptions.

Q: I have been working with baseline NCS data file. My area of interest is psychosis. I noticed, however, that the psychosis diagnosis variable(s) is not in the NCS-R. Would it be possible to get access to the non-affective psychosis variable as described in Kessler et al (2005) to merge with the NCS-R data file? The full reference is:
The Prevalence and Correlates of Nonaffective Psychosis in the National Comorbidity Survey Replication ( NCS-R). Biological Psychiatry, Volume 58, Issue 8, Pages 668-676, R. Kessler, H. Birnbaum, O. Demler, I. Falloon, E. Gagnon, M. Guyer, M. Howes, K. Kendler, L. Shi, E. Walters

Our evaluation of the NCS-R NAP variable, as described in the paper you cited, shows that it's not sufficiently robust to be used in analysis. That's why we did not release it in the public use data file. We're unable to make it available to you for the same reason. We're sorry, but we're concerned that the variable could do more harm than good.

Q: I was at the recent NCS-R workshop in Ann Arbor. One of the analyses we suggested we would like to undertake was to look at the influence of neighborhood on disorder prevalence. This would entail geocoding each subject (ideally to a census tract level) and then exploring the influence of different neighborhood variables (eg SES) after accounting for individual level variables (such as income). I have recently been an author on a similar paper which found that, indeed, neighborhood SES did influence the incidence of depression, even after accounting for individual level factors. So we could do this analysis relatively simply. We would also be interested in rural/urban comparisons. To do so, however, we need additional NCS-R data to be released that will enable us to have a finer spatial identifier. I also wonder if there is some information recorded by the interviewer on housing conditions that may be useful. I remember speaking to someone at the workshop about getting these data, but it was suggested that I wait a month or two before progressing this request. Can you advise me of how I should now follow this up?

You will need to make this request through the SAMDHA Disclosure Committee which makes decisions regarding release of the restricted data through ICPSR. The best way to do this would be to email the help line at ICPSR and ask for information regarding making a proposal to the SAMDHA Disclosure Committee for the release of restricted data.

Q: I was hoping that you will be able to help me with a query regarding the NCS-R, in particular the PTSD section.

I have been trying to use a subset of cases, those with a diagnosis of PTSD (either 12-month or lifetime). I have been selecting those cases based on the diagnosis variables (dsm_pts or d_pts12). However, I have found that data are missing on all the symptom variables (pt68 to pt106) for a large number of cases who have a positive diagnosis. For example there are 604 cases with a diagnosis of lifetime PTSD, but there are only about 434 cases with full information on all the symptom variables (pt68 to pt106). Indeed the majority of cases with incomplete data have missing data on all the symptom variables. The NCS-R literature shows that the diagnosis is based largely on these responses. My query is how these cases generated the positive diagnosis. I'd be grateful for any help as I would like to treat the missing data in the most appropriate fashion.

Please go to this link: for a description of the criteria used for each diagnosis.  Also please check the PTSD interview schedule.  You will see that people with only one event reported skip to pt 118 and are asked about this one event (this is the Random Event section).  For people who have more than one event we ask what they consider to be their worst event and then we select a random event to ask about.  If the worst event and the random event are the same occurrence of the same event they are skipped to pt121a and asked about it as a random event. Therefore the only people asked pt68-pt106 are those that had multiple events or multiple occurrences of one event and the randomly selected event was different from the worst event.  We evaluate both events and if they meet full criteria for the worst event or they meet full criteria for the random event then they meet full criteria for PTSD.

Q: Included in the NCS-R Part II are questions relating to eating disorders, however, eating disorders diagnoses variables are not included. How should these Part II questions be grouped to have a measure of eating disorders? Do you have this coded somewhere?

We decided not to include our coded diagnostic variables for eating disorders because they are among the group of rare disorders and we could not include every diagnosis in our public release file. In order to correctly code these yourself you would need to study the DSM and ICD rules as detailed in the manuals and using variables from the Part II of the NCSR.

Q: I'm working on understanding the role of the bereavement exclusion in diagnoses of major depression in the DSM. I see from the NCS-R MDE algorithm documentation (Mjdepepi_ncsr.doc) that this was not operationalized in the NCS-R. Why was question d23 deleted from the questionnaire? Were there any other attempts made to operationalize the bereavement exclusion?

We inadvertently did not ask D23 in the NCSR. As a result we were unable to operationalize the bereavement criterion.

Q: I am writing with a question regarding the NCS-R data. I would like to know how respondents were selected for the "couples sample." I understand that this was based on the respondent's ID number. However, I do not know the specific way in which respondents were selected for this sample (e.g., random sample of married individuals, married individuals in the first/last __% of the sample, etc.). I would appreciate it if you could provide a description of who is included in the couples sample.

The "couples sample" refers to a second random respondent (often the spouse of the primary respondent) selected in some households to complete the NCS-R interview. Please see the following paper, available on our website, for complete details:

Kessler, R.C., Berglund, P., Chiu, W.T., Demler, O., Heeringa, S., Hiripi, E., Jin, R., Pennell, B-P., Walters, E.E., Zaslavsky, A., Zheng, H. (2004). The US National Comorbidity Survey Replication (NCS-R): Design and field procedures. The International Journal of Methods in Psychiatric Research, 13(2), 69-92.

Q: I was wondering if there was a way to identify those cases with marijuana abuse/dependence from the data set? We have seen substance abuse and dependence diagnoses in the data set, and we are trying to play with the data and see if we can figure out how to derive these marijuana diagnoses, but some expert advice would be much appreciated.

There is no direct way to determine this type of abuse/dependence diagnosis from the Substance section of the NCSR. You can, however, examine marijuana, cocaine, prescription, and other drug use (and associated ages) by using the SU41-SU48d series of questions. You could tease out the drug used in the case of only 1 drug used and a subsequent abuse/dep diagnosis but we don’t have the necessary questions to do abuse/dependence for each individual drug type in the case of multiple drugs used.

Q: We have been exploring your NCS Replication dataset, with a particular interest in the prevalance of anxiety and depression in new parents. As far as we can tell, from the codebook and the interview, this subset of participants (those with young biological children) would be identified in Part II: The Demographics section with variable/code DM22A, and The Children section with variable/code CN1A.

In an effort to be as clear and concise as possible, we have questions regarding each variable/code, that we would appreciate any help with:

1) CN1A vs. CN1AA

We have looked at CN1a and CN1aa and notice that both questions are phrased the same. We are assuming CN1A specifies the NUMBER of children under age 5, while CN1AA seems to offer YES or NO to the question of whether or not the participant has children under the age of 5. Unfortunately, the respective frequencies (in codebook pg 2685 in the pdf) do not correspond to the numbers in CN1a (same page).
(The Frequencies of YES to CN1AA do not add up to the sum frequencies of those with 1,2,3, and 4 children under 5, as detailed in CN1A)
Are we missing something? Also what does Suppressed Information entail?

The variable CN1AA is asked only of those who had only one child (CN1=1) and the question is, was this child under 5. The response options are yes/no. So if you look at the frequency of CN1=1 you should see that this number is equal to the number of responses to CN1AA. These people were not asked CN1a (in other words when someone says they only have 1 child you don’t ask them how many of these children are under the age of 5? So they are not asked that question and instead are asked “Is this child under the age of 5?” and the responses to this question is in CN1aa.

Suppressed information means that ICPSR has removed cases from these items where the data may be potentially identifying. You can go to the link that ICPSR has on their site for disclosure analysis and read more about this.

2) DM22a - Nearly all (98%) System Missing

It seems that this question was only asked for a very small percentage of participants (roughly 98% has system missing). Namely, only 237 participants with data to this question, of which only 45 people with children under age of 5, and 192 people without such children.

Could you please help clarify or confirm this?
Are there any other codes/variables of interest for our subset of interest?

The skip patterns for DM22 was changed in April 2001, after DM22 all respondents went to DM23. There is no data for DM22a for samples with a version date 4/20/2001 or later. You will notice that at DM22 there is a goto DM23 for all responses. DM22a is not asked.

Q: I am hoping that you may be able to help me resolve the inconsistency between the prevalence of Bipolar I or II presented in your 2005 Arch Gen Psychiatry article (Table 2, 3.9% of total population) and the prevalence estimate provided by the online public version of the NCS-R data (2.1% of total population). Any selection or coding issues that you may be able to clarify would be very helpful.

We modified our definition of bipolar based in order to get a better calibration with our clinical sample. You should go to our website and look at the revised algorithm (open the zip file with the algorithms and look for bipolar I, II and sub). Also, you might want to check out these 2 papers also listed on our website.

Merikangas, K.R., Akiskal, H.S., Angst, J., Greenberg, P.E., Hirschfeld, R.M., Petukhova, M., Kessler, R.C. (2007). Lifetime and 12-month prevalence of bipolar spectrum disorder in the National Comorbidity Survey Replication. Archives of General Psychiatry, 64(5), 543-552.

Kessler, R.C., Akiskal, H.S., Angst, J., Guyer, M., Hirschfeld, R.M., Merikangas, K.R., Stang, P.E. (2006). Validity of the assessment of bipolar spectrum disorders in the WHO CIDI 3.0. Journal of Affective Disorders, 96(3), 259-269.

Q: Was there a specific reason that thought disorder (included as nonaffective psychosis in the 1994 Arch Gen Psych article) was not included as a category of mental illness in your 2005 NEJM & Arch Gen Psych articles?

It took us more time to do the clinical reappraisal interviews and to develop imputation equations for NAP than to complete the other diagnoses. We knew that the number of people with NAP would be small enough and their comorbidity with other disorders high enough that we would not be missing anything in global terms by doing the analyses in the 2005 papers without NAP. So that's what we did. This is a case of a larger pattern in the data of not including disorders of secondary focus in our analyses until after we looked at these diagnoses more closely than at the beginning of our work with the data. Pathological gambling (PG), for example, is not included in the 2005 papers, but is now included in our more recent papers because we have subsequently studied PG.

Q: Are there ways I can retrieve the data with original code for job categories (29), which interviewers recorded instead of recoded EM15 from NCS-R?

We cannot release this type of detailed and potentially identifying data due to confidentiality issues. You could pursue applying for access to restricted data via the process we have in place with ICPSR. Please go to the ICPSR site for details on the process.

Q: Do you have permission to recontact respondents if my colleagues and I obtained funding for a repeat survey on (TOPIC)?

We plan to follow up the sample over the course of time. We would be happy to maintain on file a list of questions that other investigators would suggest that we include in future data collections. However, we cannot guarantee at this time that follow-up data collection will take place. Nor can we guarantee that we will agree to collaborate with outside investigators in collecting/analyzing the questions they propose. The sample is not available for use as a sampling frame by other investigators who want to carry out a repeat survey of either all or some respondents.

Q: I was just perusing the NCS intruments page and didn’t find the questionnaire for the NCS-R Part 2. Is Part 2 not part of the publicly available NCS-R data? Would it be possible to get a copy of the Part 2 questionnaire to determine if the data would be useful?

The interview schedule includes both Part I and Part II. It takes a bit of study of the skip logic to see exactly where it is that we skip people out of the remainder of the survey. If you want to go through the skip logic, refer to the series that begins with PH101.

Q: Thank you so much for your work on the survey. I have a question about the PTSD findings. In 1995 the original survey published not only the prevalence rates in PTSD, but also the percentage of individuals who qualified for "Criterion A" in the DSM: 60.7% of males and 51.2% of females qualified as having experienced a significant traumatic event, yet the lifetime prevalence of PTSD was only 10.4% in females and 5.0% in males...

I am giving a talk on PTSD and am making the point that while many people experience traumatic events as defined by the criteria, only a minority will go on to manifest the disorder. I have the prevalence data but can't seem to find the updated numbers on the number of individuals in the replication sample that qualified as having experienced a traumatic event. Can you help?

I recommend starting with checking the NCSR diagnostic algorithms via our Word documents on the NCS website. If you check the PTSD diagnostic Word document from the NCSR website you can see how the individual variables define the criteria as well as the full diagnosis. There will be an issue though, with many of the PTSD variables in the beginning of that section not being public release due to confidentiality concerns. Unfortunately you will not be able to use some of many of the variables that pertain to the various PTSD criteria or the full diagnosis. If you would like to pursue obtaining confidential data you could work through the CPES restricted data request process. You may be able to use one of the variables released publicly to approximate what you did previously with the NCS.

Q: I am writing about Dr. Kessler's article "Prevalence and effects of mood disorders on work performance in a nationally representative sample of US workers", which was published in the American Journal of Psychiatry. I list my questions as follows:
a) The definition of "employed or self-employed 20 hrs or more per week in the month before the interview". Would you please let me know which question you use to this definition in the article?
b) How to measure "work performance" by days? In the measure section of this article, you defined "absenteeism" and "presenteeism" to describe work performance. Would you please let me know which questions you use to these two definition? From NCS survey, I don't know which question you use to define this.
c) How to transform the measures of lost work performance from a time metric to a salary metric? Would you please explain more about how you transform? Also, I would like to know which question in the survey you used here.

We are not able to provide consultation in replication of results from papers published using the data we archive. The questions you ask are not specific. For instance you ask how we defined being employed or self-employed for at least 20 hours per week. You can see from the instrument that EM7.1 asks if the respondent is employed or self-employed. Those who are currently employed or self-employed are asked at EM17 how many hours they work in the average week. Are you not able to find EM7.1 in the public release data set? If so that is a question for the ICPSR help line.

However, if you find that when you look at the number of people either currently employed or self-employed and working more than 20 hours a week in the public release dataset you get a different n than is published in Dr. Kessler’s paper. This is a specific question for Dr. Kessler.

It is appropriate to write the author of the paper with your detailed questions, but first you need to do the leg work yourself to find the variables and frequencies distributions in the public dataset before contacting him/her.

Once you have done the legwork, and you have the detailed questions for Dr. Kessler, please send your questions to him through our NCSR website contact email:


Q: Since we are not familiar with CPES and also NSC-R survey, we are confused which variable you use in NCS-R and what's the corresponding variable name in CPES. That's why we come to you about the variable issue in the paper.

We can't help you for two reasons. First, we have nothing to do with the CPES dataset. That dataset is put together by the University of Michigan, not by us. We know nothing about the numbering system they used. You'll have to sit down and do the comparisons of question wording across our interview schedule and their codebook to figure out the cross-walk. Call the Michigan people if you have problems.

Second, your request for us to provide information about the exact questions we used in our analyses go well beyond the kind of support we can give to public users. As we mentioned earlier it would be a different matter if you had tried to replicate our results using your best estimates of the way we specified our models and found dramatically different results. Were this to occur, we would have a scientific responsibility to check our work for errors and to examine your code and results carefully, which we would do.

Short of that, though, we think the best way to proceed would be for you to attempt to replicate our results based on a plausible reading of our paper and to see if your results are close to those we obtained. The issue here is whether or not two independent groups developing roughly comparable specifications can reach the same basic results. If so, then that's good and you can use your specification for your further analyses. If you find a major discrepancy with our published results, though, we will be happy to review your documented code and to check our analyses for errors.

Q: How was poverty ratio calculated? Was poverty threshold chosen based on poverty guideline published in US Dept. Health and Human SOURCE: Federal Register, Vol. 66, No. 33, February 16, 2001, pp. 10695-10697?

Please check the online documentation for the NCSR at the ICPSR/CPES site and see the supplemental variable section The poverty index is the ratio of household income/Census 2001 poverty thresholds. It was calculated from the federal poverty threshold for 2001 and incorporates family size and household income.

Q: What were the criteria for asking SC_8 question (Physical health rating) in the NSC-R? There are 7,565 systems missing; does this mean the question was not asked to respondents? How did interviewers determine who to ask SC_8 question and who not to ask the question? I looked at all docu in NCS-R but failed to locate the info I am searching for.

Please see the word document for the screening section of the NCSR instrument at and look carefully at the skip logic. 100% of respondents skip from SC7 to SC9c. Early in the field we made this change in order to shorten the instrument length.

Q: Can you tell me if questions about parental divorce and interparental conflict were included in the NCS-A?

There are general questions of this type in the survey and they are located within the "Childhood" Section of the instrument. Once the data is publicly available you will be able to analyze these variables.

Q: Does the NCS-A have measures of parental psychological disorders, parental personality characteristics, adolescent coping, adolescent temperament, and adolescent emotion regulation? Also, did you use the C-DISC to assess adolescent psychological disorders?

I’m sorry but we cannot provide this level of detail about a future public-release file. We simply do not have the resources for this type of support, unfortunately. When the data is public-release the full instrument and data will be posted and you can then delve into the actual sections and questions.

Q: We were wondering, are there any publications out of the NCS-R of 30-day prevalence of PTSD and depression? We have seen plenty on lifetime and 12-month, but no 30-day.

Also, we had to use the SDA to calculate 30-day prevalence in these conditions by gender and had to pull the gender variable from the “created variable” list. Is there any way of knowing if we used an accurate gender variable? Is there no gender variable in the NCS-R dataset?

You can search for publication abstracts on our NCS website ( At this point, we have not focused on 30-day PTSD as a major topic. The diagnostic variables you list above are available from the CPES site for download, however, and you can develop your research using them if you chose.

The gender variable is from the NCS-R data set and is one of the cleaned and “created” variables included in the file, but the data for it does come directly from our survey. For some key variables we cleaned and imputed missing values and released these to the public. They are based on actual questions from the survey.

Q: It seems that the variables corresponding with survey item DA36_2 (was R ever imprisoned since age 18) and CD39 (imprisonment before age 18) are not included in the public release file, perhaps to protect confidentiality. Is this correct? If so, is there a way of formally requesting these variables?

You can formally request confidential variables through ICPSR and the best way to handle this is to email the help link on their site to request information. Your question will then be directed to the director of disclosure analysis and related issues.

Q: I find a large proportion of data coded as system missing reported in the codebook. I wonder if that incorporated the previous category of NSC1. For part II of the demographics and tobacco, it seems that only half of the subsample (5692) observations are available. Is there a reason for that or any particular procedure I need to take care of?

The NCS I and NCS-R are separate datasets and there is no cross collection of data between the two files. Part II of the NCS-R is explained in the documentation preceding the variable frequencies. The n of 5692 is correct since this is the number of people who went on to Part II and answered questions from the second part of the survey. Please see the documentation and use of correct weights for Part I and Part II.

Q: I am interested in using the NCS-R to examine bipolar individuals in a manner similar to the Kessler book chapter, “Comorbidity of Unipolar and Bipolar Depression with other Psychiatric Disorders in a General Population Survey.” I am curious if there are any articles currently being worked on that may already be focusing on this population using the newer data.

We regret that we cannot send papers until they are posted on the NCS website. This is to prevent being overwhelmed with requests from the large number of people who write us asking if we have started to do work on various topics and, if so, if we could tell them what we are doing in order to avoid overlap.

Although we are working on a number of areas, it would go well beyond our limited person power to describe our line of thinking, or send out preliminary copies of tables or papers.

Q: In the NCS-R was the participant’s worst event or random event used in the PTSD diagnosis using the DSM definition?

Actually both the worst and the random event were used in the DSM definition of PTSD. Please see the PTSD.doc verbal explanation of the disorder coding under and then the link "diagnosis" on the left side of the webpage.

The worst and random events were used in the formulation of the disorder variable dsm_pts by using "or" type of logic. Then, following the verbal description on our website you can see how the parts fit together using either the random event or the worst event.

Q: I recently downloaded the NCS-R data set from the ICPSR website. I believe the download was successful but I've encountered one problem that I'm hoping you could solve. When I ran the SPSS syntax file to create an SPSS data file, the diagnostic information was not there. All the variable labels and variable names were present for all variables (including the diagnostic information) but there was no actual diagnostic data present. However, all the data from parts I and II were present. Is there something else that needs to be done to create a file containing the diagnostic data?

Questions of this type (i.e. technical support issues, downloads, etc.) are handled by ICPSR and we suggest contacting them via their website for the NCS-R. It appears that you have a problem with both obtaining the diagnostic data as well as the raw data. I am sure that the ICPSR help group can assist in solving this matter.

Q: I have a question about the age variables in the NCS-R. There are two age variables in the dataset, SC1 and AGE. The variables are different. For example, the oldest age coded in SC1 is 95, but is 99 in AGE. Please advise as to which AGE variable should be used and why.

In general, use the AGE variable as it has been cleaned by our staff

Q: I notice that SC1 has 3 missing values but AGE does not, but does have a value 98 and 99 which together equals 3. There are no labels on the AGE variable to indicate missing, but since there are three addition cases in the AGE variable relative to SC1 is it safe to say that 98 and 99 are not the age to the person but missing values? Other than the 98 and 99 (which are not in the SC1 variable) all other n's for each age are the same in SC1 and AGE. I don't want to call these 3 people 98 and 99 years old if in fact they are missing on age.

The people with values of 98 and 99 had birthdates of 1901 and1902, so their ages of 98 and 99 were assumed to be real.

Q: Is MDD in the NCS-R dataset equal to 2 or more episodes of MDE? Does this also mean that MDD in Kessler's 2003 paper titled, "The Epidemiology of Major Depressive Disorder" refers to 2 or more major depressive episodes?

Please refer to the diagnostic algorithms section of the NCS-R website and examine the documents describing the coding for MDE and MDDH for a good comparison of the two variables. You can link to the diagnostic section by following the links on the website.

Q: I am working with a sample of women with the permutation of fragile X syndrome. We are comparing their DSM-IV disorder rate to the NCS-R public use dataset and have a question. Can you differentiate within the NCS-R dataset who has MDD, single episode versus those with MDD, recurrent?

Please look at our website at the diagnostic algorithms for major depressive episode and major depressive disorder. You will need to cross the variable MDD by number of episodes to see who has multiple episodes. Number of episodes can be found in the raw variables D38a1 (number of episodes in the last 12 months) and D52 (number of episodes in your life).


Q: Thank you for your earlier response, in which you highlighted D52 as the variable we use to answer whether a participant had MDD, single episode or MDD, recurrent. Our current question is regarding the nature of the question for D52 titled "# Life episodes/othr prob. > 2wks". We have looked at the data and see a wide range of numbers for this variable. Can one just answer the question "I have had 16 episodes" and acquire a 16? Or, are they questioned about each major depressive episode via your diagnostic algorithm in order to "meet" for an episode?

Please carefully examine the diagnostic algorithms for MDE and MDDH to understand the relationship between the coding for the DSM and ICD algorithms and the instrument sections/questions. The coding used to operationalize the DSM and ICD diagnoses is included on our website under the "Diagnosis" section. The documents describe how we coded the diagnoses and should give you a complete picture of how the instrument and the coding work together.

Q: I have a question regarding the SC10.4a-g items. These items are "system-missing" for a subset of individuals; however, we cannot identify a skip pattern that skips over these items. Can we assume that "system missing" equates to "no", i.e., the person does not have the condition in question? If not, can you provide any information about why these items might be system missing for some people?

If you refer back to sc10.1g you will see that if a person had 1+ in the sc10.1a-1f series they are directed to sc10.4a-g. Have you already checked this set of questions and still cannot understand the logic and number of respondents? In most cases you should not assume that missing means no and definitely not for this group of questions.


Q: We have followed up on this system-missing issue. It appears that a small number of participants answered “No” to all of the SC10.1 questions, and were then skipped to SC19. Most participants who answered “No” to the SC10.1 questions still received the SC10.4 questions, however. So it appears as if this subset of participants simply was not asked the questions from SC10.2 until SC19. Perhaps the checkpoint at SC10.1i was changed at some point, with some early participants being routed directly to SC19?

This appears to be the case. There was a change in the instrument at some point to correct the skip to SC19. Therefore the data on SC10.4 is missing for those people who skipped to SC19 before the change.

Q: I’ve read several articles that use a severity rating for the individuals with psychiatric disorders; they all reference Kessler et al., 2005, Arch Gen Psychiatry. 2005;62:617-627. Participants are categorized into: no disorder, mild, moderate, or severe disorder.

a) From reading this article and several other articles that replicate this severity rating (e.g., Uebelacker et al., 2006, Wang et al., 2006), we are not able to identify all the variables necessary to create this severity rating. Moreover, the description of the variable is slightly different in different articles. For example, Kessler et al. describe one component as “work disability or substantial limitation due to a mental or substance disorder;” and I am unsure exactly from which variable(s) this is derived. Wang et al., 2006 describes one component of the rating as “multivariate functional impairment score equivalent to a Global Assessment Scale score (26) of less than 55.” Again, it is unclear to which variable this refers, as GAF scores are not available.

We have used more than one definition of severity. An older definition includes many of the items in the screener section of the NCS-R measuring impairment in functioning due to mental disorders and another, more recent definition does not include these variables but instead uses a predicted GAF based on our clinical sample. In addition, we have both of these definitions including substance disorders and excluding substance disorders. The Wang et al., 2006 paper uses the definition of severity that uses the predicted GAF. We have decided to use this version of severity in our future papers as opposed to the former definition which was used in the Kessler, et al. paper (2005). However, you will not be able to replicate this because in order to do so you would need access to our clinical data which is not available in our public release. The predicted probabilities were created by running a prediction equation in our clinical sample and then applying this equation to the entire part II population.

b) Should we use the description in Kessler et al., 2005, rather than other articles?

No, this is an older definition of severity that uses the self-rated GAF in the screening section among those who report a mental disorder. The Wang, 2006 paper is a better description of our latest severity variable which uses a predicted GAF score based on the Global Assessment of Functioning done by clinicians within our clinical validity sample.

c) Would it be possible to gain access to this severity rating or a more detailed description of it that included specific variables involved in its computation?

We are not releasing the clinical data and therefore you would not be able to re-create the predicted GAF score for the total part II population. However, using the Wang paper as your guide you should be able to recreate the remaining aspects of the definition.

Q: I am interested in using the NCS-R and NCS data in a proposal that I will be submitting shortly to NIH on Feb 5. It looks very useful but we need state of residence. Can you tell me if it is available in the data? I couldn't find it in the codebook, either by going through it or by searching 'state' or 'residence.' I guess it is a question of whether it is publically available. I would of course sign a confidentiality agreement such as those used by the Census. Also, I wondered if parents' education, as opposed to occupation/industry, is in the data?

Variables such as state are not public release for either the NCS or the NCS-R. In order to obtain these variable/s you would need to follow the process set up by ICPSR and fill out confidentiality forms and formally request the data. Please refer to the ICPSR website for how to do this.

As far as parent's education please refer to the instrument and check in the childhood section of the NCS-R. You would also need to check the NCS instrument for similar questions about number of years of education of the parents.

Q: I am looking for the size of the adult bipolar population in San Diego. Is this a statistic(s) that could be obtained from the National Comorbidity Survey Replication (NCS-R)? I'm just looking for that one factoid. How might I go about getting it without downloading the public data set, etc.?

This type of statistic is not available from the currently published work using the NCS-R data and could not be obtained from the public release dataset as is because we don’t release confidential information such as city/state without a formal request to ICPSR for confidential data. Even if you did download the public release file you would be able to produce a factoid like this without first requesting the confidential data and then developing the estimate yourself.

a) The NCS-R and NCS-A have different "modules" for different conditions, like depression, anxiety, substance abuse, etc. Is it possible to administer subsets of these modules or do these modules have to be administered in their entirety? In other words, if we did not have enough time to administer all the modules fully, can we consider administering a subset of questions from each module of interest, or would we have to administer *all* the questions in each module, so that our only option would be to incorporate fewer modules into our research?

The short answer is yes. However, one must consider hierarchy when doing this. In other words, to assess major depressive disorder you must also assess mania. For a complete answer to this question and for other details about using the CIDI in your own research please see the following website:

b) On a related note, how many minutes, on average, do each of the modules in the NCS-R and the NCS-A take to administer?

See the following publication for data on the NCS-R. This information has not been published as of yet for the NCS-A:

Kessler, R.C., Berglund, P., Chiu, W.T., Demler, O., Heeringa, S., Hiripi, E., Jin, R., Pennell, B-E., Walters, E.E., Zaslavsky, A., Zheng, H. (2004). The US National Comorbidity Survey Replication (NCS-R): design and field procedures. The International Journal of Methods in Psychiatric Research. 13(2):69-92.

c) Last, do you know if the NCS-A has been tried in other studies with youth between the ages of 10-14? Or, from your experience, do you know how reliable answers to mental health modules like the ones used in the NCS-A are among youth this age, as compared to older youth ages 15-20? We noticed that for the adolescent supplement of the National Comorbidity Survey, in addition to asking youth directly about their mental health, you also collect parental reports.

The NCS-A instrument was used in Mexico and in Colombia. Both these countries used the same age range as we used in the US (13-17). We are currently in the process of writing the validity paper for the US paper which will validate the instrument for youth in this age. Yes, we collect information for both the child and the parent.

Q: I searched the website to find measures of personality used in data collection, and I was unable to find a list of personality measures/scales used. If someone could please provide me with some information about how personality variables were assessed for the NCS-R, I would greatly appreciate it.

Please refer to page 64 in the article by Kessler and Merikangas (International Journal of Methods in Psychiatric Research) for a discussion of the IPDE scale used in the NCS-R. This paper is available via our website and is under the year 2004. You can also see the raw questions asked in the NCS-R by examining the personality section of the instrument.

Q: I have a question about the Gambling module in the NSC-R. I was wondering if an overall pathological gambling variable was computed in the data (if so, do you know the variable name)? If a compute is not available are you aware if the gambling questions in the NSC-R are from the SOGS questionnaire or DSM-IV criteria or perhaps both? Knowing the origns of the questions will help me work with the data.

The constructed gambling diagnostic variables for DSM/ICD were not included in the public release datasets during the first round of public release files. We are considering including gambling constructed variables in the future but have made no firm decisions at this point. To answer the second part of your question, there were no official scales used in the construction of the gambling section and the best way to understand how the questions were written/selected would be to study the DSM-IV and ICD disorder criteria and other related gambling scales.

Q: We are looking at the difference of tobacco use between those with lifetime mental disorders and disorder-free population by using the NCS-R 2003 data. When looking at the questionnaire and data, I have a question regarding the skip pattern. According to the questionnaire (the Tobacco Section (TB), I downloaded both the questionnaire and data set from ICPSR), if SC7=1 then TB5INTR1 should be answered, and followed by TB6 -> TB6A -> TB9.... In the data, however, I found that only part of the related respondents gave answers to questions TB5INTR1, TB6, TB6A; most of the "current smokers" (SC7=1) skipped the three questions and jumped to TB9 (number of days smoked in the past 12m). The same situation happened to those "former smokers" (SC7=2). Could you give me some information on how to identify those who skipped the three questions from those who did not, and how different these two groups of respondents are? I really appreciate your time and help.

There is a mistake in the public release variables that will be corrected shortly by ICPSR. The tobacco section was revised very early on in data collection and the age variables were updated to prompt the users who said "don't know" with the prompts "Was it before your 20s?" and "Was it before your teens?" When we made this change in the instrument, we changed the variable names from, for instance, TB5 to ATB5 in the dataset. The variable that should have been made public was ATB5 (the updated variable) and instead the old variable which was only used very early on in data collection and therefore has mostly missing values was released. ICPSR has recently been made aware of this situation and we have sent them the updated data. You should write the help line for ICPSR and ask when that data will be made available.

Q: There have been several papers published using the NCS and NCS-R that have used latent class analysis. Unfortunately, these papers do not indicate what software was used to conduct the latent class analyses. I have also had little success in obtaining replies from some of the authors of these papers. Is there a recommended statistical package for this type of analysis? Any information you can provide would be appreciated.

The LCA analyses done in most NCS-R and NCS papers used a non-commercial software that is not available in the public realm. It is based on a Fortran routine that was shared with our data analysis staff privately. However, there are a number of software packages commercially available such as Mplus, Lisrel/HLM, and other lesser known packages that do LCA. I would recommend Mplus for this type of work but a quick review of software should provide a number of good options.

Q: I am currently using NCS-R data to do some analysis with datasets, and I have attended the CIDI and NCS-R analyst training. My question is: in the dataset I was given, there are diagnostic variables for both Borderline Personality disorder (DSM_BOR) and Antisocial Personality disorder (DSM_APD). They are both coded as ‘0’, ‘1’ or ‘2’. Unfortunately, I cannot find what these codes represent in the public access codebook. Could you please let me know what these codes represent, and why I cannot locate them in the codebook?

These two variables were not included in the public release file as they are both part of either “rare” disorders or ones that Dr. Kessler decided not to include for public release. Because of this, the codebook and associated Word documents describing the coding are not available from the ICPSR site. I have examined the coding that created these variables and here is the meaning of the values:
0=non case
1=possible case
2=probable case

You may want to check the DSM and ICD manuals for a full explanation of the criteria and rules for these diagnoses.

Q: Did this survey capture smoking as a variable and can this data be used to assess what proportion of those surveyed that meet the diagnosis for schizophrenia are smokers?

Please see the Screener section (Question SC7) as well as the tobacco section for questions of this type. Once you have created your smoking variable, you can then use it along with your created schizophrenia variables to examine the proportions needed for your research.


Q: Did the survey capture trends in smoking in this population over time?

Please read through the tobacco section for the details on the questionnaire and content of this type. By the way, you may want to consider attending a CIDI training course to fully understand the instrument and design.

Q: I'm interested in the reports of the first episode of depression, but have found some problems in the dataset.

a) Question D37b.1 (Was that episode brought on by some stressful event?) is not in the dataset.

I am currently looking at d37b.1 in the ICPSR /CPES (NCS-R part) of the public release file and see data and this variable in the version of the public release file. One thing to keep in mind is that our public release dataset was modified during June and July 2007 and we now have a linked site which is part of the CPES. See for the most recent version of the data and online documentation.

b) There is a large jump in the question numbering in the depression section. There's D72 that asks about ever seeing a doctor for depression and then there's D86 that asks about professional treatment for depression in the past year. Why the jump in question numbers?

This was just a convention of the original programming for data collection. It doesn’t really mean anything.

c) Is there a question regarding medication use for depression? This was asked in the NCS, but is missing in the NCS-R public use dataset. Why?

I do not know all of the details of the instrument changes but you can find details about medication use along with type of drug in the pharmacoepi section. You could put together variables from these sections along with the diagnostic variables to piece together medication use, drug type, and diagnosis of MDE or MDDH.

Q. I am currently using NCS-R data for analyses focused on individuals with pain conditions, and had a question regarding the scoring/interpretation of the PEA variables. It appears from my reading of information provided on your website, and Dr. Lenzenweger's recent article in Biological Psychiatry, that the PEA variables contain a portion of the screener for the IPDE. How was this scored, and what was considered a positive screen for a personality disorder? Thank you in advance for your time in answering this inquiry.

Here are the criteria used for the screen.

* Cluster A disorder screen: SUM PEA Items (79,80,81,82,83) *
* Scoring: 4 or 5 = probable case, 1-3 = possible case, 0 = non-case *

* Cluster C disorder screen: SUM PEA Items (73,74,75,76,77,78) *
* Scoring: 5 or 6 = probable case, 1-4 = possible case, 0 = non-case *
* *
* Note: Changed code 8/8/01 from >= 4 criteria to >4 criteria *
*Anti-social Personality Disorder Screen

Criteria A: There is a pervasive pattern of disregard for and violation of the rights of others *
* occurring since age 15 years, as indicated by three(or more) of the following: *
* *
* 1. failure to conform to social norms with respect to lawful behaviors as indicated by *
* repeatedly performing acts that are grounds for arrest *
* *
* PEA59 is False(5) OR PEA60 is True(1) *
* *
* 2. deceitfulness, as indicated by repeated lying, use of aliases, or conning others for *
* personal profit or pleasure
* PEA63 is True(1) OR PEA69 is True(1)
* 3. impulsivity or failure to plan ahead
* PEA65 is True(1)
* 4. irritability and aggresiveness, as indicated by repeated physical fights or assaults
* PEA64 is True(1)
* 5. reckless disregard for safety of self or others
* PEA66 is True(1)
* 6. consistent irresponsibility, as indicated by repeated failure to sustain consistent *
* work behavior or honor financial obligations *
* *
* PEA62 is True(1) OR PEA67 is True(1) *
* *
* 7. lack of remorse, as indicated by being indifferent to or rationalizing having hurt, *
* mistreated, or stolen from another *
* *
* PEA61 is False(1) *
* *
* Scoring: 3 or more = probable case, 1-2 = possible cases, 0 = non-case *

Borderline Personality Disorder Screen:
*Criteria A: A pervasive pattern of instability of interpersonal relationships, self-image, and *
* affects, and marked impulsivity beginning by early adulthood and present in a variety of *
* contexts as indicated by five(or more) of the following: *
* *
* 1. frantics efforts to avoid real or imagined abandonment. Note: Do not include suicidal *
* or self-mutilating behavior covered in criterion 5 *
* *
* PEA57 is True(1) *
* *
* 2. a pattern of unstable and intense interpersonal relationships characterized by *
* alternating between extremes of idealization and devaluation *
* *
* PEA51 is True(1) *
* *
* 3. identity disturbance: markedly and persistently unstable self-image or sense of self *
* *
* PEA58 is True(1) *
* *
* 4. impulsivity in at least two areas that are potentially self-damaging(e.g., spending, *
* sex, substance abuse, reckless driving, binge eating). Note: Do not include suicidal *
* or self-mutilating behavior covered in Criterion 5. *
* *
* PEA54 is True(1) *
* *
* 5. recurrent suicidal behavior, gestures, or threats, or self-mutilating behavior *
* *
* SD4 is Yes(1) OR SD6 is Yes(1) *
* *
* 6. affective instability due to a marked reactivity of mood(e.g., intense episofic *
* dysphoria, irritability, or anxiety usually lasting a few hours and only rarely more *
* than a few days) *
* *
* PEA53 is True(1) *
* *
* 7. chronic feelings of emptiness *
* *
* PEA52 is True(1) *
* *
* 8. inappropriate, intense anger or difficulty controlling (e.g., frequent displays of *
* temper, constant anger, recurrent physical fights *
* *
* PEA55 is True(1) OR PEA72 is True(1) *
* *
* 9. transient, stress-related paranoid ideation or severe dissociative symptoms *
* *
* PEA56 is True(1) *
* *
* Scoring: 5 or more = probable case, 1-4 = possible cases, 0 = non-case *

Q: I am trying to determine the diagnostic criteria used for Major Depression and am having difficulties understanding the exclusion criteria used for 'the death of a loved one.’

The diagnostic Algorithms for the NCS guide states on page 6 that "...define it as depression if the reactions go beyond the normal bounds of uncomplicated bereavement. The exceeding of these bounds are operationalised as having two or more of the following:

Part 1

Part 2
D63 (84<D63a<98 and D63b=1) or for single episode
D63 (12<D63a<98 and D63b=2) or
D63 (3<D63a<98 and D63b=3) or
D63 (1<D63a<98 and D63b=4) or

Can you please clarify what Part 2 above relates to as I can not find a D63a and D63b question in the questionnaire (only D63), and also, what does the 84<,12<,3< and 1< and the <98 represent?

As posted on the NCS-R website in the diagnostic algorithm section, the bereavement item was mistakenly left out of the questionnaire for the NCS-R and therefore the bereavement criteria is not operationalized in the NCS-R. Please see the algorithm description for the NCS-R posted at

Q: I'm interested in using the NCS-A data to examine the prevalence of psychological disorders in adolescents with divorced parents. Can you estimate the public release date of the NCS-A data and codebooks? If the codebooks are not currently available, can you tell me if questions about parental divorce and interparental conflict were included?

Here is our statement regarding the release of the NCS-A to the public:
We put aside the NCS-2 and NCS-A data cleaning work in order to concentrate on the preparation of a public release version of the NCS-R dataset. Now that the latter has been prepared, we're turning our attention to NCS-2 and NCS-A. We have not yet advanced far enough in our analysis of data cleaning and coding to have even tentative release dates, but we will post such dates as soon as our work with these data has progressed sufficiently for us to feel that realistic estimates release dates can be provided.

As far as questions about parent divorce and interpersonal conflict, there are general questions of this type in the survey and they are located within the "Childhood" Section of the instrument. Once the data is publicly available you will be able to analyze these variables.

Q: I'm working through the GAD criteria as listed in the diagnostic algorithm that is currently posted. I'm not understanding how Criterion D is operationalized. The algorithm says: "D. The focus of the anxiety and worry is not confined to features of an Axis I disorder. At least 1 value of 1-10, 13, 20-32 in G1." I'm confused about how this fits with the DSM-IV Criterion D. Thanks for your help.

Look at the items in G1 that are not included in this definition. They include your mental health, your substance use, your social phobia, agoraphobia, sp, OCD and separation anxiety (in other words features related to Axis 1 disorders). If you only lists these things as the things you are worried about and do not list anything else, the focus of your anxiety and worry is confined to features of an Axis 1 disorder.

Q: I want to use the public version of the NCS-R data. I may have overlooked your website but I can't seem to find anywhere which variables in the dataset need to be reversed in terms of scaling (e.g., "Having a lot more energy than usual" in the Depression section should be reversed so that a score "1" points to "no depression"). I have a copy of the DSM-IV but for some questions, I find it rather difficult to decide whether it's scoring should be reversed or not. In sum, I would like to ask you whether there exists a document that indicates which items should be reversed, and if so, whether I could obtain a copy. Thank you so much in advance for your help.

For each disorder there is a detailed description of the diagnostic algorithm coding we use for determining if the criteria are met for the disorder. These can be found at the following url:

a) Does the NCS-R currently use debriefing questions (field interviewer and/or respondent) within the survey as data quality indicators? If no, have they used them in the past?

b) IF YES (to current or past use): What are the debriefing questions (can send or if on website can provide the link)? Are they respondent or field interviewer debriefing questions (or both)? Why were these particular questions selected? How are they being used? How useful are they in predicting sources of measurement error within the survey? (if past use only) – Why stop using debriefing questions?

c) IF NO: Why have such questions never been used?

To clarify what I meant by debriefing questions, I included some exemplary questions.
Filed interviewer debriefing questions:

FIDBF01 Did you conduct this interview at the respondent's home — either inside or outside?

2 NO

FIDBF02 [IF FIDBF01 = 2] Where did you conduct this interview?


FIDBF06 How cooperative has the respondent been?


FIDBF07 Indicate on this scale of 1 through 5 how private the interview was. Please do not count yourself as another person in the room.


Respondent debriefing questions:
“How truthful were you when you answered the illicit drug use?”
“Did you do your best to answer the survey questions?”

Both the NCS and the NCS-R have one section for interviewer debriefing – the interviewer observation section. For a full list of these items, the NCS and NCS-R instruments can be found at To see which of these variables are available through the public release you need to go to the either the NCS public release: or the NCS-R public release at The only item asked of the respondent was a single item A54 in the NCS section A on daily activities and SC19 in the NCS-R screening section. These were not debriefing items but actually were asked of the respondent before continuing the interview. These items required an affirmative response in order to continue with the interview. There was also a dementia section which was given to those who the interviewer felt were not able to understand the questions in the interview. Anyone asked the Dementia questions were also not included in the final sample. As to your other questions I believe that the questions on the home environment may come from the Home Observation for Measurement of the Environment.


Q. Thank you for your answers about NSC/NSC-R debriefing question. Still, several questions as below are missing. Do you have answers for these questions?

a) Why were these particular questions selected?
b) How are they being used?
c) How useful are they in predicting sources of measurement error within the survey?

SAMHSA is asking us to turn around the results of our investigation in a short time period, so it would be great if you could provide us your feedback by COB of tomorrow (July 25).

The interviewer observation questions we use are all ones developed by Bob Groves and Mick Couper at ISR. The HU observation Qs were significant predictors of nonresponse and were used in our post-stratification analysis along with BG-level gopcode data. The observations about respondents, in comparison, have not been used very much because we have very little item-missing data and no other real uses for them. We did use them in an analysis of psychosis to see if the interviewers detected the people who are psychotic as being odd in any way, but this was not very useful. We also have some data showing that responses to Qs about things like drug use are lower when other people were in the room. I'd think this kind of info could be useful for SAMHSA in light of the purpose of the survey, but your use of A-CASII probably makes these observational data less useful than otherwise in that regard.

Q: I recently attended an ICPSR workshop on using the CPES data, and Dr. Kessler mentioned that we could email him with our interests in the data to see if others are pursuing similar lines of research. I am interested in using the CPES data for my dissertation to examine racial/ethnic differences in risk and expression of PTSD. I’m particularly interested in looking at PTSD among Latinos. I know that there has been a lot of work on PTSD with the NCS-R, and also work that explores race/ethnic differences, but I haven’t seen much work on PTSD with the CPES.

Josh Breslau has done some work in this area using the NCS-R. If you look at the publications list on the NCS-R website ( and search for Breslau you will find the publications for your reference. You should also search here for PTSD papers.

Q: Regarding Past 30-Day Symptoms:

a) Are items Nsd1 through Nsd2t part of some published scale? (They appear similar to the SCL-90, but the score ranges for items are different.)

b) Since these items appear to be measuring different aspects of psychopathology, e.g. depression, hostility, etc., how should these items be tabulated?

The non-specific distress scale NSD5a-NSD5j is the K10 and information on this scale can be found on the scales section of the NCS-R website at See the following papers for more information on how this scale was developed (including a list of published scales from which items were taken):

Kessler, R.C., Andrews, G., Colpe, L.J., Hiripi, E., Mroczek, D.K., Normand, S.-L.T., Walters, E.E., & Zaslavsky, A.M. (2002). Short screening scales to monitor population prevalences and trends in nonspecific psychological distress. Psychological Medicine 32(6), 959-976.

Kessler, R.C., Barker, P.R., Colpe, L.J., Epstein, J.F., Gfroerer, J.C., Hiripi, E., Howes, M.J., Normand, S.-L.T., Manderscheid, R.W., Walters, E.E., & Zaslavsky, A.M. (2003). Screening for Serious Mental Illness in the General Population. Archives of General Psychiatry 60 (2), 184-189.

Q: I am curious as to how the National Comorbidity Survey dealt with changes from baseline to follow up regarding DSM-III-R vs. IV diagnoses of alcohol abuse/dependence. Any information regarding this topic would be greatly appreciated.

The NCS-R (cross-sectional) and the NCS-2 (panel data – not yet public) use algorithms to score the DSM-IV diagnoses of all disorders. The baseline NCS used algorithms to score DSM III diagnoses of all disorders. We simply report this distinction when looking at this analysis.

Q: As I'm looking at the CPES and NCS-R data sets, I can see that there are some questions under the heading "Childhood" that ask about abuse and neglect as an extension of "List A". I don't see "List A" anywhere in the data. If you could direct me to the right area, that would be great! Also, if there are other variables concerning maltreatment in the data set, please let me know.

Go to the NCS website and look at the respondent booklet. There are questions on abuse in the childhood section, the marriage section, and the PTSD section, but you should thoroughly examine the instrument on the NCS website for other questions. The instrument can be found at

Q: I have downloaded the NCS-R data and codebook and am interested in the coding of A- for the variable "SR1181 "- Label "Main hope for treatment." What does each of these letters represent? Is it in the codebook and I have just been unable to locate it?

This question should be directed to the CPES help line.

a) The article by Benjamin Druss et al. "Understanding Mental Health Treatment in Persons without Mental Health Diagnoses" in Arch Gen Psychiatry 64(10) Oct 2007 states on page 1198 that detailed results of patient's utilization of various providers is available upon request. I am interested in finding out more about this data, is it that of "Twelve-Month Use of Mental Health Services in the United States" 2005?

The appendix tables are located on our website at

b) I am interested in other data on CAM that was analyzed/published in this regard. (For I would like to see if conducting secondary data analysis of the NCS-R as part of my project has already been done, or if such analysis (of CAM use and correlates in the NCS-R) remains to be done? From what I could find in the publication list from the National Comorbidity website publication link of publications- 2000-2008- I only saw CAM utilization data relative to demographics and DSM-IV diagnosis in "Twelve-Month Use of Mental Health Services in the United States" 2005- but nothing more in-depth on CAM correlates with other variables of the NCS-R.

We have not done any CAM specific analyses other than those listed in the publication list on our website.

Q: In your appendix Table 2. 12-month prevalence of DSM-IV / WMH-CIDI disorders by sex and cohort, Footnote “2” is referenced at the end of “Adult Separation Anxiety Disorder.” Is this correct, or should it instead be “4.” I’m asking in order to determine the age range of sample respondents for this disorder. Other sources I have come across which use the NCS-R have referenced something similar to your footnote “4” for “adult separation anxiety disorder” limiting the age range at 18-44. It also seems odd that going down the table the footnotes go in the order 1, 2, 3, 5, skipping 4, which leads me to believe that this might be a typo.

The footnote is correct.

Q: I am in the process of applying for access to the restricted data set to complete some analyses on data from NCS-R participants who reported (or were reported as having) a hearing loss (items SC10_4b, SC10-5a2, SC10_5a4). According to the summary data that was posted on the website at one time, 269 people reported having a hearing problem that prevents them from hearing what is said in a conversation, even with a hearing aid. I am wondering -- given the survey is administered by an interviewer using a computer, how did the interviewer communicate with the deaf person so the deaf person could complete the survey? Was there a sign language interpreter? Did the deaf person read the items on the screen themselves? I'd like to be able to describe the method of data collection with the sample of people with hearing problems.

We did not use sign language interpreters and we did not instruct interviewers to let the respondent read the computer screen. My assumption is that these cases are a combination of respondents who misunderstood the extremeness of the question (maybe because they were hard of hearing!), interviewers speaking loudly throughout the interview, and possibly a few miscoded answers.

Q: Can you recommend an appropriate citation for both NCS-R lifetime prevalence estimates and 12-month prevalence estimates?


You can cite the webpage as you would any internet citation. The following papers are citations for lifetime and 12-month prevalence in the NCS-R, but do not have the breakdown by gender and age.

Kessler, R.C., Berglund, P.A., Demler, O., Jin, R., Merikangas, K.R., Walters, E.E. (2005). Lifetime prevalence and age-of-onset distributions of DSM-IV disorders in the National Comorbidity Survey Replication (NCS-R). Archives of General Psychiatry, 62(6), 593-602.

Kessler, R.C., Chiu, W.T., Demler, O., Merikangas, K. R., Walters, E.E. (2005). Prevalence, severity, and comorbidity of twelve-month DSM-IV disorders in the National Comorbidity Survey Replication (NCS-R). Archives of General Psychiatry, 62(6), 617-627.

Q: Where do the occupation and industry codes come from for the NCS-R? 

These codes come from the ISCO-88 (International Standard Classification of Occupations).

Q:  I'm doing research on mental health in a global/comparative context, and one source I've been looking at is the World Mental Health Survey at  In looking over the site, I didn't see any apparent route or means for accessing the data.  Are the data from this project available faculty/students for research purposes?  If so, how might the data be accessed?

Requests for permission to access international WMH survey data should be sent to the coordinating center at Harvard Medical School via this email:

Q:  When will the NCS-2 data will be available for public use?

We do not have an exact release date from SAMHDA as yet, but SAMHDA can make restricted copies of the dataset available prior to public release.  Interested parties will need to apply for restricted release; for more information please click here:

Q: In looking at the NCS-R data set, I found in the FAQ section of the CPES site that the NCS-R data was compromised by invalid skip logic executed during interviews. I have been unable to find any details on the nature or severity of the invalid logic, and I would very much appreciate if you could provide more details on this issue.

We selected a clinical sample using the original CIDI.  When we conducted the clinical interview for OCD (SCID), we also re-interviewed them using a revised CIDI with the skip error removed.  We then used multiple imputation to impute the disorders for those subjects who were not re-interviewed.  We do not make our multiply imputed data available to the public due to the extensive support we would need to provide for its use.

Q: I have a question about the suicidal ideation question of the National Comorbidity Survey Replication. Why are 1588 (17.1%) participants missing data on the SD2 variable? Were they not asked about suicidal ideation?

The instrument skip logic is such that if a respondent reports that they are unable to read they are skipped to SD15 and asked a version of this question there. This explains why they are missing at SD2. They were asked about suicidal ideation at SD15.

Q: I am a Ph.D. candidate who is struggling with how to address missing data with categorical variables in the NCS-R dataset. I am looking at the construction variables using categorical items in published studies. Do you by chance, have the information on how the variable for Neglect in the following study was developed?

Green, J. G., McLaughlin, K. A., Berglund, P. A., Gruber, M. J., Sampson, N. A., Zaslavsky, A. M., & Kessler, R. C. (2010). Childhood adversities and adult psychiatric disorders in the national comorbidity survey replication I: associations with first onset of DSM-IV disorders. Archives of General Psychiatry, 67(2), 113-123.

There are two issues: How to construct scales given the available items; and how to deal with missing data. On the first one: We did not release scale procedures because we do not want to reify the simple scales based on the small number of items about diverse constructs we used in the survey. But we made all items available, so you can decide on your own which items you want to try to put together by doing your own factor analyses and deciding on how to construct scales. As to missing values: A number of standard procedures exist. We typically use rational imputation when we can. Then we use mean/median/mode imputation when the amount of missing data is small and multiple imputation when the amount of missing data is larger. A number of methods exist to do MI and you will have to seek local consultation from statistics people on how to select among them and how to implement the procedures.