NBER Patent Database - 2006

The official NBER Patent Data Project website is here. See that site for the latest data; this site contains some interim files and additional data that may not be completely cleaned.

Website errors on this site can be reported to Bronwyn H. Hall. Other queries about the patent data (such as “when will it be available?”) should be sent to the NBER patent data site. If you can’t read stata 9 file format, I suggest looking into acquiring a copy of stat/transfer, an excellent program that converts data files to and from a large variety of formats. Please do not ask me to convert data for you. The data available here are the following:

US patents issued 1976-2006

US patent citations 1976-2006


Patent data for 1976-2006 – This file has application and grant year, US patent class and the HJT recode to tech classes, along with all the original IPCs for the patent, the original assignee number, and all the pdpass numbers that result from splitting jointly owned patents, standardizing names, etc. This latter variable is the one that is used for matching to Compustat. US class and IPC classification information is available here.


Note that there are multiple pdpass and multiple IPCs (icl) for each patent, so this file has only 3,209,376 unique patent numbers for 4,855,982 observations. Be careful not to double count!


File format:

  obs:     4,855,982                          Patents granted through 2006,

                                                including IPCs

 vars:            13                          28 Mar 2009 15:28

 size:   267,079,010 (57.5% of memory free)


              storage  display     value

variable name   type   format      label      variable label


appyear         int    %8.0g                  Year patent applied for

assignee        long   %12.0g                 Original assignee number

cat             byte   %8.0g                  HJT tech class (1-6)

gyear           int    %8.0g                  Year patent granted

icl             str18  %18s                   clas/international


icl_class       str4   %9s                    Main 4-char IPC

icl_maingroup   float  %9.0g                  Main group within 4char IPC

iclnum          byte   %8.0g                  clas/icl seq. number (imc)

nclass          int    %8.0g                  3-digit US patent class (10=D)

patent          long   %12.0g                 patent number

pdpass          long   %12.0g                 Unique assignee number for

                                                match to CS

subcat          byte   %8.0g                  HJT technology


subclass        float  %9.0g                  Numerical subclass


Sorted by:  patent  pdpass  icl


Citations data for 1976-2006


File format:

  obs:    23,650,891                        USPTO citing-cited patent pairs

                                            for patents issued 1976-2006

 vars:             3                          23 Aug 2008 10:39

 size:   331,112,474


              storage  display     value

variable name   type   format      label      variable label


citing          long   %12.0g                 Number of citing patent

cited           long   %12.0g                 Number of cited patent

ncites7606      int    %9.0g                  Tot cites received by pats iss



Sorted by:  cited  citing


Last Updated 26 July 2018 by Bronwyn H. Hall