


Student/~RESEARCH/~First Summer Paper/Data/EDGAR/master20184.idx Url = r"" + str(year) + '/' + 'QTR' + str(qtr) + '/master.idx'ĭl_file = down_direct + 'master' + str(year) + str(qtr) + '.idx'ĭown_direct = os.path.join((), 'edgar', 'indexfiles')ĭownloaded C:/Users/Documents/Student Files/~Current How should I adjust my code to make it run as intended? (i.e., pull all 4 qtrs of the start_year and end_year)Īm I being blocked by sec.gov? If so, can I tweak my code to get around that? import osĭef get_index(start_year:int, end_year:int, down_direct:str):ĭown_direct = r"C:/Users/Documents/Student Files/~Current Student/~RESEARCH/~First Summer Paper/Data/EDGAR/"įor year in range(start_year, end_year+1): However, sec.gov is researcher friendly as long as you attempt downloads after hours in spaced attempts, both of which I have done (I worked on this from 7pm to 10pm last night and waited 10ish minutes between attempts). It is my understanding that certain websites look for requests from urllib.request and may automatically screen for that. The unsuccessful run makes me think I have been blocked by sec.gov. The log/IDLE shell results below show a “successful” and unsuccessful run. However, this code only grabs 2018 index files. When I run the below, it should download both 20 index files at the down_direct path. I will first discuss the former first and the latter at the end.

Forthcoming, Foundations and Trends in Accounting.Ĭurrently, I have two issues: My code doesn’t work as intended and I appear to be getting blocked by sec.gov. Using Python for text analysis in accounting research.

I can provide screenshots if necessary.Īnand, V., Bochkay, K., Chychyla, R., & Leone, A. I listed this purely to show I have a source for what I am doing. I would upload the PDF, but I don't see that as an option. It is currently on SSRN, so I realize everyone may not have access. This is just a reference of the paper I am using. So, I am in the early stages of my project. I am first attempting to download the index files (image below), which will help me write a code to get 10-Ks specifically. The explanation for my code begins on page 194, and the code on page 195. I have read various sources, watched various videos, but I found the below reference to be the most relevant to my project, and quite frankly, it is easy for me to follow along with. I am doing a research project in which I am trying to scrape public company 10-Ks from sec.gov via EDGAR. I am new to the world of coding, so please bear with me if I misuse terminology or generally do not know what I am talking about.
