Large-Scale Analysis of HTTP Response Headers
Poster Number
118
Session Title
Physical Sciences, Math, and Computer Science
College
College of Business Administration
Department
Computer Science & Quantitative Methods
Faculty Mentor
Andrew Besmer, Ph.D.; R. Stephen Dannelly, Ph.D.; and William Thacker, Ph.D.
Abstract
This paper examines trends in the use of HTTP response headers that relate to security, how long it takes for them to become widely adopted after release, and how quickly they are phased out after deprecation. The data come from the Common Crawl’s monthly web crawls that collect responses from what we can consider to be the entire internet. They are delivered as JSON in WAT format and analyzed in Python on an AWS EMR cluster running PySpark, which allows the analysis of data in parallel across the nodes in the cluster. For the purposes of this research, the entire dataset will be analyzed, as well as a subset representative of Fortune 500 companies. For each website in the dataset, there will be checking for the presence of 16 different HTTP response headers that pertain to security (e.g., X-XSS-Protection). The presence of each header over several months indicates the speed of adoption or abandonment.
Grant Support?
Supported by funding from Amazon
Start Date
24-4-2020 12:00 AM
Large-Scale Analysis of HTTP Response Headers
This paper examines trends in the use of HTTP response headers that relate to security, how long it takes for them to become widely adopted after release, and how quickly they are phased out after deprecation. The data come from the Common Crawl’s monthly web crawls that collect responses from what we can consider to be the entire internet. They are delivered as JSON in WAT format and analyzed in Python on an AWS EMR cluster running PySpark, which allows the analysis of data in parallel across the nodes in the cluster. For the purposes of this research, the entire dataset will be analyzed, as well as a subset representative of Fortune 500 companies. For each website in the dataset, there will be checking for the presence of 16 different HTTP response headers that pertain to security (e.g., X-XSS-Protection). The presence of each header over several months indicates the speed of adoption or abandonment.