Large-Scale Analysis of HTTP Response Headers

Poster Number

118

Session Title

Physical Sciences, Math, and Computer Science

College

College of Business Administration

Department

Computer Science & Quantitative Methods

Faculty Mentor

Andrew Besmer, Ph.D.; R. Stephen Dannelly, Ph.D.; and William Thacker, Ph.D.

Abstract

This paper examines trends in the use of HTTP response headers that relate to security, how long it takes for them to become widely adopted after release, and how quickly they are phased out after deprecation. The data come from the Common Crawl’s monthly web crawls that collect responses from what we can consider to be the entire internet. They are delivered as JSON in WAT format and analyzed in Python on an AWS EMR cluster running PySpark, which allows the analysis of data in parallel across the nodes in the cluster. For the purposes of this research, the entire dataset will be analyzed, as well as a subset representative of Fortune 500 companies. For each website in the dataset, there will be checking for the presence of 16 different HTTP response headers that pertain to security (e.g., X-XSS-Protection). The presence of each header over several months indicates the speed of adoption or abandonment.

Grant Support?

Supported by funding from Amazon

Start Date

24-4-2020 12:00 AM

This document is currently not available here.

Share

COinS
 
Apr 24th, 12:00 AM

Large-Scale Analysis of HTTP Response Headers

This paper examines trends in the use of HTTP response headers that relate to security, how long it takes for them to become widely adopted after release, and how quickly they are phased out after deprecation. The data come from the Common Crawl’s monthly web crawls that collect responses from what we can consider to be the entire internet. They are delivered as JSON in WAT format and analyzed in Python on an AWS EMR cluster running PySpark, which allows the analysis of data in parallel across the nodes in the cluster. For the purposes of this research, the entire dataset will be analyzed, as well as a subset representative of Fortune 500 companies. For each website in the dataset, there will be checking for the presence of 16 different HTTP response headers that pertain to security (e.g., X-XSS-Protection). The presence of each header over several months indicates the speed of adoption or abandonment.