Causality and Observational Case - Pharmaceutical Detailing

Author: Jingy
Instructor: Dr. Avery Haviv
Date: Dec 2021

(picture from: Pharmaceutical Technology)

This case revolves around a data of real-world observations from a pharmaceutical company. The company markets to doctors through detailed visiting, where pharmaceutical representatives meet directly with doctors who might prescribe the drug to inform them of its capabilities. Detailing are an important part of the U.S. pharmaceutical industry, and more is spent on it than on clinical trials, free samples, educational conferences, and other forms of advertising.

The data tracks the number of prescriptions each doctor wrote for the drug in a given month, the number of detailed visits they received, and other characteristics. The company is interested in targeting their detailed visits more effectively by identifying which doctors are most likely to increase their prescriptions.

(picture from: The Pew Charitable Trusts)

The descriptions of the variables are as follows:

Explore the Dataset

detailData = read.csv('Detailing Case Data.csv')
names(detailData)
## [1] "X"              "scripts"        "detailing"      "lagged_scripts"
## [5] "mean_samples"   "doctorType"     "doctorID"
summary(detailData)
##        X            scripts         detailing      lagged_scripts  
##  Min.   :    1   Min.   : 0.000   Min.   : 0.000   Min.   : 0.000  
##  1st Qu.: 5751   1st Qu.: 1.000   1st Qu.: 1.000   1st Qu.: 1.000  
##  Median :11500   Median : 3.000   Median : 2.000   Median : 3.000  
##  Mean   :11500   Mean   : 5.061   Mean   : 1.883   Mean   : 5.093  
##  3rd Qu.:17250   3rd Qu.: 6.000   3rd Qu.: 3.000   3rd Qu.: 6.000  
##  Max.   :23000   Max.   :96.000   Max.   :18.000   Max.   :96.000  
##   mean_samples     doctorType           doctorID     
##  Min.   :0.0000   Length:23000       Min.   :   1.0  
##  1st Qu.:0.1424   Class :character   1st Qu.: 250.8  
##  Median :0.3435   Mode  :character   Median : 500.5  
##  Mean   :0.5640                      Mean   : 500.5  
##  3rd Qu.:0.7783                      3rd Qu.: 750.2  
##  Max.   :4.9609                      Max.   :1000.0

Observation:

Basic Correlations

First, we can check the correlation of interested columns scripts and detailing, whether there is a positive correlation between these two variables.

cor(detailData$scripts,detailData$detailing)
## [1] 0.2175696
cor.test(detailData$scripts,detailData$detailing)
## 
##  Pearson's product-moment correlation
## 
## data:  detailData$scripts and detailData$detailing
## t = 33.804, df = 22998, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2052229 0.2298471
## sample estimates:
##       cor 
## 0.2175696

Observation:

Next, we can check multiple correlations.

cor(detailData[,c('scripts','detailing','mean_samples')])
##                scripts detailing mean_samples
## scripts      1.0000000 0.2175696    0.4140847
## detailing    0.2175696 1.0000000    0.3766691
## mean_samples 0.4140847 0.3766691    1.0000000

Observation:

scripts is positively correlated with detailing and mean_samples.

Interpretation of a Univariate Regression

First, we run a simple linear regression with scripts as dependent variable and detailing as the independent variable.

summary(lm(scripts~detailing,data=detailData))
## 
## Call:
## lm(formula = scripts ~ detailing, data = detailData)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -14.448  -3.990  -2.231   0.889  90.829 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.29142    0.07081   46.48   <2e-16 ***
## detailing    0.93977    0.02780   33.80   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.232 on 22998 degrees of freedom
## Multiple R-squared:  0.04734,    Adjusted R-squared:  0.0473 
## F-statistic:  1143 on 1 and 22998 DF,  p-value: < 2.2e-16

Observation:

Multivariate Analysis

Next, we will control for different variables. The multiple linear regression with scripts as dependent variable and detailing, algged_scripts, mean_samples as the independent variables.

summary(lm(scripts~detailing+lagged_scripts+mean_samples,data=detailData))
## 
## Call:
## lm(formula = scripts ~ detailing + lagged_scripts + mean_samples, 
##     data = detailData)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -46.565  -1.850  -0.413   1.511  32.228 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    0.330081   0.041177   8.016 1.14e-15 ***
## detailing      0.071813   0.016308   4.404 1.07e-05 ***
## lagged_scripts 0.809921   0.003831 211.427  < 2e-16 ***
## mean_samples   0.834614   0.046108  18.101  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.921 on 22996 degrees of freedom
## Multiple R-squared:  0.7201, Adjusted R-squared:   0.72 
## F-statistic: 1.972e+04 on 3 and 22996 DF,  p-value: < 2.2e-16

Observation:

Categorical Variables

There are three different kinds of doctors in this dataset (General Physicians, Area Specialists, and Other Specialists). Different types of doctor may have different attitudes to prescribe this drug, so we need to treat this as a categorical variable.

summary(lm(scripts~detailing+lagged_scripts+mean_samples+factor(doctorType),data=detailData))
## 
## Call:
## lm(formula = scripts ~ detailing + lagged_scripts + mean_samples + 
##     factor(doctorType), data = detailData)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -45.384  -1.840  -0.300   1.538  32.859 
## 
## Coefficients:
##                                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                          2.09960    0.08008  26.218  < 2e-16 ***
## detailing                            0.09579    0.01615   5.931 3.05e-09 ***
## lagged_scripts                       0.75809    0.00428 177.109  < 2e-16 ***
## mean_samples                         0.88280    0.04586  19.249  < 2e-16 ***
## factor(doctorType)General Physician -1.92859    0.07674 -25.132  < 2e-16 ***
## factor(doctorType)Other Specialist  -1.95691    0.09037 -21.653  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.865 on 22994 degrees of freedom
## Multiple R-squared:  0.7279, Adjusted R-squared:  0.7278 
## F-statistic: 1.23e+04 on 5 and 22994 DF,  p-value: < 2.2e-16

Observation:

Interaction Effects

Next, we need to focus more on the company’s fundamental questio: who should they target? Here, we need to figure out the interaction effect of detailing and doctorType, in order to know the effect of detailing change for different groups.

summary(lm(scripts~detailing*factor(doctorType)+lagged_scripts+mean_samples,data=detailData))
## 
## Call:
## lm(formula = scripts ~ detailing * factor(doctorType) + lagged_scripts + 
##     mean_samples, data = detailData)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -44.623  -1.836  -0.358   1.541  32.337 
## 
## Coefficients:
##                                                Estimate Std. Error t value
## (Intercept)                                    1.616035   0.098075  16.478
## detailing                                      0.364417   0.035337  10.313
## factor(doctorType)General Physician           -1.327022   0.106831 -12.422
## factor(doctorType)Other Specialist            -1.333558   0.121866 -10.943
## lagged_scripts                                 0.753227   0.004314 174.618
## mean_samples                                   0.887155   0.045947  19.308
## detailing:factor(doctorType)General Physician -0.319701   0.039395  -8.115
## detailing:factor(doctorType)Other Specialist  -0.359380   0.050366  -7.135
##                                               Pr(>|t|)    
## (Intercept)                                    < 2e-16 ***
## detailing                                      < 2e-16 ***
## factor(doctorType)General Physician            < 2e-16 ***
## factor(doctorType)Other Specialist             < 2e-16 ***
## lagged_scripts                                 < 2e-16 ***
## mean_samples                                   < 2e-16 ***
## detailing:factor(doctorType)General Physician 5.09e-16 ***
## detailing:factor(doctorType)Other Specialist  9.94e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.859 on 22992 degrees of freedom
## Multiple R-squared:  0.7288, Adjusted R-squared:  0.7287 
## F-statistic:  8825 on 7 and 22992 DF,  p-value: < 2.2e-16

Observation:

Fixed Effects

Next we want to identity each doctor to see is it possible that even within a doctor type, different doctors have different attitudes to perscribe the drug and detailing visit. And, including a large categorical variable like this is formally called a fixed effect.

#install.packages('lfe', repos='http://cran.us.r-project.org')
library('lfe')
## Loading required package: Matrix
summary(felm(scripts~detailing*factor(doctorType)+lagged_scripts+mean_samples|doctorID, data = detailData))
## Warning in chol.default(mat, pivot = TRUE, tol = tol): the matrix is either
## rank-deficient or indefinite

## Warning in chol.default(mat, pivot = TRUE, tol = tol): the matrix is either
## rank-deficient or indefinite
## 
## Call:
##    felm(formula = scripts ~ detailing * factor(doctorType) + lagged_scripts +      mean_samples | doctorID, data = detailData) 
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -24.112  -1.481  -0.280   1.279  43.134 
## 
## Coefficients:
##                                                Estimate Std. Error t value
## detailing                                      0.303723   0.037798   8.035
## factor(doctorType)General Physician                  NA         NA      NA
## factor(doctorType)Other Specialist                   NA         NA      NA
## lagged_scripts                                 0.277893   0.006519  42.627
## mean_samples                                         NA         NA      NA
## detailing:factor(doctorType)General Physician -0.249304   0.043776  -5.695
## detailing:factor(doctorType)Other Specialist  -0.244269   0.055680  -4.387
##                                               Pr(>|t|)    
## detailing                                     9.79e-16 ***
## factor(doctorType)General Physician                 NA    
## factor(doctorType)Other Specialist                  NA    
## lagged_scripts                                 < 2e-16 ***
## mean_samples                                        NA    
## detailing:factor(doctorType)General Physician 1.25e-08 ***
## detailing:factor(doctorType)Other Specialist  1.15e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.371 on 21996 degrees of freedom
## Multiple R-squared(full model): 0.8021   Adjusted R-squared: 0.793 
## Multiple R-squared(proj model): 0.08132   Adjusted R-squared: 0.03943 
## F-statistic(full model):88.86 on 1003 and 21996 DF, p-value: < 2.2e-16 
## F-statistic(proj model): 278.2 on 7 and 21996 DF, p-value: < 2.2e-16

Observation:

Conclusion

The firm should target Area Specialists, it should also give out more free samples, as they have a very strong relationship with scripts. There may have many more potential variables to consider. Detailing visits in previous months, and the pharma rep all seem immediately relevant.

The key takeaways from his case which relate to important business problems: