---
title: "policy_learn"
output:
  rmarkdown::html_vignette:
    fig_caption: true
    toc: true    
    toc_depth: 2
vignette: >
  %\VignetteIndexEntry{policy_learn}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
bibliography: ref.bib  
---


```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r lib, message = FALSE}
library("data.table")
library("polle")
```

This vignette is a guide to `policy_learn()` and some of the associated S3 methods.
The purpose of `policy_learn` is to specify a policy learning algorithm and estimate an optimal
policy. For details on the methodology, see the associated paper
[@nordland2023policy].

We consider a fixed two-stage problem as a general setup and simulate data using `sim_two_stage()` and create a `policy_data` object using `policy_data()`:

```{r simdata}
d <- sim_two_stage(n = 2e3, seed = 1)
pd <- policy_data(d,
                  action = c("A_1", "A_2"),
                  baseline = c("B", "BB"),
                  covariates = list(L = c("L_1", "L_2"),
                                    C = c("C_1", "C_2")),
                  utility = c("U_1", "U_2", "U_3"))
pd
```

## Specifying and applying a policy learner

`policy_learn()` specify a policy learning algorithm via
the `type` argument: Q-learning (`ql`), doubly robust Q-learning (`drql`),
doubly robust blip learning (`blip`), policy tree learning (`ptl`), and
outcome weighted learning (`owl`).

Because each policy learning type has varying control arguments, these
are passed as a list using the `control` argument. To help the user
set the required control arguments and to provide documentation, each
type has a helper function `control_type()` which sets the default control
arguments and overwrite values if supplied by the user.

As an example we specify a doubly robust blip learner:

```{r plblip}
pl_blip <- policy_learn(
  type = "blip",
  control = control_blip(
    blip_models = q_glm(formula = ~ BB + L + C)
  )
)
```

For details on the implementation, see Algorithm 3 in [@nordland2023policy].
The only required control argument for blip learning is a model input.
The `blip_models` argument expects a `q_model`. In this case we input a
simple linear model as implemented in `q_glm`.

The output of `policy_learn()` is again a function:

```{r plblipout}
pl_blip
```

In order to apply the policy learner we need to input a `policy_data` object
and nuisance models `g_models` and `q_models` for computing the doubly robust
score.

```{r plblipapply}

(po_blip <- pl_blip(
  pd,
  g_models = list(g_glm(), g_glm()),
  q_models = list(q_glm(), q_glm())
 ))
```

## Cross-fitting the doubly robust score

Like `policy_eval()` is it possible to cross-fit the doubly robust score used as input to the policy model.
The number of folds for the cross-fitting procedure is provided via the `L` argument. As default, the
cross-fitted nuisance models are not saved. The cross-fitted nuisance models can be saved via the
`save_cross_fit_models` argument:

```{r plcross}
pl_blip_cross <- policy_learn(
  type = "blip",
  control = control_blip(
    blip_models = q_glm(formula = ~ BB + L + C)
  ),
  L = 2,
  save_cross_fit_models = TRUE
)
po_blip_cross <- pl_blip_cross(
   pd,
   g_models = list(g_glm(), g_glm()),
   q_models = list(q_glm(), q_glm())
 )
```
From a user perspective, nothing has changed. However, the policy object now contains each of the cross-fitted nuisance models:

```{r plcrossinspect}
po_blip_cross$g_functions_cf
```

## Realistic policy learning

Realistic policy learning is implemented for types `ql`, `drql`, `blip` and `ptl` (for a binary action set).
The `alpha` argument sets the probability threshold for defining the realistic action set. For implementation details,
see Algorithm 5 in [@nordland2023policy]. Here we set a 5\% restriction:

```{r pl_alpha}
pl_blip_alpha <- policy_learn(
  type = "blip",
  control = control_blip(
    blip_models = q_glm(formula = ~ BB + L + C)
  ),
  alpha = 0.05,
  L = 2
)
po_blip_alpha <- pl_blip_alpha(
   pd,
   g_models = list(g_glm(), g_glm()),
   q_models = list(q_glm(), q_glm())
 )
```

The policy object now lists the `alpha` level as well as the g-model used to define the realistic action set:
```{r viewalpha}
po_blip_alpha$alpha
po_blip_alpha$g_functions
```

## Implementation/Simulation and `get_policy_functions()`

A `policy` function is great for evaluating a given policy or
even implementing or simulating from  a single-stage policy.
However, the function is not useful for implementing or simulating
from a learned multi-stage policy. To access the policy function for each stage
we use `get_policy_functions()`. In this case we get the second stage policy function:

```{r }
pf_blip <- get_policy_functions(po_blip, stage = 2)
```

The stage specific policy requires a `data.table` with named columns as input and returns a character vector with the recommended actions:

```{r }
pf_blip(
  H = data.table(BB = c("group2", "group1"),
                 L = c(1, 0),
                 C = c(1, 2))
)
```


## Policy objects and `get_policy()`

Applying the policy learner returns a `policy_object` containing all of the components needed to specify the learned policy. In this the only component of the policy is a model for the blip function:

```{r inspectblip}
po_blip$blip_functions$stage_1$blip_model
```

To access and apply the policy itself use `get_policy()`, which behaves
as a `policy` meaning that we can apply to any (suitable) `policy_data` object to get the policy actions:

```{r }
get_policy(po_blip)(pd) |> head(4)
```

# SessionInfo

```{r sessionInfo}
sessionInfo()
```

# References