## Sorting preliminaries

Check the documentation for `?sort` and `?order` and then write code to output the values of `Y` sorted according to the order of `X`

``````X <- runif(10)
Y <- X + rnorm(10)
qplot(X, Y)``````

Base R solution

#### Check with tidyverse solution

``````egdf <- data.frame(X = X, Y = Y)
egdf %>%
arrange(X)``````
``````##             X          Y
## 1  0.05571544  2.6500709
## 2  0.08926600  0.5578119
## 3  0.16266617  0.2374518
## 4  0.17869520  0.2483979
## 5  0.25586316 -2.3250955
## 6  0.45478863 -0.2310837
## 7  0.82033903  1.1852317
## 8  0.84673601 -1.2461330
## 9  0.89994850  3.1002198
## 10 0.94981537  0.2989222``````

## Within-leaf averages

Below is some code that computes the average values of `Y` above and below a given split point

Base R

``````x_split <- 0.5
c(mean(Y[X <= x_split]),
mean(Y[X > x_split]))``````
``## [1] 0.1895922 0.8345602``

tidyverse

``````egdf %>%
group_by(X <= x_split) %>%
summarize(avg_Y = mean(Y))``````
``````## # A tibble: 2 x 2
##   `X <= x_split` avg_Y
##   <lgl>          <dbl>
## 1 FALSE          0.835
## 2 TRUE           0.190``````

## Numeric predictor

Write a function that inputs a single numeric predictor and outcome, and outputs a splitting point that achieves the lowest RSS

### Example data

``````n <- 1000
mixture_ids <- rbinom(n, 1, .5)
x <- rnorm(n) + 3*mixture_ids
y <- rnorm(n) + 3*mixture_ids
qplot(x,y)``````

## Multiple splits

``````n <- 1000
mixture_ids <- rbinom(n, 1, .5)
x <- rnorm(n) + 3*mixture_ids
y <- rnorm(n) + 3*mixture_ids
x <- c(x, rnorm(n/2, mean = -2))
y <- c(y, rnorm(n/2, mean = 5))
egdf <- data.frame(x = x, y = y)
egplot <- egdf %>%
ggplot(aes(x, y)) +
geom_point()
egplot``````