Here, we illustrate optimal univariate clustering function `Ckmeans.1d.dp`

. It uses a given number of clusters \(k\), or estimates \(k\) if a range is provided. If \(k\) is unspecified, a default range from 1 to 9 is used. It can also perform optimal weighted clustering when a non-negative weight vector is specified as input. Weighted clustering can be used to analyze signals such as time series data, spectral data, genetic or epigenetic events along a chromosome.

```
require(Ckmeans.1d.dp)
require(RColorBrewer)
```

Cluster data generated from a Gaussian mixture model of three components.

The number of clusters is provided.

```
<- c(rnorm(50, sd=0.3), rnorm(50, mean=1, sd=0.3), rnorm(50, mean=2, sd=0.3))
x <- 3 # Divide x into 3 clusters
k <- Ckmeans.1d.dp(x, k)
result <- brewer.pal(k, "Dark2")
colors plot(result, col.clusters = colors)
```

```
plot(x, col=colors[result$cluster], pch=result$cluster, cex=1.5,
main="Optimal univariate clustering given k",
sub=paste("Number of clusters given:", k))
abline(h=result$centers, col=colors, lty="dashed", lwd=2)
legend("bottomright", paste("Cluster", 1:k), col=colors, pch=1:k, cex=1, bty="n")
```

Cluster data generated from a Gaussian mixture model of three components. The number of clusters is determined by Bayesian information criterion:

```
<- c(rnorm(50, mean=-1, sd=0.3), rnorm(50, mean=1, sd=1), rnorm(50, mean=2, sd=0.4))
x # Divide x into k clusters, k automatically selected (default: 1~9)
<- Ckmeans.1d.dp(x)
result <- max(result$cluster)
k <- brewer.pal(k, "Dark2")
colors plot(result, col.clusters = colors)
```

```
plot(x, col=colors[result$cluster], pch=result$cluster, cex=1.5,
main="Optimal univariate clustering with k estimated",
sub=paste("Number of clusters is estimated to be", k))
abline(h=result$centers, col=colors, lty="dashed", lwd=2)
legend("topleft", paste("Cluster", 1:k), col=colors, pch=1:k, cex=1, bty="n")
```

We segment a time course to identify peaks using weighted clustering. The input data is the time stamp of obtaining each intensity measurement; the weight is the signal intensity.

```
<- 160
n <- seq(0, 2*pi*2, length=n)
t <- 1:(n/2)
n1 <- (max(n1)+1):n
n2 <- abs(sin(1.5*t[n1]) + 0.1*rnorm(length(n1)))
y1 <- abs(sin(0.5*t[n2]) + 0.1*rnorm(length(n2)))
y2 <- c(y1, y2)
y
<- y^8 # stress the peaks
w <- Ckmeans.1d.dp(t, k=c(1:10), w)
res <- max(res$cluster)
k <- brewer.pal(k, "Set1")
colors plot(res, col.clusters = colors)
grid()
```

```
plot(t, w, main = "Time course clustering / peak calling",
col=colors[res$cluster], pch=res$cluster, type="h",
xlab="Time t", ylab="Transformed intensity w")
grid()
abline(v=res$centers, col="gray", lty="dashed")
text(res$centers, max(w) * .95, cex=0.75, font=2,
paste(round(res$size / sum(res$size) * 100), "/ 100"))
```

It is often desirable to visualize boundaries between consecutive clusters. The `ahist()`

function offers several ways to estimate cluster boundaries. The simplest is to use the midpoint between the two closest points in two consecutive clusters, as illustrated in the code below.

```
<- c(7, 4, 1, 8, 15, 22, -1)
x <- 3
k <- Ckmeans.1d.dp(x, k=k)
ckm <- ahist(ckm, style="midpoints", data=x, plot=FALSE)$breaks[2:k]
midpoints
<- brewer.pal(k, "Set2")
colors plot(ckm, col.clusters = colors, lwd=5,
main="Midpoints as cluster boundaries")
abline(v=midpoints, col="RoyalBlue", lwd=3, lty=2)
legend("topright", "Midpoints", lwd=3, lty=2, col="RoyalBlue")
```