7.7 Histogram
geom_histogram(mapping = NULL, data = NULL, stat = "bin", position = "stack", ..., binwidth = NULL, origin = NULL, breaks = NULL, bins = NULL, na.rm = FALSE, orientation = NA, show.legend = NA, inherit.aes = TRUE)
binwidth
The width of the bins. Can be specified as a numeric value or as a function that calculates width from unscaled x. Here, “unscaled x” refers to the original x values in the data, before application of any scale transformation. When specifying a function along with a grouping structure, the function will be called once per group.- The default is to use the number of bins in
bins
, covering the range of the data.stat_bin()
usingbins=30
; this is not a good default, but the idea is to get you experimenting with different number of bins. You can also experiment modifying thebinwidth
withcenter
orboundary
arguments.binwidth
overridesbins
so you should do one change at a time. - You should always override this value, exploring multiple widths to find the best to illustrate the stories in your data.
- The default is to use the number of bins in
center
,boundary
numeric values specify bin positions. One value for either center or boundary is adequate, other values will be automatically filled usingbinwidth
.center
specifies the center of one of the bins. Default figure will use center position to identify bins.boundary
specifies the boundary between two bins. Boundary values are more informative. [suggest to specify one boundary value; just easier to say boundaries]- Worth noting that
center
andboundary
can be either above or below the range of the data, in this case the value provided will be shifted of a multiple number ofbinwidth
.
bins
Number of bins. Overridden by binwidth. Defaults to 30.breaks
Actual breaks to use. Intervals are created as left open, right closed. But specifying insidegeom_histogram
might show weird breaks in y-axis labels.Specifying breaks using
scale_x_continuous
is a better practice.
p <- ggplot(data=data, aes(tmp) ) +
geom_histogram(fill="#BDBCBC", color="black", binwidth = 2, boundary=0) +
labs(x="Average temperature [ºC]")
p
geom_histogram(aes(..density..))
surrounding the variable names with ..
means to call after_stat
function. It delays the mapping until later in the rendering process when summary statistics have been calculated. The expression ..density..
is deprecated; use after_stat()
in stead.
Most aesthetics are mapped directly from variables found in the data, called direct input (stage1). Sometimes, however, you want to delay the mapping until later stages of the data that you can map aesthetics from, and three functions to control at which stage aesthetics should be evaluated.
after_stat(x)
and after_scale(x)
can be used inside the aes()
function, used as the mapping
argument in layers.
after_stat(x)
uses variables calculated after the transformation by the layer stat (stage 2);E.g., the height of bars in
geom_histogram()
can be density probability;# this shows the count frequency ggplot(faithful, aes(x = waiting)) + geom_histogram(fill="#BDBCBC", color="black") # this shows the density plot, can replace after_stat(density) with ..density.. # surrounding the variable name with two dots ggplot(faithful, aes(x = waiting)) + geom_histogram(aes(y = after_stat(density)), fill="#BDBCBC", color="black") + geom_density() # empirical density
after_stat(count)
show frequncy count;after_stat(ncount)
count, scaled to a maximum of 1;after_stat(density)
show density;after_stat(ndensity)
density, scaled to a maximum of 1;
after_scale
uses variables calculated after the scale transformation (stage 3); see documents here.- could be used to label a bar plot;
Add fitted density from a distribution
# fit a lognormal distribution
library(MASS)
fit_params <- fitdistr(prices_monthly$AdjustedPrice,"lognormal")
fit_params$estimate
ggplot(prices_monthly, aes(x=AdjustedPrice)) +
geom_histogram(bins=40,
aes(y=..density..),
fill="#BDBCBC", color="black") +
stat_function(fun=dlnorm,
args=list(meanlog = fit_params$estimate['meanlog'],
sdlog = fit_params$estimate['sdlog']),
colour = "red"
) +
scale_x_continuous(limits=c(0, 170))
Histogram of a vector
This returns an error: data
must be a data.frame
. If you don’t provide argument name explicitly, sequential rule is used – data
arg is used for aes(x=dice_results)
.
To correct it – use arg name explicitly:
Alternatively, you may use it inside geom_
functions familiy without explicit naming mapping
argument since mapping
is the first argument unlike in ggplot
function case where data
is the first function argument.
ggplot() + geom_bar(aes(dice_results))
# or use the `aes` function
ggplot() +
aes(dice_results) +
geom_bar()
Vertical histogram
https://stackoverflow.com/a/13334294/10108921
geom_bar
and geom_col
plots bar charts.
geom_bar
makes the height of the bar proportional to the number of cases in each group.geom_col
the heights of the bars to represent values in the data
geom_ribbon(data=sim_obs_quantile, aes(ymin=`17%`, ymax=`83%`), alpha=0.2, fill="#F8766D")
plot confidence interval (CI) in shaded areas.
geom_segment(aes(x = x1, y = y1, xend = x2, yend = y2), col = "red", arrow = arrow(length = unit(0.3, "cm"))
draws a straight line between points (x, y)
and (xend, yend)
in the plot.
arrow
specification for arrow heads, as created bygrid::arrow()
.
annotate("segment", x=12, y=-0.05, xend=12, yend=0, col="red", arrow=arrow(length=unit(0.3, "cm")))
draw arrows outside the plot.