Title: | Visualization for Quantitative Language Comparison |
---|---|
Description: | Collection of visualizations as used in quantitative language comparison. Currently implemented are visualisations dealing nominal data with multiple levels ("level map" and "factor map"), and assistance for making weighted geographical Voronoi-maps ("weighted map"). |
Authors: | Michael Cysouw [aut, cre] |
Maintainer: | Michael Cysouw <[email protected]> |
License: | GPL-3 |
Version: | 0.4 |
Built: | 2024-10-24 06:01:26 UTC |
Source: | https://github.com/cysouw/qlcvisualize |
A collection of specific visualisations of data as used in quantitative language comparison.
Package: | qlcVisualize |
Type: | Package |
Version: | 0.4 |
Date: | 2024-07-31 |
License: | GPL-3 |
Currently implemented are visualisations dealing nominal data with multiple levels lmap ("levelMap") and fmap ("factorMap"), and some assistance for making of weighted geographic Voronoi maps wmap ("weigthedMap").
Michael Cysouw <[email protected]>
Convenience function to add contourlines to a map, specifically geared towards suggesting boundaries to the result of weightedMap. Internally based on a krige-interpolation.
addContour(heights, points, window, crs, add = TRUE, polygons = FALSE, levels = c(0.4, 0.45, 0.5), grid = 50000, ...)
addContour(heights, points, window, crs, add = TRUE, polygons = FALSE, levels = c(0.4, 0.45, 0.5), grid = 50000, ...)
heights |
Numeric vector with the same length as |
points |
Locations of the datapoints as |
window |
Window for the interpolation as |
crs |
A crs in WKT format. |
add |
By default, contourlines are added to the previous plot. Otherwise, they are returns as |
polygons |
Should polygons be returned instead of linestrings? Only applicable when |
levels |
Levels on which to draw the contourlines. Multiple lines get thicker towards higher values to suggest a center. These levels have to be related relative to the |
grid |
Number of points inside the window for the krige-interpolation. Higher numbers lead to nicers contourlines, but take longer to evaluate. |
... |
Additional specifications passed internally to |
Internally, a grid is made inside the window and the height is interpolated using ordinary kriging from [gstat]{krige}
with a model suggested by autofitVariogram
.
Contourlines are added to the current plot or (when add=FALSE
) they are returned as an sf object with a sfc_LINESTRING
geometry.
This is a preliminary convenience function that will be used to overhaul levelMap
Michael Cysouw <[email protected]>
weightedMap
for more involved example
data(hessen) # continuous variable between 0 and 1 data <- hessen$data[,1:3] heights <- round(data[,1]/rowSums(data), digits = 1) cols <- heat.colors(11) names(cols) <- names(table(heights)) # boundary as sf w <- sf::st_as_sf(hessen$boundary) sf::st_crs(w) <- 4326 w <- sf::st_transform(w, 2397) # points as sf p <- sf::st_as_sf(hessen$villages, coords = c("longitude", "latitude")) sf::st_crs(p) <- 4326 p <- sf::st_transform(p, 2397) # plot map plot(sf::st_geometry(p), col = cols[as.character(heights)], pch = 19) plot(sf::st_geometry(w), add = TRUE, border = "grey") # add boundary addContour(heights, points = p, window = w, crs = 2397, grid = 1000, levels = c(0.25, 0.35, 0.45, 0.55), col = "blue")
data(hessen) # continuous variable between 0 and 1 data <- hessen$data[,1:3] heights <- round(data[,1]/rowSums(data), digits = 1) cols <- heat.colors(11) names(cols) <- names(table(heights)) # boundary as sf w <- sf::st_as_sf(hessen$boundary) sf::st_crs(w) <- 4326 w <- sf::st_transform(w, 2397) # points as sf p <- sf::st_as_sf(hessen$villages, coords = c("longitude", "latitude")) sf::st_crs(p) <- 4326 p <- sf::st_transform(p, 2397) # plot map plot(sf::st_geometry(p), col = cols[as.character(heights)], pch = 19) plot(sf::st_geometry(w), add = TRUE, border = "grey") # add boundary addContour(heights, points = p, window = w, crs = 2397, grid = 1000, levels = c(0.25, 0.35, 0.45, 0.55), col = "blue")
levelMap
The function levelMap
can be tweaked by various parameters determining the boundary of the interpolation. The function boundary
helps finding suitable parameters.
boundary(points, density = 0.02, grid = 10, box.offset = 0.1 , tightness = "auto", manual = NULL, plot = TRUE)
boundary(points, density = 0.02, grid = 10, box.offset = 0.1 , tightness = "auto", manual = NULL, plot = TRUE)
points |
Points, typically a two-column matrix with x and y coordinates. |
density |
Density of points below which there should be no interpolation. |
grid |
Density of the grid. |
box.offset |
Distance of the box around the points. |
tightness |
Parameter influencing how tightly the boundary should be wrapped around the points. Passed internally to kde2d. When |
manual |
Manually added boundary points in the form of a two-column matrix with coordinates. |
plot |
Logical: by default the impact of the chosen parameters is shown. If |
Instead of trying to use a polygon as a boundary for the interpolation internally in levelMap
it turned out to be easier to use a collection of points that mark the outside.
By default, returns a plot with the original points in black, the points below density in red, and the box around the points in blue. Contour lines of the density are shown to choose different density parameters.
When plot = FALSE
, the blue and red points from the graphic are returned as a two-column matrix of x and y coordinates.
Michael Cysouw <[email protected]>
Used internally in levelMap
. The parameters of this function can be passed through, typically density
and box.offset
.
data(hessen) # show impact of the chosen parameters boundary(hessen$villages, density = 0.1, grid = 20 , manual = cbind(x = c(8.3, 9.2), y = c(49.9, 50.0))) # return coordinates boundary(hessen$villages, plot = FALSE) # abstract example, showing tightness in action oldpar<-par("mfrow") par(mfrow = c(1,3)) p <- cbind(c(1:10, 1:10), c(1:10, 10:1)) boundary(p, density = 0.005, grid = 20, tightness = "auto") boundary(p, density = 0.005, grid = 20, tightness = 5) boundary(p, density = 0.005, grid = 20, tightness = 3) par(mfrow = oldpar)
data(hessen) # show impact of the chosen parameters boundary(hessen$villages, density = 0.1, grid = 20 , manual = cbind(x = c(8.3, 9.2), y = c(49.9, 50.0))) # return coordinates boundary(hessen$villages, plot = FALSE) # abstract example, showing tightness in action oldpar<-par("mfrow") par(mfrow = c(1,3)) p <- cbind(c(1:10, 1:10), c(1:10, 10:1)) boundary(p, density = 0.005, grid = 20, tightness = "auto") boundary(p, density = 0.005, grid = 20, tightness = 5) boundary(p, density = 0.005, grid = 20, tightness = 3) par(mfrow = oldpar)
In total 34 different words in which an f-like sound occurs. The different pronunciations of this sound in 183 different German villages are included in this dataset.
data(dialects)
data(dialects)
List of 2:
villages
Dataframe with two variables LONGITUDE and LATITUDE for all 183 villages.
data
Matrix with 34 columns showing the pronunciation in the 183 villages.
Excerpt from https://github.com/cysouw/PAD/
## Not run: # might give error message because of non-ASCII phonetic symbols data(dialects) require(mapdata) map("worldHires", "Germany", fill = TRUE, col = "grey90") lmap(dialects$villages, dialects$data[,21] , levels = c(0.20, 0.22, 0.24), add = TRUE, position = "topleft") title(main = "f-sound in \'Kochlöffel\'") ## End(Not run)
## Not run: # might give error message because of non-ASCII phonetic symbols data(dialects) require(mapdata) map("worldHires", "Germany", fill = TRUE, col = "grey90") lmap(dialects$villages, dialects$data[,21] , levels = c(0.20, 0.22, 0.24), add = TRUE, position = "topleft") title(main = "f-sound in \'Kochlöffel\'") ## End(Not run)
A factor map ("fmap") is a counterpart of the base function image
. In contrast to an image, a factor map can be used for nominal data with various levels (instead of continuous numerical data). A matrix (or a dataframe coerced as matrix) is visualised by showing the most frequent contents of the cells by colouring. There are various methods for ordering of rows and columns provided, alike to a heatmap
).
factorMap(x, order = NULL, col = rainbow(4), show.remaining = FALSE, col.remaining = "grey", pch.na = 20, col.na = "lightgrey", legend = length(col), labels.x = rownames(x), labels.y = colnames(x), cex.axis = 1, cex.legend = 1, cex.remaining = 1, font = "", asp = nrow(x)/ncol(x), method = "hamming", control = NULL, plot = TRUE)
factorMap(x, order = NULL, col = rainbow(4), show.remaining = FALSE, col.remaining = "grey", pch.na = 20, col.na = "lightgrey", legend = length(col), labels.x = rownames(x), labels.y = colnames(x), cex.axis = 1, cex.legend = 1, cex.remaining = 1, font = "", asp = nrow(x)/ncol(x), method = "hamming", control = NULL, plot = TRUE)
x |
A matrix or dataframe with the data to be displayed. Rows are shown on the x-axis, columns on the y-axis, showing the row- and column-names in the display. All data in the whole matrix is interpreted as one large factor with different levels. |
order |
How should rows and columns be ordered? By default the order of the data matrix |
col |
Colors to be used for the display. By default, the colours specified here are used in order of frequency of the phenomena in the data (i.e. |
show.remaining |
Logical: should all levels without color be shown inside the boxes as text? |
col.remaining |
Which color should the text of the uncolored levels have? |
pch.na |
Symbol to be used for NA elements. Use |
col.na |
Color to be used for NA elements. |
legend |
How many levels should be shown in the legend (in the order of frequency? Alternatively, provide a vector with names of the levels to be shown. Use |
labels.x |
Labels to be used on the x-axis. Defaults to rownames of the data. Use |
labels.y |
Labels to be used on the y-axis. Defaults to colnames of the data. Use |
cex.axis |
Size of the row and columns names of |
cex.legend |
Size of the legend text. |
cex.remaining |
Size of the text in the boxes. Only shown when |
font |
Font to be used in the plotting, can be necessary for unusual unicode symbols. Passed internally to |
asp |
Aspect-ratio of the plotting of the boxes. By default the complete plot will be approximately square. Use the value 1 for all square boxes. Manually resizing the boxes by changing the plotting window can be achieved by setting |
method |
Method used to determine similarity, passed to sim.obs, which is used internally to determine the order of rows and columns, using the method chosen in |
control |
List of options passed to seriate. |
plot |
By default, a plot is returned. When |
There are many different orderings implemented: "pca"
and "varimax"
use the second dimension of prcomp and varimax respectively. "eig"
will use the first eigenvector as computed by eigs. This is really quick for large datasets. "mds"
will use the first dimension of cmdscale.
Further, all methods as provided in the function seriate can be called. Specifcally, "R2E"
and "MDS_angle"
seem worthwhile to try out. Any parameters for these methods can be passed using the option control
.
A plot is returned by default. When plot = FALSE
, a list is returned with the reordering of the rows and the columns.
Note that it is slightly confusing that the resulting image is a transposed version of the data matrix (rows of the matrix are shown as horizontal lines in the graphic, and they are shown from bottom to top). This is standard practice though, also used in image and heatmap, so it is continued here.
Michael Cysouw <[email protected]>
image
in base and pimage in the package seriation
.
# a simple data matrix x <- matrix(letters[1:5],3,5) x[2,3] <- x[1,4] <- NA rownames(x) <- c("one", "two", "three") colnames(x) <- 1:5 x # some basic factor maps factorMap(x, asp = 1) factorMap(x, col = heat.colors(5), asp = NA) factorMap(x, col = list(b = "red", e = "blue"), show.remaining = TRUE) ## Not run: # more interesting example, different "f" sounds in german dialects # note that fonts might be problematic on some platforms # plotting window should be made really large as well data(dialects) factorMap(dialects$data, col = rainbow(8), order = "R2E" , cex.axis = 0.3, cex.legend = 0.7 , show.remaining = TRUE, cex.remaining = 0.2) # get reordering of rows # to identify the group of words with "p-f" correspondences factorMap(dialects$data, order = "R2E", plot = FALSE) ## End(Not run)
# a simple data matrix x <- matrix(letters[1:5],3,5) x[2,3] <- x[1,4] <- NA rownames(x) <- c("one", "two", "three") colnames(x) <- 1:5 x # some basic factor maps factorMap(x, asp = 1) factorMap(x, col = heat.colors(5), asp = NA) factorMap(x, col = list(b = "red", e = "blue"), show.remaining = TRUE) ## Not run: # more interesting example, different "f" sounds in german dialects # note that fonts might be problematic on some platforms # plotting window should be made really large as well data(dialects) factorMap(dialects$data, col = rainbow(8), order = "R2E" , cex.axis = 0.3, cex.legend = 0.7 , show.remaining = TRUE, cex.remaining = 0.2) # get reordering of rows # to identify the group of words with "p-f" correspondences factorMap(dialects$data, order = "R2E", plot = FALSE) ## End(Not run)
Summary of the data in Haspelmath (1997), classical example of the usage of semantic maps.
data(haspelmath)
data(haspelmath)
The dataset is a matrix with nine rows, describing the different indefinite functions, and 134 columns documenting for each individual construction which functions are possibly expressed by it.
Haspelmath, Martin. Indefinite Pronouns. Oxford Studies in Typology and Linguistic Theory. Oxford: Clarendon, 1997.
data(haspelmath) ## English data haspelmath[,7:9]
data(haspelmath) ## English data haspelmath[,7:9]
Proposed in Heeringa (2004) to colour a (dis)similarity by decomposing it into three dimensions (using cmdscale
here) and then mapping these dimensions to RGB to make colours. Highly useful to visualize pairwise similarities between geographic regions.
heeringa(dist, power = 0.5, mapping = c(1, 2, 3), method = "eigs", center = NULL)
heeringa(dist, power = 0.5, mapping = c(1, 2, 3), method = "eigs", center = NULL)
dist |
|
power |
Factor used to influence the results of the multidimensional scaling. Values closer to one will lead to clearer separated colours, while higher values will lead to more gradual colours. |
mapping |
Optional vector to change the mapping of the dimensions to the colours. Should be of length 3, specifying to which color each of the three dimensions is mapped. A 1 means 'red', a 2 means 'green' and a 3 means 'blue'. Adding a minus reverses the mapping. |
method |
Method used to determine the colour dimensions. Either |
center |
Optionally, specify an index of one of the points to be put in the center of the coloring scheme, i.e. this point will become grey and all other points will be colored relative to this point. |
This proposal goes back to Heeringa (2004). The idea is to visualize distances by mapping the first three dimensions of a multidimensional scaling to the the red-green-blue scales. The mapping
vector can be used to change the mapping to the colours.
A vector of colours of the same length as the size of the dist object.
Michael Cysouw <[email protected]>
Heeringa, Wilbert. "Measuring Dialect Pronunciation Differences Using Levenshtein Distance." Ph.D. Thesis, Rijksuniversiteit Groningen, 2004.
data(hessen) tess <- weightedMap(hessen$villages, window = hessen$boundary, crs = 2397) d <- dist(hessen$data, method = "canberra") # different mappings of the colors c1 <- heeringa(d) plot(tess$weightedVoronoi, col = c1, border = NA) c2 <- heeringa(d, power = 1, mapping = c(3, -2, 1)) plot(tess$weightedVoronoi, col = c2, border = NA)
data(hessen) tess <- weightedMap(hessen$villages, window = hessen$boundary, crs = 2397) d <- dist(hessen$data, method = "canberra") # different mappings of the colors c1 <- heeringa(d) plot(tess$weightedVoronoi, col = c1, border = NA) c2 <- heeringa(d, power = 1, mapping = c(3, -2, 1)) plot(tess$weightedVoronoi, col = c2, border = NA)
An example dataset of dialect data.
data(hessen)
data(hessen)
List of 3
boundary
An object of type owin
describing a geographical boundary. This format is necessary for Voronoi diagrams.
villages
A dataframe with two variables "longitude" and "latitude" for the 157 villages on this there is data in this dataset.
data
Dataframe with 56 different characteristics of these 157 villages, distributed over 15 different variables (as indicated in the column names).
Data from https://www.syhd.info/startseite/index.html
Jürg Fleischer, Simon Kasper & Alexandra N. Lenz (2012): Die Erhebung syntaktischer Phänomene durch die indirekte Methode: Ergebnisse und Erfahrungen aus dem Forschungsprojekt "Syntax hessischer Dialekte" (SyHD). In: Zeitschrift für Dialektologie und Linguistik 79/1, 2-42.
data(hessen) tessalation <- weightedMap(hessen$villages, window = hessen$boundary, crs = 2397) plot(tessalation$weightedVoronoi)
data(hessen) tessalation <- weightedMap(hessen$villages, window = hessen$boundary, crs = 2397) plot(tessalation$weightedVoronoi)
A multi-level map ("lmap") is a plot of the distribution of nominal data with multiple levels in space. Such visualisations have two direct use-case in linguistics, viz. semantic maps and isoglosses. The drawing if the lines in space is performed by interpolation in this function (see details).
Semantic maps (Haspelmath 2003) are a visualisation of linguistic diversity. A semantic map shows a predefined configuration of functions/senses in two-dimensional space with an overlay of language-specific encoding of these functions/senses. An level-map tries to emulate this linguistic visualisation in an automatic fashion with various options for visual presentation.
Isoglosses show lines surrounding similar phenomena in space. Instead of drawing an exact boundary around measured points, an interpolation-technique is used here to show areas of interest. By only showing boundaries, multiple phenomena can be shown in one graphic.
levelMap(points, data, main = NULL, draw = 5, levels = c(0.41, 0.46, 0.51), labels = NULL, cex = 0.7, col = "rainbow", add = FALSE, ignore.others = FALSE, normalize.frequency = FALSE, scale.pies = FALSE, lambda = NA, legend = TRUE, position = "bottomleft", cex.legend = 0.7, font = "", note = TRUE, file.out = NULL, ...)
levelMap(points, data, main = NULL, draw = 5, levels = c(0.41, 0.46, 0.51), labels = NULL, cex = 0.7, col = "rainbow", add = FALSE, ignore.others = FALSE, normalize.frequency = FALSE, scale.pies = FALSE, lambda = NA, legend = TRUE, position = "bottomleft", cex.legend = 0.7, font = "", note = TRUE, file.out = NULL, ...)
points |
Coordinates of the data points specified as a two-column matrix or dataframe. |
data |
Language data to be plotted as contour-overlay over the points. Either specified as a vector of language-specific forms, or as a numeric matrix with the forms as columns and the points as rows (the language-specific forms should be specified as |
main |
Title for the plot |
draw |
Which forms to be drawn by contours. Specifying a numeric value will only draw the uppermost frequent forms in the data, by default only the topmost five forms are drawn (automatically ordered by frequency). Alternatively, a vector with names or column-indices of the forms to be drawn can be specified. |
levels |
height of contours to be drawn. Internally, all values are normalized between zero and one, so only values between those extremes are sensible. Line thickness is automatically balanced. |
labels |
Optionally, character vector with labels for the points, to be drawn instead of symbols in the plot. Should be a vector of the same length as the number of points. Alternatively, a single character-string is repeated for all points. |
cex |
Character expansion of the labels (see previous option). Also influences the size of symbols or pie-charts. |
col |
Colour specification, either in the form of the name of a built-in color palettes, like rainbow, or a manually specified vector of colors. When NULL, an attempt is made to use grey-scales. |
add |
Logical: should the plot be added to an existing plot or not? |
ignore.others |
Logical: ignore all other categories, not selected through |
normalize.frequency |
Logical: should rows of data (points in the plot) be normalized to 1? Useful only for data that represent frequency of occurence as columns. Note that setting this to |
scale.pies |
logical: for multivalued data: should the size of the pies represent frequencies ( |
lambda |
Parameter for the interpolation, passed internally to the function Krig. Low values result in more detailed boundaries around the measured points. |
legend |
Logical: should a legend be added? |
position |
Where should the legend be positioned? Passed internally to legend. |
cex.legend |
Character expansion passed to legend, and also used for the indication of the levels in the plot |
font |
Font to be used for the legend and the labels. Passed internally to |
note |
Logical: should a note be added to the bottom of the graphic to document the levels of the countour lines? |
file.out |
Location for writing the image to a file instead of plotting it on screen |
... |
Additional parameters optionally passed to boundary for the specification of the area of interpolation. |
The basic idea is to use some kind of interpolation to show areas of high-occurrence of a specific phenomenon. Internally Kriging is used, and then only contour lines are shown of the interpolation. Multiple lines are suggested to indicate the probibalistic interpretation of the lines.
A plot is produces with the different phenomena in space surrounded by lines. When multiple options are possible at each point then pie charts are added.
Michael Cysouw <[email protected]>
# isogloss example # choose one feature from hessen dataset (number 4) data(hessen) f4 <- hessen$data[,9:13] # look for area for interpolation, changing density and grid parameters # suitable parameters can be passed through to function levelMap below boundary(hessen$villages, density = 0.1, grid = 10) # useful size of pies has to be determined by changing cex plot(hessen$boundary, main = NULL) levelMap(hessen$villages, f4, draw = 3, cex = 0.8, normalize.frequency = TRUE , density = 0.1, grid = 10, add = TRUE, cex.legend = 0.5, scale.pies = TRUE) ## Not run: # another isogloss example: # "f" sounds in German dialects in the words "Kochlöffel" # might give Unicode-errors because of phonetic symbols require(mapdata) map("worldHires", "Germany", fill = TRUE, col = "grey90") data(dialects) levelMap(dialects$villages, dialects$data[,21], levels = c(0.20, 0.22, 0.24) , add = TRUE, position = "topleft") title(main = "f-sound in \'Kochlöffel\'") ## End(Not run) # semantic map example # location of points via multidimensional scaling of complete data data(haspelmath) d <- dist(haspelmath) p <- MASS::isoMDS(d)$points # testing boundary parameters boundary(p) boundary(p, density = 0.004, box = 0.15, tightness = 8) # labels to be plotted instead of points text <- gsub("\\.", "\n", rownames(haspelmath)) # show a few languages for Haspelmaths indefinite data # using a quick dummy function to set all parameters indef <- function(columns) { levelMap(p, haspelmath[,columns] , levels = 0.1, labels = text , density = 0.004, box = 0.15, tightness = 8 , lambda = 0.1, note = FALSE) } oldpar <- par("mfcol") par(mfcol = c(2,3)) indef(1:3) indef(4:6) indef(7:9) indef(10:12) indef(13:17) indef(18:22) par(mfcol = oldpar)
# isogloss example # choose one feature from hessen dataset (number 4) data(hessen) f4 <- hessen$data[,9:13] # look for area for interpolation, changing density and grid parameters # suitable parameters can be passed through to function levelMap below boundary(hessen$villages, density = 0.1, grid = 10) # useful size of pies has to be determined by changing cex plot(hessen$boundary, main = NULL) levelMap(hessen$villages, f4, draw = 3, cex = 0.8, normalize.frequency = TRUE , density = 0.1, grid = 10, add = TRUE, cex.legend = 0.5, scale.pies = TRUE) ## Not run: # another isogloss example: # "f" sounds in German dialects in the words "Kochlöffel" # might give Unicode-errors because of phonetic symbols require(mapdata) map("worldHires", "Germany", fill = TRUE, col = "grey90") data(dialects) levelMap(dialects$villages, dialects$data[,21], levels = c(0.20, 0.22, 0.24) , add = TRUE, position = "topleft") title(main = "f-sound in \'Kochlöffel\'") ## End(Not run) # semantic map example # location of points via multidimensional scaling of complete data data(haspelmath) d <- dist(haspelmath) p <- MASS::isoMDS(d)$points # testing boundary parameters boundary(p) boundary(p, density = 0.004, box = 0.15, tightness = 8) # labels to be plotted instead of points text <- gsub("\\.", "\n", rownames(haspelmath)) # show a few languages for Haspelmaths indefinite data # using a quick dummy function to set all parameters indef <- function(columns) { levelMap(p, haspelmath[,columns] , levels = 0.1, labels = text , density = 0.004, box = 0.15, tightness = 8 , lambda = 0.1, note = FALSE) } oldpar <- par("mfcol") par(mfcol = c(2,3)) indef(1:3) indef(4:6) indef(7:9) indef(10:12) indef(13:17) indef(18:22) par(mfcol = oldpar)
These functions are deprecated: use weightedMap instead.
A Voronoi-map (voronoi-tessellation, also known as dirichlet tessellation) is used in quantitative dialectology. This function is a convenience wrapper to easily produce dialect maps with voronoi tessellations. Also described here are a helper functions to produce the tessellation.
vmap(tessellation, col = NULL, add = FALSE, outer.border = "black", border = "grey", lwd = 1, ...) voronoi(points, window)
vmap(tessellation, col = NULL, add = FALSE, outer.border = "black", border = "grey", lwd = 1, ...) voronoi(points, window)
tessellation |
Tessellation of class |
col |
Vector of colors for the filling of the tessellation. Is recycled when there are more tiles than colours. The order of the tiles is the same as the order of the points as specified in the function |
add |
Add graphics to an existing plot |
outer.border |
Colour of the outer border. Specifying |
border |
Colour of the inner borders. Specifying |
lwd |
Line width of borders. |
... |
Further arguments passed to polygon. |
points |
Two-column matrix with all coordinates of the points to make a Voronoi tessellation. |
window |
Outer boundary for the Voronoi tessellation. Should be in the form of an |
This code is almost completely based on functions from the spatstat.geom
package. For convenience, first some geographical boundaries can easily be accessed and converted for use in spatstat.geom
. Then a Voronoi tessellation can be made (based on the function dirichlet
, which in turn is based on deldir
from the package deldir
). Finally, this tessellation can be plotted filled with different colours.
Any legends have to be added manually by using legend
, see examples below.
The function voronoi
returns a warning when points are attested that lie outside of the specified border. For these points there is no polygon specified. Indices for the rejected points outside the border can be accessed by attr(x, "rejects")
.
voronoi
returns a tessellation of the class tess
from the package spatstat.geom
. When points outside of the border are attested, the indices of these points are added to an attribute "rejects".
vmap
plots a map.
Michael Cysouw <[email protected]>
## Not run: # make a Voronoi tessellation for some villages in hessen data(hessen) plot(hessen$boundary) points(hessen$villages, cex = 0.3) tessellation <- voronoi(hessen$villages, hessen$boundary) plot(tessellation) # make a resizable plot with random colour specification vmap(tessellation, col = rainbow(5), border = NA) legend("bottomright", legend = c("a","b","c","d","e"), fill = rainbow(5)) # use actual colors from data, using first feature from supplied data # multiple levels cannot easily be shown # consider \link{lmap} for more detail d1 <- hessen$data[,1:3] d1 <- d1[,1]/rowSums(d1) vmap(tessellation, col = rgb(1, 1-d1, 1-d1)) text(hessen$villages,labels=hessen$data[,1],cex=.5) legend("bottomright", legend = c("es mir", "mir es / other"), fill = c("red", "white")) # Use distances to determine colour, as proposed by Heeringa (2004) # Note that different methods to establish distances can lead to rather # different results! Also try method = "euclidean" d <- dist(hessen$data, method = "canberra") cols <- heeringa(d) vmap(tessellation, col = cols, border = NA) ## End(Not run)
## Not run: # make a Voronoi tessellation for some villages in hessen data(hessen) plot(hessen$boundary) points(hessen$villages, cex = 0.3) tessellation <- voronoi(hessen$villages, hessen$boundary) plot(tessellation) # make a resizable plot with random colour specification vmap(tessellation, col = rainbow(5), border = NA) legend("bottomright", legend = c("a","b","c","d","e"), fill = rainbow(5)) # use actual colors from data, using first feature from supplied data # multiple levels cannot easily be shown # consider \link{lmap} for more detail d1 <- hessen$data[,1:3] d1 <- d1[,1]/rowSums(d1) vmap(tessellation, col = rgb(1, 1-d1, 1-d1)) text(hessen$villages,labels=hessen$data[,1],cex=.5) legend("bottomright", legend = c("es mir", "mir es / other"), fill = c("red", "white")) # Use distances to determine colour, as proposed by Heeringa (2004) # Note that different methods to establish distances can lead to rather # different results! Also try method = "euclidean" d <- dist(hessen$data, method = "canberra") cols <- heeringa(d) vmap(tessellation, col = cols, border = NA) ## End(Not run)
A weighted map ("wmap") is a combination of a Voronoi tessellation with cartogram weighting. A Voronoi map is a tessellation of a surface based on a set of geographic points. It is used to display areal patterns without overlap. Additionally, the size of the tiles can be weighted by cartogram-deformation to allow for varying the visual impression of the data. Specifically, this allows for equal-area-sized tiles to equally represent all data-points in the visual display.
weightedMap(x, y = NULL, window = NULL, crs = NULL, weights = "equal", grouping = NULL, holes = NULL, concavity = 2, expansion = 1000, method = "dcn", maxit = 10, verbose = 0)
weightedMap(x, y = NULL, window = NULL, crs = NULL, weights = "equal", grouping = NULL, holes = NULL, concavity = 2, expansion = 1000, method = "dcn", maxit = 10, verbose = 0)
x |
Coordinates of the data-points, either an |
y |
Latitude (y-coordinates), when the parameter |
window |
Geographical window within which the Voronoi-tessellation will be displayed. Typically an When no window is provided (by default), then a concave hull is induced from the coordinates (using |
crs |
Coordinate reference system that is necessary for the projection of the map. When not provided, an attempt is made to extract a crs from the provided coordinates or from the provided window. Without any crs there will be a warning and |
weights |
Vector with weights for the deformation of the Voronoi-tiles. Should have the same length as the number of coordinates provided. The weights are passed to |
grouping |
Influence the form of the concave window around the coordinates. Only used when there is no |
holes |
A list of |
concavity |
Influence the form of the concave window around the coordinates. Only used when there is no |
expansion |
Influence the form of the concave window around the coordinates. Only used when there is no |
method |
Method used for cartogram deformation, passed to |
maxit |
Maximum number of iterations to find a suitable deformation. Parameter passed internally to |
verbose |
With |
Internally, the Voronoi-tessellation is made without respecting the window, and only afterwards the window is superimposed on the tessellation. In some circumstances with internal holes in the window provided, an attempt is made to return tiles that do not jump across such holes. However, sometimes artefacts are still visible in the output.
Warnings are produced when coordinates lie outside the window provided. The results should still work, but without these points outside. Any colouring or other uses of the results have to be adapted accordingly by using the information in $outsideWindow
. Polygons without any points inside are likewise removed with a warning.
To deal with overlapping coordinates some jitter is automatically applied to the coordinates provided.
List of various lengths, depending on specified parameters. Use the names to select any of these results:
crs: |
The crs in WKT format. |
points: |
The coordinates as provided, but as a projected |
grouping: |
Character vector with the provided grouping, or the grouping as induced from any provided window. |
window: |
The window around the coordinates as a projected |
emptyPolygons: |
Numeric vector with the indices of the polygons that are removed because they do not contain any of the coordinates provided. |
outsideWindow: |
Numeric vector with the indices of the coordinates that are removed because they are outside of the window provided. |
voronoi: |
Voronoi-tessellation of the window as a projected |
weights: |
Numeric vector with the weights used for the deformation. Weights for coordinates outside of the window are removed. |
weightedPoints: |
Coordinates after the weighting-deformation, specified as a projected |
weightedWindow: |
Window after the weighting-deformation, specified as a projected |
weightedMap: |
Voronoi-tessellation after the weighting-deformation, specified as a projected |
Various additional checks and corrections are added to prevent as many as possible errors that arise in the deformation process. However, this comes at the cost of rather long runtime with large maps.
With more complex windows the deformation by cartogramR
might throw errors (like "IllegalArgumentException") or fall into an infinite loop. Try to reduce maxit
and check verbose = 1
to get an idea what might be going wrong.
Michael Cysouw <[email protected]>
# generate a window from coordinates # note the Germany-centered Gauss-Kruger projection "EPSG:2397" # consider increasing 'maxit' to remove the warning about convergence data(hessen) v <- weightedMap(hessen$villages, expansion = 4000, crs = 2397, maxit = 2) plot(v$weightedVoronoi) # show the original locations before the transformation in orange plot(v$points, add = TRUE, col = "green", cex = .5) # show the new locations after transformation in red plot(v$weightedPoints, add = TRUE, col = "blue", cex = .5) # add the real border of Hessen for comparison h <- sf::st_as_sf(hessen$boundary) sf::st_crs(h) <- 4326 h <- sf::st_transform(h, 2397) plot(h, add = TRUE, border = "red") # use the Voronoi tiles e.g. for Heeringa-colouring (see function "heeringa()") d <- dist(hessen$data, method = "canberra") plot(v$weightedVoronoi, col = heeringa(d), border = NA) plot(v$weightedWindow, add = TRUE, lwd = 2) # grouping-vector can be used to make separations in the base-map groups <- rep("a", times = 157) groups[157] <- "b" groups[c(58,59)] <- "c" groups[c(101, 102, 107)] <- "d" # holes-list can be used to add holes inside the region holes <- list(c(9, 50.5), c(9.6, 51.3), c(8.9, 51)) v <- weightedMap(hessen$villages, grouping = groups, holes = holes, crs = 2397, expansion = 3000) plot(v$weightedVoronoi, col = "grey") ## Not run: # extensive example using data from WALS (https://wals.info). Both the worldmap # and the WALS data are downloaded directly. The worldmap is projected and # deformed so that each datapoint has equal area on the map. # load worldmap data(world) # load WALS data, example feature 13: "tone" library(lingtypology) feature <- "13A" wals <- wals.feature(feature) head(wals) # get glottolog coordinates and correct errors in WALS wals$glottocode[wals$glottocode == "poqo1257"] <- "poqo1253" wals$glottocode[wals$glottocode == "mamc1234"] <- "mamm1241" wals$glottocode[wals$glottocode == "tuka1247"] <- "tuka1248" wals$glottocode[wals$glottocode == "bali1280"] <- "unea1237" wals <- merge(wals[,c("glottocode", "wals.code", feature)], glottolog[,c("glottocode", "longitude", "latitude")]) # calculate an equally-weighted voronoi transformation # using atlantic-centered interrupted mollweide projection v <- weightedMap(wals$longitude, wals$latitude, window = world, crs = "+proj=imoll +lon_0=11.5", method = "dcn", maxit = 10) # prepare colors cols <- c("lightsalmon", "lightgrey", "lightpink") names(cols) <- names(table(wals[,feature])) cols <- cols[c(2,3,1)] # the map plot(v$weightedVoronoi, col = cols[wals[,feature]], border = "darkgrey", lwd = 0.2) plot(v$weightedWindow, border = "black", add = TRUE, lwd = 0.5) legend("bottomleft", legend = names(cols), fill = cols, cex = .7) # add contourlines height <- c(0, 0.5, 1) names(height) <- c("No tones", "Simple tone system", "Complex tone system") addContour(heights = height[wals[,feature]], points = v$weightedPoints, window = v$weightedWindow, crs = v$crs, col = "darkred", levels = c(0.2, 0.4, 0.6)) # Alternative: using points instead of polygons cols[2:3] <- c("orange", "red") plot(v$weightedPoints, col = cols[wals[,feature]], cex = 1, pch = 19) plot(v$weightedWindow, add = T, border = "darkgrey", lwd = 0.5) ## End(Not run)
# generate a window from coordinates # note the Germany-centered Gauss-Kruger projection "EPSG:2397" # consider increasing 'maxit' to remove the warning about convergence data(hessen) v <- weightedMap(hessen$villages, expansion = 4000, crs = 2397, maxit = 2) plot(v$weightedVoronoi) # show the original locations before the transformation in orange plot(v$points, add = TRUE, col = "green", cex = .5) # show the new locations after transformation in red plot(v$weightedPoints, add = TRUE, col = "blue", cex = .5) # add the real border of Hessen for comparison h <- sf::st_as_sf(hessen$boundary) sf::st_crs(h) <- 4326 h <- sf::st_transform(h, 2397) plot(h, add = TRUE, border = "red") # use the Voronoi tiles e.g. for Heeringa-colouring (see function "heeringa()") d <- dist(hessen$data, method = "canberra") plot(v$weightedVoronoi, col = heeringa(d), border = NA) plot(v$weightedWindow, add = TRUE, lwd = 2) # grouping-vector can be used to make separations in the base-map groups <- rep("a", times = 157) groups[157] <- "b" groups[c(58,59)] <- "c" groups[c(101, 102, 107)] <- "d" # holes-list can be used to add holes inside the region holes <- list(c(9, 50.5), c(9.6, 51.3), c(8.9, 51)) v <- weightedMap(hessen$villages, grouping = groups, holes = holes, crs = 2397, expansion = 3000) plot(v$weightedVoronoi, col = "grey") ## Not run: # extensive example using data from WALS (https://wals.info). Both the worldmap # and the WALS data are downloaded directly. The worldmap is projected and # deformed so that each datapoint has equal area on the map. # load worldmap data(world) # load WALS data, example feature 13: "tone" library(lingtypology) feature <- "13A" wals <- wals.feature(feature) head(wals) # get glottolog coordinates and correct errors in WALS wals$glottocode[wals$glottocode == "poqo1257"] <- "poqo1253" wals$glottocode[wals$glottocode == "mamc1234"] <- "mamm1241" wals$glottocode[wals$glottocode == "tuka1247"] <- "tuka1248" wals$glottocode[wals$glottocode == "bali1280"] <- "unea1237" wals <- merge(wals[,c("glottocode", "wals.code", feature)], glottolog[,c("glottocode", "longitude", "latitude")]) # calculate an equally-weighted voronoi transformation # using atlantic-centered interrupted mollweide projection v <- weightedMap(wals$longitude, wals$latitude, window = world, crs = "+proj=imoll +lon_0=11.5", method = "dcn", maxit = 10) # prepare colors cols <- c("lightsalmon", "lightgrey", "lightpink") names(cols) <- names(table(wals[,feature])) cols <- cols[c(2,3,1)] # the map plot(v$weightedVoronoi, col = cols[wals[,feature]], border = "darkgrey", lwd = 0.2) plot(v$weightedWindow, border = "black", add = TRUE, lwd = 0.5) legend("bottomleft", legend = names(cols), fill = cols, cex = .7) # add contourlines height <- c(0, 0.5, 1) names(height) <- c("No tones", "Simple tone system", "Complex tone system") addContour(heights = height[wals[,feature]], points = v$weightedPoints, window = v$weightedWindow, crs = v$crs, col = "darkred", levels = c(0.2, 0.4, 0.6)) # Alternative: using points instead of polygons cols[2:3] <- c("orange", "red") plot(v$weightedPoints, col = cols[wals[,feature]], cex = 1, pch = 19) plot(v$weightedWindow, add = T, border = "darkgrey", lwd = 0.5) ## End(Not run)
These functions are deprecated: use weightedMap instead.
Different ways to easily produce windows of class "owin" from the package "spatstat" are presented here. These are used by voronoi.
hullToOwin(points, shift, alpha) mapsToOwin(country, database = "world") gadmToOwin(country, sub = NULL, level = 0)
hullToOwin(points, shift, alpha) mapsToOwin(country, database = "world") gadmToOwin(country, sub = NULL, level = 0)
points |
Set of points that need a window around them. Two column matrix. |
shift |
The amount of space around the outer points at determining the window. |
alpha |
Parameter for the 'curviness': lower values show more detail. Passed internally to ahull. |
country |
Name of the country to obtain borders and turn them into an |
database |
Database as used by |
sub , level
|
Names for Subdivisions of countries as available in the GADM database |
For hullToOwin
, the function ahull is used to make a hull around the points. This is then converted to an "owin" window.
The functions mapsToOwin
and GadmToWin
use external topogaphic boundaries to produce windows.
All functions return an object of class owin
from the package spatstat
.
Includes code from code from Andrew Bevan, based on code from Dylan Beaudette, see https://stat.ethz.ch/pipermail/r-sig-geo/2012-March/014409.html.
The function gadmToOwin
needs online access to download the data. The data is saved in the current working directory, and will not be downloaded again when it is already available there.
Michael Cysouw <[email protected]>
## Not run: # Boundary of the German state "Hessen" # This will need to access the online GADM database # and might take some time boundary <- gadmToOwin("DEU", "Hessen", 1) # A window does not have to be continuous random <- mapsToOwin(c("Germany", "Greece")) plot(random, main = NULL) # hull around some points # note influence of alpha and shift data(hessen) hull <- hullToOwin(hessen$villages, shift = 0.2, alpha = 1) plot(hull) points(hessen$villages) hull <- hullToOwin(hessen$villages, shift = 0.1, alpha = 0.2) plot(hull) points(hessen$villages) ## End(Not run)
## Not run: # Boundary of the German state "Hessen" # This will need to access the online GADM database # and might take some time boundary <- gadmToOwin("DEU", "Hessen", 1) # A window does not have to be continuous random <- mapsToOwin(c("Germany", "Greece")) plot(random, main = NULL) # hull around some points # note influence of alpha and shift data(hessen) hull <- hullToOwin(hessen$villages, shift = 0.2, alpha = 1) plot(hull) points(hessen$villages) hull <- hullToOwin(hessen$villages, shift = 0.1, alpha = 0.2) plot(hull) points(hessen$villages) ## End(Not run)
A polygon representing a worldmap tailored to be not too detailed, but still fitting all glottolog languages inside the polygons.
data("world")
data("world")
The format is a sfc_POLYGON
of length 477 with an EPSG:4326 projection.
Some trickery was needed to produce a lightweight polygon to represent the worldmap in reasonably accuracy without becoming too large and unwiedly. The polygons are such that all coordinates for languages as listed in the glottolog (version 5) are inside these polygons.
The map has a basic EPSG:4326 projection, so longitude-latitude coordinates can immediately be added to it. However, this does not look very nice, because the polygon from Eurasia wraps around. Consider more suitable projections, see examples. To allow for a nice pacific-centered projection Greenland has been clipped.
Polygons are based on the data from https://www.naturalearthdata.com with adjustments. Glottolog coordinates to select and adjust the polygons are from https://glottolog.org.
data(world) plot(world) # try different projections mollweide_atlantic <- "+proj=imoll +lon_0=11.5" equalearth_pacific <- "+proj=eqearth +lon_0=151" azimuth_equaldist <- "+proj=aeqd +lat_0=90 +lon_0=45" plot(sf::st_transform(world, crs = azimuth_equaldist))
data(world) plot(world) # try different projections mollweide_atlantic <- "+proj=imoll +lon_0=11.5" equalearth_pacific <- "+proj=eqearth +lon_0=151" azimuth_equaldist <- "+proj=aeqd +lat_0=90 +lon_0=45" plot(sf::st_transform(world, crs = azimuth_equaldist))