I just published my first R package, ravelRy! You can read more about it in the introduction post here.
This post is part of a series that will explore data pulled from the Ravelry.com API into R using the package. Here, we'll pull a sample of group data and explore which terms in their titles and descriptions are most characteristic of each craft: knitting, crochet, weaving, and spinning.
Ravelry groups are communities formed by users for users to share and discuss patterns, yarn, projects, techniques, and anything else.
How do group names and descriptions vary across crafts? What do crocheters talk about and how does it differ from knitters or weavers?
The finished product:
The most distinctive or important word for crochet groups is amigurumi, which are small, stuffed crochet yarn dolls, like this adorable Baby Yoda. The second most important is cal, or CAL, an acronym for crochet-along groups, where users all work on the same pattern at the same time. The same acronym shows up under knitting as KAL for knit-along. Other distinctive words are CGOA (the Crochet Guild of America), Tunisian (a style of crochet that combines elements from knitting), test, hook, and block.
For knitting groups, design, KAL (knit-along), test, sock, needle, and podcast are the most distinctive, as well as stricken, the German word for knitting, as Ravelry has users around the globe. For example, Sockenstrickereien is a sock-related knitting group where the primary language of conversation is German.
Spinning groups were defined by words like wheel, spindle, spinner, handspin, and wheel-specific words like louet and lendrum, while weaving groups were defined strongly by the word loom, as well as loom-related words like heddle and warp and the brand Ashford.
See below for step-by-step R code to build this plot.
Installation
First, download the package. You can either download from CRAN or from the development version on Github.
install.packages("ravelRy")
devtools::install_github("walkerkq/ravelRy")
Authenticating
To access the API, you'll need a free developer account, which you can create at https://www.ravelry.com/pro/developer. Then, create an app with basic read only authentication to receive a username and password.
You can either set the environment variables RAVELRY_USERNAME
and RAVELRY_PASSWORD
in your .Renviron file, or via the R console using the ravelry_auth
function.
ravelry_auth(key = 'username') # you will be prompted to enter your username
ravelry_auth(key = 'password') # you will be prompted to enter your password
Getting data
Next, search for groups using the search_groups
function. Here, I've defined a custom function in my file to return results of four queries based on the four craft categories: knitting, crochet, spinning, and weaving. The resulting dataframe groups
holds mostly knitting groups.
search_groups_by_craft <- function(x){
search_groups(page_size = 2000,
`group-crafts` = x,
`group-active` = 'yes') %>%
mutate(craft = x)
}
crafts <- c('knitting', 'crochet', 'spinning', 'weaving')
groups <- lapply(crafts, search_groups_by_craft) %>%
bind_rows() # turn df of lists into one df
groups %>% count(craft)
# A tibble: 4 x 2
craft n
<chr> <int>
1 crochet 166
2 knitting 1340
3 spinning 202
4 weaving 58
The search parameters I used in the search_groups
function are page_size
to return up to 2000 results for each query; group-crafts
to filter by craft name; and group-active
to restrict to groups that have had a post in the last 30 days. The parameters group-crafts
and group-active
are not bulit-in parameters in the function, but are instead passed through the ...
parameter.
You can discover new parameters by using the Ravelry pattern search filters and inspecting the URL. For example, the search URL for active "stitch-n-bitch" knitting groups is: https://www.ravelry.com/groups/search#group-crafts=knitting&group-type=stitch-n-bitch&group-active=yes
Next, use tidytext
to unnest the words (tokens) from the description and name fields.
library(tidytext)
groups_tidy_desc <- groups %>%
select(id, craft, short_description) %>%
unnest_tokens(output = 'word', input = 'short_description')
groups_tidy <- groups %>%
select(id, craft, name) %>%
unnest_tokens(output = 'word', input = 'name') %>%
rbind(groups_tidy_desc) %>%
anti_join(stop_words) # remove words like `the`, `and`
head(groups_tidy)
# A tibble: 6 x 3
id craft word
<int> <chr> <chr>
1 6 knitting sock
2 6 knitting knitters
3 6 knitting anonymous
4 6 knitting knitters
5 6 knitting stop
6 6 knitting knitting
To clean things up, we can also stem the words using the SnowballC
package to combine similar concepts like play
and playing
or knitter
and knitters
.
groups_tidy_stemmed <- groups_tidy %>%
mutate(wordstem = wordStem(word))
head(groups_tidy_stemmed)
# A tibble: 6 x 4
id craft word wordstem
<int> <chr> <chr> <chr>
1 6 knitting sock sock
2 6 knitting knitters knitter
3 6 knitting anonymous anonym
4 6 knitting knitters knitter
5 6 knitting stop stop
6 6 knitting knitting knit
Next, count up the number of times each word stem occurs in each subset of craft group names and descriptions, and then calculate the term frequency-inverse document frequency (tf-idf) of each craft-word pair. tf-idf can be used as a measure of how important a word is to a document (here we are treating each craft groups' text as separate documents).
groups_tidy_counted <- groups_tidy_stemmed %>%
count(craft, wordstem) %>%
bind_tf_idf(term = wordstem, document = craft, n = n)
Finally, restrict the dataset to the top 10 words by tf-idf for each group and plot!
groups_tfidf_counted %>%
# filter out words appearing fewer than 5 times
filter(n > 5) %>%
group_by(craft) %>%
mutate(rank = rank(-tf_idf, ties = 'random')) %>%
filter(rank <= 10) %>%
ungroup() %>%
ggplot(aes(x = reorder(as.factor(rank), tf_idf), y = tf_idf, label = wordstem, color = craft)) +
geom_point(size = 3) +
geom_segment(aes(x = reorder(as.factor(rank), tf_idf), xend = reorder(as.factor(rank), tf_idf), y = 0, yend = tf_idf)) +
geom_text(aes(y = 0), color = 'black', hjust = 1.1, cex = 2.75) +
facet_grid(craft ~ ., switch = 'y') +
coord_flip() +
ylim(c(-0.004, 0.06)) +
labs(title = 'Most characteristic words in Ravelry craft-specific group names',
subtitle = 'Stemmed words ranked by tf-idf',
x = '') +
scale_color_kp() +
theme(panel.grid.major.y = element_blank(),
axis.text.y = element_blank(),
legend.position = 'none')
The most distinctive or important word for crochet groups is amigurumi, which are small, stuffed crochet yarn dolls, like this adorable Baby Yoda. The second most important is cal, or CAL, an acronym for crochet-along groups, where users all work on the same pattern at the same time. The same acronym shows up under knitting as KAL for knit-along. Other distinctive words are CGOA (the Crochet Guild of America), Tunisian (a style of crochet that combines elements from knitting), test, hook, and block.
For knitting groups, design, KAL (knit-along), test, sock, needle, and podcast are the most distinctive, as well as stricken, the German word for knitting, as Ravelry has users around the globe. For example, Sockenstrickereien is a sock-related knitting group where the primary language of conversation is German.
Spinning groups were defined by words like wheel, spindle, spinner, handspin, and wheel-specific words like louet and lendrum, while weaving groups were defined strongly by the word loom, as well as loom-related words like heddle and warp and the brand Ashford.
This post is part of a series that will explore data pulled from the Ravelry.com API into R using the ravelRy package.