Which keywords are most characteristic of knitting vs. crochet? [R + ravelRy]

I just published my first R package, ravelRy! You can read more about it in the introduction post here.

Background image by Paul Hanaoka via Unsplash

This post is part of a series that will explore data pulled from the Ravelry.com API into R using the package. Here, we'll pull a sample of group data and explore which terms in their titles and descriptions are most characteristic of each craft: knitting, crochet, weaving, and spinning.

Ravelry groups are communities formed by users for users to share and discuss patterns, yarn, projects, techniques, and anything else.

Knitting group Sock Madness Forever

How do group names and descriptions vary across crafts? What do crocheters talk about and how does it differ from knitters or weavers?

The finished product:

The most distinctive or important word for crochet groups is amigurumi, which are small, stuffed crochet yarn dolls, like this adorable Baby Yoda. The second most important is cal, or CAL, an acronym for crochet-along groups, where users all work on the same pattern at the same time. The same acronym shows up under knitting as KAL for knit-along. Other distinctive words are CGOA (the Crochet Guild of America), Tunisian (a style of crochet that combines elements from knitting), test, hook, and block.

For knitting groups, design, KAL (knit-along), test, sock, needle, and podcast are the most distinctive, as well as stricken, the German word for knitting, as Ravelry has users around the globe. For example, Sockenstrickereien is a sock-related knitting group where the primary language of conversation is German.

Spinning groups were defined by words like wheel, spindle, spinner, handspin, and wheel-specific words like louet and lendrum, while weaving groups were defined strongly by the word loom, as well as loom-related words like heddle and warp and the brand Ashford.

See below for step-by-step R code to build this plot.


Installation

First, download the package. You can either download from CRAN or from the development version on Github.  

install.packages("ravelRy") 

devtools::install_github("walkerkq/ravelRy")

Authenticating

To access the API, you'll need a free developer account, which you can create at https://www.ravelry.com/pro/developer. Then, create an app with basic read only authentication to receive a username and password.

You can either set the environment variables RAVELRY_USERNAME and RAVELRY_PASSWORD in your .Renviron file, or via the R console using the ravelry_auth function.

ravelry_auth(key = 'username') # you will be prompted to enter your username 
ravelry_auth(key = 'password') # you will be prompted to enter your password 

Getting data

Next, search for groups using the search_groups function. Here, I've defined a custom function in my file to return results of four queries based on the four craft categories: knitting, crochet, spinning, and weaving. The resulting dataframe groups holds mostly knitting groups.

search_groups_by_craft <- function(x){
  search_groups(page_size = 2000,
                `group-crafts` = x,
                `group-active` = 'yes') %>% 
  mutate(craft = x)
}

crafts <- c('knitting', 'crochet', 'spinning', 'weaving')

groups <- lapply(crafts, search_groups_by_craft) %>%
  bind_rows() # turn df of lists into one df

groups %>% count(craft)

# A tibble: 4 x 2
  craft        n
  <chr>    <int>
1 crochet    166
2 knitting  1340
3 spinning   202
4 weaving     58

The search parameters I used in the search_groups function are page_size to return up to 2000 results for each query; group-crafts to filter by craft name; and group-active to restrict to groups that have had a post in the last 30 days. The parameters group-crafts and group-active are not bulit-in parameters in the function, but are instead passed through the ... parameter.

You can discover new parameters by using the Ravelry pattern search filters and inspecting the URL. For example, the search URL for active "stitch-n-bitch" knitting groups is: https://www.ravelry.com/groups/search#group-crafts=knitting&group-type=stitch-n-bitch&group-active=yes

Next, use tidytext to unnest the words (tokens) from the description and name fields.

library(tidytext)

groups_tidy_desc <- groups %>%
  select(id, craft, short_description) %>%
  unnest_tokens(output = 'word', input = 'short_description') 

groups_tidy <- groups %>%
  select(id, craft, name) %>% 
  unnest_tokens(output = 'word', input = 'name') %>%
  rbind(groups_tidy_desc) %>%
  anti_join(stop_words)  # remove words like `the`, `and`
  
head(groups_tidy)

# A tibble: 6 x 3
     id craft    word     
  <int> <chr>    <chr>    
1     6 knitting sock     
2     6 knitting knitters 
3     6 knitting anonymous
4     6 knitting knitters 
5     6 knitting stop     
6     6 knitting knitting 

To clean things up, we can also stem the words using the SnowballC package to combine similar concepts like play and playing or knitter and knitters.

groups_tidy_stemmed <- groups_tidy %>%
  mutate(wordstem = wordStem(word)) 
  
head(groups_tidy_stemmed) 

# A tibble: 6 x 4
     id craft    word      wordstem
  <int> <chr>    <chr>     <chr>   
1     6 knitting sock      sock    
2     6 knitting knitters  knitter 
3     6 knitting anonymous anonym  
4     6 knitting knitters  knitter 
5     6 knitting stop      stop    
6     6 knitting knitting  knit  

Next, count up the number of times each word stem occurs in each subset of craft group names and descriptions, and then calculate the term frequency-inverse document frequency (tf-idf) of each craft-word pair. tf-idf can be used as a measure of how important a word is to a document (here we are treating each craft groups' text as separate documents).

groups_tidy_counted <- groups_tidy_stemmed %>%
  count(craft, wordstem) %>%
  bind_tf_idf(term = wordstem, document = craft, n = n) 

Finally, restrict the dataset to the top 10 words by tf-idf for each group and plot!

groups_tfidf_counted %>%
  # filter out words appearing fewer than 5 times
  filter(n > 5) %>%
  group_by(craft) %>%
  mutate(rank = rank(-tf_idf, ties = 'random')) %>% 
  filter(rank <= 10) %>%
  ungroup() %>% 
  ggplot(aes(x = reorder(as.factor(rank), tf_idf), y = tf_idf, label = wordstem, color = craft)) +
  geom_point(size = 3) + 
  geom_segment(aes(x = reorder(as.factor(rank), tf_idf), xend = reorder(as.factor(rank), tf_idf), y = 0, yend = tf_idf)) +
  geom_text(aes(y = 0), color = 'black', hjust = 1.1, cex = 2.75) +
  facet_grid(craft ~ ., switch = 'y') +
  coord_flip() +
  ylim(c(-0.004, 0.06)) +
  labs(title = 'Most characteristic words in Ravelry craft-specific group names', 
       subtitle = 'Stemmed words ranked by tf-idf',
       x = '') +
  scale_color_kp() +
  theme(panel.grid.major.y = element_blank(),
        axis.text.y = element_blank(),
        legend.position = 'none')

The most distinctive or important word for crochet groups is amigurumi, which are small, stuffed crochet yarn dolls, like this adorable Baby Yoda. The second most important is cal, or CAL, an acronym for crochet-along groups, where users all work on the same pattern at the same time. The same acronym shows up under knitting as KAL for knit-along. Other distinctive words are CGOA (the Crochet Guild of America), Tunisian (a style of crochet that combines elements from knitting), test, hook, and block.

For knitting groups, design, KAL (knit-along), test, sock, needle, and podcast are the most distinctive, as well as stricken, the German word for knitting, as Ravelry has users around the globe. For example, Sockenstrickereien is a sock-related knitting group where the primary language of conversation is German.

Spinning groups were defined by words like wheel, spindle, spinner, handspin, and wheel-specific words like louet and lendrum, while weaving groups were defined strongly by the word loom, as well as loom-related words like heddle and warp and the brand Ashford.


This post is part of a series that will explore data pulled from the Ravelry.com API into R using the ravelRy package.

Show Comments