Which pattern attributes are associated with higher pattern difficulty? [R + ravelRy]

I just published my first R package, ravelRy! You can read more about it in the introduction post here.

Background image by Paul Hanaoka via Unsplash

This post is part of a series that will explore data pulled from the Ravelry.com API into R using the package. Here, we'll pull a sample of pattern data and explore how pattern tags (like buttonholes, post-stitch or in the round) associate with pattern difficulty ratings.

The finished product:

Crochet baby hat patterns tagged with preemie have the lowest difficulty ratings, likely because they are so small! Not a lot of room for complicated stitching, and fewer stitches overall leads to quicker project times.

top-down hat patterns had difficulty averages on the low side. Starting from the top of the hat allows the crocheter to check in and add rows as they deem necessary, while starting from the bottom and working toward a converging point locks you into the row count given by the pattern.

The patterns rated most difficult had the tag phototutorial, which makes sense; authors probably wouldn't include a series of photos with the pattern if it were straightforward, but may include them to help clarify complicated directions.

textured, post-stitch and ribbed patterns were also rated more difficult. In order to create texture or ribbing, more complex stitches or more complicated series of stitches are required.

See below for step-by-step R code to build this plot.


Installation

First, download the package. You can either download from CRAN or from the development version on Github.  

install.packages("ravelRy") 

devtools::install_github("walkerkq/ravelRy")

Authenticating

To access the API, you'll need a free developer account, which you can create at https://www.ravelry.com/pro/developer. Then, create an app with basic read only authentication to receive a username and password.

You can either set the environment variables RAVELRY_USERNAME and RAVELRY_PASSWORD in your .Renviron file, or via the R console using the ravelry_auth function.

ravelry_auth(key = 'username') # you will be prompted to enter your username 
ravelry_auth(key = 'password') # you will be prompted to enter your password 

Getting data

Next, search for patterns. My favorite projects are baby hats, so let's get specific and pull the 100 most popular crochet hat patterns for babies.

Above: a pair of ridiculous turkey hats for my twin niece + nephew
patterns <- search_patterns(page_size = 100, 
                            craft = 'crochet', 
                            pc = 'hat', 
                            fit = 'baby', 
                            sort = 'popularity')

The parameters craft, pc, fit, and sort are not built into the search_patterns function, but can be passed through the ... parameter. You can discover new parameters by using the Ravelry pattern search filters and inspecting the URL. For example, the search URL for crochet baby hats, sorted by popularity, is: https://www.ravelry.com/patterns/search#craft=crochet&pc=hat&fit=baby&sort=popularity

Next, retrieve pattern details for each using the get_patterns function, which returns a data.frame with 50 variables, 11 of which are lists of tibbles.

pattern_details <- get_patterns(ids = patterns$id)

pattern_details %>% str(max.level = 1)

'data.frame':	100 obs. of  50 variables:
 $ comments_count          : int  0 0 0 0 0 0 0 0 0 0 ...
 $ created_at              : chr  "2013/03/30 23:47:32 -0400"   ...
 $ currency                : chr  "USD" "" "USD" "USD" ...
 $ difficulty_average      : num  1.67 1.77 0 1.88 1.73 ...
 $ difficulty_count        : chr  "3" "39" "" "40" ...
 $ downloadable            : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
 $ favorites_count         : int  92 1267 150 2303 556 1691 1014 124 387 ...
 $ free                    : logi  TRUE TRUE FALSE TRUE TRUE TRUE ...
 $ gauge                   : chr  "" "" "" "4" ...
 $ gauge_divisor           : chr  "4" "4" "4" "1" ...
 $ gauge_pattern           : chr  "dc" "" "" "" ...
 $ generally_available     : chr  "1995/09/01 00:00:00 -0400"   ...
 $ id                      : int  395695 479033 514754 548204 561787 ...
 $ name                    : chr  "Easy Brimmed Beanie Cap For Babies & Children" "Lisha Baby Hats"  ...
 $ pdf_url                 : chr  "" "" "" "" ...
 $ permalink               : chr  "easy-brimmed-beanie-cap-for-babies--children" "lisha-baby-hats"  ...
 $ price                   : chr  "2" "" "5" "" ...
 $ projects_count          : int  31 117 53 139 92 97 77 40 103 54 ...
 $ published               : chr  "1995/09/01" "2014/03/01" "2014/08/01" ...
 $ queued_projects_count   : int  10 137 15 307 68 187 106 24 34 11 ...
 $ rating_average          : num  5 4.67 0 4.39 4.75 ...
 $ rating_count            : chr  "3" "39" "" "38" ...
 $ row_gauge               : chr  "" "" "" "3" ...
 $ updated_at              : chr  "2019/10/21 14:29:22 -0400" "2014/03/19 13:00:52 -0400" "2018/03/21 20:03:00 -0400"  ...
 $ url                     : chr  "" "http://lindacraftycorner.blogspot.co.uk/2014/03/lisha-baby-hat.html" "" ...
 $ yardage                 : chr  "" "" "" "109" ...
 $ yardage_max             : chr  "" "" "" "218" ...
 $ personal_attributes     : chr  "" "" "" "" ...
 $ sizes_available         : chr  "can be made XS/newborn to baby, S/toddler to small child, M/child to tween, L/tween to teen"  ...
 $ product_id              : chr  "149062" "" "228104" "" ...
 $ currency_symbol         : chr  "$" "" "$" "$" ...
 $ ravelry_download        : logi  TRUE FALSE TRUE FALSE FALSE FALSE ...
 $ download_location       : List of 100
 $ pdf_in_library          : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ volumes_in_library      : chr  "" "" "" "" ...
 $ gauge_description       : chr  "dc" "" "" "4 stitches and 3 rows = 1 inch" ...
 $ yarn_weight_description : chr  "Worsted (9 wpi)" "DK (11 wpi)" "Worsted (9 wpi)" "Bulky (7 wpi)" ...
 $ yardage_description     : chr  " yards" " yards" " yards" "109 - 218 yards" ...
 $ pattern_needle_sizes    : List of 100
 $ notes_html              : chr  "\n<p>This is a basic beanie hat that I created with just under 4 oz oz of worsted weight"| __truncated__  ...
 $ notes                   : chr  "This is a basic beanie hat that I created with just under 4 oz oz of worsted weight"| __truncated__ ...
 $ packs                   : List of 100
 $ printings               : List of 100
 $ craft                   : List of 100
 $ pattern_categories      : List of 100
 $ pattern_attributes      : List of 100
 $ pattern_author          : List of 100
 $ photos                  : List of 100
 $ pattern_type            : List of 100
 $ yarn_weight             : List of 100

For this analysis, we'll focus on understanding how the pattern_attributes predict the difficulty_average rating.

Pattern attributes are stored like tags for each pattern and display as green boxes on each pattern's page. Here, for example, is a pattern with the pattern attributes adult, baby, child, in-the-round, textured, top-down, unisex, etc, that help describe the pattern.

Description of the La Vie en Rose Earflap Hat

The difficulty score is an average of ratings that users provide assessing how challenging the pattern was to follow, where 1 is very easy and 10 is very difficult.

Ratings for the La Vie en Rose Earflap Hat

Some crocheters prefer working in the round for hats, while others may find it challenging to remember where each row began or to keep an accurate count. Does the presence of the tag in-the-round predict that the pattern will have a higher difficulty rating?

Preparing the data

First, we need to unnest (tidyr) the pattern_attributes list column, which fans out the data.frame by giving each attribute a row.

pattern_details <- pattern_details %>%
  unnest(cols = 'pattern_attributes', names_sep = '_') 
  
pattern_details %>% 
  select(id, starts_with('pattern_attributes_')) %>% 
  head()

# A tibble: 6 x 3
      id pattern_attributes_id pattern_attributes_permalink
   <int>                 <int> <chr>                       
1 395695                     4 baby                        
2 395695                     7 toddler                     
3 395695                     8 child                       
4 395695                     9 teen                        
5 395695                   204 one-piece                   
6 395695                   211 top-down 

From here we can identify the most popular attributes to cut down on noise (and avoid including a predictor that only appears once or twice in the dataset). Since we are only working with 100 patterns, let's restrict attributes to those appearing in at least 10 observations (10%).

There are also some patterns with low values for difficulty_count. Because difficulty_average is an average of user ratings, lower counts will cause biases toward the few users' opinions who rated that pattern. In general, 30 is a good rule of thumb for the number of observations to support an average. In this case, it cuts our dataset in half... so we will settle for at least 15 ratings, which leaves us with 86. 😬

top_attributes <- pattern_details %>%
  filter(difficulty_count >= 15) %>%
  count(pattern_attributes_permalink) %>%
  filter(n >= 10)

attributes_long <- pattern_details %>% 
  filter(pattern_attributes_permalink %in% top_attributes$pattern_attributes_permalink) 

And now to plot it. Here I am using a combination of a custom theme theme_kp() with edits passed via the theme() function. You can view my custom ggplot2 theme in its repo on Github.

attributes_plot %>%
  ggplot() +
  geom_density(aes(x = difficulty_average), fill = '#490B32', color = 'grey') +
  facet_grid(reorder(pattern_attributes_permalink, difficulty_average, median)~.,switch = 'y') +
  labs(title = 'Pattern average difficulty rating by pattern tag',
       subtitle = 'For the 100 most popular crochet baby hat patterns',
       y = '', x = 'difficulty_average') +
  theme_kp() +
  theme(panel.spacing = unit(-3, "lines"),
        panel.grid.major.y = element_blank(),
        strip.text.y = element_text(angle = 180, vjust = 0),
        strip.background = element_rect(fill = NA, color = NA),
        axis.text.y = element_blank())

Crochet baby hat patterns tagged with preemie have the lowest difficulty ratings, likely because they are so small! Not a lot of room for complicated stitching, and fewer stitches overall leads to quicker project times.

top-down hat patterns had difficulty averages on the low side. Starting from the top of the hat allows the crocheter to check in and add rows as they deem necessary, while starting from the bottom and working toward a converging point locks you into the row count given by the pattern.

The patterns rated most difficult had the tag phototutorial, which makes sense; authors probably wouldn't include a series of photos with the pattern if it were straightforward, but may include them to help clarify complicated directions.

textured, post-stitch and ribbed patterns were also rated more difficult. In order to create texture or ribbing, more complex stitches or more complicated series of stitches are required.


This post is part of a series that will explore data pulled from the Ravelry.com API into R using the ravelRy package.

Show Comments