I just published my first R package, ravelRy! You can read more about it in the introduction post here.
This post is part of a series that will explore data pulled from the Ravelry.com API into R using the package. Here, we'll pull a sample of pattern data and explore how pattern tags (like buttonholes
, post-stitch
or in the round
) associate with pattern difficulty ratings.
The finished product:
Crochet baby hat patterns tagged with preemie
have the lowest difficulty ratings, likely because they are so small! Not a lot of room for complicated stitching, and fewer stitches overall leads to quicker project times.
top-down
hat patterns had difficulty averages on the low side. Starting from the top of the hat allows the crocheter to check in and add rows as they deem necessary, while starting from the bottom and working toward a converging point locks you into the row count given by the pattern.
The patterns rated most difficult had the tag phototutorial
, which makes sense; authors probably wouldn't include a series of photos with the pattern if it were straightforward, but may include them to help clarify complicated directions.
textured
, post-stitch
and ribbed
patterns were also rated more difficult. In order to create texture or ribbing, more complex stitches or more complicated series of stitches are required.
See below for step-by-step R code to build this plot.
Installation
First, download the package. You can either download from CRAN or from the development version on Github.
install.packages("ravelRy")
devtools::install_github("walkerkq/ravelRy")
Authenticating
To access the API, you'll need a free developer account, which you can create at https://www.ravelry.com/pro/developer. Then, create an app with basic read only authentication to receive a username and password.
You can either set the environment variables RAVELRY_USERNAME
and RAVELRY_PASSWORD
in your .Renviron file, or via the R console using the ravelry_auth
function.
ravelry_auth(key = 'username') # you will be prompted to enter your username
ravelry_auth(key = 'password') # you will be prompted to enter your password
Getting data
Next, search for patterns. My favorite projects are baby hats, so let's get specific and pull the 100 most popular crochet hat patterns for babies.
patterns <- search_patterns(page_size = 100,
craft = 'crochet',
pc = 'hat',
fit = 'baby',
sort = 'popularity')
The parameters craft
, pc
, fit
, and sort
are not built into the search_patterns
function, but can be passed through the ...
parameter. You can discover new parameters by using the Ravelry pattern search filters and inspecting the URL. For example, the search URL for crochet baby hats, sorted by popularity, is: https://www.ravelry.com/patterns/search#craft=crochet&pc=hat&fit=baby&sort=popularity
Next, retrieve pattern details for each using the get_patterns
function, which returns a data.frame with 50 variables, 11 of which are lists of tibbles.
pattern_details <- get_patterns(ids = patterns$id)
pattern_details %>% str(max.level = 1)
'data.frame': 100 obs. of 50 variables:
$ comments_count : int 0 0 0 0 0 0 0 0 0 0 ...
$ created_at : chr "2013/03/30 23:47:32 -0400" ...
$ currency : chr "USD" "" "USD" "USD" ...
$ difficulty_average : num 1.67 1.77 0 1.88 1.73 ...
$ difficulty_count : chr "3" "39" "" "40" ...
$ downloadable : logi TRUE TRUE TRUE TRUE TRUE TRUE ...
$ favorites_count : int 92 1267 150 2303 556 1691 1014 124 387 ...
$ free : logi TRUE TRUE FALSE TRUE TRUE TRUE ...
$ gauge : chr "" "" "" "4" ...
$ gauge_divisor : chr "4" "4" "4" "1" ...
$ gauge_pattern : chr "dc" "" "" "" ...
$ generally_available : chr "1995/09/01 00:00:00 -0400" ...
$ id : int 395695 479033 514754 548204 561787 ...
$ name : chr "Easy Brimmed Beanie Cap For Babies & Children" "Lisha Baby Hats" ...
$ pdf_url : chr "" "" "" "" ...
$ permalink : chr "easy-brimmed-beanie-cap-for-babies--children" "lisha-baby-hats" ...
$ price : chr "2" "" "5" "" ...
$ projects_count : int 31 117 53 139 92 97 77 40 103 54 ...
$ published : chr "1995/09/01" "2014/03/01" "2014/08/01" ...
$ queued_projects_count : int 10 137 15 307 68 187 106 24 34 11 ...
$ rating_average : num 5 4.67 0 4.39 4.75 ...
$ rating_count : chr "3" "39" "" "38" ...
$ row_gauge : chr "" "" "" "3" ...
$ updated_at : chr "2019/10/21 14:29:22 -0400" "2014/03/19 13:00:52 -0400" "2018/03/21 20:03:00 -0400" ...
$ url : chr "" "http://lindacraftycorner.blogspot.co.uk/2014/03/lisha-baby-hat.html" "" ...
$ yardage : chr "" "" "" "109" ...
$ yardage_max : chr "" "" "" "218" ...
$ personal_attributes : chr "" "" "" "" ...
$ sizes_available : chr "can be made XS/newborn to baby, S/toddler to small child, M/child to tween, L/tween to teen" ...
$ product_id : chr "149062" "" "228104" "" ...
$ currency_symbol : chr "$" "" "$" "$" ...
$ ravelry_download : logi TRUE FALSE TRUE FALSE FALSE FALSE ...
$ download_location : List of 100
$ pdf_in_library : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ volumes_in_library : chr "" "" "" "" ...
$ gauge_description : chr "dc" "" "" "4 stitches and 3 rows = 1 inch" ...
$ yarn_weight_description : chr "Worsted (9 wpi)" "DK (11 wpi)" "Worsted (9 wpi)" "Bulky (7 wpi)" ...
$ yardage_description : chr " yards" " yards" " yards" "109 - 218 yards" ...
$ pattern_needle_sizes : List of 100
$ notes_html : chr "\n<p>This is a basic beanie hat that I created with just under 4 oz oz of worsted weight"| __truncated__ ...
$ notes : chr "This is a basic beanie hat that I created with just under 4 oz oz of worsted weight"| __truncated__ ...
$ packs : List of 100
$ printings : List of 100
$ craft : List of 100
$ pattern_categories : List of 100
$ pattern_attributes : List of 100
$ pattern_author : List of 100
$ photos : List of 100
$ pattern_type : List of 100
$ yarn_weight : List of 100
For this analysis, we'll focus on understanding how the pattern_attributes
predict the difficulty_average
rating.
Pattern attributes are stored like tags for each pattern and display as green boxes on each pattern's page. Here, for example, is a pattern with the pattern attributes adult, baby, child, in-the-round, textured, top-down, unisex, etc, that help describe the pattern.
The difficulty score is an average of ratings that users provide assessing how challenging the pattern was to follow, where 1 is very easy and 10 is very difficult.
Some crocheters prefer working in the round for hats, while others may find it challenging to remember where each row began or to keep an accurate count. Does the presence of the tag in-the-round
predict that the pattern will have a higher difficulty rating?
Preparing the data
First, we need to unnest (tidyr) the pattern_attributes
list column, which fans out the data.frame by giving each attribute a row.
pattern_details <- pattern_details %>%
unnest(cols = 'pattern_attributes', names_sep = '_')
pattern_details %>%
select(id, starts_with('pattern_attributes_')) %>%
head()
# A tibble: 6 x 3
id pattern_attributes_id pattern_attributes_permalink
<int> <int> <chr>
1 395695 4 baby
2 395695 7 toddler
3 395695 8 child
4 395695 9 teen
5 395695 204 one-piece
6 395695 211 top-down
From here we can identify the most popular attributes to cut down on noise (and avoid including a predictor that only appears once or twice in the dataset). Since we are only working with 100 patterns, let's restrict attributes to those appearing in at least 10 observations (10%).
There are also some patterns with low values for difficulty_count
. Because difficulty_average
is an average of user ratings, lower counts will cause biases toward the few users' opinions who rated that pattern. In general, 30 is a good rule of thumb for the number of observations to support an average. In this case, it cuts our dataset in half... so we will settle for at least 15 ratings, which leaves us with 86. 😬
top_attributes <- pattern_details %>%
filter(difficulty_count >= 15) %>%
count(pattern_attributes_permalink) %>%
filter(n >= 10)
attributes_long <- pattern_details %>%
filter(pattern_attributes_permalink %in% top_attributes$pattern_attributes_permalink)
And now to plot it. Here I am using a combination of a custom theme theme_kp()
with edits passed via the theme()
function. You can view my custom ggplot2 theme in its repo on Github.
attributes_plot %>%
ggplot() +
geom_density(aes(x = difficulty_average), fill = '#490B32', color = 'grey') +
facet_grid(reorder(pattern_attributes_permalink, difficulty_average, median)~.,switch = 'y') +
labs(title = 'Pattern average difficulty rating by pattern tag',
subtitle = 'For the 100 most popular crochet baby hat patterns',
y = '', x = 'difficulty_average') +
theme_kp() +
theme(panel.spacing = unit(-3, "lines"),
panel.grid.major.y = element_blank(),
strip.text.y = element_text(angle = 180, vjust = 0),
strip.background = element_rect(fill = NA, color = NA),
axis.text.y = element_blank())
Crochet baby hat patterns tagged with preemie
have the lowest difficulty ratings, likely because they are so small! Not a lot of room for complicated stitching, and fewer stitches overall leads to quicker project times.
top-down
hat patterns had difficulty averages on the low side. Starting from the top of the hat allows the crocheter to check in and add rows as they deem necessary, while starting from the bottom and working toward a converging point locks you into the row count given by the pattern.
The patterns rated most difficult had the tag phototutorial
, which makes sense; authors probably wouldn't include a series of photos with the pattern if it were straightforward, but may include them to help clarify complicated directions.
textured
, post-stitch
and ribbed
patterns were also rated more difficult. In order to create texture or ribbing, more complex stitches or more complicated series of stitches are required.
This post is part of a series that will explore data pulled from the Ravelry.com API into R using the ravelRy package.